- Cuchulainn
**Posts:**62869**Joined:****Location:**Amsterdam-
**Contact:**

At this stage, what are the features to include in the corresponding software system?The end-result is a non-proprietary framework based on new generic design patterns and Boost for the plumbing and which can be customised/tailored for various kinds of models.1, 2 and 3 factor models to start with.C++, C# and Excel interop. Thanks for your input.

Last edited by Cuchulainn on October 8th, 2011, 10:00 pm, edited 1 time in total.

Step over the gap, not into it. Watch the space between platform and train.

http://www.datasimfinancial.com

http://www.datasim.nl

http://www.datasimfinancial.com

http://www.datasim.nl

Well, here is a generic one (spatial) factor system that I wish I could solve *much* more rapidly than in Mathematica.It is essentially eqn (2.19), pg 52 in my book. On the positive half-line, x in (0,infinity), we want f(t,x) = f(t,x;z) which solvesf_t = a(x) f_xx + b(x;z) f_x - c(x;z) f, where a(x) > 0 throughout the interior, and Re[c] > 0.f(0,x) = 1f(t,infinity) = 0.Often there are no boundary conditions for f(t,x=0). Here z is a complex parameter.So, some requirements that follow from this problem:1. Ability to handle semi-infinite domains, perhaps with an automatic transformation from x to X(x) = x/(x + x0) in (0,1)2. Ability to have complex-valued pde coefs (except for a(x)) -- same as my previous request for complex number support.3. Ability to *not* specify a boundary condition at a boundary and the software tries to handle it in a reasonable way.The reason this needs to be very fast is that the output f(t,x;z) is then fed to an integration routine over z.Let's say that integration routine uses 100 z-values and results in a call option value C(parms). Here parms is a short list of (real) parameters besides z (example in my p.s.) So to getC(parms) requires 100 calls to the pde solver. Then, let's say we are doing a calibration over parameterswith 50 options and need to change the parameters 100 times. Net result: the pde solver is called 500,000 times.So this leads to another requirement.4. FAST!! > 500,000 calls to the 1 factor solver are possible in, say, < 1hr. [And no memory leaks]My wish list --------------------------------------------------------------------------------------------------------------------------------------------------------p.s. Here is a specific example I wish I could handle as described, but can't because NDSolve in Mathematicais too slow. It is the stochastic vol. model called the GARCH diffusion, which hasa(x) = x^2b(x;z) = A - B x - i z rho x^(3/2)c(x;z) = 0.5 (z^2 - i z) xSo, besides z, the real parms are (A, B, rho), where I have set the vol. of vol. parm to 1.A typical range for z is z = u + i v, where v = 1/2 and u is any real. So Re[c] = 0.5 u^2 + 0.125 > 0, which keeps the pde problem very well-behaved.

Last edited by Alan on October 8th, 2011, 10:00 pm, edited 1 time in total.

If 1-factor solver is SIMD friendly then 0.01s to call 1D solver should be possible and it is very conservative estimate. Single threaded scalar throughput is about 3 GFlops. With 0.01s per call it gives 30M floating point operations to solve 1-factor. With 4 SIMD capable cores we get around 20-50 GFlops.

Last edited by renorm on October 9th, 2011, 10:00 pm, edited 1 time in total.

So, 500,000 solver calls in 5000 secs, or about 1.5 hrs. That would be wonderful for my purposes. For comparison, in Mathematica, NDSolve with a spatial grid of 100 points takes about 2/3 sec per solver call.So for 500,000 calls, we are talking about 93 hrs. You can see my problem! BTW, can anybody with Matlab solve my posted example, say with A = B = T = x0 = 1, rho = -0.5, z = 1 + 0.5 i,and report the estimated run-time for 500,000 pde solver calls in that environment? (Say 100 spatial grid points,and any reasonably small time step that looks like it is returning at least 5-6 good digits). To make it easy,you could keep the pde as it is, and just take f(t,xmax) = 0, say with xmax = 2 or something. We don'tneed an accurate result, just a timing estimate on a similar problem.

Last edited by Alan on October 9th, 2011, 10:00 pm, edited 1 time in total.

- Cuchulainn
**Posts:**62869**Joined:****Location:**Amsterdam-
**Contact:**

QuoteSo, some requirements that follow from this problem:1. Ability to handle semi-infinite domains, perhaps with an automatic transformation from x to X(x) = x/(x + x0) in (0,1)2. Ability to have complex-valued pde coefs (except for a(x)) -- same as my previous request for complex number support.3. Ability to *not* specify a boundary condition at a boundary and the software tries to handle it in a reasonable way.The reason this needs to be very fast is that the output f(t,x;z) is then fed to an integration routine over z.Let's say that integration routine uses 100 z-values and results in a call option value C(parms). Here parms is a short list of (real) parameters besides z (example in my p.s.) So to getC(parms) requires 100 calls to the pde solver. Then, let's say we are doing a calibration over parameterswith 50 options and need to change the parameters 100 times. Net result: the pde solver is called 500,000 times.So this leads to another requirement.4. FAST!! > 500,000 calls to the 1 factor solver are possible in, say, < 1hr. [And no memory leaks]Some initial feedback1A. No problem; V1 we could use hard-coded PDE and transformation; in V2 do functonal composition using C++ and bind to keep PDE explosion in check.2A. We have discussed this (3/2 model for CF). We have a templated PDE, no problem.3A. As long as I specify good interfaces we can experiment with several 'acceptable' BC.4A. For calibration, I would use ADE and *not* Crank Nicolson.(some results to follow...)

Last edited by Cuchulainn on October 9th, 2011, 10:00 pm, edited 1 time in total.

Step over the gap, not into it. Watch the space between platform and train.

http://www.datasimfinancial.com

http://www.datasim.nl

http://www.datasimfinancial.com

http://www.datasim.nl

Thanks, Daniel. If these run-time specs can be met, it would mean the GARCH diffusion can be calibrated.While a very natural stoch. vol model, it has never been calibrated to a full option chain, AFAIK, no doubt due tothis same run-time difficulty in getting option prices. More importantly, it would mean people couldmuch easier explore stoch. vol. models given as simply parameterized sde's, without the necessity of having a closed-formcharacteristic function -- that would be a major contribution to computational finance.

Last edited by Alan on October 9th, 2011, 10:00 pm, edited 1 time in total.

- Cuchulainn
**Posts:**62869**Joined:****Location:**Amsterdam-
**Contact:**

I took a 1-factor PDE with domain transformatiion and NT = NY = 100. And 1/2 million calls. Kind of similar to your benchnark example.C++ADE 56 minutes (my V1 scheme was built for flexibility, not speed).CN (CTRL C after 1.5 hours ...)My test was rough and ready on a laptop and I did not use any of rnorm's 4-letter speedup tricks. A multi-threaded design would give even better speedup.For complex coefficients, it might be a bit slower, but on a bigger machine + ADE it would be fine I suppose.Even the serial FDM I could get even faster. Just strip down to 10 lines of ADE and some SIMD. Indeed, better to solve for CF by a PDE.

Last edited by Cuchulainn on October 9th, 2011, 10:00 pm, edited 1 time in total.

Step over the gap, not into it. Watch the space between platform and train.

http://www.datasimfinancial.com

http://www.datasim.nl

http://www.datasimfinancial.com

http://www.datasim.nl

- Cuchulainn
**Posts:**62869**Joined:****Location:**Amsterdam-
**Contact:**

For this problem and way of implementation, C# is [40,60]% slower than the C++ solution.

Last edited by Cuchulainn on October 9th, 2011, 10:00 pm, edited 1 time in total.

http://www.datasimfinancial.com

http://www.datasim.nl

QuoteOriginally posted by: CuchulainnI took a 1-factor PDE with domain transformatiion and NT = NY = 100. And 1/2 million calls. Kind of similar to your benchnark example.C++ADE 56 minutes (my V1 scheme was built for flexibility, not speed).CN (CTRL C after 1.5 hours ...)My test was rough and ready on a laptop and I did not use any of rnorm's 4-letter speedup tricks. A multi-threaded design would give even better speedup.For complex coefficients, it might be a bit slower, but on a bigger machine + ADE it would be fine I suppose.Even the serial FDM I could get even faster. Just strip down to 10 lines of ADE and some SIMD. Indeed, better to solve for CF by a PDE.Sounds very encouraging. Yes, multi-threads for calibration would also be very nice and the problem is well-suitedto 6-10 cores.

- Cuchulainn
**Posts:**62869**Joined:****Location:**Amsterdam-
**Contact:**

QuoteOriginally posted by: AlanQuoteOriginally posted by: CuchulainnI took a 1-factor PDE with domain transformatiion and NT = NY = 100. And 1/2 million calls. Kind of similar to your benchnark example.C++ADE 56 minutes (my V1 scheme was built for flexibility, not speed).CN (CTRL C after 1.5 hours ...)My test was rough and ready on a laptop and I did not use any of rnorm's 4-letter speedup tricks. A multi-threaded design would give even better speedup.For complex coefficients, it might be a bit slower, but on a bigger machine + ADE it would be fine I suppose.Even the serial FDM I could get even faster. Just strip down to 10 lines of ADE and some SIMD. Indeed, better to solve for CF by a PDE.Sounds very encouraging. Yes, multi-threads for calibration would also be very nice and the problem is well-suitedto 6-10 cores.Full test in serial modeADE (B&C) 3421 secs.Full implicit 4873CN 6429I guess this is why LU methods and calibration don't mix well?If ADE (Saul'yev) used, I reckon [2000,3000] second, serial mode.Alan,Can you paraphrase/describe the use case of this process in layman's tems? in particular, what is shared read-only data etc. Roll on, parallel processing.

Last edited by Cuchulainn on October 10th, 2011, 10:00 pm, edited 1 time in total.

http://www.datasimfinancial.com

http://www.datasim.nl

QuoteOriginally posted by: CuchulainnCan you paraphrase/describe the use case of this process in layman's tems? in particular, what is shared read-only data etc. Roll on, parallel processing.Do you mean (i) the "calibration process" (and the use of 6 cores), or (ii) the "option valuation process" (pde + complex integration to get the option value), or(iii) the "stochastic process" (the GARCH diffusion)?

Last edited by Alan on October 10th, 2011, 10:00 pm, edited 1 time in total.

- Cuchulainn
**Posts:**62869**Joined:****Location:**Amsterdam-
**Contact:**

QuoteOriginally posted by: AlanQuoteOriginally posted by: CuchulainnCan you paraphrase/describe the use case of this process in layman's tems? in particular, what is shared read-only data etc. Roll on, parallel processing.Do you mean (i) the "calibration process" (and the use of 6 cores), or (ii) the "option valuation process" (pde + complex integration to get the option value), or(iii) the "stochastic process" (the GARCH diffusion)?Just a general description of the steps in getting from input to output.

http://www.datasimfinancial.com

http://www.datasim.nl

Well, for an option chain calibration against a model (in this case the GARCH diffusion) you need to first have a a valuation function in place.The valuation function we are discussing involves running your pde solver, to get a characteristic function, which is passed to an integration routine. That valuation procedure gives you one option value.Then, for a calibration, say against an option chain with 200 options, you run an optimizer that minimizes an objective function.For example, the objective could be the mean aboslute error between the market implied volatilties and the model implied volatilitites.The implied volatilities are calculated from the prices (market and model).The minimization is over the parameters of the model, in this case the (A,B,rho) that I posted earlier.So, to sum up, the input is a set of marketplace option prices.The out put is a set of calibrated model parameters.

Mathematica offers both local and global optimizers. For my purposes, the local nonlinear constrained optimizer is fine, andruns what they call an interior point method. I could point you to some terse implementation notes for it if you really want.So, for my purposes I just need QFCL to offer a similar interior point method, which I assume will be gotten from an optimizer codelibrary somewhere.

Last edited by Alan on October 10th, 2011, 10:00 pm, edited 1 time in total.

GZIP: On