Page 1 of 7

Posted: October 9th, 2011, 6:45 pm
At this stage, what are the features to include in the corresponding software system?The end-result is a non-proprietary framework based on new generic design patterns and Boost for the plumbing and which can be customised/tailored for various kinds of models.1, 2 and 3 factor models to start with.C++, C# and Excel interop. Thanks for your input.

Posted: October 9th, 2011, 7:58 pm
Well, here is a generic one (spatial) factor system that I wish I could solve *much* more rapidly than in Mathematica.It is essentially eqn (2.19), pg 52 in my book. On the positive half-line, x in (0,infinity), we want f(t,x) = f(t,x;z) which solvesf_t = a(x) f_xx + b(x;z) f_x - c(x;z) f, where a(x) > 0 throughout the interior, and Re[c] > 0.f(0,x) = 1f(t,infinity) = 0.Often there are no boundary conditions for f(t,x=0). Here z is a complex parameter.So, some requirements that follow from this problem:1. Ability to handle semi-infinite domains, perhaps with an automatic transformation from x to X(x) = x/(x + x0) in (0,1)2. Ability to have complex-valued pde coefs (except for a(x)) -- same as my previous request for complex number support.3. Ability to *not* specify a boundary condition at a boundary and the software tries to handle it in a reasonable way.The reason this needs to be very fast is that the output f(t,x;z) is then fed to an integration routine over z.Let's say that integration routine uses 100 z-values and results in a call option value C(parms). Here parms is a short list of (real) parameters besides z (example in my p.s.) So to getC(parms) requires 100 calls to the pde solver. Then, let's say we are doing a calibration over parameterswith 50 options and need to change the parameters 100 times. Net result: the pde solver is called 500,000 times.So this leads to another requirement.4. FAST!! > 500,000 calls to the 1 factor solver are possible in, say, < 1hr. [And no memory leaks]My wish list --------------------------------------------------------------------------------------------------------------------------------------------------------p.s. Here is a specific example I wish I could handle as described, but can't because NDSolve in Mathematicais too slow. It is the stochastic vol. model called the GARCH diffusion, which hasa(x) = x^2b(x;z) = A - B x - i z rho x^(3/2)c(x;z) = 0.5 (z^2 - i z) xSo, besides z, the real parms are (A, B, rho), where I have set the vol. of vol. parm to 1.A typical range for z is z = u + i v, where v = 1/2 and u is any real. So Re[c] = 0.5 u^2 + 0.125 > 0, which keeps the pde problem very well-behaved.

Posted: October 9th, 2011, 10:16 pm
If 1-factor solver is SIMD friendly then 0.01s to call 1D solver should be possible and it is very conservative estimate. Single threaded scalar throughput is about 3 GFlops. With 0.01s per call it gives 30M floating point operations to solve 1-factor. With 4 SIMD capable cores we get around 20-50 GFlops.

Posted: October 9th, 2011, 10:29 pm
So, 500,000 solver calls in 5000 secs, or about 1.5 hrs. That would be wonderful for my purposes. For comparison, in Mathematica, NDSolve with a spatial grid of 100 points takes about 2/3 sec per solver call.So for 500,000 calls, we are talking about 93 hrs. You can see my problem! BTW, can anybody with Matlab solve my posted example, say with A = B = T = x0 = 1, rho = -0.5, z = 1 + 0.5 i,and report the estimated run-time for 500,000 pde solver calls in that environment? (Say 100 spatial grid points,and any reasonably small time step that looks like it is returning at least 5-6 good digits). To make it easy,you could keep the pde as it is, and just take f(t,xmax) = 0, say with xmax = 2 or something. We don'tneed an accurate result, just a timing estimate on a similar problem.

Posted: October 10th, 2011, 3:52 pm
QuoteSo, some requirements that follow from this problem:1. Ability to handle semi-infinite domains, perhaps with an automatic transformation from x to X(x) = x/(x + x0) in (0,1)2. Ability to have complex-valued pde coefs (except for a(x)) -- same as my previous request for complex number support.3. Ability to *not* specify a boundary condition at a boundary and the software tries to handle it in a reasonable way.The reason this needs to be very fast is that the output f(t,x;z) is then fed to an integration routine over z.Let's say that integration routine uses 100 z-values and results in a call option value C(parms). Here parms is a short list of (real) parameters besides z (example in my p.s.) So to getC(parms) requires 100 calls to the pde solver. Then, let's say we are doing a calibration over parameterswith 50 options and need to change the parameters 100 times. Net result: the pde solver is called 500,000 times.So this leads to another requirement.4. FAST!! > 500,000 calls to the 1 factor solver are possible in, say, < 1hr. [And no memory leaks]Some initial feedback1A. No problem; V1 we could use hard-coded PDE and transformation; in V2 do functonal composition using C++ and bind to keep PDE explosion in check.2A. We have discussed this (3/2 model for CF). We have a templated PDE, no problem.3A. As long as I specify good interfaces we can experiment with several 'acceptable' BC.4A. For calibration, I would use ADE and *not* Crank Nicolson.(some results to follow...)

Posted: October 10th, 2011, 5:58 pm
Thanks, Daniel. If these run-time specs can be met, it would mean the GARCH diffusion can be calibrated.While a very natural stoch. vol model, it has never been calibrated to a full option chain, AFAIK, no doubt due tothis same run-time difficulty in getting option prices. More importantly, it would mean people couldmuch easier explore stoch. vol. models given as simply parameterized sde's, without the necessity of having a closed-formcharacteristic function -- that would be a major contribution to computational finance.

Posted: October 10th, 2011, 6:22 pm
I took a 1-factor PDE with domain transformatiion and NT = NY = 100. And 1/2 million calls. Kind of similar to your benchnark example.C++ADE 56 minutes (my V1 scheme was built for flexibility, not speed).CN (CTRL C after 1.5 hours ...)My test was rough and ready on a laptop and I did not use any of rnorm's 4-letter speedup tricks. A multi-threaded design would give even better speedup.For complex coefficients, it might be a bit slower, but on a bigger machine + ADE it would be fine I suppose.Even the serial FDM I could get even faster. Just strip down to 10 lines of ADE and some SIMD. Indeed, better to solve for CF by a PDE.

Posted: October 10th, 2011, 6:54 pm
For this problem and way of implementation, C# is [40,60]% slower than the C++ solution.

Posted: October 10th, 2011, 9:04 pm
QuoteOriginally posted by: CuchulainnI took a 1-factor PDE with domain transformatiion and NT = NY = 100. And 1/2 million calls. Kind of similar to your benchnark example.C++ADE 56 minutes (my V1 scheme was built for flexibility, not speed).CN (CTRL C after 1.5 hours ...)My test was rough and ready on a laptop and I did not use any of rnorm's 4-letter speedup tricks. A multi-threaded design would give even better speedup.For complex coefficients, it might be a bit slower, but on a bigger machine + ADE it would be fine I suppose.Even the serial FDM I could get even faster. Just strip down to 10 lines of ADE and some SIMD. Indeed, better to solve for CF by a PDE.Sounds very encouraging. Yes, multi-threads for calibration would also be very nice and the problem is well-suitedto 6-10 cores.

Posted: October 11th, 2011, 8:59 am
QuoteOriginally posted by: AlanQuoteOriginally posted by: CuchulainnI took a 1-factor PDE with domain transformatiion and NT = NY = 100. And 1/2 million calls. Kind of similar to your benchnark example.C++ADE 56 minutes (my V1 scheme was built for flexibility, not speed).CN (CTRL C after 1.5 hours ...)My test was rough and ready on a laptop and I did not use any of rnorm's 4-letter speedup tricks. A multi-threaded design would give even better speedup.For complex coefficients, it might be a bit slower, but on a bigger machine + ADE it would be fine I suppose.Even the serial FDM I could get even faster. Just strip down to 10 lines of ADE and some SIMD. Indeed, better to solve for CF by a PDE.Sounds very encouraging. Yes, multi-threads for calibration would also be very nice and the problem is well-suitedto 6-10 cores.Full test in serial modeADE (B&C) 3421 secs.Full implicit 4873CN 6429I guess this is why LU methods and calibration don't mix well?If ADE (Saul'yev) used, I reckon [2000,3000] second, serial mode.Alan,Can you paraphrase/describe the use case of this process in layman's tems? in particular, what is shared read-only data etc. Roll on, parallel processing.

Posted: October 11th, 2011, 12:58 pm
QuoteOriginally posted by: CuchulainnCan you paraphrase/describe the use case of this process in layman's tems? in particular, what is shared read-only data etc. Roll on, parallel processing.Do you mean (i) the "calibration process" (and the use of 6 cores), or (ii) the "option valuation process" (pde + complex integration to get the option value), or(iii) the "stochastic process" (the GARCH diffusion)?