Re a C/C++ version, I thought a little about that several days ago. The main bottleneck there is I need two special functions with all complex arguments: the Bessel I(nu,z) and the confluent hypergeometric M(a,b,z). Googling, I could only locate one library supporting those and it was not really set up for Windows, my current environment. If anybody knows such a library that can easily be hooked in to Visual Studio on Windows, I'd appreciate a link. (Some usual suspects, Boost and GSL, don't have the complex argument support, as far as I could tell).

Yeah, typically a Fortran era function that has not (yet) been ported to C++. It was another generation of mathematicians.

What about the following: Use the Kummer's series representation for CHF and implement the former using stuff in Boost Math (I would assume that complex numbers are supported at that level, all thing being equal). If not, it is not difficult to get an initial working model directly.

http://www.boost.org/doc/libs/1_54_0/li ... ation.html
Equation (12) looks doable IMO

http://www.ece.mtu.edu/faculty/wfp/arti ... l_math.pdf
And it is obvious that parallel is needed (e.g. do you know Visual Studio PPL library, Parallel Aggregate if using (12), very easy to use) and possibly (Boost) multiprecision. Even easier, is a one-liner in OpenMP 2.0 (shipped with Visual Studio) and a thread-safe reduction variable to parallelise the loop (for example, you can get a speedup of 5 on a 8-core machine for 2 factor option pricing with ADE/FDM). BTW does your GPU support double precision and C99 maths?

Here is OpenMP piece as file...

In 2016 two of my MSc students did C++ AMP library for GPU and is much easier to use than the CUDA interface. If you decide to go down the GPU road I would advise C++ AMP in the short term for you unless there are compelling reasons for not doing so. Based on my contacts they say it takes 3-4 months to learn CUDA. The learning curve with C++ AMP is much shorter.

I have theses public domain for both CUDA and C++ AMP. Give me a shout if you are interested in copies.

Of course, parallel multicore + GPU on one machine also offers possibilities

Then load balancing is the name of the game.

At the end of the day, the peculiarities of the algorithm determines which design pattern and platforms is most suitable.