got something working just now (before going to the dentist, but need to go to my client now)
* I'm using float instead of double. I expect double to be twice as slow. I picked float because I think it's interesting, and the example I modifies was in float. I can do double too.
* I'm used 20 iterations of algorithm 3.2b. It didn't converge, there is a bug somewhere I think, the results are wrong too. The number of multiplications / divisions is probably not going to change much when I fix it
*tested it with a low end NVidia GTX 970 with 1664 core
For this case:
M[ 1.000000, 4.000000, 0.000000 + 50.000000i ] -> 2.99 E9 evalutation / sec
1. The 101 case was exp(5, 0) = exp(5 + 0i),
not exp(0,50). This is because your case explodes in this (it is documented in the article why it explodes). For exp(0,50) only MP works.
2. float is faster but useless for 3.2b. For exp(0,50) even double blows up (MP needed).
3. Can you give running times when N = 10^6,..., 10^9 because I have these at hand myself.
4. I attach my code for the different algos for comparison. (If you have improvements etc. let mw know).