Serving the Quantitative Finance Community

Cuchulainn
Posts: 17286
Joined: July 16th, 2004, 7:38 am
Location: Lviv

Re: 100 millions time faster than ODE methods

How can someone even write 150K lines of proper code?
I wrote much more than 150 k code lines :/
Over a period of 20 years.
In general, the number of lines of code/programmer is 20 LOC/week.

For pacemakers, it is one LOC per 3 weeks. A typical pricing library 3-5 million LOC.

In paper-tape/punchcard era

$salary = \alpha LOC + \beta$

The big swingers were those with boxes of punch cards.

planetoid 17300 == 2002 AV_63

Cuchulainn
Posts: 17286
Joined: July 16th, 2004, 7:38 am
Location: Lviv

Re: 100 millions time faster than ODE methods

How can someone even write 150K lines of proper code?
update: the code for the last version was 1,000 lines (1 KLOC). hth Scratch code a factor of 10.
Very clever.

“The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. The only question is whether to plan in advance to build a throwaway, or to promise to deliver the throwaway to customers.”

― Frederick P. Brooks Jr., The Mythical Man-Month: Essays on Software Engineering
planetoid 17300 == 2002 AV_63

Cuchulainn
Posts: 17286
Joined: July 16th, 2004, 7:38 am
Location: Lviv

Re: 100 millions time faster than ODE methods

It's like learning to play the violin. Ten thousand hours of practice, but then you can rattle through all four seasons in minutes.

But 'ukulele is still better.
What's easier?
1. tuning a ukele?
2. tuning a ANN?
planetoid 17300 == 2002 AV_63

JohnLeM
Topic Author
Posts: 378
Joined: September 16th, 2008, 7:15 pm

Re: 100 millions time faster than ODE methods

Ouch. I think I took the wrong formula to get my salary. It seems more like
$salary = \frac{\alpha}{\beta + loc}$

Cuchulainn
Posts: 17286
Joined: July 16th, 2004, 7:38 am
Location: Lviv

Re: 100 millions time faster than ODE methods

Here's an article claiming only ONE million times faster (in the small letters it takes 1 week to generate training data on 24 cores).

https://arxiv.org/abs/1809.02233

It's all rather superficial and more of a description than an explanation, alas.. It it 90% text.
The detailed algorithm is missing; they can write it is TWO million times faster and I would still believe you.

//
E, W., J. Han, and A. Jentzen (2017, June). Deep learning-based numerical methods for highdimensional parabolic partial differential equations and backward stochastic differential equations. ArXiv e-prints.

Is this for real?? Someone with surname "E"???

Seems these are Princeton men.
planetoid 17300 == 2002 AV_63

Billy7
Posts: 262
Joined: March 30th, 2016, 2:12 pm

Re: 100 millions time faster than ODE methods

This is a trailer from a recent very good thesis on Heston and Rough Heston. Instead of ANN being $10^4$ faster, in this case it is $[8,17]$ times slower.

Back to this old favorite forum after a few years.

Being on the traditional side myself I've been dismissing NN's, like many others I get an allergic reaction to everything that feels hyped up. "100 millions times faster!" Lol. "NN's don't suffer from the curse of dimensionality!" Yeah right..."Every function can be approximated, the Universal Approximation Theorem says so, why are you wasting your time on complicated PDE methods", a younger colleague of mine said. "I believe you" I said, "but to get the necessary accuracy how many years must I spend training it?" This was my instinctive reaction. But during the last months I've had lots of free time (and no social life) and finally gave this a try. Now I think I have a more balanced view of their applicability (but still learning).

Checking for opinions I was not surprised that most here share my initial suspicion. Having said all that, I think this thesis arrives at the wrong conclusion.
This is easy to check just by looking at the first and simplest case (Heston option pricing). The network used had 3 hidden layers of 50 nodes each with elu as activation function. This means that to get a price one just needs to perform a couple of 50 x 50 matrix multiplications and 150 exp calculations. It is stated that this took 58ms, which tells me there was something seriously wrong with the implementation. The correct timing should've been about the same number but in microsecs. I actually checked this on my PC and adjusting for the single core performance difference between my CPU and the one used in the thesis, I reckon the ANN should've timed at no more than 0.15ms. Which would mean the NN was actually about 60 times faster than the "traditional" method.

Besides, this was one of the most unfavorable comparisons for NN's. What about approximating the price of some exotic that requires a 2D PDE scheme, or Monte Carlo. That can take quite a bit more than the method used in this thesis and the difference an NN approximation would do in such cases (if successful) would be far greater than 60x.

So I wouldn't dismiss the hype just yet. The issues seem to be more how to guarantee accuracy everywhere in the parameter hyperspace (not just looking at average errors) and a perceived lack of transparency.

Alan
Posts: 2483
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: 100 millions time faster than ODE methods

Welcome back.

I would like to see a simple interpolating table lookup in the horserace. Take Heston option pricing, rounding the traditional method to 10ms. With 1000 secs of "table building" your (in-memory) table can have 100,000 entries. With 4 parameters, that's about 18 entries per parameter. Presumably the execution time to create an interpolated option value from the table is much faster than the best ANN times you mention. How's the accuracy compare?

Alan
Posts: 2483
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: 100 millions time faster than ODE methods

Sorry -- really 6 parameters, I suppose:  4 V-process parameters + T, moneyness for the option.
So we're down to about 7 axis values per parameter.

Still, might be an interesting compare, maybe again using 10 values per parameters and allowing the extra time to make the million entry table. (And we haven't even parallelized that!)

Billy7
Posts: 262
Joined: March 30th, 2016, 2:12 pm

Re: 100 millions time faster than ODE methods

Sorry -- really 6 parameters, I suppose:  4 V-process parameters + T, moneyness for the option.
So we're down to about 7 axis values per parameter.

Still, might be an interesting compare, maybe again using 10 values per parameters and allowing the extra time to make the million entry table. (And we haven't even parallelized that!)

Hi Alan, thanks, always good talking to you.

Why 6 parameters though and not 9? Namely S/K, T, r, d, v0, vBar, kappa, xi and rho. Don't we need all these to price a vanilla under Heston?

As for this 1000secs, it's an arbitrary (curiously low) number that student used. Since you only need to do this once, why be so stingy? What's stopping you from letting it run for a whole day or more? So you could create 10 million entries if you want, or more. But with 9 dimensions here it would still only give you 6 points per parameter. I doubt that can cut it for any practical level of accuracy.

Of course there are more sophisticated ways of interpolating in high dimensions, one idea being not using a cartesian grid, but rather placing the interpolating points a la Monte Carlo, or via quasi random sequences. There was a relevant talk last week at the CQF conference, probably available online if interested, it was called something like "Alternatives to NN's in Finance".

I have no idea how would these alternative interpolating methods compare with the NN's black magic. But they sure sounded more complicated to implement (though more "transparent"). And I may be wrong but I got the impression that they wouldn't be an option for dimension > 10 (wasn't paying 100% attention I admit).

Alan
Posts: 2483
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: 100 millions time faster than ODE methods

Same here Yiannis.

Oh, I forgot about v0. But, you can collapse carry stuff into X = log S/K + (r-d)T. Then a normalized call value is
c = C/(K e^(-r T)) = f(X, T, v0, vBar, kappa, xi, rho), so maybe the final answer is 7 for tables? Might (barely) be doable with acceptable accuracy.

Agree that 10+ parameters would be quite problematic for tables.

Billy7
Posts: 262
Joined: March 30th, 2016, 2:12 pm

Re: 100 millions time faster than ODE methods

Same here Yiannis.

Oh, I forgot about v0. But, you can collapse carry stuff into X = log S/K + (r-d)T. Then a normalized call value is
c = C/(K e^(-r T)) = f(X, T, v0, vBar, kappa, xi, rho), so maybe the final answer is 7 for tables? Might (barely) be doable with acceptable accuracy.

Agree that 10+ parameters would be quite problematic for tables.

Thanks Alan, I had not thought of applying any transformation. Yes, it could maybe cut it, barely as you say. If anyone here is willing to try it would be nice, I have enough things to experiment with. But I am not expecting it to be sufficiently accurate to use say for a calibration. And as I said, such a coarse uniform grid would just not have enough resolution in areas of high curvature. It would need to become fairly sophisticated to give acceptable results is my guess.
With NN's on the other hand, you can easily get pretty good results using Python libraries like Tensorflow. If you have the time to experiment with all the possible combinations of set-ups that is.

cko22
Posts: 1
Joined: June 26th, 2020, 6:13 pm

Re: 100 millions time faster than ODE methods

This is a trailer from a recent very good thesis on Heston and Rough Heston. Instead of ANN being $10^4$ faster, in this case it is $[8,17]$ times slower.

Back to this old favorite forum after a few years.

Being on the traditional side myself I've been dismissing NN's, like many others I get an allergic reaction to everything that feels hyped up. "100 millions times faster!" Lol. "NN's don't suffer from the curse of dimensionality!" Yeah right..."Every function can be approximated, the Universal Approximation Theorem says so, why are you wasting your time on complicated PDE methods", a younger colleague of mine said. "I believe you" I said, "but to get the necessary accuracy how many years must I spend training it?" This was my instinctive reaction. But during the last months I've had lots of free time (and no social life) and finally gave this a try. Now I think I have a more balanced view of their applicability (but still learning).

Checking for opinions I was not surprised that most here share my initial suspicion. Having said all that, I think this thesis arrives at the wrong conclusion.
This is easy to check just by looking at the first and simplest case (Heston option pricing). The network used had 3 hidden layers of 50 nodes each with elu as activation function. This means that to get a price one just needs to perform a couple of 50 x 50 matrix multiplications and 150 exp calculations. It is stated that this took 58ms, which tells me there was something seriously wrong with the implementation. The correct timing should've been about the same number but in microsecs. I actually checked this on my PC and adjusting for the single core performance difference between my CPU and the one used in the thesis, I reckon the ANN should've timed at no more than 0.15ms. Which would mean the NN was actually about 60 times faster than the "traditional" method.

Besides, this was one of the most unfavorable comparisons for NN's. What about approximating the price of some exotic that requires a 2D PDE scheme, or Monte Carlo. That can take quite a bit more than the method used in this thesis and the difference an NN approximation would do in such cases (if successful) would be far greater than 60x.

So I wouldn't dismiss the hype just yet. The issues seem to be more how to guarantee accuracy everywhere in the parameter hyperspace (not just looking at average errors) and a perceived lack of transparency.
I am the author of the thesis. Thanks Billy7 for your comments.

Having read your comments, I re-run the experiment, and it gave similar results. I have included a code snippet here. I would appreciate it if you could elaborate more on your implementation. Did you also implement the traditional method? How much time does it take you to compute an option price using the traditional method?
#### Measure ANN prediction time
import time

# Samples: spot_price, strike_price, risk_free_rate, dividend_yield, initial_vol, maturity_time,
#                   long_term_vol,mean_reversion_rate, vol_vol, price_vol_corr
samples = pd.DataFrame([31.426, 42.533, 0.0423, 0, 0.05962, 1.5632, 0.5066, 0.9461, 0.4976, -0.4763])

def main():

tic = time.perf_counter()
model.predict(samples.T)
toc = time.perf_counter()
print(f"Heston Option Pricing ANN prediction takes {toc - tic:0.4f} seconds")

if __name__ == "__main__":
main()

Billy7
Posts: 262
Joined: March 30th, 2016, 2:12 pm

Re: 100 millions time faster than ODE methods

I am the author of the thesis. Thanks Billy7 for your comments.

Having read your comments, I re-run the experiment, and it gave similar results. I have included a code snippet here. I would appreciate it if you could elaborate more on your implementation. Did you also implement the traditional method? How much time does it take you to compute an option price using the traditional method?
#### Measure ANN prediction time
import time

# Samples: spot_price, strike_price, risk_free_rate, dividend_yield, initial_vol, maturity_time,
#                   long_term_vol,mean_reversion_rate, vol_vol, price_vol_corr
samples = pd.DataFrame([31.426, 42.533, 0.0423, 0, 0.05962, 1.5632, 0.5066, 0.9461, 0.4976, -0.4763])

def main():

tic = time.perf_counter()
model.predict(samples.T)
toc = time.perf_counter()
print(f"Heston Option Pricing ANN prediction takes {toc - tic:0.4f} seconds")

if __name__ == "__main__":
main()

Hi there and welcome to the forum, good of you to reply!
That should give you a big speed up already:)
Then you could also just get the weights and biases matrices from the saved model and perform the few matrix multiplications yourself using numpy.
That's what I did, but I also did it in C++ for a bit extra (which I don't think added anything really since numpy matrix multiplications are very fast already).

No, I did nothing with the traditional method, no time:)  I just looked up your processor and compared it with mine and made an adjustment to my timing (which was really 70microsecs, so I thought it could be about 150microsecs on your CPU. I read in your thesis that the code for that was given to you by your supervisor and I imagine is an optimized piece of code in C++, right? So in order to make a fair comparison with the net, you should be using an equally optimized version of net calculation, or at least not the slowest available (which is model.predict).

snowde
Posts: 9
Joined: December 14th, 2021, 4:15 pm

Re: 100 millions time faster than ODE methods

One of my colleagues at The Turing has taken this even further with Neural-SDEs https://arxiv.org/abs/2007.04154