SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
User avatar
Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

MSc Theses on Machine Learning and Computational Finance

October 7th, 2019, 2:59 pm

Background
I have been an external supervisor at the University of Birmingham since 2014 for MSc students who research and produce a thesis in the summer months June-September. The focus is on analysing a financial model, approximate it using numerical methods and then producing working code in C++ (or Python). Summarising, students have three months to produce useful results.
We take two theses which we hope are interesting to those working in computational finance. The results in these theses can be generalised to other related problems such as:
  • The Heston model (analytical solution and Yanenko Splitting and Alternating Explicit (ADE) methods).
  • Using (Gaussian) Radial Basis Functions instead of traditional Backpropagation to compute neural network weights.
  • A mathematical, numerical and computational analysis of the Continuous Sensitivity Equation (CSE) method.
  • Parallel software design for ML/PDE applications.
A number of these methods would be at PhD level. If you have any queries, please do not hesitate to contact us.

Daniel J. Duffy is mathematician, software designer and coach/trainer. He has a PhD from the University of Dublin, Ireland (Trinity College).


DALVIR MANDARA
This thesis is concerned with the application of artificial neural networks (ANN) to price options under the Black Scholes (BS) model and an ANN-based framework to predict implied volatility generated by the SABR stochastic volatility model. The experiments show that the ANN architecture is able to predict the price of call options under BS as well as to predict individual implied volatility and implied volatility surfaces under the normal and log-normal SABR models.
An important part in the thesis is the application of the promising image-based implicit method (Horvath et al 2019) which uses a grid of output values in contrast to traditional neural network setups in which input vectors map to a single output. The thesis discusses the benefits of this new method, learning non-linear relationships being one of them.
This thesis can be seen as a serious work on applying and integrating Machine Learning and computational finance. It is very well written and all supervisor’s suggestions were taken on board (cross-validation, K-folds, Figure 0.1) and in a timely fashion. Some of the results are original and future research is a possibility. The programming language used is Python.
The main result is to show how ML can be used for both call and put option pricing and implied volatility calculation. It is of interest to quantitative analysts and developers.


MATT ROBINSON
This thesis introduces and elaborates on how to approximate option and bond price sensitivities (Black Scholes and Cox-Ingersoll-Ross (CIR) models) in a variety of ways. In general, option price depends on time and on the underlying stock variables as well as on a number of parameters such as volatility, interest rate and strike. The rate of change of the option price with respect to these quantities is computed (in the main, the first and second derivatives).
The thesis discusses a wide range of techniques to compute sensitivities. For example, if an analytic expression for the option is known then we can differentiate the formula or we can apply the Complex Step Method (CSM) to compute the sensitivity. Another popular method is Automatic Differentiation (AD). Continuing, it is possible to discretize the PDE to compute an approximate option price as an array and from there compute option delta and gamma using divided differences or cubic splines. For other sensitivities (such as vega, for example) this approach does not work and then the Continuous Sensitivity Equation (CSE) method is used which allows us to write the sensitivity as the solution of an initial boundary value problem for a Black-Scholes type PDE. The student also discovered new research topics as the project progressed such as well-posedness of the PDEs resulting from CSE and cases in which a PDE can have multiple solutions.
The thesis is well written; the topics have been properly researched and documented. The programming language used is C++11 and the design patterns and state-of-art methods for PDE/FDM in the book Financial Instrument Pricing using C++, 2nd edition, 2018 (John Wiley) are applied and extended.
The main result is to show how PDE models can be used to calculate option sensitivities using a range of robust and accurate numerical methods.

(A third is concerned with the application of artificial neural networks (ANN) to price options under the Black Scholes (BS) model and the Heston stochastic volatility model. In both models the analytical solution is used to produce the training data. It will be published elsewhere.)
See below on where to download these two theses.

Dalvir Mandara Artificial Neural Networks for Black-Scholes Option Pricing and Prediction of Implied Volatility for the SABR Stochastic Volatility Model

Matt Robinson Sensitivities: A Numerical Approach

//
This full text etc. is also to be found here
Blogs :: Datasim
 
User avatar
JohnLeM
Posts: 323
Joined: September 16th, 2008, 7:15 pm

Re: MSc Theses on Machine Learning and Computational Finance

October 8th, 2019, 7:57 am

Cuchullain,
Hello. I have read the first one, Dalvir Mandara Artificial Neural Networks for Black-Scholes Option Pricing and Prediction of Implied Volatility for the SABR Stochastic Volatility Model


First congratulations to the student and their supervisor, this is a nice work for a MsC. It is very understandable,  self-explanatory, and a serious study, and was a pleasure to read, really.

You however already know that my main criticisms : basically, there is no theoretical foundations in this work and related ones as Itkins and the insulting paper of Mc Ghee. The research is blind here : use generic tools from Keras or TensorFlow, measure errors ... but nobody can tell if the resulting algorithm is performant or not. In fact it is not : you are working with 34,049 trainable parameters in this study. It is enough to work with as few as 512 trainable parameters to get similar results.

The second criticism is methodologic. I don't understand the financial motivation, since there exists a faster method than these ones : we can calibrate directly and exactly any stochastic process though through call / put prices. This is a very fast procedure, done once for all. Then we can price any derivaive with it. Thus why using AI here ?
 
User avatar
Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: MSc Theses on Machine Learning and Computational Finance

October 8th, 2019, 9:37 am

JohnLeM,
Thanks.  Glad you enjoyed it.The thesis was a joy to me as well. There's a real story in there with a beginning, middle and end. I will use it as a template structure for future MSc theses and even industry quants who write notes/articles can learn from the style ;)

Now, the goal was in essence a proof-of-concept (POC) to compare against the small number and unclear articles floating on the network. Remember, it is a 3-month project and the academic approach cannot be ignored. I motivated Dalvir to take a modular approach (Figure 0.1) to demystify that whole ML/PDE discussion.
So "get it working, then get it right" and avoid premature optimisation because it is not yet on the critical path,

Some feedback
1. Used Python because OpenCV C++ was not up to the job. Lack of time to investigate in 3 months. Ideally, I would prefer to do everything in C++ from the ground up.
2. That this can be done in traditional ways does not concern me just yet. It's not the point. The goal is that AI can get similar results. Having said that, it may turn out that your approach is much better. Time will tell.
3. I agree, the maths behind the ML is somewhat flaky, as evidenced by the quality of the discussion on the "UAT" thread (for me, it is the equivalent of deus ex machina/not even wrong/holy hand granada of Antioch, whatever, LOL), CS and maths is sometimes like oil and water. These days i'm very much in nitty-gritty mode.
Maybe take more time before publishing is better?
4. Performance via parallel design, TBD. We did not investigate as it is not not yet on the critical path,
5. I am not a fan of MLP in the sense that it is not the only kid on the block. This is OK because of modular decomposition (see Figure 0.1 again).
6. but nobody can tell if the resulting algorithm is performant or not. Not sure if I completely agree --> cross-validation and 5-folds were used.

In a sense, I see it as the first stake in the ground for further discussion. Your points  can now be addressed, one by one, i.e. requirements for the next round of the 'spiral'.
Last edited by Cuchulainn on October 8th, 2019, 10:16 am, edited 8 times in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: MSc Theses on Machine Learning and Computational Finance

October 8th, 2019, 9:54 am

P.S. All the honour goes to these two students

Image
 
User avatar
JohnLeM
Posts: 323
Joined: September 16th, 2008, 7:15 pm

Re: MSc Theses on Machine Learning and Computational Finance

October 8th, 2019, 12:32 pm

6. but nobody can tell if the resulting algorithm is performant or not. Not sure if I completely agree --> cross-validation and 5-folds were used.
I don't contest the numerical figures, you did a great job to guarantee them with this K-fold method, as well as teaching me the existence of this method. I simply meant that the algorithm complexity is far from being optimal : to get an extrapolation accuracy of  order 10^-6 in dimension 6 for the smooth function studied, 500 interpolation points (or trainable parameters as you call them) should be enough.
This is probably due to the multi-layer construction in deep learning : what is the motivation in using them ? vanishing gradients ?
 
User avatar
Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: MSc Theses on Machine Learning and Computational Finance

October 8th, 2019, 6:20 pm

6. but nobody can tell if the resulting algorithm is performant or not. Not sure if I completely agree --> cross-validation and 5-folds were used.
I don't contest the numerical figures, you did a great job to guarantee them with this K-fold method, as well as teaching me the existence of this method. I simply meant that the algorithm complexity is far from being optimal : to get an extrapolation accuracy of  order 10^-6 in dimension 6 for the smooth function studied, 500 interpolation points (or trainable parameters as you call them) should be enough.
This is probably due to the multi-layer construction in deep learning : what is the motivation in using them ? vanishing gradients ?
There are well-documented issues with BPN. But the ball is in your court to prove or disprove your question if you are serious. You are the only one here who can resolve the problem by now doing the nitty-gritties, We need to go deeper than coffee-salon discussions.
 
User avatar
Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: MSc Theses on Machine Learning and Computational Finance

October 8th, 2019, 6:55 pm

JohnlL.eM,
Adding to your list of questions, I wrote this a while back

@Cuchullain, for me, Gradient Descent is a swiss-knife methods. Always produce results, but can be stuck in local minima.


Local minima, if it is lucky, That's the least of your worries. GD has a whole lot of issues: Off the top of my head

0. Inside GD lurks a nasty Euler method.
1. Initial guess must be close to real solution (Analyse Numerique 101).
2. No guarantee that GD is applicable in the first place (assumes cost function is smooth).
3. "Vanishing gradient syndrome"
https://en.wikipedia.org/wiki/Vanishing ... nt_problem
4. Learning rate parameter... so many to choose from (ad hoc/trial and error process).
5. Use Armijo  and Wolfe to improve convergence.
6. Modify algorithm by adding momentum.
7. Any you have to compute gradient 1) exact, 2) FDM, 3) AD, 4) complex step method.
8. Convergence to local minimum.
9. The method is iterative, so no true reliable quality of service (QOS).
10. It's not very robust (cf. adversarial examples). Try regularization.

There might be some more.

There are zillions of blogs out there on this...

11. choice of activation function
.
 
User avatar
JohnLeM
Posts: 323
Joined: September 16th, 2008, 7:15 pm

Re: MSc Theses on Machine Learning and Computational Finance

October 9th, 2019, 6:24 am

6. but nobody can tell if the resulting algorithm is performant or not. Not sure if I completely agree --> cross-validation and 5-folds were used.
I don't contest the numerical figures, you did a great job to guarantee them with this K-fold method, as well as teaching me the existence of this method. I simply meant that the algorithm complexity is far from being optimal : to get an extrapolation accuracy of  order 10^-6 in dimension 6 for the smooth function studied, 500 interpolation points (or trainable parameters as you call them) should be enough.
This is probably due to the multi-layer construction in deep learning : what is the motivation in using them ? vanishing gradients ?
There are well-documented issues with BPN. But the ball is in your court to prove or disprove your question if you are serious. You are the only one here who can resolve the problem by now doing the nitty-gritties, We need to go deeper than coffee-salon discussions.
Ok, I suggest to postpone the technical conversation in another thread within some days. I am still reading the second thesis, but regarding the three-months work, the thesis are good, congrats again.
 
User avatar
JohnLeM
Posts: 323
Joined: September 16th, 2008, 7:15 pm

Re: MSc Theses on Machine Learning and Computational Finance

October 9th, 2019, 6:33 am

JohnlL.eM,
Adding to your list of questions, I wrote this a while back

@Cuchullain, for me, Gradient Descent is a swiss-knife methods. Always produce results, but can be stuck in local minima.


Local minima, if it is lucky, That's the least of your worries. GD has a whole lot of issues: Off the top of my head

0. Inside GD lurks a nasty Euler method.
1. Initial guess must be close to real solution (Analyse Numerique 101).
2. No guarantee that GD is applicable in the first place (assumes cost function is smooth).
3. "Vanishing gradient syndrome"
https://en.wikipedia.org/wiki/Vanishing ... nt_problem
4. Learning rate parameter... so many to choose from (ad hoc/trial and error process).
5. Use Armijo  and Wolfe to improve convergence.
6. Modify algorithm by adding momentum.
7. Any you have to compute gradient 1) exact, 2) FDM, 3) AD, 4) complex step method.
8. Convergence to local minimum.
9. The method is iterative, so no true reliable quality of service (QOS).
10. It's not very robust (cf. adversarial examples). Try regularization.

There might be some more.

There are zillions of blogs out there on this...

11. choice of activation function
.
Well, going deeper into the coffee-salon talk, I do confess that I have another philosophy regarding minimization problem, that is a lazy one or a precaution principle : when I face a minimization problem for which a simple gradient descent fails, then I usually prefer trying to reformulate the problem into a convex one rather than to question the minimization algorithm. Indeed, a failure of the simple Euler method might indicate that the problem is misunderstood.
 
User avatar
Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: MSc Theses on Machine Learning and Computational Finance

October 9th, 2019, 8:18 am

JohnlL.eM,
Adding to your list of questions, I wrote this a while back

@Cuchullain, for me, Gradient Descent is a swiss-knife methods. Always produce results, but can be stuck in local minima.


Local minima, if it is lucky, That's the least of your worries. GD has a whole lot of issues: Off the top of my head

0. Inside GD lurks a nasty Euler method.
1. Initial guess must be close to real solution (Analyse Numerique 101).
2. No guarantee that GD is applicable in the first place (assumes cost function is smooth).
3. "Vanishing gradient syndrome"
https://en.wikipedia.org/wiki/Vanishing ... nt_problem
4. Learning rate parameter... so many to choose from (ad hoc/trial and error process).
5. Use Armijo  and Wolfe to improve convergence.
6. Modify algorithm by adding momentum.
7. Any you have to compute gradient 1) exact, 2) FDM, 3) AD, 4) complex step method.
8. Convergence to local minimum.
9. The method is iterative, so no true reliable quality of service (QOS).
10. It's not very robust (cf. adversarial examples). Try regularization.

There might be some more.

There are zillions of blogs out there on this...

11. choice of activation function
.
Well, going deeper into the coffee-salon talk, I do confess that I have another philosophy regarding minimization problem, that is a lazy one or a precaution principle : when I face a minimization problem for which a simple gradient descent fails, then I usually prefer trying to reformulate the problem into a convex one rather than to question the minimization algorithm. Indeed, a failure of the simple Euler method might indicate that the problem is misunderstood.
Yes! I agree GD is an Euler method in disguise and I don't like all the 'momentum' fixes.
For global optimisation solving an SDE shows more promise. I did a few test by combining SGD idea with SDE but I need to resurrect the results
https://forum.wilmott.com/viewtopic.php?f=34&t=101662
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On