SERVING THE QUANTITATIVE FINANCE COMMUNITY

• 1
• 6
• 7
• 8
• 9
• 10

Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Universal Approximation theorem

ISayMoo
Posts: 2050
Joined: September 30th, 2015, 8:30 pm

### Re: Universal Approximation theorem

Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf

Paul
Posts: 9465
Joined: July 20th, 2001, 3:28 pm

### Re: Universal Approximation theorem

Good find!

Genuine or spoof? And who has the patience to find out? But there's a clue in the first para:

"In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including,

So it's an advert for a Swiss AI research-paper bot.

FaridMoussaoui
Posts: 433
Joined: June 20th, 2008, 10:05 am
Location: Genève, Genf, Ginevra, Geneva

### Re: Universal Approximation theorem

So it's an advert for a Swiss AI research-paper bot.
Leave us alone!

Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Universal Approximation theorem

Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf
Please, don't hit me with them negative waves so early in the morning. You know the answer already.
I think you miss my points; the maths is a paper model but how does it work in a computer (as well). And are academics et al any good at writing a good story?

The first post on DL-PDE was by same author

https://arxiv.org/pdf/1706.04702.pdf

My comments along the cable to CH was

//
The background of the authors is DL and business (nothing wrong with that) but the article is riddled with errors that I don't even know where to begin:

1. The Galerkin method fell into disuse around 1943. More seriously, the article is really the MESHLESS method which has its own issues.
2. It is not even DL imo; just because you use a stochastic gradient method does not make it as DL.
3. "DGM is a natural merging of ML and Galerkin". Yes?
4.Details are missing (e.g. Table 1!!!!!!).
5. I seems DL applications need some kind of analytic solution for training? PDEs don't have this in general.
6. "Approximate the second derivatives using Monte Carlo". This is fiction.
7. Major numerical difficulties can't be swept under the carpet. In fact, DL will compound them, e.g how to choose an optimal learning rate alpha (usually use Brent's method).
//

Both @JohnLeM and moi have these on our early warning system, we contacted them and no answer. It is 100% spoof.
EWS; a pure mathematical stating the benefits of the Ritz-Galerkin method..LOL, the standard is FEM these days, mate.

ISayMoo
Posts: 2050
Joined: September 30th, 2015, 8:30 pm

### Re: Universal Approximation theorem

Someone (hint hint) should totally write a review of machine learning methods used for PDE to separate the good from the bad and the ugly, and submit it to JMLR. I've been told they do a good job with reviewing, albeit slow.

Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Universal Approximation theorem

Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf
Thinking about this more, how is is possible to write reams and reams of mathematical results and publish so many articles in such a short space of time???????? Even worse, who benefits?

And we have no idea if the maths is correct.

http://www.ajentzen.de/

It just looks like theorem-proving. It leads to colour-blindness if taken to extremes. Mathematical formulae on paper are immune to catastrophic cancellation effects.

Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Universal Approximation theorem

..

"In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including,
.
That's the sanity clause; it's in every AI paper.

Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Universal Approximation theorem

I have some simple questions about the Itkin paper. IMHO the experimental procedure is not described well enough:

1. What was the distribution of the sampled vectors [S, K, T, r, q, sigma]?
2. How did they split the complete sample into training and test sets?
3. Which optimiser was used for the results presented in the first sections of the paper: RMSProp or Adam?
4. What were the optimiser parameters, e.g. learning rate? How were they chosen? Did they do a parameter sweep and selected the best ones?
5. How many times was the test set used?
6. What were the values of no-arbitrage penalty constants lambda and m? How were they chosen?
7. How much accuracy was lost for out-of-distribution inputs?
I agree, There are a number of important issues that are not addressed. Maybe I am making a mountain from a mole hill:

For your point 6, I assume that you are referring to equation (5) (and (6)) of the Itkin paper? I am interested in how it works and how robust it is:

1. (5) looks like a variant of the Lagrange multiplier method albeit the constraints in equation (1) are strict > 0 instead of the 'usual' >= 0.
2. Solving (5) still entails computing gradients? On a follow-on _how_ is (5) solved? The presence of $\Phi_{\lambda,m}(x)$ doesn't make it easier.
And how are the 'Dupire sensitivities; computed? The thesis of Matt Robinson solves them as PDEs. satisfying the constraints in equation (1) up-front. More precisely, a given sensitivity satisfies a PDE (+ maximum principle) and approximate PDE by a monotone fd scheme, ensuring no-arbitrage constraints (1).
3. As the author mentions, the constained optimisation solution does not guarantee no-arbitrage. And does (5) have a (unique) minimum?
4.Are there round-off issues with (5), e.g. ATM greeks swamp the other terms.
5. A hunch is that in section 4.2 "Stability and Convergence" that using the PDE (CSE) for vega will avolid many bespoke problems and maybe also use it as training data?(?)
//
@ISayMoo
Dalvira Mantara addresses your (universal?) questions in the context of the SABR model.

katastrofa
Posts: 8135
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Universal Approximation theorem

Someone (hint hint) should totally write a review of machine learning methods used for PDE to separate the good from the bad and the ugly, and submit it to JMLR. I've been told they do a good job with reviewing, albeit slow.
Review? A rant about the universal approximation theorem being useless because it doesn't tell anything about the rate of convergence to a function - oh wait, a function you don't even know? We all know that it doesn't make sense. It's yet our another chaotic inconclusive debate made for show. Right?

Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Universal Approximation theorem

Someone (hint hint) should totally write a review of machine learning methods used for PDE to separate the good from the bad and the ugly, and submit it to JMLR. I've been told they do a good job with reviewing, albeit slow.
Review? A rant about the universal approximation theorem being useless because it doesn't tell anything about the rate of convergence to a function - oh wait, a function you don't even know? We all know that it doesn't make sense. It's yet our another chaotic inconclusive debate made for show. Right?
UAT is based on measurable functions which is too broad a class for numerical analysis. However, the consequences of relying on the  UAT magic wand deserves attention.

UAT is like a blunt chainsaw. It ignores the essential difficulties of mathematics.
But a review on ML finance would be more than UAT!

Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Universal Approximation theorem

The mathematical precision in Cybenko 1988 has been superceded/improved on here

http://www2.math.technion.ac.il/~pinkus/papers/acta.pdf

In particular, Theorems 3.1, 4.1, 5.1, 6.2, 6.7,  Proposition 3.3.

Seems like ML' maths is stuck in the 80s. The mathematical subtleties surrounding activation functions seem to have been missed, causing issues during numerics. Maybe I have missed something.

// Allan Pinkus studied with Samuel Karlin.

Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Universal Approximation theorem

A rant about the universal approximation theorem being useless because it doesn't tell anything about the rate of convergence to a function

Cuchulainn
Topic Author
Posts: 59926
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Universal Approximation theorem

So, for a given problem, which activation function would you use?

https://en.wikipedia.org/wiki/Activation_function

And I don't mean, experiment and see which is 'best'..

Ex. Dalivir Mantara used ISRLU for Greeks, Itkin uses MELU.

JohnLeM
Posts: 323
Joined: September 16th, 2008, 7:15 pm

### Re: Universal Approximation theorem

Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf
I don't buy this paper !! Try to read it, you will understand what I mean
Last edited by JohnLeM on October 15th, 2019, 11:46 am, edited 1 time in total.