Page 8 of 21

Re: Universal Approximation theorem

Posted: October 7th, 2019, 3:06 pm
by Cuchulainn

Re: Universal Approximation theorem

Posted: October 9th, 2019, 7:37 pm
by ISayMoo
Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf

Re: Universal Approximation theorem

Posted: October 9th, 2019, 8:09 pm
by Paul
Good find!

Genuine or spoof? And who has the patience to find out? But there's a clue in the first para:

"In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including,
e.g., language processing, image recognition, fraud detection, and computational advertisement."

So it's an advert for a Swiss AI research-paper bot.

Re: Universal Approximation theorem

Posted: October 9th, 2019, 10:38 pm
by FaridMoussaoui
So it's an advert for a Swiss AI research-paper bot.
Leave us alone!

Re: Universal Approximation theorem

Posted: October 11th, 2019, 9:14 am
by Cuchulainn
Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf
Please, don't hit me with them negative waves so early in the morning. You know the answer already.
I think you miss my points; the maths is a paper model but how does it work in a computer (as well). And are academics et al any good at writing a good story?


The first post on DL-PDE was by same author

https://arxiv.org/pdf/1706.04702.pdf

My comments along the cable to CH was

//
The background of the authors is DL and business (nothing wrong with that) but the article is riddled with errors that I don't even know where to begin:

1. The Galerkin method fell into disuse around 1943. More seriously, the article is really the MESHLESS method which has its own issues.
2. It is not even DL imo; just because you use a stochastic gradient method does not make it as DL.
3. "DGM is a natural merging of ML and Galerkin". Yes?
4.Details are missing (e.g. Table 1!!!!!!). 
5. I seems DL applications need some kind of analytic solution for training? PDEs don't have this in general.
6. "Approximate the second derivatives using Monte Carlo". This is fiction.
7. Major numerical difficulties can't be swept under the carpet. In fact, DL will compound them, e.g how to choose an optimal learning rate alpha (usually use Brent's method).
//



Both @JohnLeM and moi have these on our early warning system, we contacted them and no answer. It is 100% spoof.
EWS; a pure mathematical stating the benefits of the Ritz-Galerkin method..LOL, the standard is FEM these days, mate.

Re: Universal Approximation theorem

Posted: October 11th, 2019, 9:51 am
by ISayMoo
Someone (hint hint) should totally write a review of machine learning methods used for PDE to separate the good from the bad and the ugly, and submit it to JMLR. I've been told they do a good job with reviewing, albeit slow.

Re: Universal Approximation theorem

Posted: October 12th, 2019, 3:15 pm
by Cuchulainn
Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf
Thinking about this more, how is is possible to write reams and reams of mathematical results and publish so many articles in such a short space of time???????? Even worse, who benefits?

And we have no idea if the maths is correct.

http://www.ajentzen.de/

It just looks like theorem-proving. It leads to colour-blindness if taken to extremes. Mathematical formulae on paper are immune to catastrophic cancellation effects.

Re: Universal Approximation theorem

Posted: October 12th, 2019, 3:34 pm
by Cuchulainn
..

"In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including,
e.g., language processing, image recognition, fraud detection, and computational advertisement."
.
That's the sanity clause; it's in every AI paper.

Re: Universal Approximation theorem

Posted: October 14th, 2019, 10:10 am
by Cuchulainn
I have some simple questions about the Itkin paper. IMHO the experimental procedure is not described well enough:

1. What was the distribution of the sampled vectors [S, K, T, r, q, sigma]?
2. How did they split the complete sample into training and test sets?
3. Which optimiser was used for the results presented in the first sections of the paper: RMSProp or Adam?
4. What were the optimiser parameters, e.g. learning rate? How were they chosen? Did they do a parameter sweep and selected the best ones?
5. How many times was the test set used?
6. What were the values of no-arbitrage penalty constants lambda and m? How were they chosen?
7. How much accuracy was lost for out-of-distribution inputs?
I agree, There are a number of important issues that are not addressed. Maybe I am making a mountain from a mole hill:

For your point 6, I assume that you are referring to equation (5) (and (6)) of the Itkin paper? I am interested in how it works and how robust it is:

1. (5) looks like a variant of the Lagrange multiplier method albeit the constraints in equation (1) are strict > 0 instead of the 'usual' >= 0. 
2. Solving (5) still entails computing gradients? On a follow-on _how_ is (5) solved? The presence of [$]\Phi_{\lambda,m}(x)[$] doesn't make it easier.
And how are the 'Dupire sensitivities; computed? The thesis of Matt Robinson solves them as PDEs. satisfying the constraints in equation (1) up-front. More precisely, a given sensitivity satisfies a PDE (+ maximum principle) and approximate PDE by a monotone fd scheme, ensuring no-arbitrage constraints (1).
3. As the author mentions, the constained optimisation solution does not guarantee no-arbitrage. And does (5) have a (unique) minimum?
4.Are there round-off issues with (5), e.g. ATM greeks swamp the other terms.
5. A hunch is that in section 4.2 "Stability and Convergence" that using the PDE (CSE) for vega will avolid many bespoke problems and maybe also use it as training data?(?)
//
@ISayMoo
Dalvira Mantara addresses your (universal?) questions in the context of the SABR model.

Re: Universal Approximation theorem

Posted: October 14th, 2019, 2:36 pm
by katastrofa
Someone (hint hint) should totally write a review of machine learning methods used for PDE to separate the good from the bad and the ugly, and submit it to JMLR. I've been told they do a good job with reviewing, albeit slow.
Review? A rant about the universal approximation theorem being useless because it doesn't tell anything about the rate of convergence to a function - oh wait, a function you don't even know? We all know that it doesn't make sense. It's yet our another chaotic inconclusive debate made for show. Right?

Re: Universal Approximation theorem

Posted: October 14th, 2019, 3:11 pm
by Cuchulainn
Someone (hint hint) should totally write a review of machine learning methods used for PDE to separate the good from the bad and the ugly, and submit it to JMLR. I've been told they do a good job with reviewing, albeit slow.
Review? A rant about the universal approximation theorem being useless because it doesn't tell anything about the rate of convergence to a function - oh wait, a function you don't even know? We all know that it doesn't make sense. It's yet our another chaotic inconclusive debate made for show. Right?
UAT is based on measurable functions which is too broad a class for numerical analysis. However, the consequences of relying on the  UAT magic wand deserves attention.

UAT is like a blunt chainsaw. It ignores the essential difficulties of mathematics.
But a review on ML finance would be more than UAT!

Re: Universal Approximation theorem

Posted: October 15th, 2019, 9:04 am
by Cuchulainn
The mathematical precision in Cybenko 1988 has been superceded/improved on here

http://www2.math.technion.ac.il/~pinkus/papers/acta.pdf

In particular, Theorems 3.1, 4.1, 5.1, 6.2, 6.7,  Proposition 3.3. 

Seems like ML' maths is stuck in the 80s. The mathematical subtleties surrounding activation functions seem to have been missed, causing issues during numerics. Maybe I have missed something.

// Allan Pinkus studied with Samuel Karlin. 

Re: Universal Approximation theorem

Posted: October 15th, 2019, 9:21 am
by Cuchulainn
A rant about the universal approximation theorem being useless because it doesn't tell anything about the rate of convergence to a function 
What is your opinion after having read the Pinkus' paper?

Re: Universal Approximation theorem

Posted: October 15th, 2019, 9:30 am
by Cuchulainn
So, for a given problem, which activation function would you use?

https://en.wikipedia.org/wiki/Activation_function

And I don't mean, experiment and see which is 'best'..

Ex. Dalivir Mantara used ISRLU for Greeks, Itkin uses MELU.

Re: Universal Approximation theorem

Posted: October 15th, 2019, 9:57 am
by JohnLeM
Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf
I don't buy this paper !! Try to read it, you will understand what I mean :)