theses can be found here

https://forum.wilmott.com/viewtopic.php?f=11&t=102097

- Cuchulainn
**Posts:**62132**Joined:****Location:**Amsterdam-
**Contact:**

Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf

Good find!

Genuine or spoof? And who has the patience to find out? But there's a clue in the first para:

"In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including,

e.g.,** language processing, image recognition, fraud detection, and computational advertisement**."

So it's an advert for a Swiss AI research-paper bot.

Genuine or spoof? And who has the patience to find out? But there's a clue in the first para:

"In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including,

e.g.,

So it's an advert for a Swiss AI research-paper bot.

- FaridMoussaoui
**Posts:**507**Joined:****Location:**Genève, Genf, Ginevra, Geneva

Leave us alone!So it's an advert for a Swiss AI research-paper bot.

- Cuchulainn
**Posts:**62132**Joined:****Location:**Amsterdam-
**Contact:**

Please, don't hit me with them negative waves so early in the morning. You know the answer already.Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf

I think you miss my points; the maths is a paper model but how does it work in a computer (as well). And are academics et al any good at writing a good story?

The first post on DL-PDE was by same author

https://arxiv.org/pdf/1706.04702.pdf

My comments along the cable to CH was

//

//

Both @JohnLeM and moi have these on our early warning system, we contacted them and no answer. It is 100% spoof.

EWS; a pure mathematical stating the benefits of the Ritz-Galerkin method..LOL, the standard is FEM these days, mate.

Someone (hint hint) should totally write a review of machine learning methods used for PDE to separate the good from the bad and the ugly, and submit it to JMLR. I've been told they do a good job with reviewing, albeit slow.

- Cuchulainn
**Posts:**62132**Joined:****Location:**Amsterdam-
**Contact:**

Thinking about this more, how is is possible to write reams and reams of mathematical results and publish so many articles in such a short space of time???????? Even worse, who benefits?Cuch, you complain that there's not enough maths in ML. Is this paper sufficiently mathy for you? https://arxiv.org/pdf/1908.10828.pdf

And we have no idea if the maths is correct.

http://www.ajentzen.de/

It just looks like theorem-proving. It leads to colour-blindness if taken to extremes. Mathematical formulae on paper are immune to catastrophic cancellation effects.

- Cuchulainn
**Posts:**62132**Joined:****Location:**Amsterdam-
**Contact:**

That's the sanity clause; it's in every AI paper...

"In the past few years deep artificial neural networks (DNNs) have been successfully employed in a large number of computational problems including,

e.g.,language processing, image recognition, fraud detection, and computational advertisement."

.

- Cuchulainn
**Posts:**62132**Joined:****Location:**Amsterdam-
**Contact:**

I agree, There are a number of important issues that are not addressed. Maybe I am making a mountain from a mole hill:I have some simple questions about the Itkin paper. IMHO the experimental procedure is not described well enough:

1. What was the distribution of the sampled vectors [S, K, T, r, q, sigma]?

2. How did they split the complete sample into training and test sets?

3. Which optimiser was used for the results presented in the first sections of the paper: RMSProp or Adam?

4. What were the optimiser parameters, e.g. learning rate? How were they chosen? Did they do a parameter sweep and selected the best ones?

5. How many times was the test set used?

6. What were the values of no-arbitrage penalty constants lambda and m? How were they chosen?

7. How much accuracy was lost for out-of-distribution inputs?

For your point 6, I assume that you are referring to equation (5) (and (6)) of the Itkin paper? I am interested in how it works and how robust it is:

1. (5) looks like a variant of the Lagrange multiplier method albeit the constraints in equation (1) are strict > 0 instead of the 'usual' >= 0.

2. Solving (5) still entails computing gradients? On a follow-on _how_ is (5) solved? The presence of [$]\Phi_{\lambda,m}(x)[$] doesn't make it easier.

And how are the 'Dupire sensitivities; computed? The thesis of Matt Robinson solves them as PDEs. satisfying the constraints in equation (1) up-front. More precisely, a given sensitivity satisfies a PDE (+ maximum principle) and approximate PDE by a monotone fd scheme, ensuring no-arbitrage constraints (1).

3. As the author mentions, the constained optimisation solution does not guarantee no-arbitrage. And does (5) have a (unique) minimum?

4.Are there round-off issues with (5), e.g. ATM greeks swamp the other terms.

5. A hunch is that in section 4.2 "Stability and Convergence" that using the PDE (CSE) for vega will avolid many bespoke problems and maybe also use it as training data?(?)

//

@ISayMoo

Dalvira Mantara addresses your (universal?) questions in the context of the SABR model.

- katastrofa
**Posts:**9215**Joined:****Location:**Alpha Centauri

Review? A rant about the universal approximation theorem being useless because it doesn't tell anything about the rate of convergence to a function - oh wait, a function you don't even know? We all know that it doesn't make sense. It's yet our another chaotic inconclusive debate made for show. Right?Someone (hint hint) should totally write a review of machine learning methods used for PDE to separate the good from the bad and the ugly, and submit it to JMLR. I've been told they do a good job with reviewing, albeit slow.

- Cuchulainn
**Posts:**62132**Joined:****Location:**Amsterdam-
**Contact:**

UAT is based on measurable functions which is too broad a class for numerical analysis. However, the consequences of relying on the UAT magic wand deserves attention.Review? A rant about the universal approximation theorem being useless because it doesn't tell anything about the rate of convergence to a function - oh wait, a function you don't even know? We all know that it doesn't make sense. It's yet our another chaotic inconclusive debate made for show. Right?Someone (hint hint) should totally write a review of machine learning methods used for PDE to separate the good from the bad and the ugly, and submit it to JMLR. I've been told they do a good job with reviewing, albeit slow.

UAT is like a blunt chainsaw. It ignores the essential difficulties of mathematics.

But a review on ML finance would be more than UAT!

- Cuchulainn
**Posts:**62132**Joined:****Location:**Amsterdam-
**Contact:**

The mathematical precision in Cybenko 1988 has been superceded/improved on here

http://www2.math.technion.ac.il/~pinkus/papers/acta.pdf

In particular, Theorems 3.1, 4.1, 5.1, 6.2, 6.7, Proposition 3.3.

Seems like ML' maths is stuck in the 80s. The mathematical subtleties surrounding activation functions seem to have been missed, causing issues during numerics. Maybe I have missed something.

// Allan Pinkus studied with Samuel Karlin.

http://www2.math.technion.ac.il/~pinkus/papers/acta.pdf

In particular, Theorems 3.1, 4.1, 5.1, 6.2, 6.7, Proposition 3.3.

Seems like ML' maths is stuck in the 80s. The mathematical subtleties surrounding activation functions seem to have been missed, causing issues during numerics. Maybe I have missed something.

// Allan Pinkus studied with Samuel Karlin.

- Cuchulainn
**Posts:**62132**Joined:****Location:**Amsterdam-
**Contact:**

What is your opinion after having read the Pinkus' paper?

- Cuchulainn
**Posts:**62132**Joined:****Location:**Amsterdam-
**Contact:**

So, for a given problem, which activation function would you use?

https://en.wikipedia.org/wiki/Activation_function

And I don't mean, experiment and see which is 'best'..

Ex. Dalivir Mantara used ISRLU for Greeks, Itkin uses MELU.

https://en.wikipedia.org/wiki/Activation_function

And I don't mean, experiment and see which is 'best'..

Ex. Dalivir Mantara used ISRLU for Greeks, Itkin uses MELU.

I don't buy this paper !! Try to read it, you will understand what I mean

Last edited by JohnLeM on October 15th, 2019, 11:46 am, edited 1 time in total.

GZIP: On