Serving the Quantitative Finance Community

Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Universal Approximation theorem

(An UAT with measure theory as foundation is essentially useless). Authors who blindly quote Cybenko/Hornik probably don't realise that is is as leaky as a sieve). It's a bit like the Weierstrass theorem .. warm feelings but is not constructive (in the precise sense of Errett Bishop).

https://en.wikipedia.org/wiki/Stone%E2% ... ss_theorem

Here's a related article UAT with again more 'sharpness'
https://arxiv.org/abs/2006.08859

Without metric spaces, Cauchy sequences, completeness and stuff like that, how is quantitative analysis possible?

//
Till now determining optimal width and depth using class UAT as the holy grenade of Antioch is a bit ad hoc. Like getting a flat tire... solution is to drive around the block a few times, maybe it will fix itself.

"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Universal Approximation theorem

(An UAT with measure theory as foundation is essentially useless). Authors who blindly quote Cybenko/Hornik probably don't realise that is is as leaky as a sieve). It's a bit like the Weierstrass theorem .. warm feelings but is not constructive (in the precise sense of Errett Bishop).

https://en.wikipedia.org/wiki/Stone%E2% ... ss_theorem

Here's a related article UAT with again more 'sharpness'
https://arxiv.org/abs/2006.08859

//
Till now determining optimal width and depth using class UAT as the holy grenade of Antioch is a bit ad hoc. Like getting a flat tire... solution is to drive around the block a few times, maybe it will fix itself.

Yep, all this clearly highlights the problem of reliability of artificial intelligence tools that some investment banks are facing today in their industrial projects. From a theoretical point of view, for pricing methods with Neural Networks, one can also read this reference (see for instance proposition 4.1, that is a very honest and good analysis, but telling you that there are some problems). We also benchmarked our old, plain, PDE methods  with deep learning methods. The results are quite humiliating, and not only for high-dimensional problems coming from mathematical finance. To be honest, this might really be a pain to all those who invested in these technologies, numerical scientists as investors. Please, always question your tools !

katastrofa
Posts: 10256
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Universal Approximation theorem

What's "skinny" NN? If it means, I'm guessing, that the number of neurons in the first deep (or each) layer is smaller than the input size, then obviously there's an information bottleneck and it's not a universal approximator. BTW, am I not getting something or the Rojas paper [28] quoted therein basically proves that one can separate two points with a line?

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Universal Approximation theorem

Till now determining optimal width and depth using class UAT as the holy grenade of Antioch is a bit ad hoc. Like getting a flat tire... solution is to drive around the block a few times, maybe it will fix itself.

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Universal Approximation theorem

What's "skinny" NN? If it means, I'm guessing, that the number of neurons in the first deep (or each) layer is smaller than the input size, then obviously there's an information bottleneck and it's not a universal approximator. BTW, am I not getting something or the Rojas paper [28] quoted therein basically proves that one can separate two points with a line?
I am taking a more closer look: skinny networks are defined Theorem1. If I understand well, for the one-dimensional case of a "one-to-on" (it seems to be just any uniformly continuous injective function) activation function $\varphi : R \mapsto R$, a skinny networks takes n inputs, then consists in any number of layers having size at most size n. Then this netwok can not approximate any function $\psi$, if $\psi$ satisfies a quite common topological constraint. For instance any function that tends to zero a infinity can not be approximated by a skinny network using  $\varphi$ .
Last edited by JohnLeM on January 31st, 2021, 5:40 pm, edited 3 times in total.

katastrofa
Posts: 10256
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Universal Approximation theorem

Dude, that made absolutely no sense (to begin with, psi is an activation function)... Ego te absolvo a peccatis tuis in Intelligentia Artificialis.

Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Universal Approximation theorem

Dude, that made absolutely no sense (to begin with, psi is an activation function)... Ego te absolvo a peccatis tuis in Intelligentia Artificialis.
No one born after 1975 knows Latin.

Pity you don't have published work in this area, Is it better than sniping.
You are not adding any value.
Last edited by Cuchulainn on January 31st, 2021, 5:19 pm, edited 1 time in total.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Universal Approximation theorem

Dude, that made absolutely no sense (to begin with, psi is an activation function)... Ego te absolvo a peccatis tuis in Intelligentia Artificialis.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Universal Approximation theorem

Dude, that made absolutely no sense (to begin with, psi is an activation function)... Ego te absolvo a peccatis tuis in Intelligentia Artificialis.
I read and wrote it too fast sorry.  I corrected. I am just trying to help, please could you in turn take a little bit more care to your communication, chica ?

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Universal Approximation theorem

Deep, Skinny Neural Networks are not Universal Approximators

https://arxiv.org/abs/1810.00393

trial and error?
@Cuchulainn : I was thinking to this paper. There is an interesting phrase: "This analysis is not meant to suggest that deep networks are worse than shallow networks, but rather to better understand how and why they will perform differently on different data sets". Indeed this remark applies to kernels too. However I think that there is a quite clear answer for kernels : kernels generates functional spaces.

So I am convinced now that the paper is correct, because a relu activation function generates a functional space of functions that can not vanish at infinity. Such functional spaces are very handy in some situations. This paper is not at all a critic of neural network as I thought. But it is showing a little bit more that kernel methods can describe neural networks.

Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Universal Approximation theorem

What's "skinny" NN? If it means, I'm guessing, that the number of neurons in the first deep (or each) layer is smaller than the input size, then obviously there's an information bottleneck and it's not a universal approximator. BTW, am I not getting something or the Rojas paper [28] quoted therein basically proves that one can separate two points with a line?
I'm surprised the article got into the IEEE..
All Is see is text, but where is the bespoke 'proof'?
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Universal Approximation theorem

A recent analysis of the convergence for deep learning method, by a serious mathematician, Schwab. It shows again that there are problems. Indeed, even for a one-dimensional option price, you will find that there are problems using this approach. How many institutions built XVA systems or risk management system with this method ?
The artificial intelligence community is today claiming solving a problem, the curse of dimensionality, for which they not only fail to do so, but that was already solved several years ago. They polluted the information system of their institutions, and are celebrating this mess with awards...This is starting to be interesting. Are we back to middle age ?

Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Universal Approximation theorem

Christoph Schwab was PhD student of Ivo Babuska, Tzar of the FEM method. You can believe anything he (CS) writes on Approximation Theory will be rigorous.

A major problem with AI is that it is is not rigorous and not up to the job in proving things that need to be proved..It is a finite-dimensional matrix (aka graph) world, grosso modo. Results are achieved a-posteriori (mucho experimentation).
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Universal Approximation theorem

From Ruf and Wang 2020

The Stone-Weierstrass theorem asserts that any continuous function on a compact set can be approximated
by polynomials. Similarly, the universal approximation theorems ensure that ANNs approximate
continuous functions in a suitable way. In particular, ANNs are able to capture nonlinear dependencies
between input and output.
With this understanding, an ANN can be used for many applications related to option pricing and hedging.
In the most common form, an ANN learns the price of an option as a function of the underlying
price, strike price, and possibly other relevant option characteristics. Similarly, ANNs might also be trained
to learn implied

Sure; it is not even wrong.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Universal Approximation theorem

From Ruf and Wang 2020

The Stone-Weierstrass theorem asserts that any continuous function on a compact set can be approximated
by polynomials. Similarly, the universal approximation theorems ensure that ANNs approximate
continuous functions in a suitable way. In particular, ANNs are able to capture nonlinear dependencies
between input and output.
With this understanding, an ANN can be used for many applications related to option pricing and hedging.
In the most common form, an ANN learns the price of an option as a function of the underlying
price, strike price, and possibly other relevant option characteristics. Similarly, ANNs might also be trained
to learn implied

Sure; it is not even wrong.
This is really fun, as I do experience almost on a daily basis the same answers, again and again, concerning the UAT and Weierstrass. Another one that is quite nice : kernels methods can't scale to industrial problems because they are quadratics in term of training set size.
Even if you prove that UAT / Weirestrass is useless here, or if you show that kernels methods are linears in term of training set size, they will stick to their holy hand grenad of Antioch. Their best members wrote that deep learning provide non converging methods, or we show to them that deep learning does not work for partial differential equations ? Don't bother, it is more fun to share awards, to give kudos, and to waste private and public money.