SERVING THE QUANTITATIVE FINANCE COMMUNITY

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

I don't know if I posted this paper here already, but I think you might like it. I read it, it's a good paper and comes with a complete recipe: Probabilistic Line Searches for Stochastic Optimization
Yes,  I remember. It is a very good article. The step from deterministic to stochastic Wolfe looks doable and I have my C++ code for BVN and splines as well as a pluggable component architecture (based on my 2018 book) defined by small interfaces, I used scalar functions and now it will be multivariate.
A practical issue is understanding their pseudo code and mapping it to my components. That's the plan which will now be executed.

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Would you consider Open Sourcing the C++ implementation of probabilistic line search? It would be awesome for the ML/AI community if it could be included in TensorFlow, for example. I could probably help with that.

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Would you consider Open Sourcing the C++ implementation of probabilistic line search? It would be awesome for the ML/AI community if it could be included in TensorFlow, for example. I could probably help with that.
Sounds like a very good idea.I'l have a think about this project and get back with feedback etc.Thanks.

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

@Cuchullain JLM = JohnLeM I presume? This is quite an interesting paper: I did not know that a link between NNs and RBF was already done 25 years ago. Thanks for pointing me out the link ! Concerning the Universal approximation Theorem, my feeling is that...it is a little bit too universal : I never found a way to use it to get a usable convergence result, since it does not give any convergence rate. We had to build others tools.
I started a search for results on convergence of approximations by NNs. For shallow networks, the problem seems to be well understood:

Barron (1993): "feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), where n is the number of nodes. The function approximated is assumed to have a bound on the first moment of the magnitude distribution of the Fourier transform" (i.e. the Fourier transform of its gradient is integrable).

]Mhaskar (1996): "to approximate a C^n-function on a d-dimensional set with infinitesimal error eps one needs a network of size about eps^(−d/n), assuming a smooth activation function" (after Yarotsky (2016)).

The behaviour of deep networks is less understood. I'll continue.

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

I have ordered Grace Wahba's book.

katastrofa
Posts: 7236
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

@Cuchullain JLM = JohnLeM I presume? This is quite an interesting paper: I did not know that a link between NNs and RBF was already done 25 years ago. Thanks for pointing me out the link ! Concerning the Universal approximation Theorem, my feeling is that...it is a little bit too universal : I never found a way to use it to get a usable convergence result, since it does not give any convergence rate. We had to build others tools.
I started a search for results on convergence of approximations by NNs. For shallow networks, the problem seems to be well understood:

Barron (1993): "feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), where n is the number of nodes. The function approximated is assumed to have a bound on the first moment of the magnitude distribution of the Fourier transform" (i.e. the Fourier transform of its gradient is integrable).

]Mhaskar (1996): "to approximate a C^n-function on a d-dimensional set with infinitesimal error eps one needs a network of size about eps^(−d/n), assuming a smooth activation function" (after Yarotsky (2016)).

The behaviour of deep networks is less understood. I'll continue.
This is the convergence w/r to the network size. Since, in general, neural networks are simply parameterised kernel functions, do the proofs of time convergence for respective classic kernel functions apply to them? E.g. the popular in solid state physics perceptrons.

BTW, I've found this trying to find how to do that with deep networks: https://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

I think kernel functions  are a fancy name for RBF. And then we are back to OP's original thesis. Parsimony is warmly welcomed to avoid AI's propensity for putting new wine in old bottles and concocting terms willy nilly. It confuses maths types no end.

A random example
https://ac.els-cdn.com/S089812211400463 ... 64af88ba32

katastrofa
Posts: 7236
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Dude... RBFs are a type of Kernel functions. Kernel methods and perceptrons are 1960s. THE classics.

katastrofa
Posts: 7236
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

A basic question: what other universal approximations are there? Kernel functions, decision trees and forests, neural nets, ...?

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Dude... RBFs are a type of Kernel functions. Kernel methods and perceptrons are 1960s. THE classics.
I plead temporary insanity. My bad. It could happen to a bishop.

Actually, kernel methods go back to Hilbert and Schmidt.

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

I don't know if I posted this paper here already, but I think you might like it. I read it, it's a good paper and comes with a complete recipe: Probabilistic Line Searches for Stochastic Optimization
Update #1:: I have reduced the scope by implementing a deterministic Armijo classic 1966 trick for GD. It is better than other line searches but it breaks for the inocuous Rosenbrock banana with global minimum at $(A, A^2)$

$(A-x)^2 + B(y - x^2)^2$

This counterexample proved Armijo is not a panacea. On the other hand, solving this as a ODE gradient system is very robust

A. Global convergence <==> convergence to an accumulation point for any $x_0$
B. Convergence to a global minimum.

For $A = 32$ I get $(31,9996, 1023.98)$ by ODE. Very nice.

//
1. A != B
2. The Wolfe conditions have recent updates
3. GD stuck in a valley.
Last edited by Cuchulainn on March 14th, 2019, 9:28 pm, edited 3 times in total.

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

You have reduced the scope by getting rid of the S in SGD? :O

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

You have reduced the scope by getting rid of the S in SGD? :O
It's temporary. It's the way mathematicians solve problems by first taking concrete cases
The S is not the essentially complex issue at this stage. Or should I build an SGD on a shaky GD? Not interested in tweaking ad nauseum.
In fact, you should be the one making assertions to help me. Such one-liners don't help me much, unfortunately.

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

I'm worried that you're throwing the baby out with the bath-water. The line search for deterministic case is a solved problem, isn't it? See Nocedal and Wright. The problem with the classic Wolfe conditions is that they fail when the function is stochastic (hence SGD). You can implement the probabilistic line search algorithm and apply it to the deterministic case, but I'm worried that it will be a degenerate condition it is not meant to handle.

Here's a proposition: implement a classic function (e.g the famous banana) and add Gaussian noise to the gradients and values. You'll have a test case to train on. I understand why you don't want to start with batch SGD.

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Here's a proposition: implement a classic function (e.g the famous banana) and add Gaussian noise to the gradients and values. You'll have a test case to train on. I understand why you don't want to start with batch SGD.

That's a good idea In this way it might understanding the approach in Mahsereci&Hennig and get me to SGD in steps.

Question;: what is the precise definition of a stochastic gradient as an entity?

sg(x) = g(x) + noise(x)
sg(x) = g(x + noise(x))
something like equation (3) in M&H?

BtW how does (3) look if I am only interested in gradient and not function values?

We are probably in metaheuristics territory. How would bees solve this problem?

Caveat: AI is a kind of (serious) hobby project at the moment and I am also interested in the wider applicability of these numerical methods.
Last edited by Cuchulainn on March 15th, 2019, 6:21 pm, edited 2 times in total.