SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
User avatar
Cuchulainn
Posts: 58987
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 8th, 2019, 11:00 am

I don't know if I posted this paper here already, but I think you might like it. I read it, it's a good paper and comes with a complete recipe: Probabilistic Line Searches for Stochastic Optimization
Yes,  I remember. It is a very good article. The step from deterministic to stochastic Wolfe looks doable and I have my C++ code for BVN and splines as well as a pluggable component architecture (based on my 2018 book) defined by small interfaces, I used scalar functions and now it will be multivariate. 
A practical issue is understanding their pseudo code and mapping it to my components. That's the plan which will now be executed.
 
User avatar
ISayMoo
Posts: 1653
Joined: September 30th, 2015, 8:30 pm

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 8th, 2019, 2:01 pm

Would you consider Open Sourcing the C++ implementation of probabilistic line search? It would be awesome for the ML/AI community if it could be included in TensorFlow, for example. I could probably help with that.
 
User avatar
Cuchulainn
Posts: 58987
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 8th, 2019, 4:43 pm

Would you consider Open Sourcing the C++ implementation of probabilistic line search? It would be awesome for the ML/AI community if it could be included in TensorFlow, for example. I could probably help with that.
Sounds like a very good idea.I'l have a think about this project and get back with feedback etc.Thanks.
 
User avatar
ISayMoo
Posts: 1653
Joined: September 30th, 2015, 8:30 pm

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 10th, 2019, 5:19 pm

@Cuchullain JLM = JohnLeM I presume? This is quite an interesting paper: I did not know that a link between NNs and RBF was already done 25 years ago. Thanks for pointing me out the link ! Concerning the Universal approximation Theorem, my feeling is that...it is a little bit too universal : I never found a way to use it to get a usable convergence result, since it does not give any convergence rate. We had to build others tools.
I started a search for results on convergence of approximations by NNs. For shallow networks, the problem seems to be well understood:

Barron (1993): "feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), where n is the number of nodes. The function approximated is assumed to have a bound on the first moment of the magnitude distribution of the Fourier transform" (i.e. the Fourier transform of its gradient is integrable).

]Mhaskar (1996): "to approximate a C^n-function on a d-dimensional set with infinitesimal error eps one needs a network of size about eps^(−d/n), assuming a smooth activation function" (after Yarotsky (2016)).

The behaviour of deep networks is less understood. I'll continue.
 
User avatar
Cuchulainn
Posts: 58987
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 10th, 2019, 8:24 pm

I have ordered Grace Wahba's book.
 
User avatar
katastrofa
Posts: 7588
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 10th, 2019, 9:28 pm

@Cuchullain JLM = JohnLeM I presume? This is quite an interesting paper: I did not know that a link between NNs and RBF was already done 25 years ago. Thanks for pointing me out the link ! Concerning the Universal approximation Theorem, my feeling is that...it is a little bit too universal : I never found a way to use it to get a usable convergence result, since it does not give any convergence rate. We had to build others tools.
I started a search for results on convergence of approximations by NNs. For shallow networks, the problem seems to be well understood:

Barron (1993): "feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), where n is the number of nodes. The function approximated is assumed to have a bound on the first moment of the magnitude distribution of the Fourier transform" (i.e. the Fourier transform of its gradient is integrable).

]Mhaskar (1996): "to approximate a C^n-function on a d-dimensional set with infinitesimal error eps one needs a network of size about eps^(−d/n), assuming a smooth activation function" (after Yarotsky (2016)).

The behaviour of deep networks is less understood. I'll continue.
This is the convergence w/r to the network size. Since, in general, neural networks are simply parameterised kernel functions, do the proofs of time convergence for respective classic kernel functions apply to them? E.g. the popular in solid state physics perceptrons.

BTW, I've found this trying to find how to do that with deep networks: https://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf
 
User avatar
Cuchulainn
Posts: 58987
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 10th, 2019, 9:54 pm

I think kernel functions  are a fancy name for RBF. And then we are back to OP's original thesis. Parsimony is warmly welcomed to avoid AI's propensity for putting new wine in old bottles and concocting terms willy nilly. It confuses maths types no end.

A random example
https://ac.els-cdn.com/S089812211400463 ... 64af88ba32
 
User avatar
katastrofa
Posts: 7588
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 10th, 2019, 11:56 pm

Dude... RBFs are a type of Kernel functions. Kernel methods and perceptrons are 1960s. THE classics.
 
User avatar
katastrofa
Posts: 7588
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 12th, 2019, 12:11 pm

A basic question: what other universal approximations are there? Kernel functions, decision trees and forests, neural nets, ...?
 
User avatar
Cuchulainn
Posts: 58987
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 12th, 2019, 9:09 pm

Dude... RBFs are a type of Kernel functions. Kernel methods and perceptrons are 1960s. THE classics.
I plead temporary insanity. My bad. It could happen to a bishop.

Actually, kernel methods go back to Hilbert and Schmidt. 
 
User avatar
Cuchulainn
Posts: 58987
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 14th, 2019, 8:50 pm

I don't know if I posted this paper here already, but I think you might like it. I read it, it's a good paper and comes with a complete recipe: Probabilistic Line Searches for Stochastic Optimization
Update #1:: I have reduced the scope by implementing a deterministic Armijo classic 1966 trick for GD. It is better than other line searches but it breaks for the inocuous Rosenbrock banana with global minimum at [$](A, A^2)[$] 

[$] (A-x)^2 + B(y - x^2)^2[$]

This counterexample proved Armijo is not a panacea. On the other hand, solving this as a ODE gradient system is very robust 

A. Global convergence <==> convergence to an accumulation point for any [$]x_0[$]
B. Convergence to a global minimum.

For [$]A = 32[$] I get [$](31,9996, 1023.98)[$] by ODE. Very nice.
Image

//
1. A != B
2. The Wolfe conditions have recent updates
3. GD stuck in a valley.
Last edited by Cuchulainn on March 14th, 2019, 9:28 pm, edited 3 times in total.
 
User avatar
ISayMoo
Posts: 1653
Joined: September 30th, 2015, 8:30 pm

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 14th, 2019, 8:54 pm

You have reduced the scope by getting rid of the S in SGD? :O
 
User avatar
Cuchulainn
Posts: 58987
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 14th, 2019, 9:08 pm

You have reduced the scope by getting rid of the S in SGD? :O
It's temporary. It's the way mathematicians solve problems by first taking concrete cases
The S is not the essentially complex issue at this stage. Or should I build an SGD on a shaky GD? Not interested in tweaking ad nauseum.
In fact, you should be the one making assertions to help me. Such one-liners don't help me much, unfortunately.
 
User avatar
ISayMoo
Posts: 1653
Joined: September 30th, 2015, 8:30 pm

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 14th, 2019, 10:17 pm

I'm worried that you're throwing the baby out with the bath-water. The line search for deterministic case is a solved problem, isn't it? See Nocedal and Wright. The problem with the classic Wolfe conditions is that they fail when the function is stochastic (hence SGD). You can implement the probabilistic line search algorithm and apply it to the deterministic case, but I'm worried that it will be a degenerate condition it is not meant to handle.

Here's a proposition: implement a classic function (e.g the famous banana) and add Gaussian noise to the gradients and values. You'll have a test case to train on. I understand why you don't want to start with batch SGD.
 
User avatar
Cuchulainn
Posts: 58987
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

March 15th, 2019, 1:00 pm

Here's a proposition: implement a classic function (e.g the famous banana) and add Gaussian noise to the gradients and values. You'll have a test case to train on. I understand why you don't want to start with batch SGD.

That's a good idea In this way it might understanding the approach in Mahsereci&Hennig and get me to SGD in steps.

Question;: what is the precise definition of a stochastic gradient as an entity?

sg(x) = g(x) + noise(x)
sg(x) = g(x + noise(x))
something like equation (3) in M&H?

BtW how does (3) look if I am only interested in gradient and not function values? 

We are probably in metaheuristics territory. How would bees solve this problem?

Caveat: AI is a kind of (serious) hobby project at the moment and I am also interested in the wider applicability of these numerical methods.
Last edited by Cuchulainn on March 15th, 2019, 6:21 pm, edited 2 times in total.
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On