SERVING THE QUANTITATIVE FINANCE COMMUNITY

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

The line search for deterministic case is a solved problem, isn't it?
I don't really know. Don't know enough to say But have a look at this article by Bill Hager who is a numerical analyst. Granted its nonlinear CGM but still

http://people.cs.vt.edu/~asandu/Public/ ... survey.pdf

I have a feeling that Wolfe is applicable to differentiable functions. Does it break down for non-differentiable kinks? These issues tend to be brushed under the carpet in ML articles or lip-service paid to them.

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Here's a proposition: implement a classic function (e.g the famous banana) and add Gaussian noise to the gradients and values. You'll have a test case to train on. I understand why you don't want to start with batch SGD.

That's a good idea In this way it might understanding the approach in Mahsereci&Hennig and get me to SGD in steps.

Question;: what is the precise definition of a stochastic gradient as an entity?

sg(x) = g(x) + noise(x)
sg(x) = g(x + noise(x))
something like equation (3) in M&H?

BtW how does (3) look if I am only interested in gradient and not function values?
Eq (3) from M&H if you need to sample both y and dy/dx. If you need only the gradient, reduce Eq(3) by removing the y bits (see https://en.wikipedia.org/wiki/Multivari ... tributions).

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

The line search for deterministic case is a solved problem, isn't it?
I don't really know. Don't know enough to say But have a look at this article by Bill Hager who is a numerical analyst. Granted its nonlinear CGM but still

http://people.cs.vt.edu/~asandu/Public/ ... survey.pdf

I have a feeling that Wolfe is applicable to differentiable functions. Does it break down for non-differentiable kinks? These issues tend to be brushed under the carpet in ML articles or lip-service paid to them.
Thanks, good paper! I think you're right about the differentiability.

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

You'r welcome.

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

The line search for deterministic case is a solved problem, isn't it?
I don't really know. Don't know enough to say But have a look at this article by Bill Hager who is a numerical analyst. Granted its nonlinear CGM but still

http://people.cs.vt.edu/~asandu/Public/ ... survey.pdf

I have a feeling that Wolfe is applicable to differentiable functions. Does it break down for non-differentiable kinks? These issues tend to be brushed under the carpet in ML articles or lip-service paid to them.
Thanks, good paper! I think you're right about the differentiability.
Even Armijo 1966 and Nocedal Wright are silent on the conditions on f that must be satisfied in order for the claims to work. In  particular, they seem to exclude the vanishing and exploding gradient problems that appear. This opens a Pandora' Box (aka potential can of worms) and pathological scenarios. One issue is that you can get catastrophic cancellation errors and etc.
Sticking to the tricky deterministic Rosenbrock (steep valleys, non-convex function, global minimum) Armijo + GD breaks down while gradient systems are better. The most robust one I tested is Gradient Langevin Dynamics with a (rough) variation of simulated annealing . I am experimenting as to finding optimal hyperparameters.
At this stage, my question of how robust is the M&H algorithm? IMO taking a dynamics system perspective is more mathematically mature.

What is a good test case for Stochastic GLD? Basically, how do you formulate it in a precise way? I have it ready to go.

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Try linear regression with L2 loss, but at every step select randomly a subset of (x, y) pairs.

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Try linear regression with L2 loss, but at every step select randomly a subset of (x, y) pairs.
Input data: $(x_1,y_1),...,(x_n,y_n)$, $n$ yuge?

1. Take into account model errors in $x_j$ as well as in $y_j$, $j = 1,...,n$?
2. Input data is uniformly generated?
3. Mini-batch uniform sample of (constant?) size $m < n$ at each iteration?
4. Find parameters $a,b$ in $y = a + bx$.
5. it be the least-squares regression line?

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

@Cuchullain JLM = JohnLeM I presume? This is quite an interesting paper: I did not know that a link between NNs and RBF was already done 25 years ago. Thanks for pointing me out the link ! Concerning the Universal approximation Theorem, my feeling is that...it is a little bit too universal : I never found a way to use it to get a usable convergence result, since it does not give any convergence rate. We had to build others tools.
I started a search for results on convergence of approximations by NNs. For shallow networks, the problem seems to be well understood:

Barron (1993): "feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), where n is the number of nodes. The function approximated is assumed to have a bound on the first moment of the magnitude distribution of the Fourier transform" (i.e. the Fourier transform of its gradient is integrable).

]Mhaskar (1996): "to approximate a C^n-function on a d-dimensional set with infinitesimal error eps one needs a network of size about eps^(−d/n), assuming a smooth activation function" (after Yarotsky (2016)).

The behaviour of deep networks is less understood. I'll continue.
About (deep or shallow) very wide networks: there's a strong relation with Gaussian Processes:

Priors for Infinite Networks

Deep Neural Networks as Gaussian Processes

Gaussian Process Behaviour in Wide Deep Neural Networks

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Try linear regression with L2 loss, but at every step select randomly a subset of (x, y) pairs.
Input data: $(x_1,y_1),...,(x_n,y_n)$, $n$ yuge?

1. Take into account model errors in $x_j$ as well as in $y_j$, $j = 1,...,n$?
2. Input data is uniformly generated?
3. Mini-batch uniform sample of (constant?) size $m < n$ at each iteration?
4. Find parameters $a,b$ in $y = a + bx$.
5. it be the least-squares regression line?
1. No, simply Loss = sum_i (y_i - a * x_i - b)^2
2. Up to you.
3. Yes.
4. Yes.
5. Yes.

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

1. Makes things easier.
3. m constant.

So, for this test, conceptually we've got $N$ samples (too many) hovering around a regression line and we replace them by mini-batches of size $m$ which are also as accurate but faster. That's the essence of SGD?

// I find Deisenroth et al 2019 style very clear, The 'balance' is excellent.

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

1. Makes things easier.
3. m constant.

So, for this test, conceptually we've got $N$ samples (too many) hovering around a regression line and we replace them by mini-batches of size $m$ which are also as accurate but faster. That's the essence of SGD?
Less accurate, much faster, equal in expectation to the complete gradient (because  the loss is additive over sample points). That's SGD in a nutshell.

// I find Deisenroth et al 2019 style very clear, The 'balance' is excellent.
What paper is that?

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Deisenroth et al have a book

https://mml-book.github.io/

JohnLeM
Topic Author
Posts: 225
Joined: September 16th, 2008, 7:15 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Hi all, sorry for not having participated during the last week, I was quite busy performing numerical tests using CoDeFi for ALM.
@Cuchulainn, You could find these results interesting, as they illustrate the use of Meshfree methods (AKA Neural Networks for our beotians AI readers ...) for a 6-dimensional finance problem.

ISayMoo
Posts: 1457
Joined: September 30th, 2015, 8:30 pm

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Deisenroth et al have a book

https://mml-book.github.io/
Ah, this one! Yeah.

Cuchulainn
Posts: 58401
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Hi all, sorry for not having participated during the last week, I was quite busy performing numerical tests using CoDeFi for ALM.
Certainly interesting. But it does feel a wee bit like an advertorial (nothing wrong) and it has little to do with NN as far as I can see.
This thread has moved on since last week.