Page **6** of **12**

### Re: DL and PDEs

Posted: **April 8th, 2018, 6:57 pm**

by **lovenatalya**

It's a very ambitious article..

But what about its correctness? Does it look like it has solved a problem with, say, more accuracy and speed than the state of the art classical numerical methods? Do you see any flaws like you saw in the paper you discussed in the thread quoted below?

*Your call on the math being outdated and fairly basic is missing the point a bit I think. *

I may have missed a point, especially since you didn't give one. All I see is a link to a not useful article (that I waded in, not even a hint to pinpoint what's on your mind) and DL/PDE buzzwords in the title.

What's the question centering around DL/PDE,exactly? A guess; does DL solve curse of dimensionality?

// is there a GOOD paper on DL/PDE. 1st impression is it's a solution looking for a problem. 'Teaching' a pde sounds a bit weird.

I think this one is a bit better:

https://arxiv.org/abs/1708.07469 Yes, they argue that DL solves the curse of dimensionality.

When I first saw this "note' I thought it looked good but on deeper inspection it turned out to be a train wreck/horrror show. Part of my research back then was proving convergence of FEM (finite element) in Sobolev spaces, later in TV design, oil and gas. So, It is very interesting to see how someone uses maths maths to meet DL half way.

The background of the authors is DL and business (nothing wrong with that) but the article is riddled with errors that I don't even know where to begin:

1. The Galerkin method fell into disuse around 1943. More seriously, the article is really the MESHLESS method which has its own issues.

2. It is not even DL imo; just because you use a stochastic gradient method does not make it as DL.

3. "DGM is a natural merging of ML and Galerkin". Yes?

4.Details are missing (e.g. Table 1!!!!!!).

5. I seems DL applications need some kind of analytic solution for training? PDEs don't have this in general.

6. "Approximate the second derivatives using Monte Carlo". This is fiction.

7. Major numerical difficulties can't be swept under the carpet. In fact, DL will compound them, e.g how to choose an optimal learning rate alpha (usually use Brent's method).

I don't see much synergy between DL and PDE at this stage.

- that is really only the known knowns and the known unknowns. And each year, we discover a few more of those unknown unknowns.

//

AFAIR Meshless leads to a full (ill-conditioned) matrix system

http://user.it.uu.se/~bette/meshless05.pdf

### Re: DL and PDEs

Posted: **April 8th, 2018, 7:05 pm**

by **Cuchulainn**

*But what about its correctness? Does it look like it has solved a problem with, say, more accuracy and speed than the state of the art classical numerical methods? Do you see any flaws like you saw in the paper you discussed in the thread quoted below?*

OK,. I'll have a look. But I need to radically scope it down to 1d/2d PDE and less 'esotertic' PDE.

I don't believe PDE can be solved by NN/SGD but I might be wrong. My gut feeling is that there is a fatal flaw somewhere. I'll study the article but it sounds too good to be true. Caveat; I am a NN noobie here.

Some of the finest mathematical brains on PDE last 200 years. SGD is a 50-year old outgrowth and still evolving..

And solving a 100-dimensional PDE is only to be found in Kubla Khan's Xanadu.

### Re: DL and PDEs

Posted: **April 8th, 2018, 7:52 pm**

by **lovenatalya**

*But what about its correctness? Does it look like it has solved a problem with, say, more accuracy and speed than the state of the art classical numerical methods? Do you see any flaws like you saw in the paper you discussed in the thread quoted below?*

OK,. I'll have a look. But I need to radically scope it down to 1d/2d PDE and less 'esotertic' PDE.

I don't believe PDE can be solved by NN/SGD but I might be wrong. My gut feeling is that there is a fatal flaw somewhere. I'll study the article but it sounds too good to be true. Caveat; I am a NN noobie here.

Some of the finest mathematical brains on PDE last 200 years. SGD is a 50-year old outgrowth and still evolving..

Excellent and thank you, Cuchulainn. Awaiting your expert evaluation!

And solving a 100-dimensional PDE is only to be found in Kubla Khan's Xanadu.

But does the DL breakthrough in the Go playing, which also lives in an extremely high dimensional search space, not offer any inspiration to high dimensional PDE?

Anyway, this is of course speculation.

### Re: DL and PDEs

Posted: **April 10th, 2018, 4:55 pm**

by **Cuchulainn**

**@lovenatalya**
As promised, I went through the article as well as (call it Exhibit II upon which it builds -->)
https://arxiv.org/abs/1708.03223
There is so much that can be said, but I will reduce the scope by only looking at the :"numerics" (and not much in there). I did have a look some time ago and now again. It is really awful. I have experience of most of the numerical methods and in general, they don't work here or are not understood properly.

1. Exhibit II, section 2.1 are about 4 major issues.

a. fixed point iteration (even if you can find!) does not converge exponentially (it is linear, just try x = sin(x)). Robust contraction mappings are very difficult to construct.

b. Too many assumptions.

c. The jury is out on multilevel MC (does it work in theory?) and it is awful slow. (two theses).

d. A lot of symbolic formulae (sigm signs are always scary). ditto terms like "for sufficiently large/small N, h.".

Regarding your link

1. Strange PDEs (esoteric).

2. Seems Euler-Maruyama is at the heart of the work (this is not a good sign, it is a terrible method).

3. What is being approximated really? Table 1 is statistics,

**not numerics**, which uses [$]L\infty[$] error estimates.

4. Why not do a 1d and 2d PDE in detail for starters?

5. Performance? Not clear if the values in Table 2 are good or not.

I am unable to relate to much of the articles' contents. And the algorithms will break down.

On the other hand, maybe I am missing something really simple

Finally, to answer your specific questions:

*But what about its correctness? Does it look like it has solved a problem with, say, more accuracy and speed than the state of the art classical numerical methods?*
No..
*Do you see any flaws like you saw in the paper you discussed in the thread quoted below?*
Yes, quite a few, above.
Finally, who on earth solved a PDE by posing it as an optimal control/.BSDE/DL? If you are serious show for 1 factor early exercise option as Proof of Concept.

### Re: DL and PDEs

Posted: **April 10th, 2018, 7:49 pm**

by **lovenatalya**

@Cuchulainn:

That is quick, but I suppose you know the stuff like the back of your hand! Thank you for the valuable technical evaluation. I will look into the details you have mentioned. It will take me some time.

If you are serious show for 1 factor early exercise option as Proof of Concept.

That is precisely my motivation in looking into DL as an improvement on the Longstaff-Schwardz regression method for Monte Carlo simulation pricing of an American option. I presume you are not optimistic about its chance of success?

### Re: DL and PDEs

Posted: **April 10th, 2018, 9:35 pm**

by **outrun**

In low dimensions its common to use basis functions an least squares fitting -which is one of the longstaff schwartz variants (fitting a polynomial or fitting basis functions). That's also something people do in the Reinforement Learning subfield of Machine Learning, it's called "least squared temporal difference learning"

https://link.springer.com/article/10.1007/BF00114723
In high dimensions it makes sense to use NN as function approximators instead of a linear combination of basis functions or polynomials. But when fitting a NN you typically process your data many times via gradient descent and have the solution slowly move to an error minimum. Least square is much faster -*if* you can do e.g. a linear decomposition-.

Longstaff-Schwardz is very sample efficient, but the functions it uses to approximate the put value function are simple and that introduces model error.
What Alpha Go does is very different, it learns a high dimensional value functions of the game (is this a good position to be in?) via exploration and exploitation -trying things-, and then uses the value function to decide what move to make. This is a slow process.
With American options all you need it the expected value of the option one step forward in time -and you exercise if its below intrinsic-. The structure of the problem allows you to solve it recursively via backwards induction, e.g. with a binomial tree etc. In ML this is also often done, but the problems are typically more difficult: 1) the optimal value functions is not simply 1D, 2) the number of actions you can take at each point in time is often large, sometimes infinite (in what direction -angle- shall we move forward). 3) you don't know the transition probabilities (like the up down probability in the binomial tree). One of the main difference is that in order to estimate a high dimensional optimal value function (there are also other methods that don't use value functions) you need to know optimal behaviour and you end up with a chicken and an egg situation. This is often done iteratively: you try random actions, use the experience you collected to fit a value function, use the value function to figure out better actions to take etc etc. This is called value iteration: http://artint.info/html/ArtInt_227.html
These methods really shine if you don't have a dynamical model (that tells you how the stock prices move like you normally do in finance with a "let's pick Heston"), or don't have a reward model (this is never in finance, derivatives have clear contract terms), or the number of dimension is high, or the state of the market is partially observed, or if the underlying behaviour is non-Markovian.

### Re: DL and PDEs

Posted: **April 11th, 2018, 6:08 am**

by **lovenatalya**

@outrun:

Really appreciate your response with such rich information and references. I am a newbie in deep learning and neural network. I need to slowly digest what you have said. My following questions will sound naive.

Suppose one has an American option on many underlyings (dimensions) and one would like to use Monte Carlo simulation to solve it. From your post, can I conclude NN has an advantage over the least square fitting method?

You mentioned binomial trees. But it is not a problem solving American option in binomial tree since every vertex point has branches from which you can make the backward induction or perform dynamic programming. The problem comes when pricing using a Monte Carlo simulation where there is no branching to make the induction. So the comparison based on a binomial tree does not seem, to my naive mind, to be an appropriate one. What am I missing?

In ML this is also often done, but the problems are typically more difficult: 1) the optimal value functions is not simply 1D

in order to estimate a high dimensional optimal value function

What do you mean by an optimal value function [$]f[$] being high dimensional? Do you mean the objective function [$]f: R^m\rightarrow R^n[$] where [$]n>1[$]? Or do you just mean [$]m>1[$] but [$]n=1[$]? If it is the former case, how do you order the value?

### Re: DL and PDEs

Posted: **April 11th, 2018, 7:18 am**

by **outrun**

What is better -or even possible- depend on the problem details.

In least square fitting you estimate the expected value of the put at t+1 conditioned on the underlying at t and assuming you don't exercise, and you do so by fitting a continuous function on MC samples. (indeed you don't have a set of discrete points like in trees and grids). The purpose of the fitting a function is twofold: 1) to remove sample noise so that you can do with less MC samples, and don't have to do nested MC and 2) to have a value of the option for all possible underlying values (in trees and grids you consider the underlying only at a set of values)

There are many alternative ways to approximate that function: LS uses a polynomial fit, basis function use a linear combination of some functions g_i, f(S1,S2,..) = w1*g1(S1,S2,..) + w2*g2(S1,S2,..). Both these fits are not computationally expensive and that a good property. However a drawback is that the function you fit to your data might not be a good fit: can you accurately approximate the American Put price with a polynomial of a linear combination of basis functions? NN can also be used for function approximation, but since they a non-linear fitting them to data is more complicated and in general very slow. In practice I would not use a NN to price an American put, instead I would try to find a good set of basis function and do a LS fit.

--

There are 3 ways to model "optimal actions":

1) you can model the "value of state", e.g. "this particular Go board layout (the state) look very good for me: State->Value = R^361 -> R^1. This is a very high dimensional function. You would use any sort of function approximation method to estimate that value function. Once you have that value function you can then use it to pick optimal action. To do so you would need to have model that tells you the dynamics of the state transitions. This is the typical Bellmann setting. If I'm in state S1 and I do action a1 I land in state S2 which has value V(S2). If I did action a3 I would have landed in state S3 with value V(S3). Since V(S3) > V(S2) I'm going to do action a3. If you *don't* know the dynamics (you have no idea in what state you'll land when you do action a1) then you can't use this.

2) You can also model the value of each possible action you can do in each state. State,Action->Value R^361 x 361 -> R^1. You can then pick an action a3 if V(S1,a2) < V(S2,a3). For this you don't need to know the dynamics of the state transitions.

3) Finally there is also an approach where you directly model what action to take in a certain state with a probability distribution over actions. A model like when I'm in state S1 I'm going to do action a1 70% of the times: P(S1,a=a1)=0.7 P(S1,a=a2)=0.3. For some types of problems random-ish actions is best this to do, e.g. when you can't fully observe the state. Or, e.g. in rock paper scissors the optimal action is to do a random thing.

### Re: DL and PDEs

Posted: **April 11th, 2018, 9:42 am**

by **Cuchulainn**

@Cuchulainn:

That is quick, but I suppose you know the stuff like the back of your hand! Thank you for the valuable technical evaluation. I will look into the details you have mentioned. It will take me some time.

If you are serious show for 1 factor early exercise option as Proof of Concept.

That is precisely my motivation in looking into DL as an improvement on the Longstaff-Schwardz regression method for Monte Carlo simulation pricing of an American option. I presume you are not optimistic about its chance of success?

You're welcome.

Well, I don't know everything but I tend to be wary. What I miss is how they apply these numerical techniques (BTW my background is precisely numerical analysis). .Some claims are downright wrong.

LSM is also slow as you surely know. So, maybe DL will be the silver bullet. AFAIR (I supervised MSc students) the CURSE sets in LSM but like polynomially and you have to use orthogonal polynomials so maybe DL avoids this issue (if so, it would be magic).

I have no idea how one would formulate American option problem in a DL context. It would have be accurate and fast? Is DL

*ueberhaup*t the correct paradigm? I don't know.

Still, it would be a good project especially on n-core machine.

### Re: DL and PDEs

Posted: **April 17th, 2018, 4:41 pm**

by **Cuchulainn**

I recently sent a number of questions (mostly numerics and feasibility) to the authors after having read this and related articles a few times. Hopefully get some feedback soon.

### Re: DL and PDEs

Posted: **April 21st, 2018, 10:35 pm**

by **ISayMoo**

I recently sent a number of questions (mostly numerics and feasibility) to the authors after having read this and related articles a few times. Hopefully get some feedback soon.

If you ever write a self-contained comment or a critique of applications of ML to PDE solving, I'd really like to see it.

### Re: DL and PDEs

Posted: **April 22nd, 2018, 6:36 pm**

by **Cuchulainn**

Sure

*Hello,*
*I have a number of questions and remarks on the article. In general, I feel the numerical methods are difficult to use or just wrong:*
*1. Picard/Banach does not converge exponentially as you claim?*
*2. Finding a contraction is difficult (sometimes impossible). To he honest, I am sceptical*
*3. Multilevel Monte Carlo is slow from my experience. Is the underlying maths also robust?*
*4. The PDEs used are esoteric and not mainstream. Why not start with 3d heat equation, Black Scholes, Heston?*
*5. I don't see sharp error estimates.*
*Maybe I am missing something.*
*See section 2.1*
*https://arxiv.org/abs/1708.03223*

### Re: DL and PDEs

Posted: **July 5th, 2018, 8:47 pm**

by **Cuchulainn**

http://www.sciencemag.org/news/2018/05/ ... ng-alchemy
Rahimi charged that machine learning algorithms, in which computers learn through trial and error, **have become a form of "alchemy."** Researchers, he said, do not know why some algorithms work and others don't, nor do they have rigorous criteria for choosing one AI architecture over another.
Even for 'outsiders' this field does seem to be a tad chaotic.For example, the attention to mathematical rigour has lost out to benchmarks for the moment.

### Re: DL and PDEs

Posted: **July 11th, 2018, 8:25 am**

by **Cuchulainn**

*Despite all the marketing hype coming out of Mountain View, there really hasn’t been much in the way of conceptualbreakthroughs in machine learning since the 1990s. Improvements in neural networks have caused excitement, and the ability of deep learning to work more efficiently on images is an improvement in capabilities. Stuff like gradient boost machines have also been a considerable technical improvement in usable machine learning. They don’t really count as big conceptual breakthroughs; just normal improvements for a field of engineering that has poor theoretical substructure. As for actual “AI” -almost nobody is really working on this.*

One example that intrigues me is when the big three (Google etc.) came to a y junction in the road they chose for discrete dynamical systems (grosso modo, GD) and not continuous ones (ODEs). The numerical methods used are very old (not necessarily bad) but outdated IMO.

Of course, GD et al is easy to program.

### Re: DL and PDEs

Posted: **August 24th, 2018, 1:32 pm**

by **ISayMoo**