Serving the Quantitative Finance Community

  • 1
  • 2
  • 3
  • 4
  • 5
  • 12
 
User avatar
mtsm
Topic Author
Posts: 78
Joined: July 28th, 2010, 1:40 pm

DL and PDEs

November 6th, 2017, 8:46 pm

Did you guys discuss this here already?
https://arxiv.org/pdf/1706.04702.pdf
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: DL and PDEs

November 6th, 2017, 9:47 pm

We knows nothing about this. It's kind of pure maths/university research.
Good compelling examples missing.
 
User avatar
ISayMoo
Posts: 2332
Joined: September 30th, 2015, 8:30 pm

Re: DL and PDEs

November 6th, 2017, 10:36 pm

Many people try such things now. What I find a bit weird in their implementation is that they seem to have a recurrent structure in the solver, but do not make use of this fact in the code. They just stack layer after layer, one for each time step. It won't scale.
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: DL and PDEs

November 7th, 2017, 9:25 am

One point is not to succumb to what happened to String Theory, i.e. unrealistic expectations. AI has a track record but maybe it us better to focus on what it does best (and what it does not do best).

I can't imagine a NN being more efficient and accurate than a standard finite difference method. But you never know.

I am reading Goodfellow et al. One first impression (which is allowed!) is that the underlying maths (chapter 4, for example) is fairly basic and somewhat outdated. But I need to read more of course. Is there no alternative to gradient-based methods here? Gradients are badly-behaved objects.They fly off the handle so easily.
 
User avatar
mtsm
Topic Author
Posts: 78
Joined: July 28th, 2010, 1:40 pm

Re: DL and PDEs

November 7th, 2017, 2:20 pm

I practically created this thread for you. 

I don't think that it is known what it does best and what not. I could imagine that for very high-dimensional problems the universal function approximation properties of neural nets could well be useful in various areas outside of computer vision. 

Your call on the math being outdated and fairly basic is missing the point a bit I think. The main problem with most people in this forum as I have been complaining about for years if you read some threads I contributed into, is that it is filled with people who are  essentially hung up irreversibly on parametric methods. It makes things looks exceedingly old-fashioned. It's pretty good that some people have no taken an interest into non-parametric methods. This is what machine learning is, it's basically non-parametric statistical modeling and gradient descent is extremely general, so quite appropriate in such a general setting. The math in this book isn't exactly simplistic btw. 

My sense is that ML in finance is already past its prime. Success on the buy-side hasn't been spectacular. Banks haven't even started failing yet. I think things swung from extremely parametric models over into extremely non-parametric models. The truth lies hopefully somewhere in between. 
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: DL and PDEs

November 7th, 2017, 4:59 pm

I hope others can feel it is for them as well. I don't have a case I worked out myself but that should not stop our asking questions such as:

1. Why not use Differential Evolution as well as Gradient Descent (saying it's slow is disingenuous)
2. I would like C++  as well as Python. Is Python slow?
3. Is DL for more than computer vision?
4. A 101 example A-Z just to show how it works.
5. I had some exposure to topology in previous life: somehow TDA feels more robust than universal approximators. 

These are genuine questions.
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: DL and PDEs

November 7th, 2017, 6:20 pm

1. NN are typically 10k to 10mln dimensional functions. DE would be very slow, you would need at least the double of number agents but in practice many more or else you would be searching in a subspace. Each agent represent an instance of the network, so big memory consumption and computations on it.
NN have lots of paths to minima, they have eg lots of permutation invariants. You also don't want to find the global minimum because you would be overfitting. The most common method is *stochastic* gradient descent because the values function is not smooth and convex.
2. All the popular framework have bindings to popular languages like python and c++. The backend are BLAS and CUDA libraries where 99% of the code executing will be.
3. Yes, anything where finding representations of data is a good candidate.
4. You have a good book, start coding and experimenting. The course of Andrew Ng is really a good intro.
5. Imo the challenges are in the loss functions, and topologies. People look at NN as networks with valves, intersections, where information flow, gradient management. Unsupervised learning is the most exciting area at the moment.
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: DL and PDEs

November 7th, 2017, 8:31 pm

You can try a simple fully connected NN of eg 3 layers.

Input a vector x of length 256. Each layer does y =max(Ax+b,0), with A a 256x256 matrix, b a vector of length 256. The y output of layer L in the x input of layer L+1.

So, you have a 256 vector input, then a bunch of layer, and then a 256 vector output. Try to teach it to map some 256d input points to some 256d output points. Eg, provide it with 100.000 generated x,y pairs of some function. Maybe sin(tf) framents as input and teach it to output cos(5tf)

A first questions:
How many variables does this NN have?
 
User avatar
ISayMoo
Posts: 2332
Joined: September 30th, 2015, 8:30 pm

Re: DL and PDEs

November 7th, 2017, 9:20 pm

Is there no alternative to gradient-based methods here? Gradients are badly-behaved objects.They fly off the handle so easily.
Geoff Hinton (the guy who invented back-propagation) kind of agrees with you: https://www.axios.com/ai-pioneer-advoca ... 37027.html
 
User avatar
ISayMoo
Posts: 2332
Joined: September 30th, 2015, 8:30 pm

Re: DL and PDEs

November 7th, 2017, 9:32 pm

I hope others can feel it is for them as well. I don't have a case I worked out myself but that should not stop our asking questions such as:

1. Why not use Differential Evolution as well as Gradient Descent (saying it's slow is disingenuous)
People use evolutionary algorithms to search for better network architectures, but practically everyone trains a NN using Stochastic Gradient Descent.
2. I would like C++  as well as Python. Is Python slow?
It is, that's why TensorFlow is C++ under the hood.
3. Is DL for more than computer vision?
Of course! Speech recognition, stochastic control, playing games...
4. A 101 example A-Z just to show how it works.
Try this: https://www.tensorflow.org/get_started/mnist/beginners
5. I had some exposure to topology in previous life: somehow TDA feels more robust than universal approximators. 
These are genuine questions.
And... surprise! Neural Networks are not robust: https://blog.openai.com/adversarial-example-research/

Adversarial examples are a big topic in AI research now.
 
User avatar
katastrofa
Posts: 7440
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: DL and PDEs

November 7th, 2017, 11:36 pm

As much as a lot of ML looks to me like a set of not-so-sophisticated methods of overfitting (a.k.a. "universal representation"), there are some areas in which their efficiency can make a difference, e.g. faster algorithms of solving density functional theory (the fundamental method of modelling all sorts of many-body systems and the basis for developing new materials). Otherwise, the ML researchers seem to either fly away in the same direction as the ST people or appear utterly arrogant by claiming that they developed basic risk-score methods, Voronoi tessellation, etc. (I'm teasing ISayMoo) ;-)


@ISayMoo "Adversarial examples are a big topic in AI research now."

That's the thing. A lot of most urgent real-life problems are about dealing with fat-tail risks, e.g. a chihuahua vs a muffin or a camera vs a missile launcher.
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: DL and PDEs

November 8th, 2017, 9:44 am

1. NN are typically 10k to 10mln dimensional functions. DE would be very slow, you would need at least the double of number agents but in practice many more or else you would be searching in a subspace. Each agent represent an instance of the network, so big memory consumption and computations on it.


I don't get this answer, not at all. Are you saying finding a local minimum is OK?

1. In every branch, we desire a global minimum >> local minimum, even in DL (have a look at Goodfellow figure 4.3 page 81).
2. Gradient descent methods fined local minimum, not necessarily global. Is that serious?
3, Overfitting is caused by high-order polynomials (yes?). I don't see what the relationship is with finding minima.
4. More evidence is needed on "how slow" DE is (the good news is that always give a global minimum).
5. The Cybenko universal approximation theorem seems to have little coupling to anything in mainstream numerical approximation. Maybe it is not necessary, but maybe say that. Borel measures  and numerical accuracy are not a good mix IMO.

Mathematically, it feels that this approach is not even wrong..
Last edited by Cuchulainn on November 8th, 2017, 10:12 am, edited 6 times in total.
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: DL and PDEs

November 8th, 2017, 9:53 am

Yes, anything where finding representations of data is a good candidate.

This is very general. Reminds me of the 90s with OOT 1) everything is an object, 2) objects are for the plucking (easy to find).

What is needed is to list the criteria for using DL if we want avoid String Theory promise being all things to all men.

I am a DL noobie so maybe all these questions have already been addressed..
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: DL and PDEs

November 9th, 2017, 3:24 pm

Your call on the math being outdated and fairly basic is missing the point a bit I think. 

I may have missed a point, especially since you didn't give one. All I see is a link to a not useful article (that I waded in, not even a hint to pinpoint what's on your mind) and DL/PDE buzzwords in the title.

What's the question centering around DL/PDE,exactly? A guess; does DL solve curse of dimensionality?

// is there a GOOD paper on DL/PDE. 1st impression is it's a solution looking for a problem. 'Teaching' a pde sounds a bit weird. 
 
User avatar
ISayMoo
Posts: 2332
Joined: September 30th, 2015, 8:30 pm

Re: DL and PDEs

November 9th, 2017, 9:10 pm

1. NN are typically 10k to 10mln dimensional functions. DE would be very slow, you would need at least the double of number agents but in practice many more or else you would be searching in a subspace. Each agent represent an instance of the network, so big memory consumption and computations on it.


I don't get this answer, not at all. Are you saying finding a local minimum is OK?

1. In every branch, we desire a global minimum >> local minimum, even in DL (have a look at Goodfellow figure 4.3 page 81).
You don't want a global minimum on the training set, because that would be overfitting: early stopping
2. Gradient descent methods fined local minimum, not necessarily global. Is that serious?
Yes.
3, Overfitting is caused by high-order polynomials (yes?). I don't see what the relationship is with finding minima.
4. More evidence is needed on "how slow" DE is (the good news is that always give a global minimum).
5. The Cybenko universal approximation theorem seems to have little coupling to anything in mainstream numerical approximation. Maybe it is not necessary, but maybe say that. Borel measures  and numerical accuracy are not a good mix IMO.

Mathematically, it feels that this approach is not even wrong..
Your points feel the same way to me too :) more precision would be welcome :)