Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Cuchulainn · April 5th, 2019, 4:38 am

I think this idée fixe on a grand unifying theory is leading nowhere. Give it up. It's a distraction.
There are bigger challenges.

// No apostrophe in Finnegans

JohnLeM · April 5th, 2019, 7:02 am

That's like saying that the "Finnegans Wake" and "Fifty Shades of Grey" are the same thing, because they both contain English words.

Eq. 5.1 in Bishop's book concerns regression problems. In such case you're not optimising x_i's, because they are fixed - they are the data. You're optimising the weights or changing the kernel itself.

Once you depart from the regression problem and start thinking about classification, for example, the kernel - NN analogy breaks down.

Kernel methods also enforce different boundary conditions on kernel functions than what we require from kernel functions. Kernel functions live in different functional spaces.

It's not enough for the symbols to be exchangeable

Concerning your comment over litterature, I could not say, I did not read Finnegans wake

Nt..nt..nt..this distinction between x_i's and weights is curious. I think that you misunderstood what I wrote.

Ok, since the relabelling fashion is at its peak, let allow me to relabel things. The only question is: what are the unknowns in Bishop book ? Then label it x_i. Let really comes into the details: in Bishop book, the unknowns are w_{i,j} ? Call them x_{i+N*j} in my notations...

But usually, these problems have a particular structure. For instance w_{i,j} = M_{i,j}(x_1,...,x_N), where M= (M_{i,j}) is a matrix representing some kind of projection for classifiers, or a propagation operator for a PDE. That's why I introduced a matrix M above. To fix things, think now that each x_i is a 16x16 pixel image for the MNIST problem, or that X = (x_1,..,x_N) is a PDE grid.

JohnLeM · April 5th, 2019, 7:11 am

I think this idée fixe on a grand unifying theory is leading nowhere. Give it up. It's a distraction.
There are bigger challenges.

// No apostrophe in Finnegans

Cuchullain, I am not developing the BET (Big Everything Theory). I am just saying that it probably exists, and that kernels could do the job.

JohnLeM · April 5th, 2019, 7:15 am

I understand your last paragraph as saying that I could price different derivative payoffs for the same underlying price process, right?
Yes. For instance in this post, you can see this effect: the samples points are computed within 20 secs (a 12-th dimensional process AFAIR). Then a portfolio of 10000 complex product is evaluated within less than 10 secs.
Bear in mind that people use the same set of MC paths to evaluate a whole portfolio, too. This is not just a performance optimisation, also improves the accuracy of the hedging (less noise).

Did you run large scale tests? Take a portfolio of 10,000 options. Price them to the same level of error using a) your method and b) standard Monte Carlo tricks. What is the computation time for a) and b)?

We did so many tests for this method...concerning this very precise question, I already answered above: see this post for instance.

JohnLeM · April 5th, 2019, 7:36 am

@"Quoting your own words, PDE methods have "shapeshifting" properties."

I don't think I wrote something like that. I don't even know what it would mean. I wrote that parameterised functions (here NNs) have, metaphorically speaking, shapeshifting properties - as opposed to fixed-shaped kernels. That's the essential property of NNs, which motives many of its applications.

The way I see it is that there are different approaches to solving optimisation problems, and they often determine the choice of different techniques for their implementations. To say something obvious, since the problem being solved doesn't change, the approaches are equivalent up to a certain level of detail. I wouldn't put an equality sign between your method and an NN optimisation, though, e.g. because your method expects the positions of particles

I put an equality sign because both approach are equivalents. To be really clear, it should be better for AI guys to notice this equivalence right now, if they plan one day to compute properly error estimations, or to avoid bothering future generations with unnecessary complexity...

katastrofa · April 5th, 2019, 8:40 am

There's at least one person from the AI community here who's trying to tell you that you're mistaken about the equivalence and explaining clearly why (however difficult it is to pin down what you mean).

JohnLeM · April 5th, 2019, 8:44 am

There's at least one person from the AI community here who's trying to tell you that you're mistaken about the equivalence and explaining clearly why (however difficult it is to pin down what you mean).

Ok. But we are both mathematicians : give me a problem that one method can solve, and the other can not.

katastrofa · April 5th, 2019, 9:13 am

It takes being a physicist to know that translating every problem to PDEs is not necessarily a good approach

JohnLeM · April 5th, 2019, 9:22 am

Your LinkedIn post only gives the computation time for your method, it doesn't give the computation time for the Monte Carlo method which gives the same accuracy. And it's not clear from it how you implemented the MC pricer.

This sounds naive: "The overall error on the entire portfolio of 1024 AutoCall is 0.2% (relative error on price), which corresponds to a convergence factor of rate 1 / N with N = 512". What about the constant factor? Convergence rate O(1/N) refers to how error eps(N) scales with N, not that eps = 1/N for some particular value of eps and N.

And first you're saying that your method converges at rate 1/N^2, but then you report rate 1/N for an AutoCall portfolio.

And buried in a footnote is the shocker news that your exciting 1/N^2 convergence rate does not work for bounded variation functions, a class which includes most reasonable payoff functions one can think about (e.g. (S(T) - K)+) ! So for all practical purposes, your method has the same convergence properties as Sobol numbers...

I think the problem you have in selling your method comes from the fact that you're overselling it too much and are not presenting its strengths and limitations clearly enough.

This is a Linkedin post. The MC Pricer is basically an encapsulation of boost libraries. Computation time is not very important there, as we were looking to a reference price.

The relevant question in your answer is : "what about the constant factor ?". For instance, variance is a relevant constant factor for Monte-Carlo methods. Here, for this approach, the relevant constant is a "Sobolev" type norm.

As you read in the footnote, autocalls are discontinuous functions, they are less regulars than calls options : they are like barriers options, to be snobbish, they are heavyside-like functions. I am saying that these methods converges at rate 1/N^2 for call options. I can also write 1/N^3 if you consider one order of regularity more for your sampled functions. But it is 1/N for discontinuous functions. My only point is to say that this estimation is optimal : you won't do better than this estimation, because you can't. There does not exist a sampling set of points reaching a better rate of convergence. A corollary of this is the following : the PDE methods that we designed have the best possible convergence rate in some sense : that is what we have to do to tackle the curse of dimensionality.

Concerning you last remark, what is true is that I should find time (hence money:/) to write a serious article instead of writing linkedin posts.

katastrofa · April 5th, 2019, 9:43 am

It seems you're in a catch-22.

JohnLeM · April 5th, 2019, 9:47 am

It seems you're in a catch-22.

Could you develop ?

JohnLeM · April 5th, 2019, 10:01 am

your method has the same convergence properties as Sobol numbers...

Interesting remark. Yes, you are correct, these methods converge also at rate (LnN)^{D-1} /N if you consider a kernel generating the functional space corresponding to Koksma-Hlawka inequality. Dont expect a miracle: Koksma-Hlawka are already optimal estimations, you can't expect a sampling method to beat it.

However, have you ever tried to work with low-number of Sobol points ? Yes we can !

JohnLeM · April 5th, 2019, 11:13 am

Computational time is very important for people who need to deliver risk numbers to the boss at 7am every morning. These are the people you're selling your stuff to.

And you didn't answer my question (again ) - did you run the MC pricer separately for every option, or once for the whole portfolio? It doesn't affect only the computational time, but also the accuracy...

The answer is in the post : there are only 10, 11, 12 different reference prices in this test, but with 1024, 2048, 4096 combinations of underlyings for D=10,11,12. Thus I ran 10, 11, 12 times the Monte-Carlo pricer.

JohnLeM · April 5th, 2019, 11:16 am

Well, my criticism was even simpler: it's just not accurate or correct to say "0.2% (relative error on price), which corresponds to a convergence factor of rate 1 / N with N = 512" even if, in fact, 1/512 ~= 0.002. You CANNOT estimate the convergence rate for a single value of N. Either you prove it analytically (in the limit N -> infinity) or you try a series of N's and fit a linear function to error as a function of log N.

We did both : it is proven theoretically, and we also tested numerically for a lot of examples. I need to publish all this.

JohnLeM · April 5th, 2019, 11:19 am

You seem to be contradicting your LinkedIn post now (or I don't understand something). In it you wrote that "a bounded variation function, a function class for which we know that the convergence rates of a sampling method can not exceed 1 / N, not 1/N^2." The payoff of a call option, (S - K)+, is a bounded variation function, hence the convergence rate should be limited to 1/N.

No : a call option is a function having a gradient of bounded variation. It is one order smoother than a barrier option. Rephrasing : the gradient of a call option is a barrier option.

Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?

Re: Are Artificial Intelligence methods (AKA Neural Networks) for PDEs about to rediscover the wheel ?