SERVING THE QUANTITATIVE FINANCE COMMUNITY

Cuchulainn
Topic Author
Posts: 59017
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

That's OK. I've got it. And I see from Storn and Price that the 1985 SDE method was obsoleted by DE circa 1995 because of a massive reduction in the number of function evaluations needed! (Table 3). So, while an SDE approach indeed works, it's not a good one. Well, I learned something today.
DE is a population-based ('safety-in-numbers') optimisation method and it uses a selection method to allow certain members of the population to evolve to the next generation. I programmed a C++11 and tested against the usual suspects (recall the 'iv' thread as well). It is extremely robust as far as I can see but compute-intense. It seems that choosing its hyperparameters is an art..

The Rastrigin function is non-convex and has many local minima. For n = 10 I got the answer with a population 6000 and 50000 generations.

https://en.wikipedia.org/wiki/Rastrigin_function

(BTW Ackley in 2d is a piece of cake NPOP = NGEN = 100 does the job; ODE solver converges to a local minimum).
DE is not meant for sequential code.@Alan, how does MM cope with DE (efficiency)?

Cuchulainn
Topic Author
Posts: 59017
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

But, I can see at least two problems. First, evidently missing from the idea is some supplementary criteria that makes you jump away from some local maxima that's inferior to some other one already investigated. Second, I would like the Brownian noise term to disappear when f( ) is convex, thus reducing the SDE system to your ODE system.

I read Aluffi et al 1985. The ODE solver converges when the function minimum is unique but it is somewhat of a leap-of-faith IMO to consider a 'suitable stochastic perturbation' (what's the ratonale?) that will find the global solution in some magic way.

The proposed algorithm reminds me of a protean evolutionary program?? When they 'split a path into 2 paths', what do they mean?

Alan
Posts: 9741
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

Re your MM question, you have to remember that Mathematica is really a black box in terms of algorithmic details. I don't use DE that much --  mostly as a double check that some local extremum I have found, say for maximum likelihood, is actually a global optimum.

DE is a submethod of its global optimizer called NMinimize. I don't know if there are details beyond this

The local optimizer is called FindMinimum. From casual use, NMinimize running the DE method seems to widely explore the parameter space until it thinks it has a good global candidate; then it appears to switch to FindMinimum for the final convergence. There is a user parameter called MaxIterations: say the default is 100. Then, If NMinimize hasn't converged by, let's say, 90 iterations, it seems to switch to FindMinimum on its best candidate so far. Then, it will either converge to the requested Precision (and/or Accuracy goal), or report a failure to converge. Again, these are just my casual use impressions of the black box.

The documentation cautions that, even with NMinimize, finding the global minimum is not guaranteed.

Alan
Posts: 9741
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

I read Aluffi et al 1985. The ODE solver converges when the function minimum is unique but it is somewhat of a leap-of-faith IMO to consider a 'suitable stochastic perturbation' (what's the ratonale?) that will find the global solution in some magic way.
No, they give the strong impression the method could be made rigorous. Plus Table 2 strongly suggests it indeed 'works'.

Their rationale was intuition from quantum mechanics for a particle in a multi-welled potential, as the Schrodinger equation and heat equation are formally similar. In the long-run, if you remove the diffusive term perturbation in the right way, apparently the particle is almost certainly found at the bottom of the deepest well: see (19). That's their intuition. It is kinda magic. But, see the last paragraph of Sec. 2; they promised a rigorous follow-up elsewhere.  Farid Moussaoui probably knows?

FaridMoussaoui
Posts: 370
Joined: June 20th, 2008, 10:05 am

### Re: Asymptotic behaviour of ODE/PDE (large time)

There were a follow up paper in 1988 in ACM Trans Math Soft but don't know if it included a proof about (19).

Cuchulainn
Topic Author
Posts: 59017
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

Re your MM question, you have to remember that Mathematica is really a black box in terms of algorithmic details. I don't use DE that much --  mostly as a double check that some local extremum I have found, say for maximum likelihood, is actually a global optimum.

DE is a submethod of its global optimizer called NMinimize. I don't know if there are details beyond this

The local optimizer is called FindMinimum. From casual use, NMinimize running the DE method seems to widely explore the parameter space until it thinks it has a good global candidate; then it appears to switch to FindMinimum for the final convergence. There is a user parameter called MaxIterations: say the default is 100. Then, If NMinimize hasn't converged by, let's say, 90 iterations, it seems to switch to FindMinimum on its best candidate so far. Then, it will either converge to the requested Precision (and/or Accuracy goal), or report a failure to converge. Again, these are just my casual use impressions of the black box.

The documentation cautions that, even with NMinimize, finding the global minimum is not guaranteed.
The Storn and Price paper's figure 2 is 19 lines of C that does the whole shebang.

Alan
Posts: 9741
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

There were a follow up paper in 1988 in ACM Trans Math Soft but don't know if it included a proof about (19).
Thanks -- couldn't tell either. But found proofs by other authors here

Cuchulainn
Topic Author
Posts: 59017
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

There were a follow up paper in 1988 in ACM Trans Math Soft but don't know if it included a proof about (19).
Thanks -- couldn't tell either. But found proofs by other authors here
This looks very promising. I've done some reading + numerical experimentation on this Langevin SDE with noise (BTW momentum/friction  could be added to this SDE). The Geman article is related to Boltzmann annealing (and statistical mechanics) that avoids falling into local minima as seen with greedy algorithms such as SDE and those used in ML. Seems it can be used with minibatches as well..

// On DE, it is very robust by can slide downhill to a local minimum. Then use simulated annealing as a step in the algorithm.

Solving the Geman equation (1.1) numerically I see two possibilities:

1. Solve it as an SDE in the usual way.
2. Solve the ODE gradient system part and perturb the solution to get out of a rut.

What do you think?

//
The mahs become all this is much more robust and applicable (and pleasing!) than jumping straight into its discrete 'recipe' equivalents such as Gradient Descent (GD), which is quite brittle.

Cuchulainn
Topic Author
Posts: 59017
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

I can vaguely remember doing a course on "nonlinear differential equations" as an undergrad and pretty sure we looked at the stability of the "point" at infinity, but long since lost the notes and can't remember how we did it
Bear in mind that your transformation maps the circle $|t|=\infty$ to the point $\tau=1$

wiki has Barbalat's lemma and stability of time-varying systems for when you have $\frac{dy}{dt}=f(t)$ rather than $f(t;y(t))$
This seems like a good approach. But we now have an SDE which means we probably need a stochastic Barbalat.

http://liberzon.csl.illinois.edu/teachi ... rbalat.pdf

Alan
Posts: 9741
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

There were a follow up paper in 1988 in ACM Trans Math Soft but don't know if it included a proof about (19).
Thanks -- couldn't tell either. But found proofs by other authors here
This looks very promising. I've done some reading + numerical experimentation on this Langevin SDE with noise (BTW momentum/friction  could be added to this SDE). The Geman article is related to Boltzmann annealing (and statistical mechanics) that avoids falling into local minima as seen with greedy algorithms such as SDE and those used in ML. Seems it can be used with minibatches as well..

// On DE, it is very robust by can slide downhill to a local minimum. Then use simulated annealing as a step in the algorithm.

Solving the Geman equation (1.1) numerically I see two possibilities:

1. Solve it as an SDE in the usual way.
2. Solve the ODE gradient system part and perturb the solution to get out of a rut.

What do you think?

//
The mahs become all this is much more robust and applicable (and pleasing!) than jumping straight into its discrete 'recipe' equivalents such as Gradient Descent (GD), which is quite brittle.
As I read the link, you need to use 1.1 with $T(t) = c/\log(1+t)$ or some other function $T(t)$ that meets the condition of the Theorem on pg. 1035. So, yes, 1. with that proviso. 2. sounds ad hoc. Haven't read it closely enough or followed the cites to see what "c" should be, though.

figaro
Posts: 149
Joined: October 3rd, 2005, 5:49 pm

### Re: Asymptotic behaviour of ODE/PDE (large time)

That initial T=t/(1+t) is actually pretty problematic. Perfectly well behaved functions can go completely crazy at T=1.

E.g., if your target function has any oscilatory component whatsoever, it will probably have infinitely many maxima and minima in an epsilon sized interval. For any epsilon. It won't help if the amplitude of the oscillation goes to zero, you still have the exact same problem.

The only way stochastic smoothing can sort it out is by smoothing out all oscillations, which (probably) defeats the purpose.

What is wrong with simple long time asymptotics, t = T / epsilon? It is the exact same t ~ 1/epsilon scaling at leading order, but without the time compression.

Cuchulainn
Topic Author
Posts: 59017
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

This thread stated life as ODEs for miniminisation (locals), which are found for large t. The transfomation y = t/(1+t) (whch btw are used for boundary value problems) was to see if we could reduce the ODE to one on (0,1). Trying out/kick tyres.

It became clear that an SDE in combination with Boltzman annealing is needed if we wish to find global mimina.

There's yuge amount on how to solve these SDEs. A related topic is to use simulated annealing with robust but greedy Diferential Evolution..

Cuchulainn
Topic Author
Posts: 59017
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

Try linear regression with L2 loss, but at every step select randomly a subset of (x, y) pairs.

Yes. Did as per this suggestion.

This refers to optimisation the MSE(L2) cost function using linear regession.
I have written a small prototype that implements SGD but using gradient system ODE that I solve numerically. It;'s like a Langevin SDE.

1. Batch size m << N.
2. Exact and perturbed function values.

Looks robust on examples. And probably a better way to go than the traditional discrete-time GD method.

Cuchulainn
Topic Author
Posts: 59017
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

The method employed has the same structure as in this article except that I solve a gradient system ODE (both with and without noise) instead of discrete GD.

https://arxiv.org/abs/1707.06618

Cuchulainn
Topic Author
Posts: 59017
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: Asymptotic behaviour of ODE/PDE (large time)

An interesting spin-off is using gradients systems for equality and inequality constraints (called regularisation in ML?). In the former case I used the quadratic penalty method by taking the gradient of the square of the constraint function. Till now, it looks good, Easier to program then the classical methods; the ODE solver takes care of the 'hyperparameters' like $dt$.
Another approach is to add a Lagrange multiplier term to the ODE on a manifold as here:

// I see regularisation as defining constraints to reduce generalisation error with new data. It is an add-on(s) to the primary cost/loss function. It is a specific  example of constrained optimisation,
Last edited by Cuchulainn on April 17th, 2019, 12:41 pm, edited 4 times in total.

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...

 JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...

GZIP: On