Serving the Quantitative Finance Community

 
User avatar
Cuchulainn
Topic Author
Posts: 20250
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: Asymptotic behaviour of ODE/PDE (large time)

December 22nd, 2018, 2:29 pm

That's OK. I've got it. And I see from Storn and Price that the 1985 SDE method was obsoleted by DE circa 1995 because of a massive reduction in the number of function evaluations needed! (Table 3). So, while an SDE approach indeed works, it's not a good one. Well, I learned something today.
DE is a population-based ('safety-in-numbers') optimisation method and it uses a selection method to allow certain members of the population to evolve to the next generation. I programmed a C++11 and tested against the usual suspects (recall the 'iv' thread as well). It is extremely robust as far as I can see but compute-intense. It seems that choosing its hyperparameters is an art..

The Rastrigin function is non-convex and has many local minima. For n = 10 I got the answer with a population 6000 and 50000 generations. 

https://en.wikipedia.org/wiki/Rastrigin_function

(BTW Ackley in 2d is a piece of cake NPOP = NGEN = 100 does the job; ODE solver converges to a local minimum).
DE is not meant for sequential code.@Alan, how does MM cope with DE (efficiency)?
 
User avatar
Cuchulainn
Topic Author
Posts: 20250
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: Asymptotic behaviour of ODE/PDE (large time)

December 22nd, 2018, 3:51 pm

But, I can see at least two problems. First, evidently missing from the idea is some supplementary criteria that makes you jump away from some local maxima that's inferior to some other one already investigated. Second, I would like the Brownian noise term to disappear when f( ) is convex, thus reducing the SDE system to your ODE system.  

I read Aluffi et al 1985. The ODE solver converges when the function minimum is unique but it is somewhat of a leap-of-faith IMO to consider a 'suitable stochastic perturbation' (what's the ratonale?) that will find the global solution in some magic way.

The proposed algorithm reminds me of a protean evolutionary program?? When they 'split a path into 2 paths', what do they mean?
 
User avatar
Alan
Posts: 2958
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: Asymptotic behaviour of ODE/PDE (large time)

December 22nd, 2018, 4:15 pm

Re your MM question, you have to remember that Mathematica is really a black box in terms of algorithmic details. I don't use DE that much --  mostly as a double check that some local extremum I have found, say for maximum likelihood, is actually a global optimum.  

DE is a submethod of its global optimizer called NMinimize. I don't know if there are details beyond this

The local optimizer is called FindMinimum. From casual use, NMinimize running the DE method seems to widely explore the parameter space until it thinks it has a good global candidate; then it appears to switch to FindMinimum for the final convergence. There is a user parameter called MaxIterations: say the default is 100. Then, If NMinimize hasn't converged by, let's say, 90 iterations, it seems to switch to FindMinimum on its best candidate so far. Then, it will either converge to the requested Precision (and/or Accuracy goal), or report a failure to converge. Again, these are just my casual use impressions of the black box. 

The documentation cautions that, even with NMinimize, finding the global minimum is not guaranteed.
 
User avatar
Alan
Posts: 2958
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: Asymptotic behaviour of ODE/PDE (large time)

December 22nd, 2018, 4:31 pm

I read Aluffi et al 1985. The ODE solver converges when the function minimum is unique but it is somewhat of a leap-of-faith IMO to consider a 'suitable stochastic perturbation' (what's the ratonale?) that will find the global solution in some magic way.
No, they give the strong impression the method could be made rigorous. Plus Table 2 strongly suggests it indeed 'works'.

Their rationale was intuition from quantum mechanics for a particle in a multi-welled potential, as the Schrodinger equation and heat equation are formally similar. In the long-run, if you remove the diffusive term perturbation in the right way, apparently the particle is almost certainly found at the bottom of the deepest well: see (19). That's their intuition. It is kinda magic. :D But, see the last paragraph of Sec. 2; they promised a rigorous follow-up elsewhere.  Farid Moussaoui probably knows?  
 
User avatar
FaridMoussaoui
Posts: 327
Joined: June 20th, 2008, 10:05 am
Location: Genève, Genf, Ginevra, Geneva

Re: Asymptotic behaviour of ODE/PDE (large time)

December 22nd, 2018, 7:01 pm

There were a follow up paper in 1988 in ACM Trans Math Soft but don't know if it included a proof about (19).
 
User avatar
Cuchulainn
Topic Author
Posts: 20250
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: Asymptotic behaviour of ODE/PDE (large time)

December 22nd, 2018, 8:19 pm

Re your MM question, you have to remember that Mathematica is really a black box in terms of algorithmic details. I don't use DE that much --  mostly as a double check that some local extremum I have found, say for maximum likelihood, is actually a global optimum.  

DE is a submethod of its global optimizer called NMinimize. I don't know if there are details beyond this

The local optimizer is called FindMinimum. From casual use, NMinimize running the DE method seems to widely explore the parameter space until it thinks it has a good global candidate; then it appears to switch to FindMinimum for the final convergence. There is a user parameter called MaxIterations: say the default is 100. Then, If NMinimize hasn't converged by, let's say, 90 iterations, it seems to switch to FindMinimum on its best candidate so far. Then, it will either converge to the requested Precision (and/or Accuracy goal), or report a failure to converge. Again, these are just my casual use impressions of the black box. 

The documentation cautions that, even with NMinimize, finding the global minimum is not guaranteed.
The Storn and Price paper's figure 2 is 19 lines of C that does the whole shebang.
 
User avatar
Alan
Posts: 2958
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: Asymptotic behaviour of ODE/PDE (large time)

December 22nd, 2018, 10:47 pm

There were a follow up paper in 1988 in ACM Trans Math Soft but don't know if it included a proof about (19).
Thanks -- couldn't tell either. But found proofs by other authors here
 
User avatar
Cuchulainn
Topic Author
Posts: 20250
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: Asymptotic behaviour of ODE/PDE (large time)

January 18th, 2019, 10:09 am

There were a follow up paper in 1988 in ACM Trans Math Soft but don't know if it included a proof about (19).
Thanks -- couldn't tell either. But found proofs by other authors here
This looks very promising. I've done some reading + numerical experimentation on this Langevin SDE with noise (BTW momentum/friction  could be added to this SDE). The Geman article is related to Boltzmann annealing (and statistical mechanics) that avoids falling into local minima as seen with greedy algorithms such as SDE and those used in ML. Seems it can be used with minibatches as well..

// On DE, it is very robust by can slide downhill to a local minimum. Then use simulated annealing as a step in the algorithm.

Solving the Geman equation (1.1) numerically I see two possibilities:

1. Solve it as an SDE in the usual way.
2. Solve the ODE gradient system part and perturb the solution to get out of a rut.

What do you think? 

//
The mahs become all this is much more robust and applicable (and pleasing!) than jumping straight into its discrete 'recipe' equivalents such as Gradient Descent (GD), which is quite brittle.
 
User avatar
Cuchulainn
Topic Author
Posts: 20250
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: Asymptotic behaviour of ODE/PDE (large time)

January 18th, 2019, 3:36 pm

I can vaguely remember doing a course on "nonlinear differential equations" as an undergrad and pretty sure we looked at the stability of the "point" at infinity, but long since lost the notes and can't remember how we did it
Bear in mind that your transformation maps the circle [$]|t|=\infty[$] to the point [$]\tau=1[$]

wiki has Barbalat's lemma and stability of time-varying systems for when you have [$]\frac{dy}{dt}=f(t)[$] rather than [$]f(t;y(t))[$]
This seems like a good approach. But we now have an SDE which means we probably need a stochastic Barbalat.

http://liberzon.csl.illinois.edu/teachi ... rbalat.pdf
 
User avatar
Alan
Posts: 2958
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Re: Asymptotic behaviour of ODE/PDE (large time)

January 21st, 2019, 5:13 am

There were a follow up paper in 1988 in ACM Trans Math Soft but don't know if it included a proof about (19).
Thanks -- couldn't tell either. But found proofs by other authors here
This looks very promising. I've done some reading + numerical experimentation on this Langevin SDE with noise (BTW momentum/friction  could be added to this SDE). The Geman article is related to Boltzmann annealing (and statistical mechanics) that avoids falling into local minima as seen with greedy algorithms such as SDE and those used in ML. Seems it can be used with minibatches as well..

// On DE, it is very robust by can slide downhill to a local minimum. Then use simulated annealing as a step in the algorithm.

Solving the Geman equation (1.1) numerically I see two possibilities:

1. Solve it as an SDE in the usual way.
2. Solve the ODE gradient system part and perturb the solution to get out of a rut.

What do you think? 

//
The mahs become all this is much more robust and applicable (and pleasing!) than jumping straight into its discrete 'recipe' equivalents such as Gradient Descent (GD), which is quite brittle.
As I read the link, you need to use 1.1 with [$]T(t) = c/\log(1+t)[$] or some other function [$]T(t)[$] that meets the condition of the Theorem on pg. 1035. So, yes, 1. with that proviso. 2. sounds ad hoc. Haven't read it closely enough or followed the cites to see what "c" should be, though.
 
User avatar
figaro
Posts: 7
Joined: October 3rd, 2005, 5:49 pm

Re: Asymptotic behaviour of ODE/PDE (large time)

February 6th, 2019, 11:08 am

That initial T=t/(1+t) is actually pretty problematic. Perfectly well behaved functions can go completely crazy at T=1.

E.g., if your target function has any oscilatory component whatsoever, it will probably have infinitely many maxima and minima in an epsilon sized interval. For any epsilon. It won't help if the amplitude of the oscillation goes to zero, you still have the exact same problem.

The only way stochastic smoothing can sort it out is by smoothing out all oscillations, which (probably) defeats the purpose.

What is wrong with simple long time asymptotics, t = T / epsilon? It is the exact same t ~ 1/epsilon scaling at leading order, but without the time compression.