Serving the Quantitative Finance Community

 
User avatar
Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Asymptotic behaviour of ODE/PDE (large time)

April 17th, 2019, 12:26 pm

That's OK. I've got it. And I see from Storn and Price that the 1985 SDE method was obsoleted by DE circa 1995 because of a massive reduction in the number of function evaluations needed! (Table 3). So, while an SDE approach indeed works, it's not a good one. Well, I learned something today.
DE is robust, but a global minimum is not guaranteed. DE is a 'greedy' algorithm. It needs to be shook and roll up by annealing etc.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl
 
User avatar
Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Asymptotic behaviour of ODE/PDE (large time)

May 6th, 2019, 1:28 pm

This is an interesting approach, in particular SDE for global minimisation. It is more robust than the well-publicised GD.

https://www.researchgate.net/publicatio ... ns_47_1-16

An "SGD" version is doable.For the ODE I did it.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl
 
User avatar
Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Asymptotic behaviour of ODE/PDE (large time)

October 27th, 2019, 2:57 pm

Here is a very good and mathematically robust method(s) to solve optimisation problems using the bespoke ODE techniques. It is a brilliant paper (Wiki tells us J.C. Platt is ML guru).

https://papers.nips.cc/paper/4-constrained-differential-optimization.pdf

I did some tests of methods (10) and (27) + multiple constraints using ODE solvers and the results were very good. Mathematically, it is consistent with the weltanshauung in this thread.

What's the rationale for using an ODE for that type of problem? Why not just use standard optimizers 


[ltr]Probably more robust numerically and maths >> Lagrange multiplier trick.[/ltr]

[ltr]// I wonder how much of mainstream ecosystem know/use this method.[/ltr]
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl
 
User avatar
Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Asymptotic behaviour of ODE/PDE (large time)

December 30th, 2019, 1:49 pm

My first thought was something like

(*) [$]d \vec{x}_t = \nabla f(\vec{x}) dt + \vec{k} \, \nabla f(\vec{x}) \cdot d \vec{W}_t[$],
That was a "standard" approch to use a global optimiser. Have a look to Aluffi and all, Journal of optimization theory and application. Vol 47, 1995.
These approaches seem to have a 'handle",and are special cases of  random dynamical systems

https://en.wikipedia.org/wiki/Random_dynamical_system
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl
 
User avatar
Cuchulainn
Topic Author
Posts: 65000
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Asymptotic behaviour of ODE/PDE (large time)

June 17th, 2021, 10:51 am

The more I investigate Gradient Descent (GD) the more I believe ML have opened a can of worms (in their naivety). Invoking Sapir-Whorf, life begins with linear algebra and iterative, i.e. they are looking at the solution. Because algebra is low-hanging fruit.

Let's skip the zillion fixes and patches for GD. Enough blogs already. One example however is

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. ... Momentum can accelerate training and learning rate schedules can help to converge the optimization process.

There is much ado with this and a whole cottage industry has grown around it, e.g. grid search ugh.

GD is really a FD scheme for dissipative gradient ODE (Lagrange, Poincare); the learning rate is the step size in the ODE solver. Equality and inequality constraints are easy (try with GD..)
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl