Serving the Quantitative Finance Community

 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 6th, 2019, 7:34 pm

It's got a precise mathematical definition (also in symbols, see Wiki):

The dynamical system concept is a mathematical formalization for any fixed "rule" which describes the time dependence of a point's position in its ambient space. The concept unifies very different types of such "rules" in mathematics: the different choices made for how time is measured and the special properties of the ambient space may give an idea of the vastness of the class of objects described by this concept. Time can be measured by integers, by real or complex numbers or can be a more general algebraic object, losing the memory of its physical origin, and the ambient space may be simply a set, without the need of a smooth space-time structure defined on it. 

Example: ODEs.

Go back to Lagrange and Hamilton.
Last edited by Cuchulainn on December 6th, 2019, 8:25 pm, edited 2 times in total.
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 6th, 2019, 7:44 pm

"Deep Learning is running out steam" P. Morgan @24.08

www.youtube.com/watch?v=qZf4f8nHKKU

(it's a  catch-all)
 
User avatar
tags
Posts: 3162
Joined: February 21st, 2010, 12:58 pm

Re: If you are bored with Deep Networks

December 6th, 2019, 8:36 pm

"Deep Learning is running out steam" P. Morgan @24.08

www.youtube.com/watch?v=qZf4f8nHKKU

(it's a  catch-all)
decreasing marginal steam?
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 7th, 2019, 11:41 am

"Deep Learning is running out steam" P. Morgan @24.08

www.youtube.com/watch?v=qZf4f8nHKKU

(it's a  catch-all)
decreasing marginal steam?
With a slight touch of hubris.
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 7th, 2019, 12:13 pm

In what sense is GD more ad-hoc than say Newton-Raphson?
Is this a random question? a Venn diagram might be be useful.

The original question (as always) is SGD versus ODEs to solve the same kind of problem.
 
User avatar
katastrofa
Posts: 7440
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: If you are bored with Deep Networks

December 7th, 2019, 8:27 pm

It's a question about what you wrote: "The thesis is that dynamical systems are more robust (or are at least worth investigating) than the somewhat slightly  flaky, ad-hoc gradient descent method and its myriad variants."

What do you mean by replacing SGD with ODE? ODE is something one needs to integrate and this is done analogously to GD methods. What exactly do you suggest? I can suggest nothing - the idea doesn't hold, imho.
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 7th, 2019, 8:42 pm

Ok, let's agree on 

0. Nrewton Raphson is irrelevant 
1. (S)GD is an optimisation method, really, yes?
2. Have a look at the many links on using ODE to minimise stuff (have been known since the time of Poincare) One last time-> (Alan and Farid)

https://forum.wilmott.com/viewtopic.php?f=34&t=101662

Anyhoo, if all else fail, don't forget the documentation. :lol:
 
User avatar
katastrofa
Posts: 7440
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: If you are bored with Deep Networks

December 7th, 2019, 8:52 pm

Yes, GD is an optimisation method (just like the irrelevant Newton, but without the second derivative - ergo speed vs accuracy).

Point 2 is unclear.
Last edited by katastrofa on January 3rd, 2020, 10:20 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 7th, 2019, 11:06 pm

OK, here's a good start (AFAIK JC Platt is at Google). And FYI I've tested these methods. 

https://papers.nips.cc/paper/4-constrai ... zation.pdf

If this is unclear, I'll give up. It's a very clear paper. And if clear, buy me a Guinness.
 
User avatar
katastrofa
Posts: 7440
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: If you are bored with Deep Networks

December 8th, 2019, 7:09 am

OK, thanks. Strangely, when I read your post before, it ended at "One last time->".
Can you describe shortly in your own words what's your thesis?
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 14th, 2019, 10:55 am

OK, thanks. Strangely, when I read your post before, it ended at "One last time->".
Can you describe shortly in your own words what's your thesis?
No, I have already explained, say something once why say it again?. But see bespoke Platt and Barr 1988.

"Lagrange multipliers/penalty meet ODE and energy basins."
 
User avatar
katastrofa
Posts: 7440
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: If you are bored with Deep Networks

December 15th, 2019, 3:04 am

OK, Love, but where do you see the constrains in the loss function optimisation?
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 15th, 2019, 1:02 pm

OK, Love, but where do you see the constrains in the loss function optimisation?
Good question. you can put equality (Lagrange multiplier/KKT) and inequality (penalty method) constraints into the ODE (Platt Barr eq. (27)!!##&^@22). The numerical results are looking very good to date (including the well-known optimisation benchmark cases such as Griewangk). Solve the ODE numerically, in a nutshell.

Next..
For ML, we are probably talking initially about some kind of Tikhonov regularisation which at first glance looks even simpler, although Ridge regression fits into a dynamical system formulation better than LASSO regression. But then we could solve ODEs with discontinuous RHS (Filippov).
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 16th, 2019, 5:04 pm

Here is POC code for the class of linear cost functions (it use Boost odeint) that competes with CGM, normal equations.
In this case the problem is unconstrained.
Once the gradient is known it becomes easy.

kats, I'll post a simple test code to show how it works if you like.
// LinearSystemGradientDescent.hpp
//
// Solving dx/dt = -(Ax - b)
// 
// (C) Datasim Education BV 2020

#ifndef Gradient_System
#define Gradient_System

#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/matrix.hpp>
namespace ublas = boost::numeric::ublas;
#include <vector>
#include <boost/numeric/odeint.hpp>

using value_type = double;
using vector_type = ublas::vector<value_type>;
using state_type = ublas::vector<value_type>;

struct LinearSystemGradientDescentOde
{ // Solve dx/dt = -(Ax-b)

 ublas::matrix<double> A;
 ublas::vector<double> b;

 LinearSystemGradientDescentOde(const ublas::matrix<double>& matrixA, 
 const ublas::vector<double>& vectorB) : A(matrixA), b(vectorB)
 {
 
 }

 void operator()(const vector_type& x, vector_type& dxdt, value_type  t)
 { // ODE based on minimisation problem

 double tmp;
 for (std::size_t i = 0; i < A.size1(); ++i)
 {
 tmp = 0.0;
 for (std::size_t j = 0; j < A.size2(); ++j)
 {
 tmp += A(i, j)*x[j];
 }
 dxdt[i] = tmp - b[i];
 
 // Remember we work on transformed interval (0,1)
 value_type tmp2 = 1.0 - t;
 dxdt[i] = -dxdt[i] / (tmp2*tmp2);
 }
 }
};


#endif
 
User avatar
Cuchulainn
Posts: 20252
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: If you are bored with Deep Networks

December 17th, 2019, 1:19 pm

Test code 
// Interval strategy on (O,inf):
//
//	1. Truncation to large finite T
//  2. Transform (0,inf) to (0,1)
//
// 2. tends to give more accurate rounded results.
//
// (C) Datasim Education BV 2020
//


#include "LinearSystemGradientDescent.hpp"
#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/io.hpp>
namespace ublas = boost::numeric::ublas;

#include <boost/numeric/odeint.hpp>
namespace Bode = boost::numeric::odeint;

int main()
{
	
	// 2X2 matrices
	//
	// A1 == symmetric and positive definite (pd)
	// A2 == symmetric and NOT positive definite (pd)
	// A3 == NOT symmetric and positive definite(pd)
	// A4 == NOT symmetric and NOT positive definite(pd)
	std::size_t r = 2; std::size_t c = 2;

	ublas::matrix<double> A1(r, c);
	A1(0, 0) = 2; A1(0, 1) = 1; A1(1, 0) = 1; A1(1, 1) = 2;
	
		
	ublas::vector<double> b1(r); 
	b1[0] = 4;  b1[1] = 5;
	

	
	// ODE Solver, x = (1 2) is solution in all cases
	ublas::vector<double> x(r); x[0] = x[1] = 0.0;
	ublas::vector<double> x2(r);  x2[0] = x2[1] = 0.0;

	// Integrate on [L,T]
	// EXX. Try T = 0.1 0.25, 0.5, 0.75, 0.95. 0.9999 etc.
	double L = 0.0; double T = 0.99584959825795;
	double dt = 1.0e-5;


	LinearSystemGradientDescentOde ode(A1, b1);

	// Cash Karp (Ford Cortina)
	std::size_t steps = Bode::integrate(ode, x, L, T, dt);

	std::cout << "Number of steps " << steps << '\n';
	std::cout << "Solution " << x << '\n';

	// BS upmarket model
	Bode::bulirsch_stoer<state_type, value_type> bsStepper;
	
	std::size_t steps3 = Bode::integrate_adaptive(bsStepper, ode, x2, L, T, dt);

	std::cout << "Number of steps, Bulirsch-Stoer " << steps3 << '\n';
	std::cout << "Solution II " << x2 << '\n';
	

	return 0;
}