SERVING THE QUANTITATIVE FINANCE COMMUNITY

Cuchulainn
Posts: 58741
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

AFAIK the standard practice is to evaluate the full gradient and use it, but recently there was a number of papers discussing Block Coordinate Descent algorithms. But they all take into account the structure of the network. Yours doesn't.
This family of algorithms could be interesting indeed. They are many years old and well-known in the numerical analysis community (see e.g. Ortega and Rheinboldt etc.) Basically, an n-d problem can be solved as a sequence of simpler 1-d (or blocked) problems (ADI, ADE, Gauss-Seidel etc.). So it has a great future behind it.
It is easy to program and can be parallelised  (the article by Stephen Wright is nice).  The idea of computing the full gradient is becoming less palatable (even the current simple code attests to this) than computing simpler partial derivatives which may even have analytical solutions.

IMO it could be useful. I suppose anything is better than pesky learning rates?

ISayMoo
Topic Author
Posts: 1562
Joined: September 30th, 2015, 8:30 pm

### Re: If you are bored with Deep Networks

There is no escape from pesky learning rates

Cuchulainn
Posts: 58741
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

1. If (as I predict) you scoff at 1., at least use automatic differentiation in C++, those gradients are easy to get wrongAD resolves manual labour but is it a wonder drug?Recently (1998, 2012) the complex and multicomplex step methods look like a good alternative to compute gradients, Jacobians and Hessians (for these, bicomplex is sufficient). The underlying maths dates from Cayley, Dickson, Hamilton et al. No round-off errors but you have to use complex numbers.

Bicomplex == std::complex<std::complex<T>>

ISayMoo
Topic Author
Posts: 1562
Joined: September 30th, 2015, 8:30 pm

### Re: If you are bored with Deep Networks

You need to specify an epsilon though, so it's just an approximation.

Cuchulainn
Posts: 58741
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

You need to specify an epsilon though, so it's just an approximation.
$h = 1.0e^{-100}$ Or whatever you are having yourself. And no round-off errors in contrast to the usual divided differences (no subtraction).

Actually, "step length" is the more appropriate term in this context.

ISayMoo
Topic Author
Posts: 1562
Joined: September 30th, 2015, 8:30 pm

### Re: If you are bored with Deep Networks

How does it work when using 16-bit floats?

Cuchulainn
Posts: 58741
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

How does it work when using 16-bit floats?
How does 'what' work? OK because no subtraction.
Of course, h < std::numeric_limits<float>::epsilon()
Who use floats these days? are you suggesting mixed-mode? i.e. parts where accuracy is not critical?

Cuchulainn
Posts: 58741
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

How does it work when using 16-bit floats?
on $e^x$ for $x=5$ (floats) fdm classic breakdown.
* Computing $de^x/dx$
First deriv: h, FDM classic, Complex : 0.10000000149, 156.087341309, 148.165924072,
First deriv: h, FDM classic, Complex : 0.0500000007451, 152.186584473, 148.351318359,
First deriv: h, FDM classic, Complex : 0.0250000003725, 150.284423828, 148.397705078,
First deriv: h, FDM classic, Complex : 0.0125000001863, 149.342041016, 148.409286499,
First deriv: h, FDM classic, Complex : 0.00625000009313, 148.874511719, 148.412200928,
First deriv: h, FDM classic, Complex : 0.00312500004657, 148.65234375, 148.412918091,
First deriv: h, FDM classic, Complex : 0.00156250002328, 148.53515625, 148.413101196,
First deriv: h, FDM classic, Complex : 0.000781250011642, 148.4375, 148.413146973,
First deriv: h, FDM classic, Complex : 0.000390625005821, 148.3984375, 148.413162231,
First deriv: h, FDM classic, Complex : 0.00019531250291, 148.59375, 148.413162231,
First deriv: h, FDM classic, Complex : 9.76562514552e-05, 148.59375, 148.413162231,
First deriv: h, FDM classic, Complex : 4.88281257276e-05, 147.8125, 148.413162231,
First deriv: h, FDM classic, Complex : 2.44140628638e-05, 147.5, 148.413162231,
First deriv: h, FDM classic, Complex : 1.22070314319e-05, 150, 148.413162231,
First deriv: h, FDM classic, Complex : 6.10351571595e-06, 150, 148.413162231,
First deriv: h, FDM classic, Complex : 3.05175785797e-06, 140, 148.413162231,
First deriv: h, FDM classic, Complex : 1.52587892899e-06, 140, 148.413162231,
First deriv: h, FDM classic, Complex : 7.62939464494e-07, 180, 148.413162231,
First deriv: h, FDM classic, Complex : 3.81469732247e-07, 160, 148.413162231,
First deriv: h, FDM classic, Complex : 1.90734866123e-07, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 9.53674330617e-08, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 4.76837165309e-08, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.38418582654e-08, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.19209291327e-08, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 5.96046456636e-09, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.98023228318e-09, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.49011614159e-09, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 7.45058070795e-10, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 3.72529035397e-10, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.86264517699e-10, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 9.31322588493e-11, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 4.65661294247e-11, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.32830647123e-11, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.16415323562e-11, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 5.82076617808e-12, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.91038308904e-12, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.45519154452e-12, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 7.2759577226e-13, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 3.6379788613e-13, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.81898943065e-13, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 9.09494715325e-14, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 4.54747357663e-14, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.27373678831e-14, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.13686839416e-14, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 5.68434197078e-15, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.84217098539e-15, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.4210854927e-15, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 7.10542746348e-16, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 3.55271373174e-16, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.77635686587e-16, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 8.88178432935e-17, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 4.44089216468e-17, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.22044608234e-17, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.11022304117e-17, 0, 148.413162231,
Last edited by Cuchulainn on July 4th, 2018, 4:57 pm, edited 1 time in total.

Cuchulainn
Posts: 58741
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

doubles

First deriv: h, FDM classic, Complex : 2.27847563111e-306, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 1.13923781556e-306, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 5.69618907778e-307, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 2.84809453889e-307, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 1.42404726944e-307, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 7.12023634722e-308, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 3.56011817361e-308, 0, 148.413159103,

ISayMoo
Topic Author
Posts: 1562
Joined: September 30th, 2015, 8:30 pm

### Re: If you are bored with Deep Networks

How does it work when using 16-bit floats?
How does 'what' work? OK because no subtraction.
Of course, h < std::numeric_limits<float>::epsilon()
Who use floats these days? are you suggesting mixed-mode? i.e. parts where accuracy is not critical?
TPUs.

Cuchulainn
Posts: 58741
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

How does it work when using 16-bit floats?
How does 'what' work? OK because no subtraction.
Of course, h < std::numeric_limits<float>::epsilon()
Who use floats these days? are you suggesting mixed-mode? i.e. parts where accuracy is not critical?
TPUs.
Sounds reasonable: accuracy to 10 decimal positions not needed.
BTW what is a minifloat?

ISayMoo
Topic Author
Posts: 1562
Joined: September 30th, 2015, 8:30 pm

### Re: If you are bored with Deep Networks

https://www.revolvy.com/main/index.php?s=Minifloat

Not everyone in the AI community is happy about it, but the TPU designers face many constraints.

Cuchulainn
Posts: 58741
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

https://www.revolvy.com/main/index.php?s=Minifloat

Not everyone in the AI community is happy about it, but the TPU designers face many constraints.
Outside the current comfort zone?

ISayMoo
Topic Author
Posts: 1562
Joined: September 30th, 2015, 8:30 pm

### Re: If you are bored with Deep Networks

?

Cuchulainn
Posts: 58741
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

There is no escape from pesky learning rates
In the current test code the algorithm is a special case of the Euler method for a (stiff?) discrete dynamic system and the learning rate is the step-length. Realising this fact explains much of the hullabaloo about time-dependent  learning rates.

It's going to remain extremely ad-hoc ....

//
My take ..
Now if you add more layers and take smaller steps you get an ODE to model the NN
$dx/dt = - \nabla E(x)$ (1)

where $x$ is the vector of network weights, $E(x)$ is the loss/error/objective function and $\nabla E(x)$ is the gradient.

You can now call on the results by Poincaré et al to prove stuff and compute a solution of ODE system (1) (mostly stiff ODE and use adaptive ODE solver)..

Much better approach IMO. All of this should be known in AI circles. It uses pure mathematics/physics and numerical analysis.

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...

 JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...

GZIP: On