SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
User avatar
Cuchulainn
Posts: 57041
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

June 29th, 2018, 4:26 pm

AFAIK the standard practice is to evaluate the full gradient and use it, but recently there was a number of papers discussing Block Coordinate Descent algorithms. But they all take into account the structure of the network. Yours doesn't.
This family of algorithms could be interesting indeed. They are many years old and well-known in the numerical analysis community (see e.g. Ortega and Rheinboldt etc.) Basically, an n-d problem can be solved as a sequence of simpler 1-d (or blocked) problems (ADI, ADE, Gauss-Seidel etc.). So it has a great future behind it.
It is easy to program and can be parallelised  (the article by Stephen Wright is nice).  The idea of computing the full gradient is becoming less palatable (even the current simple code attests to this) than computing simpler partial derivatives which may even have analytical solutions.

IMO it could be useful. I suppose anything is better than pesky learning rates?
 
User avatar
ISayMoo
Topic Author
Posts: 1082
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

June 29th, 2018, 9:33 pm

There is no escape from pesky learning rates :)
 
User avatar
Cuchulainn
Posts: 57041
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

July 4th, 2018, 6:50 am

  1. If (as I predict) you scoff at 1., at least use automatic differentiation in C++, those gradients are easy to get wrongAD resolves manual labour but is it a wonder drug?Recently (1998, 2012) the complex and multicomplex step methods look like a good alternative to compute gradients, Jacobians and Hessians (for these, bicomplex is sufficient). The underlying maths dates from Cayley, Dickson, Hamilton et al. No round-off errors but you have to use complex numbers.

Bicomplex == std::complex<std::complex<T>>
 
User avatar
ISayMoo
Topic Author
Posts: 1082
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

July 4th, 2018, 11:27 am

You need to specify an epsilon though, so it's just an approximation.
 
User avatar
Cuchulainn
Posts: 57041
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

July 4th, 2018, 1:05 pm

You need to specify an epsilon though, so it's just an approximation.
[$]h = 1.0e^{-100}[$] Or whatever you are having yourself. And no round-off errors in contrast to the usual divided differences (no subtraction).

Actually, "step length" is the more appropriate term in this context.
 
User avatar
ISayMoo
Topic Author
Posts: 1082
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

July 4th, 2018, 3:12 pm

How does it work when using 16-bit floats?
 
User avatar
Cuchulainn
Posts: 57041
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

July 4th, 2018, 3:26 pm

How does it work when using 16-bit floats?
How does 'what' work? OK because no subtraction.
Of course, h < std::numeric_limits<float>::epsilon()
Who use floats these days? are you suggesting mixed-mode? i.e. parts where accuracy is not critical?
 
User avatar
Cuchulainn
Posts: 57041
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

July 4th, 2018, 4:40 pm

How does it work when using 16-bit floats?
on [$]e^x[$] for [$]x=5[$] (floats) fdm classic breakdown.
* Computing [$]de^x/dx[$]
First deriv: h, FDM classic, Complex : 0.10000000149, 156.087341309, 148.165924072,
First deriv: h, FDM classic, Complex : 0.0500000007451, 152.186584473, 148.351318359,
First deriv: h, FDM classic, Complex : 0.0250000003725, 150.284423828, 148.397705078,
First deriv: h, FDM classic, Complex : 0.0125000001863, 149.342041016, 148.409286499,
First deriv: h, FDM classic, Complex : 0.00625000009313, 148.874511719, 148.412200928,
First deriv: h, FDM classic, Complex : 0.00312500004657, 148.65234375, 148.412918091,
First deriv: h, FDM classic, Complex : 0.00156250002328, 148.53515625, 148.413101196,
First deriv: h, FDM classic, Complex : 0.000781250011642, 148.4375, 148.413146973,
First deriv: h, FDM classic, Complex : 0.000390625005821, 148.3984375, 148.413162231,
First deriv: h, FDM classic, Complex : 0.00019531250291, 148.59375, 148.413162231,
First deriv: h, FDM classic, Complex : 9.76562514552e-05, 148.59375, 148.413162231,
First deriv: h, FDM classic, Complex : 4.88281257276e-05, 147.8125, 148.413162231,
First deriv: h, FDM classic, Complex : 2.44140628638e-05, 147.5, 148.413162231,
First deriv: h, FDM classic, Complex : 1.22070314319e-05, 150, 148.413162231,
First deriv: h, FDM classic, Complex : 6.10351571595e-06, 150, 148.413162231,
First deriv: h, FDM classic, Complex : 3.05175785797e-06, 140, 148.413162231,
First deriv: h, FDM classic, Complex : 1.52587892899e-06, 140, 148.413162231,
First deriv: h, FDM classic, Complex : 7.62939464494e-07, 180, 148.413162231,
First deriv: h, FDM classic, Complex : 3.81469732247e-07, 160, 148.413162231,
First deriv: h, FDM classic, Complex : 1.90734866123e-07, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 9.53674330617e-08, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 4.76837165309e-08, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.38418582654e-08, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.19209291327e-08, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 5.96046456636e-09, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.98023228318e-09, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.49011614159e-09, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 7.45058070795e-10, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 3.72529035397e-10, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.86264517699e-10, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 9.31322588493e-11, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 4.65661294247e-11, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.32830647123e-11, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.16415323562e-11, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 5.82076617808e-12, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.91038308904e-12, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.45519154452e-12, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 7.2759577226e-13, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 3.6379788613e-13, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.81898943065e-13, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 9.09494715325e-14, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 4.54747357663e-14, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.27373678831e-14, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.13686839416e-14, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 5.68434197078e-15, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.84217098539e-15, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.4210854927e-15, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 7.10542746348e-16, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 3.55271373174e-16, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.77635686587e-16, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 8.88178432935e-17, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 4.44089216468e-17, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 2.22044608234e-17, 0, 148.413162231,
First deriv: h, FDM classic, Complex : 1.11022304117e-17, 0, 148.413162231,
Last edited by Cuchulainn on July 4th, 2018, 4:57 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 57041
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

July 4th, 2018, 4:49 pm

doubles

First deriv: h, FDM classic, Complex : 2.27847563111e-306, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 1.13923781556e-306, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 5.69618907778e-307, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 2.84809453889e-307, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 1.42404726944e-307, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 7.12023634722e-308, 0, 148.413159103,
First deriv: h, FDM classic, Complex : 3.56011817361e-308, 0, 148.413159103, 
 
User avatar
ISayMoo
Topic Author
Posts: 1082
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

July 4th, 2018, 5:20 pm

How does it work when using 16-bit floats?
How does 'what' work? OK because no subtraction.
Of course, h < std::numeric_limits<float>::epsilon()
Who use floats these days? are you suggesting mixed-mode? i.e. parts where accuracy is not critical?
TPUs.
 
User avatar
Cuchulainn
Posts: 57041
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

July 5th, 2018, 10:05 am

How does it work when using 16-bit floats?
How does 'what' work? OK because no subtraction.
Of course, h < std::numeric_limits<float>::epsilon()
Who use floats these days? are you suggesting mixed-mode? i.e. parts where accuracy is not critical?
TPUs.
Sounds reasonable: accuracy to 10 decimal positions not needed.
BTW what is a minifloat?
 
User avatar
ISayMoo
Topic Author
Posts: 1082
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

July 5th, 2018, 5:33 pm

https://www.revolvy.com/main/index.php?s=Minifloat

Not everyone in the AI community is happy about it, but the TPU designers face many constraints.
 
User avatar
Cuchulainn
Posts: 57041
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

July 5th, 2018, 7:28 pm

https://www.revolvy.com/main/index.php?s=Minifloat

Not everyone in the AI community is happy about it, but the TPU designers face many constraints.
Outside the current comfort zone?
 
User avatar
ISayMoo
Topic Author
Posts: 1082
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

July 5th, 2018, 7:48 pm

?
 
User avatar
Cuchulainn
Posts: 57041
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

July 5th, 2018, 8:07 pm

There is no escape from pesky learning rates :)
In the current test code the algorithm is a special case of the Euler method for a (stiff?) discrete dynamic system and the learning rate is the step-length. Realising this fact explains much of the hullabaloo about time-dependent  learning rates.

It's going to remain extremely ad-hoc ....

//
My take ..
Now if you add more layers and take smaller steps you get an ODE to model the NN 
[$]dx/dt = - \nabla E(x)[$] (1)

where [$]x[$] is the vector of network weights, [$]E(x)[$] is the loss/error/objective function and [$]\nabla E(x)[$] is the gradient.

You can now call on the results by Poincaré et al to prove stuff and compute a solution of ODE system (1) (mostly stiff ODE and use adaptive ODE solver)..

The gradients are solved by an adjoint  backward ODE solver in conjunction with an AD solver to compute the gradient.

Much better approach IMO. All of this should be known in AI circles. It uses pure mathematics/physics and numerical analysis.
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On