SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
User avatar
ISayMoo
Topic Author
Posts: 1119
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

December 6th, 2018, 8:56 pm

I've been told, by several renowned experts in the field, that the best learning rate to use is 2E-4. Some people say it's 3E-4, but they're a minority.
 
User avatar
Cuchulainn
Posts: 57307
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

December 6th, 2018, 9:18 pm

I've been told, by several renowned experts in the field, that the best learning rate to use is 2E-4. Some people say it's 3E-4, but they're a minority.
BTW to whom or what is this a response?
Nah, it's just an arbitrarily chosen number (assuming it is a number..). Go back to the maths  and you see it's a step length (no fancy name needed).

And of course you believe them. I don't, but I don't have a superior. I have to discover all these things on my own.
A serious discussion with you will be impossible. Are you allowed to say more or does it stop at 2e-4?? Anyway, I must have touched a chord, but which one?

I only need to give one counterexample to prove it is wrong. And that is easy.
But I digress.

Where have all the AI experts gone?
 
User avatar
Cuchulainn
Posts: 57307
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

December 6th, 2018, 9:53 pm

ISayMoo
I've been told, by several renowned experts in the field, that the best learning rate to use is 2E-4. Some people say it's 3E-4, but they're a minority.
Image
 
User avatar
ISayMoo
Topic Author
Posts: 1119
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

December 6th, 2018, 11:06 pm

Relax, I was joking. Some men are so easy to rile up.
 
User avatar
Cuchulainn
Posts: 57307
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

December 7th, 2018, 11:38 am

Relax, I was joking. Some men are so easy to rile up.
So, I will just ignore you? 
This was your thread which you have sucked all life out of.
You aren't even humorous, although you do seem to be trying.
 
User avatar
ISayMoo
Topic Author
Posts: 1119
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

December 10th, 2018, 11:24 am

Hmm, I wouldn't say that it was me who destroyed this thread. You seem to have a thing against deep learning (it's a fashionable thing to rile against these days, sure), you criticise it for basically not being a mutation of FEM, pretend to be on the verge of gifting us with an insightful criticism, but then seem to have trouble with working through any basic problem.

Deliver or get off the pot, as they say ;-)
 
User avatar
katastrofa
Posts: 6567
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: If you are bored with Deep Networks

December 10th, 2018, 3:38 pm

I've been told, by several renowned experts in the field, that the best learning rate to use is 2E-4. Some people say it's 3E-4, but they're a minority.
Why 2E-4? :-)
 
User avatar
ISayMoo
Topic Author
Posts: 1119
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

December 10th, 2018, 3:47 pm

It's a mystery.

Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.
 
User avatar
Cuchulainn
Posts: 57307
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

December 10th, 2018, 4:20 pm

It's a mystery.

Just to be clear: I was joking. 
I thought it was more bluff. But if you say you are joking,let's leave it at that. Being a Sararīman has its advantages (I suppose) in that you can experiment on someone else's time.
 
User avatar
katastrofa
Posts: 6567
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: If you are bored with Deep Networks

December 10th, 2018, 10:41 pm

It's a mystery.

Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.
I'm used to this sort of heuristics after years of theoretical research in applied solid state physics. Crystals worked, and my job was to find out why - but it was secondary. Maybe ML model need just quality testing before they leave the production line - instead of explanations? At last, all models are wrong, but some are useful :-) (This is how most applied physics "works" - e.g. do you think the Bloch band structure models have anything to do with reality? There are often completely different physical mechanisms behind those bands.)
 
User avatar
ISayMoo
Topic Author
Posts: 1119
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

December 10th, 2018, 11:34 pm

Working on it!
 
User avatar
Paul
Posts: 8731
Joined: July 20th, 2001, 3:28 pm

Re: If you are bored with Deep Networks

December 11th, 2018, 5:34 am

It's a mystery.

Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.
Well, I thought it was a joke. And quite a good one. Isn’t much of numerical analysis and optimization like that? Rather arbitrary. One of the many reasons I don’t like all these subjects.
 
User avatar
ISayMoo
Topic Author
Posts: 1119
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

December 11th, 2018, 11:09 am

Part of the documentation of the routine LBFGS by J. Nocedal (one of the giants in the field of nonlinear numerical optimization):

GTOL is a DOUBLE PRECISION variable with default value 0.9, which
C        controls the accuracy of the line search routine MCSRCH. If the
C        function and gradient evaluations are inexpensive with respect
C        to the cost of the iteration (which is sometimes the case when
C        solving very large problems) it may be advantageous to set GTOL
C        to a small value. A typical small value is 0.1.  Restriction:
C        GTOL should be greater than 1.D-04.
Why 0.9? Why 0.1? Why greater than 1e-4? You can shoot off the same questions at LBFGS as the ones Cuch throws at SGD. SGD doesn't have an automated way of setting the learning rate, so it's "dumb". Methods like LBFGS contain an algorithm to automatically set the learning rate, but this algorithms in 99 cases out of 100 contain some hyper-parameters which you either adjust to the every problem or set to some "typical" value. If you're lucky, there's a theorem which tells you the bounds in which you have to fit. But because this is hidden somewhere in the bowels of ancient Fortran library, people naively think it "just works". Just like SGD, it works until it doesn't. There's no magical way around the problem that if you're optimising a function based on point estimates, you have a learning problem and the no-free lunch theorem comes down on you like a ton of bricks.
 
User avatar
katastrofa
Posts: 6567
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: If you are bored with Deep Networks

December 11th, 2018, 1:42 pm

It's a mystery.

Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.
Well, I thought it was a joke. And quite a good one. Isn’t much of numerical analysis and optimization like that? Rather arbitrary. One of the many reasons I don’t like all these subjects.
That's why I've always considered ML and the whole computer science engineering rather than science. It's not good though if they don't adhere to the basic rules of statistics in their experiments (vide all of them testing their image recognition methods on the same ImageNet collection - it's nicely called by statisticians data butchery).
 
User avatar
ISayMoo
Topic Author
Posts: 1119
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

December 11th, 2018, 2:36 pm

Working on it!
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On