SERVING THE QUANTITATIVE FINANCE COMMUNITY

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

As I said, all this stuff is Euler method (duh).

Here is the first step to solve an ODE (is AD the way to do it here??)

https://arxiv.org/pdf/1806.07366.pdf

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

It's a mystery.

Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.
Well, I thought it was a joke. And quite a good one. Isn’t much of numerical analysis and optimization like that? Rather arbitrary. One of the many reasons I don’t like all these subjects.
It only looks that way. You can prove convergence of FEM schemes in weighted Sobolev spaces but 1) it takes longer 2) someone has to pay..

AI numerical methods for AI are somewhat outdated to a lesser or greater extent.

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

Part of the documentation of the routine LBFGS by J. Nocedal (one of the giants in the field of nonlinear numerical optimization):

GTOL is a DOUBLE PRECISION variable with default value 0.9, which
C        controls the accuracy of the line search routine MCSRCH. If the
C        function and gradient evaluations are inexpensive with respect
C        to the cost of the iteration (which is sometimes the case when
C        solving very large problems) it may be advantageous to set GTOL
C        to a small value. A typical small value is 0.1.  Restriction:
C        GTOL should be greater than 1.D-04.
Why 0.9? Why 0.1? Why greater than 1e-4? You can shoot off the same questions at LBFGS as the ones Cuch throws at SGD. SGD doesn't have an automated way of setting the learning rate, so it's "dumb". Methods like LBFGS contain an algorithm to automatically set the learning rate, but this algorithms in 99 cases out of 100 contain some hyper-parameters which you either adjust to the every problem or set to some "typical" value. If you're lucky, there's a theorem which tells you the bounds in which you have to fit. But because this is hidden somewhere in the bowels of ancient Fortran library, people naively think it "just works". Just like SGD, it works until it doesn't. There's no magical way around the problem that if you're optimising a function based on point estimates, you have a learning problem and the no-free lunch theorem comes down on you like a ton of bricks.
If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.

ISayMoo
Topic Author
Posts: 1718
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.
I know this book, I implemented some algorithms from it. Can you tell me what general purpose non-linear optimisation methods described in it are fully automatic and have no user-adjustable parameters?

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.
I know this book, I implemented some algorithms from it. Can you tell me what general purpose non-linear optimisation methods described in it are fully automatic and have no user-adjustable parameters?
I would say the backtracking discrete methods (e.g. page 37 etc.). The continuous analogues (ODE solvers) have all this built-on so as 'user' you just give a tolerance and the solver does the rest instead of having to choose from a palette of learning rates. Backtracking is well-established in numerical analysis.

But if by 'user' you mean something else that's another discussion.
Last edited by Cuchulainn on December 17th, 2018, 3:18 pm, edited 1 time in total.

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

“"Neural networks" are a sad misnomer. They're neither neural nor even networks. They're chains of differentiable, parameterized geometric functions,  trained with gradient descent (with gradients obtained via the chain rule). A small set of highschool-level ideas put together.” — François Chollet

https://en.wikipedia.org/wiki/Keras

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

Just like SGD, it works until it doesn't.

Sounds defeatist.
In mathematics, conditions are stated under which an algorithm works for a given class of problems. It will not works when the problem does not satisfy the assumption. For example,

$x_{k+1} = f(x_{k}), k = 0,1,2, ..$ only converges when $f$ is a contraction.

I see it as: for what class of problems does SGD work? I don't think this has been done, at least not in the spate of articles to date. For example, SGD only finds local minimum. And then SGD for constrained optimisation is a Pandora's box, yes? It gets very fuzzy-fuzzy?

ODE solvers take into account the problem they are trying to solve and adjust parameters accordingly.

ISayMoo
Topic Author
Posts: 1718
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.
I know this book, I implemented some algorithms from it. Can you tell me what general purpose non-linear optimisation methods described in it are fully automatic and have no user-adjustable parameters?
I would say the backtracking discrete methods (e.g. page 37 etc.). The continuous analogues (ODE solvers) have all this built-on so as 'user' you just give a tolerance and the solver does the rest instead of having to choose from a palette of learning rates. Backtracking is well-established in numerical analysis.
Hmm. In my edition (2nd, Springer, 2006) page 37 has Algorithm 3.1 which has 2 hyperparameters (c and rho). What is the theory for choosing the best values of c and rho?

ISayMoo
Topic Author
Posts: 1718
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

*crickets*

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

Politicians fume after Amazon's face-recog AI fingers dozens of them as suspected crooks

https://www.theregister.co.uk/2018/07/2 ... ion_sucks/

ISM: how do you think the crooks feel about it.

It's not there yet.

ISayMoo
Topic Author
Posts: 1718
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

Yes, still a way to go

“Then we’ll keep it alive as long as we live,” added Legolas. “And we won’t forget the first great battle of the night, even if we may have forgotten the final defeat.”

“I agree,” Gandalf said, “but we will all remember it as the last battle in Middle-earth, and the first great battle of the new day.”

Aragorn drew his sword, and the Battle of Fangorn was won. As they marched out through the thicket the morning mist cleared, and the day turned to dusk.
But at least we have now an inexhaustible supply of really bad Tolkien fan fiction.

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

It won't be long before you can do a PhD in a liberal arts college on

"Finnegans Wake:  a Hidden Markov Model and Bayesian Network Approach"

riverrun, past Eve and Adam's, from swerve of shore to bendof bay, brings us by a commodius vicus of recirculation back toHowth Castle and Environs.    Sir Tristramvioler d'amores, fr'over the short sea, had passen-core rearrived from North Armorica on this side the scraggyisthmus of Europe Minor to wielderfight his penisolate war: norhad topsawyer's rocks by the stream Oconee exaggerated themselseto Laurens County's gorgios while they went doublin their mumperall the time: nor avoice from afire bellowsed mishe mishe totauftauf thuartpeatrick: not yet, though venissoon after, had akidscad buttended a bland old isaac: not yet, though all's fair invanessy, were sosie sesthers wroth with twone nathandjoeRot apeck of pa's malt had Jhem or Shen brewed by arclight and roryend to the regginbrow was to be seen ringsome on the aquaface.    The fall (bababadalgharaghtakamminarronnkonnbronntonner-ronntuonnthunntrovarrhounawnskawntoohoohoordenenthur-nuk!) of a once wallstrait oldparr is retaled early in bed and lateron life down through all christian minstrelsy

ISayMoo
Topic Author
Posts: 1718
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

I'm sure you could now.

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

In general, there are 3 layers between Joyce's effects and causes. Great PhD topic.

And domain annotations are essential to build the influence diagram.

BTW what's a good C++ open-source for HMMs and BNs?

Cuchulainn
Posts: 59013
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

Yes, still a way to go

“Then we’ll keep it alive as long as we live,” added Legolas. “And we won’t forget the first great battle of the night, even if we may have forgotten the final defeat.”

“I agree,” Gandalf said, “but we will all remember it as the last battle in Middle-earth, and the first great battle of the new day.”

Aragorn drew his sword, and the Battle of Fangorn was won. As they marched out through the thicket the morning mist cleared, and the day turned to dusk.
But at least we have now an inexhaustible supply of really bad Tolkien fan fiction.
LOR is 1-dimensional (all live happily ever after) while PW is a vicus cycle.