Sounds defeatist.

In mathematics, conditions are stated under which an algorithm works for a given class of problems. It will not works when the problem does not satisfy the assumption. For example,

[$]x_{k+1} = f(x_{k}), k = 0,1,2, ..[$] only converges when [$]f[$] is a contraction.

I see it as: for what class of problems does SGD work? I don't think this has been done, at least not in the spate of articles to date. For example, SGD only finds local minimum. And then SGD for constrained optimisation is a Pandora's box, yes? It gets very fuzzy-fuzzy?

ODE solvers take into account the problem they are trying to solve and adjust parameters accordingly.

Statistics: Posted by Cuchulainn — Yesterday, 11:30 pm

]]>

https://en.wikipedia.org/wiki/Keras

Statistics: Posted by Cuchulainn — Yesterday, 3:15 pm

]]>

I know this book, I implemented some algorithms from it. Can you tell me what general purpose non-linear optimisation methods described in it are fully automatic and have no user-adjustable parameters?If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.

But if by 'user' you mean something else that's another discussion.

Statistics: Posted by Cuchulainn — Yesterday, 3:14 pm

]]>

If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.

I know this book, I implemented some algorithms from it. Can you tell me what general purpose non-linear optimisation methods described in it are fully automatic and have no user-adjustable parameters?Statistics: Posted by ISayMoo — December 16th, 2018, 4:55 pm

]]>

Part of the documentation of the routine LBFGS by J. Nocedal (one of the giants in the field of nonlinear numerical optimization):

If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.Why 0.9? Why 0.1? Why greater than 1e-4? You can shoot off the same questions at LBFGS as the ones Cuch throws at SGD. SGD doesn't have an automated way of setting the learning rate, so it's "dumb". Methods like LBFGS contain an algorithm to automatically set the learning rate, but this algorithms in 99 cases out of 100 contain some hyper-parameters which you either adjust to the every problem or set to some "typical" value. If you're lucky, there's a theorem which tells you the bounds in which you have to fit. But because this is hidden somewhere in the bowels of ancient Fortran library, people naively think it "just works". Just like SGD, it works until it doesn't. There's no magical way around the problem that if you're optimising a function based on point estimates, you have a learning problem and the no-free lunch theorem comes down on you like a ton of bricks.GTOL is a DOUBLE PRECISION variable with default value 0.9, which

C controls the accuracy of the line search routine MCSRCH. If the

C function and gradient evaluations are inexpensive with respect

C to the cost of the iteration (which is sometimes the case when

C solving very large problems) it may be advantageous to set GTOL

C to a small value. A typical small value is 0.1. Restriction:

C GTOL should be greater than 1.D-04.

Statistics: Posted by Cuchulainn — December 15th, 2018, 7:27 pm

]]>

Well, I thought it was a joke. And quite a good one. Isn’t much of numerical analysis and optimization like that? Rather arbitrary. One of the many reasons I don’t like all these subjects.It's a mystery.

Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.

AI numerical methods for AI are somewhat outdated to a lesser or greater extent.

Statistics: Posted by Cuchulainn — December 15th, 2018, 7:21 pm

]]>

Here is the first step to solve an ODE (is AD the way to do it here??)

https://arxiv.org/pdf/1806.07366.pdf

Some weekend reading.

Statistics: Posted by Cuchulainn — December 15th, 2018, 7:14 pm

]]>

I tried all of it and your suggestion worked well. In the end I decided just to go with the non-parametric realized variance solution. It works and does the job plus I avoid a lot of issues. When possible, the non-parametric solutions are my favourites.

Thanks again.

Statistics: Posted by volatilityMan — December 14th, 2018, 9:26 pm

]]>

]]>

]]>

Well, I thought it was a joke. And quite a good one. Isn’t much of numerical analysis and optimization like that? Rather arbitrary. One of the many reasons I don’t like all these subjects.It's a mystery.

Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.

Statistics: Posted by katastrofa — December 11th, 2018, 1:42 pm

]]>

Why 0.9? Why 0.1? Why greater than 1e-4? You can shoot off the same questions at LBFGS as the ones Cuch throws at SGD. SGD doesn't have an automated way of setting the learning rate, so it's "dumb". Methods like LBFGS contain an algorithm to automatically set the learning rate, but this algorithms in 99 cases out of 100 contain some hyper-parameters which you either adjust to the every problem or set to some "typical" value. If you're lucky, there's a theorem which tells you the bounds in which you have to fit. But because this is hidden somewhere in the bowels of ancient Fortran library, people naively think it "just works". Just like SGD, it works until it doesn't. There's no magical way around the problem that if you're optimising a function based on point estimates, you have a learning problem and the no-free lunch theorem comes down on you like a ton of bricks.GTOL is a DOUBLE PRECISION variable with default value 0.9, which

C controls the accuracy of the line search routine MCSRCH. If the

C function and gradient evaluations are inexpensive with respect

C to the cost of the iteration (which is sometimes the case when

C solving very large problems) it may be advantageous to set GTOL

C to a small value. A typical small value is 0.1. Restriction:

C GTOL should be greater than 1.D-04.

Statistics: Posted by ISayMoo — December 11th, 2018, 11:09 am

]]>

It's a mystery.

Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.

Well, I thought it was a joke. And quite a good one. Isn’t much of numerical analysis and optimization like that? Rather arbitrary. One of the many reasons I don’t like all these subjects.Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.

Statistics: Posted by Paul — December 11th, 2018, 5:34 am

]]>

]]>

Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.

Statistics: Posted by katastrofa — December 10th, 2018, 10:41 pm

]]>