As I said, all this stuff is Euler method (duh).
Here is the first step to solve an ODE (is AD the way to do it here??)
https://arxiv.org/pdf/1806.07366.pdf
Some weekend reading.
It only looks that way. You can prove convergence of FEM schemes in weighted Sobolev spaces but 1) it takes longer 2) someone has to pay..Well, I thought it was a joke. And quite a good one. Isn’t much of numerical analysis and optimization like that? Rather arbitrary. One of the many reasons I don’t like all these subjects.It's a mystery.
Just to be clear: I was joking. The point is that people choose a heuristic learning rate and some value ranges "usually work". But everyone knows that this is not how things should be done, and the theory of optimisation in statistical learning is an active research field.
If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.Part of the documentation of the routine LBFGS by J. Nocedal (one of the giants in the field of nonlinear numerical optimization):
Why 0.9? Why 0.1? Why greater than 1e-4? You can shoot off the same questions at LBFGS as the ones Cuch throws at SGD. SGD doesn't have an automated way of setting the learning rate, so it's "dumb". Methods like LBFGS contain an algorithm to automatically set the learning rate, but this algorithms in 99 cases out of 100 contain some hyper-parameters which you either adjust to the every problem or set to some "typical" value. If you're lucky, there's a theorem which tells you the bounds in which you have to fit. But because this is hidden somewhere in the bowels of ancient Fortran library, people naively think it "just works". Just like SGD, it works until it doesn't. There's no magical way around the problem that if you're optimising a function based on point estimates, you have a learning problem and the no-free lunch theorem comes down on you like a ton of bricks.GTOL is a DOUBLE PRECISION variable with default value 0.9, which
C controls the accuracy of the line search routine MCSRCH. If the
C function and gradient evaluations are inexpensive with respect
C to the cost of the iteration (which is sometimes the case when
C solving very large problems) it may be advantageous to set GTOL
C to a small value. A typical small value is 0.1. Restriction:
C GTOL should be greater than 1.D-04.
I know this book, I implemented some algorithms from it. Can you tell me what general purpose non-linear optimisation methods described in it are fully automatic and have no user-adjustable parameters?If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.
I would say the backtracking discrete methods (e.g. page 37 etc.). The continuous analogues (ODE solvers) have all this built-on so as 'user' you just give a tolerance and the solver does the rest instead of having to choose from a palette of learning rates. Backtracking is well-established in numerical analysis.I know this book, I implemented some algorithms from it. Can you tell me what general purpose non-linear optimisation methods described in it are fully automatic and have no user-adjustable parameters?If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.
Hmm. In my edition (2nd, Springer, 2006) page 37 has Algorithm 3.1 which has 2 hyperparameters (c and rho). What is the theory for choosing the best values of c and rho?I would say the backtracking discrete methods (e.g. page 37 etc.). The continuous analogues (ODE solvers) have all this built-on so as 'user' you just give a tolerance and the solver does the rest instead of having to choose from a palette of learning rates. Backtracking is well-established in numerical analysis.I know this book, I implemented some algorithms from it. Can you tell me what general purpose non-linear optimisation methods described in it are fully automatic and have no user-adjustable parameters?If you study the book by Nocedal and Wright (BTW I have) you will see that there are more nuanced approaches as well.
But at least we have now an inexhaustible supply of really bad Tolkien fan fiction.“Then we’ll keep it alive as long as we live,” added Legolas. “And we won’t forget the first great battle of the night, even if we may have forgotten the final defeat.”
“I agree,” Gandalf said, “but we will all remember it as the last battle in Middle-earth, and the first great battle of the new day.”
Aragorn drew his sword, and the Battle of Fangorn was won. As they marched out through the thicket the morning mist cleared, and the day turned to dusk.
LOR is 1-dimensional (all live happily ever after) while PW is a vicus cycle.Yes, still a way to go
But at least we have now an inexhaustible supply of really bad Tolkien fan fiction.“Then we’ll keep it alive as long as we live,” added Legolas. “And we won’t forget the first great battle of the night, even if we may have forgotten the final defeat.”
“I agree,” Gandalf said, “but we will all remember it as the last battle in Middle-earth, and the first great battle of the new day.”
Aragorn drew his sword, and the Battle of Fangorn was won. As they marched out through the thicket the morning mist cleared, and the day turned to dusk.