Speaking of
boredom, I don't get bored so often. However, having studied discrete maths optimisation methods (GD and it many variants and use cases) it would seem that they all break down unless you start tweaking. It feels like premature optimisation, like first buying the windows and doors when building a house.
And I said a few times .. Inside GD is hiding a (nasty) Euler scheme for an otherwise unpecified ODE. Why? Maybe because of ML's roots in linear algebra. Maybe it's better to start with the ODE leading to GD (the mathematical approach) rather than jumping head-first into GD.
Standing on the yuge shoulders of Lyapunov and Poincaré leads to Gradient Flows
https://icml.cc/media/Slides/icml/2019/hallb(12-16-00)-12-17-05-5119-deep_generative.pdf
Looks promising but I bow to the ML experts here.
I have done several small experiments (POC) based on the ODE approximation (SGD as a ODE, optiminisation ODE, noisy data). It is much easier/elegant (
for me) A-Z than the many algos in Nocedal and Wright. At the least, it keep the grey cells ticking, like in ye olde days here.
https://forum.wilmott.com/viewtopic.php?f=34&t=101662