It's got lots of minima!I would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...
I have the same feeling, the chance of having zero derivatives in all direction in high dimensions is nearly zero. Also recent research on that (and also older in physics): https://stats.stackexchange.com/questio ... lue-to-theI would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...
Talk is cheap. Prove it.I have the same feeling, the chance of having zero derivatives in all direction in high dimensions is nearly zero. Also recent research on that (and also older in physics): https://stats.stackexchange.com/questio ... lue-to-theI would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...
Ergo, the method is not robust.Some people say it doesQuantum annealers (e.g. D-wave) find global minima. What deep networks may do, unlike classical algorithms, is finding correlations of the minima positions (more complex than e.g. momentum methods) - it's a quantum algorithm's capability. Introducing memory might possibly help too...
" Our analysis suggests that the convergence issues may be fixed by endowing such algorithms with “long-term memory” of past gradients"
(The criticism I heard of their fix is that it fixes convergence in some cases, but makes it worse in others.)
We don't have true mathematical foundations in any science, only a best-fit-so-far set of theories expressed in math. Admitted, those best-fit-so-far maths are amazingly accurate. But are the correct? The entire history of science is just a series of discarded "foundations" with no guarantees that today's math is correct.I matters a lot! If you don't understand the mathematical foundations, you're groping around in the dark and progress is very slow. People who want to push AI forward are very keen on understanding the mathematical foundations of NN learning, because only then will we be able to create systems which can "learn to learn", and create AGI (artificial general intelligence).Ah, but does it matter? If the goal is science, then "yes". If the goal is practical solutions, then "no".tl&dr: we don't understand how deep networks learn
We know less about brains than we know about neural nets which isn't surprising in that artificial neural nets are modeled on natural neural nets. Certainly there's no solid mathematical foundation for the brain. We don't know why we must sleep (which seems like an extremely maladaptive thing to do). We don't know how anesthetics and pain killers actually work. And what mathematical foundation predicts the placebo effect? And what the hell are all those glial cells and astrocytes doing?We do have some clues, and I wouldn't say that we are "perfectly comfortable" with the current state of knowledge about how ours (and other animals') brains work - we don't know how to treat depression and other mental disorders, we don't know how to optimally teach and train people, etc.Humans seem perfectly comfortable using their brains despite having no clue how they work.
You first need to prove that that function is a potential loss function of a deep neural network. I'm missing a lot of dimensions for a start! Your NN is stuck in an off topic dogma. Every method has issues you can find with a quick Google search in 80s textbooks,: the question is to prove if those issues are applicable... And if you are capable of doing so!Talk is cheap. Prove it.I have the same feeling, the chance of having zero derivatives in all direction in high dimensions is nearly zero. Also recent research on that (and also older in physics): https://stats.stackexchange.com/questio ... lue-to-theI would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...
Sure, My knowledge is mostly by osmosis.Links?
So, I take it you cannot solve this problem using NN? I think it is a fundemental chasm between maths and engineering.You first need to prove that that function is a potential loss function of a deep neural network. I'm missing a lot of dimensions for a start! Your NN is stuck in an off topic dogma. Every method has issues you can find with a quick Google search in 80s textbooks,: the question is to prove if those issues are applicable... And if you are capable of doing so!Talk is cheap. Prove it.I have the same feeling, the chance of having zero derivatives in all direction in high dimensions is nearly zero. Also recent research on that (and also older in physics): https://stats.stackexchange.com/questio ... lue-to-the
So, proof that deep NN have lots of relevant local minima that make them act horrible. Then proof that the set of tools -like stoch gradient descent, dropout, activation functions like selu etc- cause deep NN to act horrible. Also really understand that we have high dimensions and a loss function spanned by a finite set of training examples! That we want to generalize, not memorize.
This all reminders me of someone saying that B&S is all wrong. Exactly the same behaviour.
What do you mean "you can't solve it with NN"? Can you at least specify the problem more clearly? Where does NN come in?So, I take it you cannot solve this problem using NN? I think it is a fundemental chasm between maths and engineering.You first need to prove that that function is a potential loss function of a deep neural network. I'm missing a lot of dimensions for a start! Your NN is stuck in an off topic dogma. Every method has issues you can find with a quick Google search in 80s textbooks,: the question is to prove if those issues are applicable... And if you are capable of doing so!Talk is cheap. Prove it.
So, proof that deep NN have lots of relevant local minima that make them act horrible. Then proof that the set of tools -like stoch gradient descent, dropout, activation functions like selu etc- cause deep NN to act horrible. Also really understand that we have high dimensions and a loss function spanned by a finite set of training examples! That we want to generalize, not memorize.
This all reminders me of someone saying that B&S is all wrong. Exactly the same behaviour.
...the source of all great mathematics is the special case, the concrete example. It is frequent in mathematics that every instance of a concept of seemingly generality is, in essence, the same as a small and concrete special case.”
-- Paul Halmos
Math is not robust in the broader physical sense.And what the hell is your point, T4A?
And that's where our roads diverge. Do you really believe what you have written?Math is not robust in the broader physical sense.And what the hell is your point, T4A?
Math may be extremely even perfectly provably robust within it's close-world domain of logic but to the extent that there are any discrepancies between the selected math and the selected physical system, the results of all that math may be logically correct and practically wrong.
Yes I do believe that.And that's where our roads diverge. Do you really believe what you have written?Math is not robust in the broader physical sense.And what the hell is your point, T4A?
Math may be extremely even perfectly provably robust within it's close-world domain of logic but to the extent that there are any discrepancies between the selected math and the selected physical system, the results of all that math may be logically correct and practically wrong.
You don't know what you are saying about what maths is and what it is not.
Math is not robust in the broader physical sense.
BS. A wee bit of humility is in order here, T4A.
Meshuggah