Cuchulainn wrote:outrun wrote:katastrofa wrote:I would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...

I have the same feeling, the chance of having zero derivatives in all direction in high dimensions is nearly zero. Also recent research on that (and also older in physics): https://stats.stackexchange.com/questio ... lue-to-the

Talk is cheap. Prove it.

You first need to prove that that function is a potential loss function of a deep neural network. I'm missing a lot of dimensions for a start! Your NN is stuck in an off topic dogma. Every method has issues you can find with a quick Google search in 80s textbooks,: the question is to prove if those issues are applicable... And if you are capable of doing so!

So, proof that deep NN have lots of relevant local minima that make them act horrible. Then proof that the set of tools -like stoch gradient descent, dropout, activation functions like selu etc- cause deep NN to act horrible. Also really understand that we have high dimensions and a loss function spanned by a finite set of training examples! That we want to generalize, not memorize.

This all reminders me of someone saying that B&S is all wrong. Exactly the same behaviour.