If you are bored with Deep Networks

katastrofa · November 13th, 2017, 4:33 pm

I would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...

Cuchulainn · November 13th, 2017, 4:39 pm

I would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...

It's got lots of minima!
BTW my DE algo fails for the Griewank function
https://www.sfu.ca/~ssurjano/griewank.html

outrun · November 13th, 2017, 5:15 pm

I would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...

I have the same feeling, the chance of having zero derivatives in all direction in high dimensions is nearly zero. Also recent research on that (and also older in physics): https://stats.stackexchange.com/questio ... lue-to-the

outrun · November 13th, 2017, 5:30 pm

The most widely used method is stochastic gradient, not full batch gradient descent. A typical NN is between 10k and 10mln dimensional.

If you fit a mixtures density to a small set of points then the global optimum (of maximum likelihood) will be a set of Dirac delta functions, it won't likely perform any good when evaluated against new samples!

Cuchulainn · November 13th, 2017, 8:37 pm

I would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...
I have the same feeling, the chance of having zero derivatives in all direction in high dimensions is nearly zero. Also recent research on that (and also older in physics): https://stats.stackexchange.com/questio ... lue-to-the

Talk is cheap. Prove it.

Cuchulainn · November 13th, 2017, 8:41 pm

Quantum annealers (e.g. D-wave) find global minima. What deep networks may do, unlike classical algorithms, is finding correlations of the minima positions (more complex than e.g. momentum methods) - it's a quantum algorithm's capability. Introducing memory might possibly help too...
Some people say it does
" Our analysis suggests that the convergence issues may be fixed by endowing such algorithms with “long-term memory” of past gradients"
(The criticism I heard of their fix is that it fixes convergence in some cases, but makes it worse in others.)

Ergo, the method is not robust.

Rule in mathematics: if you make an algo easy in one aspect it will make it difficult somewhere else. Essential difficulties remain.

Nothing wrong with throwing the kitchen sink at the problem but it has to be more than fixes if you want to go into one of them robot cars.

Traden4Alpha · November 14th, 2017, 12:05 am

tl&dr: we don't understand how deep networks learn
Ah, but does it matter? If the goal is science, then "yes". If the goal is practical solutions, then "no".
I matters a lot! If you don't understand the mathematical foundations, you're groping around in the dark and progress is very slow. People who want to push AI forward are very keen on understanding the mathematical foundations of NN learning, because only then will we be able to create systems which can "learn to learn", and create AGI (artificial general intelligence).

We don't have true mathematical foundations in any science, only a best-fit-so-far set of theories expressed in math. Admitted, those best-fit-so-far maths are amazingly accurate. But are the correct? The entire history of science is just a series of discarded "foundations" with no guarantees that today's math is correct.

At best, the only foundation in science is the growing set of observations and experimental outcomes that must explained by whatever theories du jour are bubbling up. The monotonic growth is data seems to suggest there's monotonic growth in confidence in the math du jour but there's no guarantee that some observation tomorrow won't totally destroy todays foundation (except as a convenient approximation like F-ma).

Humans seem perfectly comfortable using their brains despite having no clue how they work.
We do have some clues, and I wouldn't say that we are "perfectly comfortable" with the current state of knowledge about how ours (and other animals') brains work - we don't know how to treat depression and other mental disorders, we don't know how to optimally teach and train people, etc.

We know less about brains than we know about neural nets which isn't surprising in that artificial neural nets are modeled on natural neural nets. Certainly there's no solid mathematical foundation for the brain. We don't know why we must sleep (which seems like an extremely maladaptive thing to do). We don't know how anesthetics and pain killers actually work. And what mathematical foundation predicts the placebo effect? And what the hell are all those glial cells and astrocytes doing?

outrun · November 14th, 2017, 7:34 am

I would expect that there very few minima because of the high dimension, but maybe my intuition is completely wrong...
I have the same feeling, the chance of having zero derivatives in all direction in high dimensions is nearly zero. Also recent research on that (and also older in physics): https://stats.stackexchange.com/questio ... lue-to-the
Talk is cheap. Prove it.

You first need to prove that that function is a potential loss function of a deep neural network. I'm missing a lot of dimensions for a start! Your NN is stuck in an off topic dogma. Every method has issues you can find with a quick Google search in 80s textbooks,: the question is to prove if those issues are applicable... And if you are capable of doing so!

So, proof that deep NN have lots of relevant local minima that make them act horrible. Then proof that the set of tools -like stoch gradient descent, dropout, activation functions like selu etc- cause deep NN to act horrible. Also really understand that we have high dimensions and a loss function spanned by a finite set of training examples! That we want to generalize, not memorize.

This all reminders me of someone saying that B&S is all wrong. Exactly the same behaviour.

Cuchulainn · November 14th, 2017, 9:26 am

Links?

Sure, My knowledge is mostly by osmosis.
Some time ago I received a bunch of books on NNs from the publisher as a nice gesture for reviewing some OO books, One is Freeman and Skapura 1992 which is mathematical/precise (in comparison, Goodfellow et al feels like a history book) and it discusses NN in continuous time using ODEs.
The other stuff was just from Wiki.
CTRNNs use systems of ODEs which I have few issues with, either mathematically or numerically, The question is what their applicability is. BTW vector and matrix(!) ODEs can be solved in Boost.
They say a discrete time NN is a special CTNN in which the ODE has been replaced by a difference equation.

I heard that JP Aubin (top-class mathematician) has a book (CUP) on continuous time NN.

They must be a reason why no one is talking about /publishing CT stuff???

Cuchulainn · November 14th, 2017, 9:35 am

I have the same feeling, the chance of having zero derivatives in all direction in high dimensions is nearly zero. Also recent research on that (and also older in physics): https://stats.stackexchange.com/questio ... lue-to-the
Talk is cheap. Prove it.

You first need to prove that that function is a potential loss function of a deep neural network. I'm missing a lot of dimensions for a start! Your NN is stuck in an off topic dogma. Every method has issues you can find with a quick Google search in 80s textbooks,: the question is to prove if those issues are applicable... And if you are capable of doing so!

So, proof that deep NN have lots of relevant local minima that make them act horrible. Then proof that the set of tools -like stoch gradient descent, dropout, activation functions like selu etc- cause deep NN to act horrible. Also really understand that we have high dimensions and a loss function spanned by a finite set of training examples! That we want to generalize, not memorize.

This all reminders me of someone saying that B&S is all wrong. Exactly the same behaviour.

So, I take it you cannot solve this problem using NN? I think it is a fundemental chasm between maths and engineering.

...the source of all great mathematics is the special case, the concrete example. It is frequent in mathematics that every instance of a concept of seemingly generality is, in essence, the same as a small and concrete special case.”

-- Paul Halmos

outrun · November 14th, 2017, 10:34 am

Talk is cheap. Prove it.

You first need to prove that that function is a potential loss function of a deep neural network. I'm missing a lot of dimensions for a start! Your NN is stuck in an off topic dogma. Every method has issues you can find with a quick Google search in 80s textbooks,: the question is to prove if those issues are applicable... And if you are capable of doing so!

So, proof that deep NN have lots of relevant local minima that make them act horrible. Then proof that the set of tools -like stoch gradient descent, dropout, activation functions like selu etc- cause deep NN to act horrible. Also really understand that we have high dimensions and a loss function spanned by a finite set of training examples! That we want to generalize, not memorize.

This all reminders me of someone saying that B&S is all wrong. Exactly the same behaviour.
So, I take it you cannot solve this problem using NN? I think it is a fundemental chasm between maths and engineering.

...the source of all great mathematics is the special case, the concrete example. It is frequent in mathematics that every instance of a concept of seemingly generality is, in essence, the same as a small and concrete special case.”

-- Paul Halmos

What do you mean "you can't solve it with NN"? Can you at least specify the problem more clearly? Where does NN come in?

Is your question "can you come up with a NN that has this specific loss function"?

The way I see it is that you presented a textbook case we all have seen at school and which is not applicable. It's an instance of a difficult problem for gradient descent. If I would follow you logical reasoning I would in return present a convect function and ask you to proof that GD always gets stuck in a local minima! That's why get get all these reactions. If you want a sensible discussion then we can continue but this is a waste of time.

Eg do you know that in DL we change the loss function in every gradient step? Also that we don't know the true loss function(!) because it doesn't exist and hence instead work with an approximation? We can't have a sensible discussion if you don't know these things.

Traden4Alpha · November 14th, 2017, 1:59 pm

And what the hell is your point, T4A?

Math is not robust in the broader physical sense.

Math may be extremely even perfectly provably robust within it's close-world domain of logic but to the extent that there are any discrepancies between the selected math and the selected physical system, the results of all that math may be logically correct and practically wrong.

Cuchulainn · November 14th, 2017, 2:07 pm

And what the hell is your point, T4A?
Math is not robust in the broader physical sense.

Math may be extremely even perfectly provably robust within it's close-world domain of logic but to the extent that there are any discrepancies between the selected math and the selected physical system, the results of all that math may be logically correct and practically wrong.

And that's where our roads diverge. Do you really believe what you have written?

You don't know what you are saying about what maths is and what it is not.

Math is not robust in the broader physical sense.
BS. A wee bit of humility is in order here, T4A.

Meshuggah

Traden4Alpha · November 14th, 2017, 2:49 pm

And what the hell is your point, T4A?
Math is not robust in the broader physical sense.

Math may be extremely even perfectly provably robust within it's close-world domain of logic but to the extent that there are any discrepancies between the selected math and the selected physical system, the results of all that math may be logically correct and practically wrong.
And that's where our roads diverge. Do you really believe what you have written?

You don't know what you are saying about what maths is and what it is not.

Math is not robust in the broader physical sense.
BS. A wee bit of humility is in order here, T4A.

Meshuggah

Yes I do believe that.

It seems like pure arrogance to believe that today's theorized mathematical models of the physical world are correct in the same absolute deductive logical way that a mathematical theorem is correct within some axiomatic system.

We agree that in math there are no 'maybes.' But looking at the history and philosophy of science shows that science is full of 'maybes'. Sure, scientists can rule out certain theories as being inconsistent with the accumulating pot of data but unless one has constructed the entire set of all theories and empirically ruled out every theory but one, there's always more than one theory in the set of maybes. Currently, we don't even have the set of all theories let alone empirical confidence in rejecting every theory but one among the known theories.

There are too many unexplained phenomenon and untested predictions to have absolute confidence that any given mathematical system does more than very closely approximate physical reality within the limited energy, time scales, and spatial scales that humans has so far observed.

Yes, a wee bit of humility is in order here. But it is those who have faith in the math du jour that need more humility.

Cuchulainn · November 14th, 2017, 4:51 pm

We agree that in math there are no 'maybes.'
We don't agree. Never mind.

Yes, a wee bit of humility is in order here. But it is those who have faith in the math du jour that need more humility.
The 'math du jour' is gradient descent etc. these days. Unfortunately, a discussion is not possible as the die has been cast.
What I notice is the range of maths in NN is quite limited. I am just the messenger.

AI has a checkered past; what's different now?

If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks

Re: If you are bored with Deep Networks