Time for capsules!

Technical papers:

https://arxiv.org/abs/1710.09829

https://openreview.net/pdf?id=HJWLfGWRb

What is really impressive is that the new architecture is much more robust to adversarial examples - a bane of deep learning.

Time for capsules!

Technical papers:

https://arxiv.org/abs/1710.09829

https://openreview.net/pdf?id=HJWLfGWRb

What is really impressive is that the new architecture is much more robust to adversarial examples - a bane of deep learning.

Technical papers:

https://arxiv.org/abs/1710.09829

https://openreview.net/pdf?id=HJWLfGWRb

What is really impressive is that the new architecture is much more robust to adversarial examples - a bane of deep learning.

- Cuchulainn
**Posts:**56672**Joined:****Location:**Amsterdam-
**Contact:**

Prof Rowat has some remarks

http://www.socscistaff.bham.ac.uk/rowat ... -Rowat.pdf

These adversarial examples probably give gradient-based method a hard time because problems be non-convex and nonlinear, and possibly even non-differntiable.

If a single bit changes a dog into a care, then maybe generalised metrics and metric spaces should be used. Topology?

If you used a more robust metric then one bit change makes a dog, not??

I suppose it's all part of the scientific process, continuous improvement?

my 2 cents.

http://www.socscistaff.bham.ac.uk/rowat ... -Rowat.pdf

These adversarial examples probably give gradient-based method a hard time because problems be non-convex and nonlinear, and possibly even non-differntiable.

If a single bit changes a dog into a care, then maybe generalised metrics and metric spaces should be used. Topology?

If you used a more robust metric then one bit change makes a dog, not??

I suppose it's all part of the scientific process, continuous improvement?

my 2 cents.

Classifiers segment space, there will always be point close to the border that need just a very small nudge pass the border. That's actually the algorithm behind finding adversarial cases.

outrun wrote:Classifiers segment space, there will always be point close to the border that need just a very small nudge pass the border. That's actually the algorithm behind finding adversarial cases.

But the point about adversarial examples in Deep Learning is that you can find an adversarial example from about any starting point, not just a few near a "border". There are algorithms which do it (two are referenced in the capsule paper). The reason is not that deep networks are classifiers, but that the probability distribution they learn is too concentrated on the examples they see.

Your example above is hand-crafted. You can't take a random photo of a duck, modify it a bit and turn into a photo an average person will classify as a rabbit. (BTW, it is not a good duck picture, because ducks don't have beaks like that.)

Cuchulainn wrote:

Indeed. All these things are well-known in the community.

I recommend this paper: https://arxiv.org/pdf/1611.03530.pdf

tl&dr: we don't understand how deep networks learn

tl&dr: we don't understand how deep networks learn

- Traden4Alpha
**Posts:**23951**Joined:**

ISayMoo wrote:tl&dr: we don't understand how deep networks learn

Ah, but does it matter? If the goal is science, then "yes". If the goal is practical solutions, then "no".

Humans seem perfectly comfortable using their brains despite having no clue how they work.

- Cuchulainn
**Posts:**56672**Joined:****Location:**Amsterdam-
**Contact:**

Traden4Alpha wrote:ISayMoo wrote:tl&dr: we don't understand how deep networks learn

Ah, but does it matter? If the goal is science, then "yes". If the goal is practical solutions, then "no".

Humans seem perfectly comfortable using their brains despite having no clue how they work.

This is not wrong. But it does not address the core issue.

Even for practical methods, you need to know what the underlying principles are I suppose. Avoids nasty surprises. Even Hinton is reviewing his backpropagation algorithm.

I would prefer to meet a panda than a gibbon in a dark alleyway on a Saturday night. Improving in the next evolutionary version is too late.

Are all sheep in Scotland black or is that a sheep in the field, one of whose sides is black?

Seriously, you need to know for which class of problems a given method is suitable. The original Perceptron broke down because the corresponding mathematical/minimisation problem did not have a solution. Seems to me that this should be the first problem to address..

- Traden4Alpha
**Posts:**23951**Joined:**

Agreed! Your sentiment is the basis for all of science and engineering. Yes, we certainly should try to determine how things work so that we can predict which operating conditions lead to performance or failure.

The question is how. Can we use deductive methods to logically prove the system properties or must we use inductive methods to empirically assess performance and then use interpolation (sometimes safe) or extrapolation (often dangerous) to predict performance under new conditions?

For simple math and simple code, deductive analysis of the system can mathematically prove the properties of the system. For more complex things (human brains and neural networks) deduction may be intractable. (It may even be true that any deductive system capable of proving the properties of a complex system would, itself, be so complex that we'd not know how that deductive system works and thus not trust the deductive's system's assessment of the complex system).

Your example of the panda, gibbon, and alley illustrates this nicely. Technically, we really don't know how panda, gibbons, and dark alleys work in any mathematical sense -- it's all empirical knowledge. Moreover, we know the empircal properties of pandas, gibbons, and dark alleys independently of each other -- having no data on those specific animals in those specific locations. Predicting whether a gibbon in a dark alley or a panda in a dark alley is actually more dangerous calls for extrapolation. Extrapolation is dangerous and yet we do it because there's no alternative.

The stickier issue is: are there things that perform as well as back-propagation for learning-related tasks for which we do know how they work? If we don't use back-propagation, what do we use that is provably better? (BTW: human brains are demonstrably worse on both deductive and inductive dimensions: we don't know how they work and they have worse empirical performance.)

The question is how. Can we use deductive methods to logically prove the system properties or must we use inductive methods to empirically assess performance and then use interpolation (sometimes safe) or extrapolation (often dangerous) to predict performance under new conditions?

For simple math and simple code, deductive analysis of the system can mathematically prove the properties of the system. For more complex things (human brains and neural networks) deduction may be intractable. (It may even be true that any deductive system capable of proving the properties of a complex system would, itself, be so complex that we'd not know how that deductive system works and thus not trust the deductive's system's assessment of the complex system).

Your example of the panda, gibbon, and alley illustrates this nicely. Technically, we really don't know how panda, gibbons, and dark alleys work in any mathematical sense -- it's all empirical knowledge. Moreover, we know the empircal properties of pandas, gibbons, and dark alleys independently of each other -- having no data on those specific animals in those specific locations. Predicting whether a gibbon in a dark alley or a panda in a dark alley is actually more dangerous calls for extrapolation. Extrapolation is dangerous and yet we do it because there's no alternative.

The stickier issue is: are there things that perform as well as back-propagation for learning-related tasks for which we do know how they work? If we don't use back-propagation, what do we use that is provably better? (BTW: human brains are demonstrably worse on both deductive and inductive dimensions: we don't know how they work and they have worse empirical performance.)

A NN + a loss function is a high dimensional function. Back propagation is a supervised calibration method. Replacing BP with something else doesn't change these issues. These model are indeed empirically assessed, but is that any different from a table in a QF magazine illustrating the error in a small set of option prices for some numerical method?

outrun wrote:A NN + a loss function is a high dimensional function. Back propagation is a supervised calibration method. Replacing BP with something else doesn't change these issues. These model are indeed empirically assessed, but is that any different from a table in a QF magazine illustrating the error in a small set of option prices for some numerical method?

I sincerely hope that the AI community is able to reach beyond the level of QF magazines.

Traden4Alpha wrote:ISayMoo wrote:tl&dr: we don't understand how deep networks learn

Ah, but does it matter? If the goal is science, then "yes". If the goal is practical solutions, then "no".

I matters a lot! If you don't understand the mathematical foundations, you're groping around in the dark and progress is very slow. People who want to push AI forward are very keen on understanding the mathematical foundations of NN learning, because only then will we be able to create systems which can "learn to learn", and create AGI (artificial general intelligence).

Humans seem perfectly comfortable using their brains despite having no clue how they work.

We do have some clues, and I wouldn't say that we are "perfectly comfortable" with the current state of knowledge about how ours (and other animals') brains work - we don't know how to treat depression and other mental disorders, we don't know how to optimally teach and train people, etc.

I just came out of bed and almost mistook our cat for my black adidias sneeker with white stripes. I was trying to grab them and put them on but then it hissed and ran away.

That's why a GANs have two NN, me and the cat critic. Things sort themselfes out.

However what is I mistake it the other way around next time? Give food and pats to my sneekers? Nothing would happen? Who would correct me?

That's why a GANs have two NN, me and the cat critic. Things sort themselfes out.

However what is I mistake it the other way around next time? Give food and pats to my sneekers? Nothing would happen? Who would correct me?

- Cuchulainn
**Posts:**56672**Joined:****Location:**Amsterdam-
**Contact:**

As ISayMoo says, we are a long way away from modelling the human brain. So maybe it's time to temper expectations. Even Hinton says something new is needed.. Gradient descent is kind of basic.

From a mathematical viewpoint, the foundations of NN look fairly basic IMO. A new kind of maths is probably needed.

Regarding models, what is needed in NN is the equivalent of Navier Stoles PDE.

- Cuchulainn
**Posts:**56672**Joined:****Location:**Amsterdam-
**Contact:**

Too binary; "NN good, deductive bad?" We are being pushed into a corner.

As ISayMoo says, we are a long way away from modelling the human brain. So maybe it's time to temper expectations.

From a mathematical viewpoint, the foundations of NN look fairly basic. A new kind of maths is probably needed. Caveat; I am a NN applivation noobie, doesn't mean I can't ask questions, especially if there is a bit of maths in there.

Regarding models, what is needed in NN is the equivalent of Navier Stoles PDE.