SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
User avatar
ISayMoo
Topic Author
Posts: 2210
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

June 7th, 2018, 7:49 pm

The 1st article is very old (from 2007). A lot of the stuff in it is out of date or applies to things people are no longer really interested in.

The 2nd article (guidelines) is kind of "Captain Obvious speaking" paper. The author's right about everything (or maybe almost everything, but I didn't have the time to go nitpicking), but it's rather well-known stuff.
Fair enough.So things have progressed in leaps and bounds in the period 2007-2017?
2007 is not very old. BP is at least 50 years old. 
Yes, things have progressed quite a bit since 2007.
 
User avatar
Cuchulainn
Posts: 60518
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

June 7th, 2018, 8:13 pm

The 1st article is very old (from 2007). A lot of the stuff in it is out of date or applies to things people are no longer really interested in.

The 2nd article (guidelines) is kind of "Captain Obvious speaking" paper. The author's right about everything (or maybe almost everything, but I didn't have the time to go nitpicking), but it's rather well-known stuff.
Fair enough.So things have progressed in leaps and bounds in the period 2007-2017?
2007 is not very old. BP is at least 50 years old. 
Yes, things have progressed quite a bit since 2007.
You're beginning to sound like a politician. And I am not joking.
http://www.datasimfinancial.com
http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..
R. van Gulik
 
User avatar
ISayMoo
Topic Author
Posts: 2210
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

June 7th, 2018, 10:36 pm

I think the problem is at the receiving end ;-)

Honestly, the 1st paper wouldn't have been very up to date even in 2007. It fails to mention things like LSTMs or CNNs, which were already present then. It only discusses feed-forward networks with a single layer and mentions larger architectures only in passing.

This doesn't mean that it doesn't raise valid points. People often neglect to describe exactly the training procedure, or fail to mention in papers the tricks which have to be performed to make the method work (the very existence of "tricks" is a problem, of course...)

It's a field which is developing. A physical analogy would be quantum mechanics before the mathematicians came and sorted out the foundations.
 
User avatar
Cuchulainn
Posts: 60518
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

June 8th, 2018, 12:20 pm

Thank you. Clear.
http://www.datasimfinancial.com
http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..
R. van Gulik
 
User avatar
ISayMoo
Topic Author
Posts: 2210
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

June 9th, 2018, 7:19 am

You should like this paper: Do CIFAR-10 Classifiers Generalize to CIFAR-10?

(tl&dr: Deep Learning image classifiers break under an even minor shift in the data distribution.)
 
User avatar
katastrofa
Posts: 8527
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: If you are bored with Deep Networks

June 9th, 2018, 11:08 am

You should like this paper: Do CIFAR-10 Classifiers Generalize to CIFAR-10?

(tl&dr: Deep Learning image classifiers break under an even minor shift in the data distribution.)
Could it be overcome by introducing a translation vector as an extra training parameter?
 
User avatar
outrun
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

Re: If you are bored with Deep Networks

June 9th, 2018, 3:19 pm

You should like this paper: Do CIFAR-10 Classifiers Generalize to CIFAR-10?

(tl&dr: Deep Learning image classifiers break under an even minor shift in the data distribution.)
Could it be overcome by introducing a translation vector as an extra training parameter?
That's a popular technique called "data augmentation"

In general (no matter what technique you use) there is of course always an issue with image classificatioin that you have a relative small set of sample in a very high dimension. How to generalize?

My intuition is techniques borrowed from random forests and support vector machines (and many GANs) will improve the robustness of generalization.

I've also read that the new test samples generates in that paper are easy to classify (by a human) as different than the original cfar-10 training samples. If so then what does that say? Imo it's actually good that models performs worse on the new samples and at the same time that the original samples weren't representative for the task you want the model to perform.

It's still a good point that the paper is making, I like it.
 
User avatar
katastrofa
Posts: 8527
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: If you are bored with Deep Networks

June 9th, 2018, 4:15 pm

More precisely: at the training stage, I would draw a minibatch of pictures and transform each of them by translating it by a vector in plane and rotating. I would calculate the 3D loss function for each transformation and look for its minimum. It cannot be any minimum though - it needs to be a smooth bump rather than a sharp peak. The procedure sounds cumbersome, but there are many statistical methods which would accelerate it. There's a risk that the network would start to confuse objects which e.g. look similar to different objects if one rotates them, but there's a chance that the network would reach for finer differences to successfully train itself.

Apart from translation and rotation, one could consider other transformations, such as asymmetric scaling to manipulate the perspective. This would be much more challenging, but could potentially teach the network to recognise the same object seen at different angles.

Anyway, I would try this out myself but I don't have sufficient computational power :-( (If I had, I would use for something more interesting anyway :-P)
Last edited by katastrofa on June 9th, 2018, 4:49 pm, edited 1 time in total.
 
User avatar
ISayMoo
Topic Author
Posts: 2210
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

June 9th, 2018, 4:47 pm

You should like this paper: Do CIFAR-10 Classifiers Generalize to CIFAR-10?

(tl&dr: Deep Learning image classifiers break under an even minor shift in the data distribution.)
Could it be overcome by introducing a translation vector as an extra training parameter?
That's a popular technique called "data augmentation"
Don't believe this Medium post, they're extremely naive. Training procedures for models like VGG (trained on ImageNet) do all that (scropping, reflections, etc.) and they still overfit. Do you know what's in the ImageNet dataset? Over 10% of it is dog photos, and less than half of that is cats. While everyone knows that most of Internet is cat pics!

Convnets are simply too simplistic tools to solve image recognition, whatever data augmentation you do. They're not even - contrary to popular opinion - translationally invariant (because they downsample).

In general (no matter what technique you use) there is of course always an issue with image classificatioin that you have a relative small set of sample in a very high dimension. How to generalize?

My intuition is  techniques borrowed from random forests and support vector machines (and many GANs)  will improve the robustness of generalization.

I've also read that the new test samples generates in that paper are easy to classify (by a human) as different than the original cfar-10 training samples.  If so then what does that say? Imo it's actually good that models performs worse on the new samples and at the same time that the original samples weren't representative for the task you want the model to perform.
CIFAR-10 is not representative for anything. It's only merit is that it's small and models train fast on it.

It's still a good point that the paper is making, I like it.
 
User avatar
ISayMoo
Topic Author
Posts: 2210
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

June 9th, 2018, 5:04 pm

More precisely: at the training stage, I would draw a minibatch of pictures and transform each of them by translating it by a vector in plane and rotating. I would calculate the 3D loss function for each transformation and look for its minimum. It cannot be any minimum though - it needs to be a smooth bump rather than a sharp peak. The procedure sounds cumbersome, but there are many statistical methods which would accelerate it. There's a risk that the network would start to confuse objects which e.g. look similar to different objects if one rotates them, but there's a chance that the network would reach for finer differences to successfully train itself.

Apart from translation and rotation, one could consider other transformations, such as asymmetric scaling to manipulate the perspective. This would be much more challenging, but could potentially teach the network to recognise the same object seen at different angles.

Anyway, I would try this out myself but I don't have sufficient computational power :-( (If I had, I would use for something more interesting anyway :-P)
I like this approach. It's ambitious and, to my intuition, looks expensive computationally, but somehow aligns better with how animals learn visual recognition. We learn from video streams, not static pictures.
 
User avatar
Cuchulainn
Posts: 60518
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

June 10th, 2018, 9:41 am

You should like this paper: Do CIFAR-10 Classifiers Generalize to CIFAR-10?

(tl&dr: Deep Learning image classifiers break under an even minor shift in the data distribution.)
Could it be overcome by introducing a translation vector as an extra training parameter?
This probably means that the problem is "ill-posed" in some way. The case of adversarial training is probably a good example. So, adding a very small increment to the image result in an image of a gibbon instead of a panda. In numerical analysis this means that the maths that processes the image is not stable. Maybe it's different with ML but there is something not kosher. Correct me if I am wrong.(gut feeling says the metrics used to compute distance is not robust enough).,

On {over,under}fitting, it seems they are introduced in the same breath as global polynomials, scary beast at the best of times. The fix seems to be to use regularisation (aka penalty, Lagrange multipliers), which is a Pandora's box of optimisation. It can open a new can of worms e.g. which one to use?

In the book by Geron page 123 he talks about a 300-degree polynomial regression model. Is it serious? No one in numerical analysis does this.

// finding the 'closeness' of 2 data distributions? Which (semi-)metric is used??
http://www.datasimfinancial.com
http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..
R. van Gulik
 
User avatar
ISayMoo
Topic Author
Posts: 2210
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

June 10th, 2018, 12:38 pm

Adversarial examples exist also for "old school" classifiers such as SVMs. It's not a problem with deep learning only.
 
User avatar
outrun
Posts: 4573
Joined: April 29th, 2016, 1:40 pm

Re: If you are bored with Deep Networks

June 10th, 2018, 3:15 pm

Data augmentation is adding a-priory knowledge via extending your dataset. If you have a labeled picture of a cat then you know that you can shift it one pixel to the right and have another cat image (but if it was picture of a QR code pointing to a Spanish domain then not!). Giving more sample created by leveraging known invariants during training helps generalize, teach the model about those invariants through samples.

If I trained some model to distinguish cat from dogs based on a small set of pictures, ..and then later revealed that it was actually about black vs brown objects instead of cats vs dogs then there is no way the model couldn't have learned that from the labeled train data. It could have learned either interpretation of the objective, both would for the train data. Even worse, the cat pictures might have been taken with a flash and it might have learned to detect flashes. The model is as good as the data you provide, and it needs a lot of data (or a priori knowledge) to be able to correctly attribute labels to features and generalize well. These are general statistical / information problems, not specific to deep learning.

There are of course other ways to add a priority knowledge than creating extra samples. Eg CNN assume that the pixels have some 2d layout and that you should use hierarchical features. That's also a form of adding a-priory knowledge to your model. In the 90s there were some experiment with rotation, scale, shift invariants..
 
User avatar
ISayMoo
Topic Author
Posts: 2210
Joined: September 30th, 2015, 8:30 pm

Re: If you are bored with Deep Networks

June 10th, 2018, 5:52 pm

These are general statistical / information problems, not specific to deep learning.
Specific to deep learning is the rejection of prior knowledge in your models and relying only, or almost only, on the training data as a source of knowledge. The bias embedded in CNNs is not strong and only loosely related to the task they are being used for (as evidenced by the fact that CNNs trained on ImageNet have been successfully applied to e.g. audio data).
 
User avatar
Cuchulainn
Posts: 60518
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: If you are bored with Deep Networks

June 10th, 2018, 6:48 pm

Adversarial examples exist also for "old school" classifiers such as SVMs. It's not a problem with deep learning only.
Those methods seems to have some commonality.
http://www.datasimfinancial.com
http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..
R. van Gulik
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On