Serving the Quantitative Finance Community

Cuchulainn
Topic Author
Posts: 64439
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Machine Learning: Frequency asked Questions

Who needs maths anyways; I remember this infamous post here

My deeper point was that Cauchy sequences may be an approximation for some things in the real world (as long one does not look too far ahead in time or too closely at tiny epsilons) but Cauchy sequences don't actually occur in the real world. The pure proven properties of Cauchy sequences are only in the minds of mathematicians. Engineers and physicists might use Cauchy sequences as an approximation but must always be aware that they do not actually exist in the physical world.
I prefer not to know who wrote this post
No need for math today ? Shouldn't we reply with just a kid song ?
"Little pig, little pig, let me come in."
"No, no, by the hair on my chinny chin chin."
"Then I'll huff, and I'll puff, and I'll blow your house in."
More like fighting off Morlocks.
viewtopic.php?f=34&t=101293&p=826665&hi ... hy#p826665
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

Cuchulainn
Topic Author
Posts: 64439
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Machine Learning: Frequency asked Questions

Another gem.

You can't use it (Cauchy sequence) to say something about "approximation to an unknown function".

Unknown means that you assume no prior and also that you need to sample it, and samples don't give any guarantee about future samples. In the real world you are in the realms of sampling and statistics. You don't know how the function behaves between samples (interpolation), you don't know if repeated samples will give the same outcome (stationarity: the function you approximate in RL are eg non-stationary) and there is the finite resolution of samples: measurement and representation error.

However the main point is that you can't generalize and give guarantees on a set of samples.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

katastrofa
Posts: 10082
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Machine Learning: Frequency asked Questions

BTW, the heatmap is sensitivity analysis of the output (i.e. the probability distribution of the classes) to the activations of the last convolutional layer. In simple words, it shows where the model sees those strange things - indeed, in my cat!
Strangely, lots of objects are classified as a dishwasher. Hardly ever the classifications are correct - in most cases they are nonsense.
I feel kind of sorry for all those people spending their lives tweaking those models to beat some benchmarks. Some of them might have had a chance to be real scientists and do something valuable. Not to mention the wasted computational resources.
@Katastrofa, first of all congratulations, your cat is very cute.
Concerning classification algorithms, I experienced that it can be quite difficult to do efficient ones, particularly if you wish to classify among a wide number of category. Moreover, to be honest, classification algorithms based on neural networks might not be very efficient for that task.
Thanks. She's very bright too
What methods would you recommend as better than NNs for classification?

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Machine Learning: Frequency asked Questions

BTW, the heatmap is sensitivity analysis of the output (i.e. the probability distribution of the classes) to the activations of the last convolutional layer. In simple words, it shows where the model sees those strange things - indeed, in my cat!
Strangely, lots of objects are classified as a dishwasher. Hardly ever the classifications are correct - in most cases they are nonsense.
I feel kind of sorry for all those people spending their lives tweaking those models to beat some benchmarks. Some of them might have had a chance to be real scientists and do something valuable. Not to mention the wasted computational resources.
@Katastrofa, first of all congratulations, your cat is very cute.
Concerning classification algorithms, I experienced that it can be quite difficult to do efficient ones, particularly if you wish to classify among a wide number of category. Moreover, to be honest, classification algorithms based on neural networks might not be very efficient for that task.
Thanks. She's very bright too
What methods would you recommend as better than NNs for classification?
I am using kernel methods (Support vector machines - SVM). I recorded better results with them than Neural networks ones for academical ML problems, as MNIST or all the Kaggle tests I have made so far - mainly classifiers / predictors problems. This is quite explainable: SVMs can describe any neural networks, but the converse is false.

Cuchulainn
Topic Author
Posts: 64439
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Machine Learning: Frequency asked Questions

Moved it to "Monotone".
Last edited by Cuchulainn on October 7th, 2020, 7:29 am, edited 7 times in total.
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Machine Learning: Frequency asked Questions

The scheme above

$u_i^{n+1}=u_i^{n} -\frac{dt}{dx}(u_{i+1}^{n} - u_{i}^{n})$

is unconditionally unstable,
I am happy not to be the only one that mistakes channels

Cuchulainn
Topic Author
Posts: 64439
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

### Re: Machine Learning: Frequency asked Questions

The scheme above

$u_i^{n+1}=u_i^{n} -\frac{dt}{dx}(u_{i+1}^{n} - u_{i}^{n})$

is unconditionally unstable,
I am happy not to be the only one that mistakes channels
You have set a precedence and some of it is rubbing off.. Anyways, it is still an open question
BTW how's that monotone CN coming along?

https://en.wikipedia.org/wiki/Upwind_scheme
"Compatibility means deliberately repeating other people's mistakes."
David Wheeler

http://www.datasimfinancial.com
http://www.datasim.nl

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Machine Learning: Frequency asked Questions

You have set a precedence and some of it is rubbing off.. Anyways, it is still an open question
BTW how's that monotone CN coming along?

https://en.wikipedia.org/wiki/Upwind_scheme
Indeed, it is straightforward now. We are using monotone, conservative, entropy dissipative schemes that defines stochastic transition matrix in any dimensions, for any kind of stochastic processes. They are obtained with a close approach to the original CN one. These are the schemes that have been presented in Wilmott magazine. I will add these results in our last paper with P Lefloch.

katastrofa
Posts: 10082
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Machine Learning: Frequency asked Questions

@Katastrofa, first of all congratulations, your cat is very cute.
Concerning classification algorithms, I experienced that it can be quite difficult to do efficient ones, particularly if you wish to classify among a wide number of category. Moreover, to be honest, classification algorithms based on neural networks might not be very efficient for that task.
Thanks. She's very bright too
What methods would you recommend as better than NNs for classification?
I am using kernel methods (Support vector machines - SVM). I recorded better results with them than Neural networks ones for academical ML problems, as MNIST or all the Kaggle tests I have made so far - mainly classifiers / predictors problems. This is quite explainable: SVMs can describe any neural networks, but the converse is false.
Theoretically, SVM can describe any NN. Practically, it can - especially if you find the right kernel function using NNs
Why is the converse not possible?

BTW, can you share some examples of your work with SVNs?

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Machine Learning: Frequency asked Questions

Theoretically, SVM can describe any NN. Practically, it can - especially if you find the right kernel function using NNs
Why is the converse not possible?

BTW, can you share some examples of your work with SVNs?
We do it practically also: we know exactly what is the kernel corresponding to any deep tralala neural network. We can insert them directly in a SVM machine (we developed a platform, a little bit similar to Tensorflow - expect that I am alone to code it, I don't have Google resources- , but running SVM, called codefi if you are a C++ user - Or codpy if you use it through a python interface). Neural networks are a very very tiny part of SVMs.  We can characterize, both theoretically and computationally, the error that a Neural network is achieving. In practice, we face some issues concerning NNs, one of them being that NNs do not provide convergent methods for Finance applications.

Hence we use other kernels more adapted to our purposes. These kernels can't be inserted into Tensorflow or pytorch at my knowledge. That should highlight why the converse is false.

All my linkedin post or articles are done using our SVMs framework, see https://www.linkedin.com/in/jeanmarcmercier/. You should find a track record of seven years examples there, maybe others stuffs can be found on my research gate page. We have also tons of unpublished examples if you are interested.

katastrofa
Posts: 10082
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Machine Learning: Frequency asked Questions

Theoretically, SVM can describe any NN. Practically, it can - especially if you find the right kernel function using NNs
Why is the converse not possible?

BTW, can you share some examples of your work with SVNs?
We do it practically also: we know exactly what is the kernel corresponding to any deep tralala neural network. We can insert them directly in a SVM machine (we developed a platform, a little bit similar to Tensorflow - expect that I am alone to code it, I don't have Google resources- , but running SVM, called codefi if you are a C++ user - Or codpy if you use it through a python interface). Neural networks are a very very tiny part of SVMs.  We can characterize, both theoretically and computationally, the error that a Neural network is achieving. In practice they suffer from a lot of problems, the worst being that they do not provide convergent methods for Finance applications.

We use other kernels more adapted to our purposes. These kernels can't be inserted into Tensorflow or pytorch at my knowledge.

All my linkedin post or articles are done using our SVMs framework, see https://www.linkedin.com/in/jeanmarcmercier/. You should find a track record of seven years examples there, maybe others stuffs can be found on my research gate page. We have also tons of unpublished examples if you are interested.
It's cruel to tell someone to go and dig in your LinkedIn notes. Would you kindly select an example for me/us?

Honestly, I still don't understand the statements you've made in your previous post, but nvm. Now I'm even more confused something else - what do you mean by the kernel of a NN? Yo usay that we know exactly what t is - I don't have the faintest idea
BTW, I think the word "kernel" is being a bit abused by ML researchers. It did not harm to anyone
BTW, I found a method called Neural SVM - is that what you're doing?

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Machine Learning: Frequency asked Questions

Theoretically, SVM can describe any NN. Practically, it can - especially if you find the right kernel function using NNs
Why is the converse not possible?

BTW, can you share some examples of your work with SVNs?
We do it practically also: we know exactly what is the kernel corresponding to any deep tralala neural network. We can insert them directly in a SVM machine (we developed a platform, a little bit similar to Tensorflow - expect that I am alone to code it, I don't have Google resources- , but running SVM, called codefi if you are a C++ user - Or codpy if you use it through a python interface). Neural networks are a very very tiny part of SVMs.  We can characterize, both theoretically and computationally, the error that a Neural network is achieving. In practice they suffer from a lot of problems, the worst being that they do not provide convergent methods for Finance applications.

We use other kernels more adapted to our purposes. These kernels can't be inserted into Tensorflow or pytorch at my knowledge.

All my linkedin post or articles are done using our SVMs framework, see https://www.linkedin.com/in/jeanmarcmercier/. You should find a track record of seven years examples there, maybe others stuffs can be found on my research gate page. We have also tons of unpublished examples if you are interested.
It's cruel to tell someone to go and dig in your LinkedIn notes. Would you kindly select an example for me/us?

Honestly, I still don't understand the statements you've made in your previous post, but nvm. Now I'm even more confused something else - what do you mean by the kernel of a NN? Yo usay that we know exactly what t is - I don't have the faintest idea
BTW, I think the word "kernel" is being a bit abused by ML researchers. It did not harm to anyone
BTW, I found a method called Neural SVM - is that what you're doing?
@Katastrofa.
A) Ok here are some links towards these linkedin post (not peer reviewed paper and quickly written. I am a private researcher, and this is linked in, free style !)
1) The SVM framework benchmarked against Tensorflow for the MNIST problem can be found here  (this one is indeed a 4 year old test).
2) The same SVM framework toying with the Navier Stokes equation
3) The same SVM framework used a a general stress test framework
4) The ground ideas to compute error estimates with this framework
5) A remark to Neural Networks users that want to apply NNs to high dimensional PDE here
6) The same SVM framework used as a general, high dimensional, PDE solver for Finance applications (pricing / hedging) - the post is from 2014.

B) I am trying to use a very precise vocabulary. A Support Vector Machine is fully described by a single function called a kernel.  I know (it is surely known by others) that a Neural Network, whatever the number of layers it has, can also be represented by a kernel, called an activation function kernel. An activation kernel is a very particular kernel, and most of the kernels that I use can't fit into activation function kernels: SVMS are more generals than NNs.

C) I am having a look to Neural SVM. To be honest my first reaction reading the abstract is : "do these guys understand that Neural Networks are part of SVMS ?". It means than their Neural-SVM machine can be fully described by a single kernel in a SVM machine...(addendum) I stop this paper reading the sentence "The standard SVM is unable to handle multiple outputs in a single architecture". This is erroneous..

The point is that no one took time to understand what are the math behind NNs and invested billions in this technology. Now what is happening ?

"Then I'll huff, and I'll puff, and I'll blow your house in."

katastrofa
Posts: 10082
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: Machine Learning: Frequency asked Questions

I can see some superficial formal analogies between SVMs and NNs if I forget about all the conditions the SVN's kernels need to meet to produce trustworthy results (see Mercer conditions).
SVMs are based on the trick of changing the metric of the data space in such a way that datapoints, which they aren't linearly separable in the original metric, become linear-separable, the so-called kernel trick. Here is a very nice pictorial explanation.
Those kernel functions are simply the scalar products of the vectors (representing datapoints) in that space. For instance, let's say a polynomial kernel is a good candidate to fir our training d@ta: $K (x_i, x_j) = a x_i^T x_j + b$. You can call it "activation function" if you like. You can also see that the number of the model parameters grows with the number of datapoints. That's why SVMs were replaced with NNs, which scale with the dataset size better (at the cost of the mathematical rigour). NNs don't have this problem, because the actual activation function is $w_i x_j + b_i$, where w and b are parameters of the NN units. Hence the dimension of the problem is controlled by your chosen size of the NN layer (not the data).
Summarising, I definitely wouldn't say that SVNs are NNs.

("d@ta" because Wordpress blocks me for "data"! - "A potentially unsafe operation has been detected in your request to this site")

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Machine Learning: Frequency asked Questions

I can see some superficial formal analogies between SVMs and NNs if I forget about all the conditions the SVN's kernels need to meet to produce trustworthy results (see Mercer conditions).
SVMs are based on the trick of changing the metric of the data space in such a way that datapoints, which they aren't linearly separable in the original metric, become linear-separable, the so-called kernel trick. Here is a very nice pictorial explanation.
Those kernel functions are simply the scalar products of the vectors (representing datapoints) in that space. For instance, let's say a polynomial kernel is a good candidate to fir our training d@ta: $K (x_i, x_j) = a x_i^T x_j + b$. You can call it "activation function" if you like. You can also see that the number of the model parameters grows with the number of datapoints. That's why SVMs were replaced with NNs, which scale with the dataset size better (at the cost of the mathematical rigour). NNs don't have this problem, because the actual activation function is $w_i x_j + b_i$, where w and b are parameters of the NN units. Hence the dimension of the problem is controlled by your chosen size of the NN layer (not the data).
Summarising, I definitely wouldn't say that SVNs are NNs.

("d@ta" because Wordpress blocks me for "data"! - "A potentially unsafe operation has been detected in your request to this site")
kernel functions are simply the scalar products of the vectors  These are scalar product vectors, thus are kernels for SVMs. A kernel is ANY symmetrical function $k(x,y)$ (more precisely admissible kernels). For instance $k(x,y) = \max(<x,y>,0)$ is the RELU network of tensorflow. Ok u are speaking about scalar product kernels. What wrong with them ?

And no again to "You can call it "activation function" if you like. You can also see that the number of the model parameters grows with the number of datapoints"
I can work with the computational resources that you want or at a given precision, and taking into account all your data. This holds of course also for linear kernels, but these are not really interesting, they consist in linear regression. I have also an algorithm that is quite similar to learning. Same methods, but more generals, and theoretically bullet-proof.

And more interestingly, I can now explain why and when a Neural Network fails, and propose a patch to fix this mess when it occurs. This is the "Huff". But I can also "Puff" if I wish to...

JohnLeM
Posts: 515
Joined: September 16th, 2008, 7:15 pm

### Re: Machine Learning: Frequency asked Questions

C) I am having a look to Neural SVM. To be honest my first reaction reading the abstract is : "do these guys understand that Neural Networks are part of SVMS ?". It means than their Neural-SVM machine can be fully described by a single kernel in a SVM machine...(addendum) I stop this paper reading the sentence "The standard SVM is unable to handle multiple outputs in a single architecture". This is erroneous..
I was in a killer-mode yesterday, sorry for this too severe judgment. I read this paper, and put it into a folder of mine, named "kernel engineering"