I can see some superficial formal analogies between SVMs and NNs if I forget about all the conditions the SVN's kernels need to meet to produce trustworthy results (see Mercer conditions).

SVMs are based on the trick of changing the metric of the data space in such a way that datapoints, which they aren't linearly separable in the original metric, become linear-separable, the so-called kernel trick.

Here is a very nice pictorial explanation.
Those

**kernel functions are simply the scalar products of the vectors** (representing datapoints) in that space. For instance, let's say a polynomial kernel is a good candidate to fir our training d@ta: [$]K (x_i, x_j) = a x_i^T x_j + b[$]. You can call it "activation function" if you like. You can also see that the number of the model parameters grows with the number of datapoints. That's why SVMs were replaced with NNs, which scale with the dataset size better (at the cost of the mathematical rigour). NNs don't have this problem, because the actual activation function is [$]w_i x_j + b_i[$], where w and b are parameters of the NN units. Hence the dimension of the problem is controlled by your chosen size of the NN layer (not the data).

Summarising, I definitely wouldn't say that SVNs are NNs.

("d@ta" because Wordpress blocks me for "data"! - "A potentially unsafe operation has been detected in your request to this site")

**kernel functions are simply the scalar products of the vectors ** These are scalar product vectors, thus are kernels for SVMs. A kernel is ANY symmetrical function [$]k(x,y)[$] (more precisely admissible kernels). For instance [$]k(x,y) = \max(<x,y>,0)[$] is the RELU network of tensorflow. Ok u are speaking about scalar product kernels. What wrong with them ?

And no again to "You can call it "activation function" if you like. You can also see that the number of the model parameters grows with the number of datapoints"

I can work with the computational resources that you want or at a given precision, and taking into account all your data. This holds of course also for linear kernels, but these are not really interesting, they consist in linear regression. I have also an algorithm that is quite similar to learning. Same methods, but more generals, and theoretically bullet-proof.

And more interestingly, I can now explain why and when a Neural Network fails, and propose a patch to fix this mess when it occurs. This is the "Huff". But I can also "Puff" if I wish to...