Is the experience kathartic?This paper seems quite interesting but I spent unsuccessfully two hours trying to understand it. But nice cows indeed !A loss surface of a neural network can approximate anything. Including cows.

- Cuchulainn
**Posts:**64682**Joined:****Location:**Drosophila melanogaster-
**Contact:**

Is the experience kathartic?This paper seems quite interesting but I spent unsuccessfully two hours trying to understand it. But nice cows indeed !A loss surface of a neural network can approximate anything. Including cows.

"Compatibility means deliberately repeating other people's mistakes."

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

I am not sure if it is cathartic or not. I will try again to see if I can get out of my body. I'll let you know.Is the experience kathartic?This paper seems quite interesting but I spent unsuccessfully two hours trying to understand it. But nice cows indeed !A loss surface of a neural network can approximate anything. Including cows.

But I agree with you, a lot of authors around there should reconsider the link to their readers.

- Cuchulainn
**Posts:**64682**Joined:****Location:**Drosophila melanogaster-
**Contact:**

If both yourself and I have difficulty, what about freshly-minted data scientists. ArxIv is a litany of unreadable/unrefereed papers.

Documentation and writing skills have never had the interest of CS. Most articles take no account of the fact that 95% of people don't do AI/ML.

"Compatibility means deliberately repeating other people's mistakes."

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

I tried again one hour this morning. But it is really unclear. However one of the included reference seems more detailed and clear, I'll try to fall back to it to understand their work.

- Cuchulainn
**Posts:**64682**Joined:****Location:**Drosophila melanogaster-
**Contact:**

In a sense all of the ML articles to date have have missed the mark because they are based on statistics and linear algebra (don't have metrics, norms and basically lacking in hard functional analysis). Sapir-Whorf in action (the language you use shapes how you think). The current methods might be barking up the wrong tree and may not even be wrong.

An alternative is to*embed* probability distributions in a reproducing kernel Hilbet space (RKHS) an then the analysis takes place in the context of inner product spaces by defining feature maps and a kernel mean.

A special case is the kernel trick in SVM.

Much of RKHS rests on the magical Riesz Representation Theorem.

https://en.wikipedia.org/wiki/Riesz_rep ... on_theorem

JohnLeM is the resident expert on RKHS, Please feel free to correct any flaws.

// BTW kernels can be characterised a being univeral, characteristic, translation-invariant, strictly positive-definite, What's that?

An alternative is to

A special case is the kernel trick in SVM.

Much of RKHS rests on the magical Riesz Representation Theorem.

https://en.wikipedia.org/wiki/Riesz_rep ... on_theorem

JohnLeM is the resident expert on RKHS, Please feel free to correct any flaws.

// BTW kernels can be characterised a being univeral, characteristic, translation-invariant, strictly positive-definite, What's that?

"Compatibility means deliberately repeating other people's mistakes."

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

- Cuchulainn
**Posts:**64682**Joined:****Location:**Drosophila melanogaster-
**Contact:**

Here is a nice paper on kernel methods to test if two samples are from different distributions.If I didn't know any better I would be inclined to say that there be a Cauchy sequence hiding in equation (2).

https://arxiv.org/abs/0805.2368

https://arxiv.org/abs/0805.2368

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

- katastrofa
**Posts:**10155**Joined:****Location:**Alpha Centauri

Sounds like a standard PCA to me. You know, the non-parametric unsupervised learning technique developed by ML guys.

- Cuchulainn
**Posts:**64682**Joined:****Location:**Drosophila melanogaster-
**Contact:**

You mean this?Sounds like a standard PCA to me. You know, the non-parametric unsupervised learning technique developed by ML guys.

http://citeseerx.ist.psu.edu/viewdoc/su ... .1.29.1366

Almost. They applied kernel methods (from 1907) to PCA (Pearson 1901). Kernel methods from Hilbert, Schmidt, Volterra etc. Om giants' shoulders.

It contains kernel PCA as an instance/sub case. There are many others.

But the

It worth investigating and you seem to be suggesting that it is well-known and universal in ML, which it is not it seems. I could be wrong.

Q: people used the term ML instead of Statistical Learning. Is ii sexier?

Why not call it multivariate statistics and be done with it (parsimony >> verbosity but it may not sell well).

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

For me it's perfectly clear what the article is saying.

- Cuchulainn
**Posts:**64682**Joined:****Location:**Drosophila melanogaster-
**Contact:**

I would hope so.For me it's perfectly clear what the article is saying.

Unfortunately, that's irrelevant to the other stakeholders.

Edit: Maybe the threshold can be lowered by more tutorial-style articles, e.g. "RKHS for the impatient".

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

After reading one of their included reference, it is now more clear. I was unable to understand their log-entropy functional (3) without this reading. Frankly, they could have developed a little bit more to make it understandable, or at least added a - "see [] for details".For me it's perfectly clear what the article is saying.

Stricly positive kernels on [$]\Omega[$] are functions [$]k(x,y)[$] satisfying [$]k(x^i,x^j)_{i,j \le N}[$] is a s.d.p matrix for any set of distinct points [$]x^i \in \Omega[$]// BTW kernels can be characterised a being univeral, characteristic, translation-invariant, strictly positive-definite, What's that?

Translation invariant kernels are kernels having form [$]k(x,y) = \varphi(x-y)[$]

I found some definitions of Universal kernels in https://arxiv.org/pdf/1003.0887.pdf. To me it might be a little bit obsolete definition, telling you that a kernel can reproduce any continuous function in some Banach space.

I found the definition of characteristic kernels in this reference : https://www.ism.ac.jp/~fukumizu/papers/fukumizu_etal_nips2007_extended.pdf. To me it is a little bit strange definition. As far as I understood : consider a kernel [$]k(x,y)[$], generating a space of functions [$]H_k[$]. It is said characteristic if for any two probability measure [$]\mu, \nu[$], if [$]\int f(x) d\mu = \int f(x) d\nu[$] for any [$]f \in H_k[$] implies [$]\mu = \nu[$]. I would say that both definition (characteristic and universal kernels) are almost surely equivalent

- Cuchulainn
**Posts:**64682**Joined:****Location:**Drosophila melanogaster-
**Contact:**

time for reflection...

Some updates on Cybenko holy Universal Approximation Theorem (1989). Has it stood the test of time?

“*Classical measure theory is fundamentally non-constructive, since the classical definition of Lebesgue measure does not describe any way to compute the measure of a set or the integral of a function. In fact, if one thinks of a function just as a rule that "inputs a real number and outputs a real number" then there cannot be any algorithm to compute the integral of a function, since any algorithm would only be able to call finitely many values of the function at a time, and finitely many values are not enough to compute the integral to any nontrivial accuracy.”*

All fine, but where does measure theory help in approximation theory, functional and numerical analysis? How would you answer as

. Data scientist

. Computer scientist

. Pure mathematician

. Physicist

. Philospher (the full cycle)

Nice warm feelings but we need a priori estimates.

It is now almost 2021,

https://en.wikipedia.org/wiki/Universal ... on_theorem

Some updates on Cybenko holy Universal Approximation Theorem (1989). Has it stood the test of time?

“

All fine, but where does measure theory help in approximation theory, functional and numerical analysis? How would you answer as

. Data scientist

. Computer scientist

. Pure mathematician

. Physicist

. Philospher (the full cycle)

Nice warm feelings but we need a priori estimates.

It is now almost 2021,

https://en.wikipedia.org/wiki/Universal ... on_theorem

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

- Cuchulainn
**Posts:**64682**Joined:****Location:**Drosophila melanogaster-
**Contact:**

Deep, Skinny Neural Networks are not Universal Approximators

https://arxiv.org/abs/1810.00393

trial and error?

https://arxiv.org/abs/1810.00393

trial and error?

David Wheeler

http://www.datasimfinancial.com

http://www.datasim.nl

I just read quickly this paper. @Cuchulainn thank you, it is very valuable !Deep, Skinny Neural Networks are not Universal Approximators

https://arxiv.org/abs/1810.00393

trial and error?