SERVING THE QUANTITATIVE FINANCE COMMUNITY

Cuchulainn
Topic Author
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Universal Approximation theorem

A loss surface of a neural network can approximate anything. Including cows.
This paper seems quite interesting but I spent unsuccessfully two hours trying to understand it. But nice cows indeed !
Is the experience kathartic?

JohnLeM
Posts: 380
Joined: September 16th, 2008, 7:15 pm

Re: Universal Approximation theorem

A loss surface of a neural network can approximate anything. Including cows.
This paper seems quite interesting but I spent unsuccessfully two hours trying to understand it. But nice cows indeed !
Is the experience kathartic?
I am not sure if it is cathartic or not. I will try again to see if I can get out of my body. I'll let you know.
But I agree with you, a lot of authors around there should reconsider the link to their readers.

Cuchulainn
Topic Author
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Universal Approximation theorem

But I agree with you, a lot of authors around there should reconsider the link to their readers.

If both yourself and I have difficulty, what about freshly-minted data scientists. ArxIv is a litany of unreadable/unrefereed papers.

Documentation and  writing skills have never had the interest of CS. Most articles take no account of the fact that 95% of people don't do AI/ML.

JohnLeM
Posts: 380
Joined: September 16th, 2008, 7:15 pm

Re: Universal Approximation theorem

I tried again one hour this morning. But it is really unclear. However one of the included reference seems more detailed and clear, I'll try to fall back to it to understand their work.

Cuchulainn
Topic Author
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Universal Approximation theorem

In a sense all of the  ML articles to date have have missed the mark because they are based on statistics and linear algebra (don't have metrics, norms and basically lacking in hard functional analysis). Sapir-Whorf in action (the language you use shapes how you think). The current methods might be barking up the wrong tree and may not even be wrong.
An alternative is to embed probability distributions in a reproducing kernel Hilbet space (RKHS) an then the analysis takes place in the context of inner product spaces by defining feature maps and a kernel mean.

A special case is the kernel trick in SVM.

Much of RKHS rests on the magical Riesz Representation Theorem.

https://en.wikipedia.org/wiki/Riesz_rep ... on_theorem

JohnLeM  is the resident expert on RKHS, Please feel free to correct any flaws.

// BTW kernels can be characterised a being univeral, characteristic, translation-invariant, strictly positive-definite, What's that?

Cuchulainn
Topic Author
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Universal Approximation theorem

Here is a nice paper on kernel methods to test if two samples are from different distributions.If I didn't know any better I would be inclined to say that there be a Cauchy sequence hiding in equation (2).

https://arxiv.org/abs/0805.2368

katastrofa
Posts: 9327
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Universal Approximation theorem

Sounds like a standard PCA to me. You know, the non-parametric unsupervised learning technique developed by ML guys.

Cuchulainn
Topic Author
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Universal Approximation theorem

Sounds like a standard PCA to me. You know, the non-parametric unsupervised learning technique developed by ML guys.
You mean this?
http://citeseerx.ist.psu.edu/viewdoc/su ... .1.29.1366

Almost. They applied kernel methods (from 1907) to PCA (Pearson 1901). Kernel methods from Hilbert, Schmidt, Volterra etc. Om giants' shoulders.
It contains kernel PCA as an instance/sub case. There are many others.
But the main point IMO is that the methods of appiled functional analysis is being used.

It worth investigating and you seem to be suggesting that it is well-known and universal in ML, which it is not it seems. I could be wrong.

Q: people used the term ML instead of Statistical Learning. Is ii sexier?
Why not call it multivariate statistics and be done with it (parsimony >> verbosity but it may not sell well).

ISayMoo
Posts: 2314
Joined: September 30th, 2015, 8:30 pm

Re: Universal Approximation theorem

For me it's perfectly clear what the article is saying.

Cuchulainn
Topic Author
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Re: Universal Approximation theorem

For me it's perfectly clear what the article is saying.
I would hope so.
Unfortunately, that's irrelevant to the other stakeholders.

Edit: Maybe the threshold can be lowered by more tutorial-style articles, e.g. "RKHS  for the impatient".

JohnLeM
Posts: 380
Joined: September 16th, 2008, 7:15 pm

Re: Universal Approximation theorem

For me it's perfectly clear what the article is saying.
After reading one of their included reference, it is now more clear. I was unable to understand their log-entropy functional (3) without this reading. Frankly, they could have developed a little bit more to make it understandable, or at least added a - "see [] for details".

JohnLeM
Posts: 380
Joined: September 16th, 2008, 7:15 pm

Re: Universal Approximation theorem

// BTW kernels can be characterised a being univeral, characteristic, translation-invariant, strictly positive-definite, What's that?
Stricly positive kernels on $\Omega$ are functions $k(x,y)$ satisfying $k(x^i,x^j)_{i,j \le N}$ is a s.d.p matrix for any set of distinct points $x^i \in \Omega$
Translation invariant kernels are kernels having form $k(x,y) = \varphi(x-y)$
I found some definitions of Universal kernels in https://arxiv.org/pdf/1003.0887.pdf. To me it might be a little bit obsolete definition, telling you that a kernel can reproduce any continuous function in some Banach space.
I found the definition of characteristic kernels in this reference : https://www.ism.ac.jp/~fukumizu/papers/fukumizu_etal_nips2007_extended.pdf. To me it is a little bit strange definition. As far as I understood : consider a kernel $k(x,y)$, generating a space of functions $H_k$. It is said characteristic if for any two probability measure $\mu, \nu$, if $\int f(x) d\mu = \int f(x) d\nu$ for any $f \in H_k$ implies $\mu = \nu$. I would say that both definition (characteristic and universal kernels) are almost surely equivalent