I'm waiting for your comments about probabilistic line search. And for the results of your and Paul's project of teaching the NNs the differential operator

I am waiting for your numerical experiments on the first issue. This thread is not a project and there is no project leader.

// I recall your post

OK, so this looks like a reasonably decent paper: Probabilistic Line Searches for Stochastic Optimization

They discuss convergence guarantees briefly in Sec. 3.4.1. Experiments look encouraging, but I'd like to test them on something more challenging.

//

What are your findings, ISM? I have some views, but first yours!

I read the paper and I think it's a decent, well-tested idea. I didn't have the time to run numerical experiments.

There's also Pang.

He's building self-driving cars now. Let's not talk about Pang.

Assuming you had time, what is the main technical challenge?I read the paper and I think it's a decent, well-tested idea. I didn't have the time to run numerical experiments.

1. Code for Cubic spline and BVN (I have them in my new C++ book)

2. Section 3.1 is not clear to me (what's a Gaussian proces prior?)

3. Putting it all together

4. Other?

?

I'm not sure you're right. I think they use a standard machinery of statistical learning for regularisation problems and describe it in some twisted jargon (I cannot read it, sorry). The problem of minimising a penalised loss function in linear regression, splines, ML algos, etc., generally can be phrased as

min (f in H) [L(y, f(x)) + alpha * J(f)],

where x and y are data, L is a loss function, J is the penalty, alpha is a constant which will balance the smoothness and errors of the fit f (under- v overfitting).

In machine learning, J is defined on functions f which live in a reproducing kernel Hilbert space (BTW, the concept was developed by Stanisław Zaremba, one of the greatest Polish mathematicians). The space has properties which enable reducing the infinite-dimensional minimalisation problem to a finite dimensional one.

The above can be rephrased in the Bayesian framework (which is quite popular in ML from what I can see in Google searches) for f defined as a kernel integral w/r to some Borel measure. The measure can be interpreted as a stochastic process, and in this case it's usually generated by an alpha-stable distribution like Gaussian. Putting a prior on it corresponds to putting a prior on the space of f. You can treat it as a prior in posterior inference.

Since in such a condensed form it doesn't sound anything like it's supposed to sound, I can lend you a book on statistical learning

With some effort, the above description can be made into a precise mathematical explanation.

Sure! It can be even made into a statistical learning course for quants, £1000 per person per day.

Thanks Katastrofa, I didn't know about the connection to RKHS.

Functional Analysis is coming in from the cold. Useful,representation theorems.

Is it hands-on?

