SERVING THE QUANTITATIVE FINANCE COMMUNITY

Amin
Posts: 2100
Joined: July 14th, 2002, 3:00 am

### Re: If you are bored with Deep Networks

T4A, you are a great example that some animals are very bad at learning anything, including proper manners.
I was going through this thread as I know nothing about deep networks (only superficial ones) and you appear to be quite/very knowledgeable on this.
I agree on both counts. Friendly advice; write up or shut up. Fight on OT, not here.

Kata's comments on the quality (or lack of) of posts are well-founded and reflect my own opinion And it's on record.

There has been a yuge amount of waffle lately. I feel it reached a peak in the last few weeks. It is maddening.

Maybe you could delete your posts. If you look you can see I'm in a useful conversation with ISM and I now have to wade through this muck you have posted.
I totally fail to understand the reason behind recent increase in personal arguments and that too on TECHNICAL forum. There are other recent threads with similar activity where different people are making personal attacks on technical forum. Have they added brain control chemicals in the entire water supply of major European cities?

katastrofa
Posts: 7790
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: If you are bored with Deep Networks

I don't need chemicals to tell people that they are lame. Adiós!

Cuchulainn
Posts: 59215
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

Maybe you could delete your posts. If you look you can see I'm in a useful conversation with ISM and I now have to wade through this muck you have posted.
And your post has added to the crap. I posted something serious and I now have to wade through the muck you have just created.
Give me a break on the water rant; The water supply to me house has dried up.
Last edited by Cuchulainn on July 13th, 2018, 11:14 am, edited 3 times in total.

Cuchulainn
Posts: 59215
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

E is stochastic (because mini-batches). Does your argument still apply?
It would seem that mini-batch are mainly an optimisation and not a functional requirement ("what you lose on the swings you gain on the roundabouts" aka Pyrrhic victory).
Going to MB entails a new dynamical system in an enlarged state space.  Then the ODE system will probably become a random ODE with white noise or even some kind of Langevin/OU SDE. Then you are in the area of numerical solutions use cases..

I understand SGD 'plumbing' but I don't understand the problem it purports to solve. It is yet to be mathematically specified IMO. I see SGD as an example of gradient descent in a specific context. Maybe I am missing some vital info.
I am bumping those post again. Amin, stick to your own turf,. Gracias.

Paul
Posts: 9155
Joined: July 20th, 2001, 3:28 pm

### Re: If you are bored with Deep Networks

A. He’s right.
B. His comment might well be a joke. Which is pretty remarkable, considering.

Back to the topic...sorry, I haven’t been following this thread but Cuch have you written a network version of a simple diffusion equation?

Cuchulainn
Posts: 59215
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

If it was a joke, then's it's hilarious.It's the water. That's why I use Harp lager tanks for my supply.
PDE/ML not yet, but I will do, At this stage just two-layer function approximation 101 stuff.
The issue at this stage is the lack of a decent specification and no response to my mails to PDE/ML gurus. Of course,, how can you take seriously when they say they have solve a PDE in 100 dimensions.

If someone has even a simple diffusion PDE indeed then I could try it. Although I feel we should start with diffusion models for neuron activity. Need to start with continuous time. Just an idea at this stage. BTW just discovered the Handbook of Brain Theory and NN in my bookcase. It's very comprehensve. It's from 1998..

BTW we have a DL-PDE thread as well.

Cuchulainn
Posts: 59215
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

ISayMoo
Topic Author
Posts: 1799
Joined: September 30th, 2015, 8:30 pm

### Re: If you are bored with Deep Networks

Although I feel we should start with diffusion models for neuron activity.
http://iopscience.iop.org/article/10.10 ... a89ff/meta

Cuchulainn
Posts: 59215
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

A question...

Assuming the analysis/formulae to compute the leaning rate in SGD algoritms is an (unstated?) variation of a standard and established line search method, then some questions and remarks naturally apply:

1. The approach to compute the learning rate seems to very context-sensitive  ad hoc and entails a lot of numerical experimentation. This is necessary but it is not science.
2. In contrast, line search methods have robust 1d-solvers to compute the optimal step size. Brent's method springs to mind.  BTW what about Barzilai-Borwein?(..?..)
3.The learning rate parameter should he 'hidden' from the user, in much the same that the step length is automatically adjusted in ODE solvers. It's knowledge that the 'user' cannot guess nor should not have to guess IMO.

ISayMoo
Topic Author
Posts: 1799
Joined: September 30th, 2015, 8:30 pm

### Re: If you are bored with Deep Networks

"Should"? It's great if you can do it, but the world is a complex and messy place. ODEs are relatively simple beasts compared to fitting highly non-linear functions with millions of free parameters.

It's "should" only if you have a point to make.

Cuchulainn
Posts: 59215
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

Don't just read it; fight it! Ask your own questions, look for your own examples, discover your own proofs. Is the hypothesis necessary? Is the converse true? What happens in the classical special case? What about the degenerate cases? Where does the proof use the hypothesis?

-Paul Halmos

Cuchulainn
Posts: 59215
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

and sigmoid + SGD :-/
sigmoid_sgd.png
1. This approximation looks VERY bad
2. Why is sigmoid bad for this benign function?
I am having mixed results here. The properties of the input function is important e.g. $e^{-x}$ is good while $e^x$ is awful.

@ISayMoo: selecting a 'good' activation function is an open question in the research community. As numerical analyst, I find them a bit simplistic.They are non-adaptive and cannot approximate a given function in the global domain.

Classic high order polynomials are almost useless (everyone in ML seems to be using them as recipes, why?) while 3rd order piecewise polynomials are (much) better approximations. are much better in general. You can vary the control points  But you know that I reckon.

Maybe some parts of ML algos do not need the fundamental laws of mathematics and they converge with enough experimentation and ad-hocness.

Cuchulainn
Posts: 59215
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

Another remark: if you have a choice of 10 activation functions and 5 learning rate strategies then you have 50 code blocks to test.

aka a big test department.

katastrofa
Posts: 7790
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

### Re: If you are bored with Deep Networks

I think it's not necessarily about sigmoid, but maybe about SGD, learning rate, NN hyperparameters, etc. I can't remember the exact values - I just scratched a quick code in Python to see what would happen and closed without saving As you said, there's a lot to optimise. I describe an example of the parameter tuning procedure in my paper, Appendix A.4 and A.5 (I later improved the NN model and I'll publish a new version soon).

Cuchulainn
Posts: 59215
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

### Re: If you are bored with Deep Networks

I think it's not necessarily about sigmoid, but maybe about SGD, learning rate, NN hyperparameters, etc. I can't remember the exact values - I just scratched a quick code in Python to see what would happen and closed without saving As you said, there's a lot to optimise. I describe an example of the parameter tuning procedure in my paper, Appendix A.4 and A.5 (I later improved the NN model and I'll publish a new version soon).
There seems to be a disconnect between computer science and numerical analysis in this area. It seems the laws of gravity are not relevant in this context. In the classic optimisation theory the step length (ML calls it learning rate, what's in a name) is computed using Wolfe conditions etc.
https://en.wikipedia.org/wiki/Wolfe_conditions

The rationale for learning rates, and criteria for choosing the most suitable one are not clear.