Serving the Quantitative Finance Community

 
User avatar
Cuchulainn
Topic Author
Posts: 20254
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

A place to ask about numerical methods

November 16th, 2017, 3:01 pm

The (stochastic) gradient descent method is well-established, more recently in Deep Learning. A crucial part of the algorithm is how to determine how to compute the learning rate parameters:

1.  in DL it is a constant (not too big, not to small). You can use learning schedules but this demands insight.
2. other places, update the learning rate parameter at each iteration by solving a 1d nonlinear system (Brent is very very good). Using Brent obviates the need for learning schedules and then we just use it as a black. box. My intuition would also tell me that it is more robust (industry standard). Have used a lot.

What are the consequences of using each one? 
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: A place to ask about numerical methods

November 16th, 2017, 3:32 pm

Firdt read this to see what people are doing: http://ruder.io/optimizing-gradient-des ... challenges

I mostly use ADAM and have so far had no issues with saddle points. The main issues I worry about and manage is overtraining when datasets are small (<100k samples). This means that I have to try and prevent it from finding the global minimum and instead have to measure when to stop training.


You should give it a try, but Brent's in high dim doesn't seem like something that would scale. Supoose you have a NN with 10k weights, how would you attack it with Brent's? How many points and function evaluation would need in each step? ...but that said, give it a try!
 
User avatar
Cuchulainn
Topic Author
Posts: 20254
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: A place to ask about numerical methods

November 16th, 2017, 4:20 pm

I am using Goodfellow et al book which everyone one here is recommending to me.
I know that link for some time now; it does not address my question.

but Brent's in high dim doesn't seem like something that would scale. 
This tells me that you don'e know what Brent's method is and  how it links into gradient descent method.  In fairness, I should have said 'scalar' instead of 'systems'.

Here's Brent in Boost Maths Toolkit

http://www.boost.org/doc/libs/1_63_0/li ... inima.html

Here is the last part of the jigsaw: plug scalar parameter into the Steepest (or Gradient if u prefer) Descent:
https://en.wikipedia.org/wiki/Gradient_descent

My question remains open.

You should give it a try, 
Yes, I will. But after I have understood all the assumptions and mathematics. 


"Suppose you have a NN with 10k weights, how would you attack it"

I don't see many computational problems as far as that’s concerned.
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: A place to ask about numerical methods

November 16th, 2017, 5:19 pm

Very good, let's see it! Try a simple 2 layer NN, MNIST is a good benchmark, classify 28x28 pixel images into 10 categories.
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: A place to ask about numerical methods

November 16th, 2017, 5:28 pm

here are the datasets in easy CSV format https://pjreddie.com/projects/mnist-in-csv/

each row contains the intensities of the pixels on a single image. 

do you need help with the NN and loss function?
 
User avatar
Cuchulainn
Topic Author
Posts: 20254
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Re: A place to ask about numerical methods

November 16th, 2017, 5:38 pm

Very good, let's see it! Try a simple 2 layer NN, MNIST is a good benchmark, classify 28x28 pixel images into 10 categories.
An open issue to understand data flow and structures? Should  I use BGL graphs to hold the weights?
Last edited by Cuchulainn on February 21st, 2018, 5:18 pm, edited 1 time in total.
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: A place to ask about numerical methods

November 16th, 2017, 5:52 pm

You can do it in a week yourself, why wait for 2018?

This is what I would suggest: as basic at it gets, yet an actual NN, and an extensively studies problem.

 
input: "x" vector of 784 floats. This is where you input the image pixel intensities.

hidden layer: 256 unit.  It has a weight matrix W1 of size 784x256 and a bias vector b1 of length 256. These are variables you need to optimize. The output of the hidden layer units is

h = max(W1*x + b1, 0)

max is the easiest "activation function" from a gradient perspective, works good in practice, and is called RELU. So it's a linear transform followed by a non-linear transform. Very easy: matmul, sum, max.

the output of the NN is going to be 10 class probabilities. What digit do we see in the image? Similar to the hidden layer you have a linear transform followed by  non-linear transform,

y  = sm(W2*h + b2)

W2 is a 256x10 matrix, b2 a vector of length 10. You also optimize these.

sm(x) is the softmax function. https://en.wikipedia.org/wiki/Softmax_function If transforms the vector of 10 floats into 10 probabilities (all in the range 0-1 and the sum == 1).

...that's the "forward pass" it specifies how to convert inputs to outputs.

for the backward pass you need to define a loss function. The image was e.g. a "9" but the NN said p=[0.1, 0.05, .. 0.03]. How do you convert that mismatch to a number? You can then differentiate that loss function towards W1,b1,W2,b2 and get you gradient.

Do you need pointers with equations for the backward pass?
Last edited by outrun on November 16th, 2017, 5:58 pm, edited 1 time in total.
 
User avatar
outrun
Posts: 4573
Joined: January 1st, 1970, 12:00 am

Re: A place to ask about numerical methods

November 16th, 2017, 5:57 pm

.. so two matrices and two vectors to hold the weights. (or you merge the  b_i vectors with the W_i matrix for storage and have 2 matrices)