I'd like to see some benchmarks for S is 120,K is 100, r=0.08, the volatilityis 0.2, the dividend rate q is 0.08, and the expiry T is 3This was studied in Quantitative Finance, Vol. 12, No. 1, January 2012, 17?20On the analytical/numerical pricing of American putoptions against binomial tree pricesMARK JOSHI and MIKE STAUNTONand was the subject of a certain amount of dispute. The price obtained there was5.929805

Browsing this thread today searching for Mark's material.

Thanks Mark for all the wonderful things you shared!

Statistics: Posted by outrun — November 21st, 2017, 10:20 pm

]]>

]]>

This is what I would suggest: as basic at it gets, yet an actual NN, and an extensively studies problem.

input: "x" vector of 784 floats. This is where you input the image pixel intensities.

hidden layer: 256 unit. It has a weight matrix W1 of size 784x256 and a bias vector b1 of length 256. These are variables you need to optimize. The output of the hidden layer units is

h = max(W1*x + b1, 0)

max is the easiest "activation function" from a gradient perspective, works good in practice, and is called RELU. So it's a linear transform followed by a non-linear transform. Very easy: matmul, sum, max.

the output of the NN is going to be 10 class probabilities. What digit do we see in the image? Similar to the hidden layer you have a linear transform followed by non-linear transform,

y = sm(W2*h + b2)

W2 is a 256x10 matrix, b2 a vector of length 10. You also optimize these.

sm(x) is the softmax function. https://en.wikipedia.org/wiki/Softmax_function If transforms the vector of 10 floats into 10 probabilities (all in the range 0-1 and the sum == 1).

...that's the "forward pass" it specifies how to convert inputs to outputs.

for the backward pass you need to define a loss function. The image was e.g. a "9" but the NN said p=[0.1, 0.05, .. 0.03]. How do you convert that mismatch to a number? You can then differentiate that loss function towards W1,b1,W2,b2 and get you gradient.

Do you need pointers with equations for the backward pass?

Statistics: Posted by outrun — November 16th, 2017, 5:52 pm

]]>

Very good, let's see it! Try a simple 2 layer NN, MNIST is a good benchmark, classify 28x28 pixel images into 10 categories.

I am writing up a WBSO project for 2018 and I think this is a good start. But I don't want to use others' numerical libraries, at least not just yet.

An open issue to understand data flow and structures? Should I use BGL graphs to hold the weights?

// do you apply for WBSO? It's a good. It must not be routine stuff but new and technically risky.

Statistics: Posted by Cuchulainn — November 16th, 2017, 5:38 pm

]]>

each row contains the intensities of the pixels on a single image.

do you need help with the NN and loss function?

Statistics: Posted by outrun — November 16th, 2017, 5:28 pm

]]>

]]>

b

This tells me that you don'e know what Brent's method is and how it links into gradient descent method. In fairness, I should have said 'scalar' instead of 'systems'.

Here's Brent in Boost Maths Toolkit

http://www.boost.org/doc/libs/1_63_0/li ... inima.html

Here is the last part of the jigsaw: plug scalar parameter into the Steepest (or Gradient if u prefer) Descent:

https://en.wikipedia.org/wiki/Gradient_descent

My question remains open.

Yes, I will. But after I have understood all the assumptions and mathematics.

"Suppose you have a NN with 10k weights, how would you attack it"

I don't see many computational problems as far as that’s concerned.

Statistics: Posted by Cuchulainn — November 16th, 2017, 4:20 pm

]]>

I mostly use ADAM and have so far had no issues with saddle points. The main issues I worry about and manage is overtraining when datasets are small (<100k samples). This means that I have to try and prevent it from finding the global minimum and instead have to measure when to stop training.

You should give it a try, but Brent's in high dim doesn't seem like something that would scale. Supoose you have a NN with 10k weights, how would you attack it with Brent's? How many points and function evaluation would need in each step? ...but that said, give it a try!

Statistics: Posted by outrun — November 16th, 2017, 3:32 pm

]]>

1. in DL it is a constant (not too big, not to small). You can use learning schedules but this demands insight.

2. other places, update the learning rate parameter at each iteration by solving a 1d nonlinear system (Brent is very very good). Using Brent obviates the need for learning schedules and then we just use it as a black. box. My intuition would also tell me that it is more robust (industry standard). Have used a lot.

What are the consequences of using each one?

Statistics: Posted by Cuchulainn — November 16th, 2017, 3:01 pm

]]>

(And those are exciting advancements in QCM. I wonder if after a few years, QCM bit depth growth will be exponential or slower? When will there be megabit QCM?)

Statistics: Posted by Traden4Alpha — November 14th, 2017, 12:22 pm

]]>

Still, QCM is almost here:

http://www-03.ibm.com/press/us/en/press ... /53374.wss

https://phys.org/news/2017-11-ibm-miles ... antum.html

Statistics: Posted by katastrofa — November 13th, 2017, 1:42 pm

]]>

Let's say I have two assets, [x] with a 2 duration and [y] with a 1 duration.

This implied the formula for line between these two securities' price movements is [x] = 2/1 [y] ... which implies the beta is 2. For the inverse, the beta would be 1.

Where I am getting hung up is that the durations seem to imply a perfect correlation between the two assets. beta_xy = 1 / beta_yx. This situation only holds if the correlation is exactly 1.

there are a lot of assumptions in there. To get the ratio of the price movements from the duration, you seem to be assuming something like a uniform shift in interest rates, and if you ignore the convexity they probably are perfectly correlated under those assumptions

Statistics: Posted by ppauper — November 10th, 2017, 9:00 pm

]]>

However, I am interested in doing this using a forward looking metric: duration off Bloomberg.

Let's say I have two assets, [x] with a 2 duration and [y] with a 1 duration.

This implied the formula for line between these two securities' price movements is [x] = 2/1 [y] ... which implies the beta is 2. For the inverse, the beta would be 1.

Where I am getting hung up is that the durations seem to imply a perfect correlation between the two assets. beta_xy = 1 / beta_yx. This situation only holds if the correlation is exactly 1.

So what's the preferred approach? Even if I use a whole loan model with a fancy yield curve simulator, I'm still modeling using the same underlier... The resulting price strips will appear "over correlated" and I miss out on the supply/demand mechanics that could account for variances "within" the durations.

I sort of feel like I'm missing something...

Statistics: Posted by th14 — November 9th, 2017, 7:43 pm

]]>

"As vivien said, to help PDE schemes, a general idea is to solve PDE for martingale processes, then to use maps to fit underlyings"

Statistics: Posted by JohnLeM — October 30th, 2017, 8:27 am

]]>

The topic is the following : it is a new method to our knowledge to produce samples for Monte-Carlo methods that we use also for PDE (Partial Differential Equations) methods. We can compute these Monte Carlo samples for a quite large class of stochastic processes: any stochastic processes defined by an SDE (Stochastic Differential Equations), and also in high dimensions (we tested up to 64 dimensions). These samples can be used together with a PDE engine to boost convergence rate of regulatory type computations, as illustrated in the post.

Note that we used this sampling method for this Wilmot post.

Statistics: Posted by JohnLeM — October 30th, 2017, 8:09 am

]]>