PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Alan · January 19th, 2018, 8:33 pm

A micro optimiser regarding Godonov Double Sweep vs Thomas (ADI)

We have stress-tested these solvers when applied to the heat equation that we discuss in section 13.3 using mesh sizes of 100,000 in space and time. Both methods give accurate results (even though the double sweep tend to be more accurate). The processing times in seconds were in the ranges [490, 588] for double sweep and [730, 780] for the Thomas algorithm.
Where is this from, do they mention the CPU used, coding language, any parallelism etc?

Anyway, this is slightly different from Alan's "project" I think, which is to efficiently calibrate to GARCH using whatever ingredients work. It could be interesting to benchmark different PDE solvers in a separate topic maybe, in 1(BS) and 2D(Heston/GARCH), run some random tests as I did for the GARCH solver and and then compare average precision/runtime ratios. Also check delta/gamma. But first let's see some more numbers!

So, to bump this thread, Billy7 and I collaborated (well, he did the hard stuff) to solve that GARCH diffusion calibration problem. For anybody interested, we posted it to the arXiv here

outrun · January 19th, 2018, 9:07 pm

Very impressive! Very thorough report and lots of smart tricks. It must have taken a lot of iteration to get all those goodies in?

From the applied part it's interesting to see that [$]\bar{v}[$] isn't as stable as the other GARCH parameters (other than [$]v_0[$] of course). The others are quite stable. That might be an interesting difference between realised and implied GARCH.

Also, Heston sometimes fits better even though (IMO) GARCH is a *much* better model for the underlying behaviour than Heston.

And on a side note: in a little project of mine I had been thinking about calibrating to price instead the IV. For far out of the money option with a theoretical price of say $0.01 a trader might not mind paying $0.01 extra (it's difficult to predict small probabilities accurately, there are not enough observation/events) but in IV terms that would be a huge increase. It's a complicated aspect I still am pondering about: there is the bid-ask spread -and the mid might not be the best-, the close prices being sensitive to last minute underlying movement related to the delta.

Congratulations with this very cool paper!

Alan · January 19th, 2018, 11:17 pm

Thanks for taking a look and the kind comments.

Yes, definitely a few iterations. By 3 months in, Yiannis had built a Corvette; by the end, it was a jet plane. Well, maybe I exaggerate a little.

You're right on [$]\bar{v}[$]. If the model wasn't misspecified, then even in the very low volatility environment of 2017, [$]\bar{v}[$] should be estimated the same as for our (higher vol) 2010 data set. Ditto for any constant parameter vol model.

Heston '93 often fitting a little better was a surprise and agree that the GARCH diffusion is better on the underlying. Also, the (better) Heston fits typically came with very low values for the so-called Feller condition ratio, which raises other "P vs Q" issues for the Heston model..

It's a subjective choice on what to calibrate to. We liked the IV's because (first) they're same order of magnitude across all the options. Then, to the next approximation, they weight the deep-out-of-the-money puts more. That puts stress on the places where the (diffusion) models have a hard time fitting the market. But, agree, the IV's can get weird/noisy if you push this too far.

Billy7 · January 20th, 2018, 12:14 am

Thanks outrun, it was quite a bit of work all right! But mostly fun and working with Alan was a pleasure too

outrun · January 20th, 2018, 10:27 am

I can imagine!

It's probably the first time that someone calibrates GARCH to implied and that very relevant. Relevant in practice (the Corvette) and from a researcher point of view (eg insight in parameter stability uand fit).

I've seen a people use GARCH for the underlying (in a risk setting), Heston for the implied ..but then when doing MC (for VaR analysis on portfolios with derivatives) the Heston calibration to implied vol scenarios was eventually dropped because of speed issues. I've seen that Yianis' PDE method is extremely performant so that might be a very interesting for those types of applications. Having the ability to use the same model for underlying behaviour and implied gives you the option to study the relation between the two sets parameters. If you find that -and if the relation is stable- then you can build a joint underlying+implied GARCH model.

Very cool! I'm still reading it and digging into the details.

Billy7 · January 20th, 2018, 4:48 pm

I've seen a people use GARCH for the underlying (in a risk setting), Heston for the implied ..but then when doing MC (for VaR analysis on portfolios with derivatives) the Heston calibration to implied vol scenarios was eventually dropped because of speed issues. I've seen that Yianis' PDE method is extremely performant so that might be a very interesting for those types of applications.

Well, to be honest, I think a Heston calibration using the semi-analytic solution is still faster (they are supposed to take a few seconds). Using Quantlib's semi-analytic Heston pricer with what seems to be the most robust setting (using Gauss-Lobatto integration), I don't get a few seconds for the 246-option chain, but it still calibrates faster than the GARCH/PDE setup (using Excel's solver for both).

Cuchulainn · March 1st, 2018, 2:44 pm

I reckon a yuge number of manhours went into that paper. Well done

This could be a great baseline for other ADI and FDM peoples. To reproduce the results by others you could provide the input data in XML so it can be read by any language.

A general remark on ADI (not my favourite for several reasons, but that is irrelevant in this context; tomorrow is another day) with mixed dervatives has always been a bone of contention..

The 'flaw'/workaround/whatever IMO with ADI is that the FDM scheme fails to preserve the ellipticity of the PDE unless all kinds of mesh adaptation tricks are used ... correct if I have misunderstood.
Samarskii (who BTW rediscovered the Craig-Sneyd 1989 scheme in 1964) discusses monotone schemes here

http://samarskii.ru/articles/2002/2002-003ocr.pdf

In particular, inequality (3.21) is key and for constant coeff PDE (e.g. HW2) it becomes

rho (s1/s2) < h1/h2 < (s1/s2)/rho

This seems to be quite elegant and gives a nice M-matrix.
// It is possible to 'rotate' the PDE to remove mixed derivative but the domain becomes a trapezium..

Billy7 · March 5th, 2018, 4:34 pm

I reckon a yuge number of manhours went into that paper. Well done
This could be a great baseline for other ADI and FDM peoples. To reproduce the results by others you could provide the input data in XML so it can be read by any language.

A general remark on ADI (not my favourite for several reasons, but that is irrelevant in this context; tomorrow is another day) with mixed dervatives has always been a bone of contention..

The 'flaw'/workaround/whatever IMO with ADI is that the FDM scheme fails to preserve the ellipticity of the PDE unless all kinds of mesh adaptation tricks are used ... correct if I have misunderstood.
Samarskii (who BTW rediscovered the Craig-Sneyd 1989 scheme in 1964) discusses monotone schemes here

http://samarskii.ru/articles/2002/2002-003ocr.pdf

In particular, inequality (3.21) is key and for constant coeff PDE (e.g. HW2) it becomes

rho (s1/s2) < h1/h2 < (s1/s2)/rho

This seems to be quite elegant and gives a nice M-matrix.
// It is possible to 'rotate' the PDE to remove mixed derivative but the domain becomes a trapezium..

Sorry, just saw this. Thank you, yes indeed, hundreds of unpaid hours:)
Yes you're right, the discretization used does not guarantee that the solution always stays positive. But as I explain in the paper, I went for what I thought would be the most efficient way to deliver the final result. Yes I had indeed seen the Samarski paper (as well as the Sheppard thesis), since you have posted links to them a few times. But 3.2.1 is for an equation with no convection terms and for a uniform grid. For the GARCH pricing PDE on non-uniform grids, 4.24 from Samarski should apply, but then you do have to use mesh adaptation tricks, i.e. you have to generate your grid in a special way, subject to the restriction imposed by that mathematical requirement (4.24) and not on the "physical" characteristics of the problem. And I chose against it as I prefer to be able to place my grid points where I want better resolution. It may be that going that Samarski route is better (I doubt it, but who knows) and it would be great if someone else came up with more efficient calibrations still.

Cuchulainn · March 6th, 2018, 8:30 am

I am trying to view these kinds of PDE/FDM from several viewpoints and to trigger discussions, e.g. the ADI with bells and whistles is fine but not easy for FDM noobies. For example, how easy would ADI be for _non_ Heston-style, e.g 2 factor Hull White Bermudan callable bond, to take a random example.

I have seen not much on ADI for more general derivative problems. Most of the literature is very much focused on Heston. Curious, why no preprocessing x = log(S)?

Equation 4.24 is the condition for monotonicity for convection-diffusion and Samarskii is essentially doing a kind of fitting. A better discussion is looking at eq. 3.21 and in a number of case the quotients abs(k12)/k22 etc. reduce to constants (think of log-transformed Heston, basket,HW2 pdes) so we can get a monotone scheme with constant meshes (it may not be what is needed for other pesky reasons).

Regarding your eq 30, what is the compelling reason for using it? It is not clear to me what problem(s) is solved by this transformation.

OPEN QUESTION:
P.S. Samarskii mentions that 4.24 can be solved but does not say _how_. I have no idea how I would tackle this problem. It is highly nonlinear because the coefficients must be evaluated at mesh points (which are themselves unknown).

Alan · March 6th, 2018, 4:04 pm

re (30), I can make a few comments and am sure Yiannis can provide more.

In general, with these stoch. vol. problems, there are various competing things that any good solver will likely want to incorporate and a lot of discretion in doing so. Here are 4, discussed at various points in the paper:

1. In the continuum, the V-state space is [$][0,\infty)[$] or [$](0,\infty)[$] depending on the model. For the Heston model, you definitely want V=0 in the state space, as V=0 is reachable under calibrated parameters. When V=0 is simply an entrance boundary (so not reachable from the interior, but the process can be started there), there is more discretion. This is the case for the GARCH diffusion. But, if you want to have a somewhat unified treatment of the two models (and perhaps the class [$]dV = (a - b V) dt + c(V) dW[$]), you will probably want V=0 on your grid -- whatever the grid. Then, the PDE is still "running" at V=0.

2. At the other end, you need very large [$]V_{max}[$] so that the probability mass in [$][0,V_{max})[$] is *very* close to 1.

3. It's desirable, for efficiency, to concentrate points near [$]V_0[$] as that is where you want the solutions, which may be very small (as in 2017).

4. Finally, it's very efficient to employ spatial Richardson extrapolation, which works better with a uniform grid spacing.

So, trying to accommodate these 4 competing interests led to the hybrid v-grid of (30). Not uniquely, of course.

Billy7 · March 6th, 2018, 7:00 pm

I am trying to view these kinds of PDE/FDM from several viewpoints and to trigger discussions, e.g. the ADI with bells and whistles is fine but not easy for FDM noobies. For example, how easy would ADI be for _non_ Heston-style, e.g 2 factor Hull White Bermudan callable bond, to take a random example.

How does that HW PDE look like? How is it different?

I have seen not much on ADI for more general derivative problems. Most of the literature is very much focused on Heston. Curious, why no preprocessing x = log(S)?

Well, now you have one paper more which is not on Heston:). But I'm sure I've seen papers using ADI for the (3D) Heston Hull- White PDE for example. Or you mean something different? As mentioned in the paper, the log(S) transformation is used but is generally less accurate when discretized, so not my first choice.

Regarding your eq 30, what is the compelling reason for using it? It is not clear to me what problem(s) is solved by this transformation.

A non-uniform grid is absolutely necessary here for the reasons Alan mentioned.The hybrid construction described by (30) leads to more regular convergence and hence to better Richardson Extrapolation results (which is based on the premise that the convergence order is the theoretical 2 in this case). One needs to be in the asymptotic range of course, which means some minimum spatial resolution is needed for RE to "work" (and so to be of benefit). There are comments on this in the paper. In general it works pretty well and I think this is one of the most interesting contributions in the paper, as I haven't seen any other paper in finance using spatial RE. Have you? It's a good alternative I think to 4th order schemes (certainly much simpler) and I would guess of comparable accuracy. Higher (4th+) order FD schemes accuracy gains also increase with increasing resolution and their performance may not be better than that of a 2nd order scheme when a low resolution is used (same as RE).
Another interesting thing is that 4th order (observed) order of convergence is possible with RE, despite the discontinuous initial conditions (the vanilla payoff). That's with the simple smoothing proposed in the paper. I remember the OP (JohnLeM) saying that this is difficult. Which it is, say with the "old" 5-pt stencil 4th order FD schemes, but I remember reading that it possible when a compact 4th order scheme is used, together with some sophisticated way to smooth the the initial payoff. See for example a few papers by During like this one: https://arxiv.org/abs/1512.02529v1

OPEN QUESTION:
P.S. Samarskii mentions that 4.24 can be solved but does not say _how_. I have no idea how I would tackle this problem. It is highly nonlinear because the coefficients must be evaluated at mesh points (which are themselves unknown).

Yes, that's partly why I didn't try it:) I think what he means is that constructing a non-uniform grid that satisfies this constraint is possible. I think an equivalent approach (and a way to do it) is presented in one of the references, by Ikonen & Toivanen : http://users.jyu.fi/~tene/papers/reportB12-05.pdf

Cuchulainn · March 6th, 2018, 9:14 pm

The Hull White PDE is in the pdf. I am coaching a quant who 4 weeks ago knew no FDM. Now he has ADE (Barkat-Clark, Saul'yev variants), Soviet Splitting, Samarski and a bunch of MOL variants. Bermudan calls + calibrated [$]\theta(t)[$]. All methods give same results. With and w/o exponential fitting.
The PDE is kind of benign in the sense that Dirichlet BCs seem to be good. No degeneracy on the boundary. And Saul'yev is super fast and 2 lines of code (well, more or less).
Until now we use constant meshes using Samarskii's constraints. Also Yanenko and this seems to give similar results. Once finished, we wish to stress test for a range of parameters.

How would ADI be used in this case?

Cuchulainn · March 6th, 2018, 9:24 pm

Regarding RE in space, I am not sure how robust it is. Here 6th order compact ADI that the authors claim (caveat I have not checked, I leave to ADI users).
https://ac.els-cdn.com/S089812210600322 ... 219cd88840

Might be a pointer.

// I used RE for [$]u_{xy} = f[$] with constant meshes and works very well. But not for 2nd order elliptic PDE. Just feels odd to me.

Billy7 · March 6th, 2018, 10:07 pm

How would ADI be used in this (HW2D) case?

I see no structural difference from the SV PDE's, so the same way as in the paper, or the papers referenced therein I guess. Not difficult I think. For the spatial discretization one can use whatever they like, the one proposed by Samarski for example.

Regarding RE in space, I am not sure how robust it is.
// I used RE for [$]u_{xy} = f[$] with constant meshes and works very well. But not for 2nd order elliptic PDE. Just feels odd to me.

The only way to be more sure is to try it. The good thing is that it's so cheap time-wise to try (unlike those higher order compact schemes). Why does it feel odd for 2nd order elliptic PDE? Imo it's not the type of the PDE as much that can affect its effectiveness, as the presence of singularities/discontinuities. I've tried it and it works very well indeed for 2D elliptic PDE's, especially if you use uniform grids. It depends on the accuracy you want, so you try and see if it gives you an advantage for the accuracy you seek and if it's robust or not.

Cuchulainn · March 7th, 2018, 9:55 am

Compact schemes are indeed a bit more work indeed.

A while collaborated with Alex Levin on a HW2-type PDE (no correlation); Long story short, unable to get ADI to work and we went for Marchuck's 1-2-2-1 dimensional splitting method. Maybe someone at some stage could try it for Heston.

http://citeseerx.ist.psu.edu/viewdoc/do ... 1&type=pdf

//
On another note, when finishing my PhD in 1980 I compared my fitted method against Jan Verwer's RK method. JV was prof in UVA and he lived in the next village to mine. He retired a few years back; two weeks later he passed.

PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.

Re: PDE methods for optimal quantizers of stochastic processes ? An example with Heston PDE.