Bayesian MCMC vs. Optimal Filtering for stochastic volatility?

mizhael · April 21st, 2008, 6:25 pm

Hi all,Has anybody compared the performance of Bayesian MCMC vs. Optimal Filtering in the stochastic volatility setting? The article Filtering in Finance talks about several filtering techniques esp. the Particle Filters. At the end of day, there are certain approximation in using filtering. And filtering is actually a very mechanical procedure. On top of the filters, there is still a need for MLE. And MLE has optimization, which is hard for high dimensional objective functions with sophisticated numerical evaluation and approximation (the inner loop of the optimization is numerical and approximation-based, which renders the usage of non-gradient based optimization method). On the other hand, Bayesian MCMC can be used for the stochastic volatility setting too. In fact, Bayesian MCMC is inferior to MLE for simple models, such as fitting data to a simple Poisson model, etc., because MCMC is also a monte carlo type approximation based method. However, I am not sure in the stochastic volatility setting, when MLE + filtering may have trouble, will Bayesian MCMC outperform MLE+ filtering? Has anybody done a comparative study about the performance of these two methods, in terms of estimating the parameters and estimating the latent variables? Any thoughts! Thanks!

Rez · April 21st, 2008, 8:52 pm

QuoteHas anybody compared the performance of Bayesian MCMC vs. Optimal Filtering in the stochastic volatility setting? Continuous or discrete time? any particular process? What do you mean by 'optimal filtering'?QuoteAt the end of day, there are certain approximation in using filtering.What do you mean by approximations?QuoteAnd filtering is actually a very mechanical procedure.I don't agree. Why do you say that?QuoteOn top of the filters, there is still a need for MLE. And MLE has optimization, which is hard for high dimensional objective functions with sophisticated numerical evaluation and approximation (the inner loop of the optimization is numerical and approximation-based, which renders the usage of non-gradient based optimization method). I am not sure what you mean. Optimization is a separate issue with separate algorithms. Bayesian methods need to converge as well.QuoteIn fact, Bayesian MCMC is inferior to MLE for simple modelsInferior by which standard? Why is that 'in fact'?QuoteHowever, I am not sure in the stochastic volatility setting, when MLE + filtering may have trouble, will Bayesian MCMC outperform MLE+ filtering?That's a very general statement. Both models need fine tuning.A crap MCMC will be outperformed by a correctly built MLE and vice versa.Typically someone that writes a paper on MCMC cannot set up and implement MLE correctly, and the opposite it true for MLE folks.K

mizhael · April 23rd, 2008, 1:37 am

I was referring to the setup in those papers. In fact, I am not very sure how does discrete/continuous-time change our picture here, due to my limited knowledge.For example, in a linear Gaussian setting, Kalman filtering is optimal.Particle filters involve approximation.You follow a recipe to do it. MLE is bad because it often converges to local optimum, and also you don't have enough data to obtain the asymptotic gain. I am asking in some cases you have both methods available, if there is any comparative studies in the literature...

Rez · April 23rd, 2008, 8:04 am

QuoteI was referring to the setup in those papers. In fact, I am not very sure how does discrete/continuous-time change our picture here, due to my limited knowledge.For example, in a linear Gaussian setting, Kalman filtering is optimal.Particle filters involve approximation.Exactly. So you cannot say <<At the end of day, there are certain approximation in using filtering>> as this is not universally true. Kalman is not an approximation. An optimal filter would not approximate.QuoteYou follow a recipe to do it. If you see it as a recipe it means that you don't understand what's going on and why.QuoteMLE is bad because it often converges to local optimum, and also you don't have enough data to obtain the asymptotic gain.Is that a problem of MLE or of the maximization routine you use? If you don't have enough data no procedure in the world will help you. They might mislead you in believing that they do.K

JediQuant · April 23rd, 2008, 10:58 am

Apologies for the intrusion..here's a related question from me...:I have recently acquired myself Javaher's book, 'Inside Volatility Arbitrage'. I must say the book has proved to be really useful and is quite beautifully written.My question is whether anyone here has implemented any of the particle filters he gives for estimating jump diffusions with stoch. volatility? I have tried to, but, I notice it takes so much time! My dataset has 4,000 values and plus with a 1,000 particles...each function evaluation takes so much time! I am using the Nelder-Meade algorithm for optimisation, however, that runs into problems!What I normally tend to do with MLE is use many different random starting values and then select the set which gives the highest likelihood. However, in the particle filtering method, evaluating the function take so so much time that it seems quite absurd for me to use multiple starting values. In fact, in Javaher's book, the examples he uses...he tends to use starting parameter values which are quite close to the actual ones (he plays with simulated data).Any thoughts on this? Are there any faster algorithms? Any references people can point me towards.Many thanks!

mizhael · April 23rd, 2008, 4:26 pm

QuoteOriginally posted by: RezQuoteI was referring to the setup in those papers. In fact, I am not very sure how does discrete/continuous-time change our picture here, due to my limited knowledge.For example, in a linear Gaussian setting, Kalman filtering is optimal.Particle filters involve approximation.Exactly. So you cannot say <<At the end of day, there are certain approximation in using filtering>> as this is not universally true. Kalman is not an approximation. An optimal filter would not approximate.QuoteYou follow a recipe to do it. If you see it as a recipe it means that you don't understand what's going on and why.QuoteMLE is bad because it often converges to local optimum, and also you don't have enough data to obtain the asymptotic gain.Is that a problem of MLE or of the maximization routine you use? If you don't have enough data no procedure in the world will help you. They might mislead you in believing that they do.KPlease be advised that I was talking about "particle filters" being approximate, not Kalman filtering in its optimal use. MLE has no guarantee of global optimum, this is a well-known fact.

Rez · April 24th, 2008, 7:42 am

QuoteOriginally posted by: JediQuantApologies for the intrusion..here's a related question from me...:I have recently acquired myself Javaher's book, 'Inside Volatility Arbitrage'. I must say the book has proved to be really useful and is quite beautifully written.My question is whether anyone here has implemented any of the particle filters he gives for estimating jump diffusions with stoch. volatility? I have tried to, but, I notice it takes so much time! My dataset has 4,000 values and plus with a 1,000 particles...each function evaluation takes so much time! I am using the Nelder-Meade algorithm for optimisation, however, that runs into problems!What I normally tend to do with MLE is use many different random starting values and then select the set which gives the highest likelihood. However, in the particle filtering method, evaluating the function take so so much time that it seems quite absurd for me to use multiple starting values. In fact, in Javaher's book, the examples he uses...he tends to use starting parameter values which are quite close to the actual ones (he plays with simulated data).Any thoughts on this? Are there any faster algorithms? Any references people can point me towards.Many thanks!JediThis is an inherent problem of particle filters and other methods: they are just expensive to compute. Nothing you can do about it.The trick of starting from the simulated values to re-estimate is old and used routinely to pump up the results and make them look good. You can only imagine how long it would take for the optimization to stop if you started randomly! And using simplex with a relatively large stopping criterion will make the optimization stop relatively quickly around the starting values which are chosen to be the true ones... hmmm... I would bet money that if you randomize the starting points then particle filters will be as good/crap as anything else. Also playing with simulated data does not address the robustness of the method. In reality the model is only an approximation, and when there is misspecification you cannot foresee what the biases will be. If you run an experiment where a few jumps in the series, or is you bootstrap the noise rather than making it Gaussian, your filter will break down.Anyway, using simplex to optimize an expensive function is absurd, but I guess with an objective function that is full of nuisance parameters you are trapped between a rock and a hard place. A few years ago I was using NPSOL and I was very happy with it as it struck a good balance between speed and robustness. But it is (was?) in Fortran77 and you have to spend time to understand what it does.Using a number of starting points is a way to go: what I do now is use a heuristic method first to locate a starting point, and then a hill climbing algorithm to optimize locally if needed. Also keeping the points visited by the heuristic help me say something on the standard errors of MLE.Which brings me to another point: performing MLE is pointless without getting robust standard errors. Many people use BFGS or BHHH or variants of other methods that involve gradients and estimates of the Hessian. This is what Matlab does for example. I found that Hessians that come out from these procedures are totally rubbish to construct confidence intervals. They are not created to be accurate, they just want to give a rough idea of the next hill-climbing step size.Apologies for the long postKyriakos

Rez · April 24th, 2008, 7:57 am

QuoteMLE has no guarantee of global optimum, this is a well-known fact.It might be a well known fact in your circle but I never heard of it before.And calling it well known does not make it true.What exactly do you mean? that the likelihood function does not exhibit a global maximum, or that your optimization method cannot find it?Well the MLE of a linear regression model has a global maximum, which coincides with least squares. Therefore we cannot say that a global maximum does not exist.If your optimization method is trapped in local maxima, then use a different method. It is not an MLE problem.In stochastic volatility models you can even find pathological points where the MLE goes to infinity.. how's that for a global maximum? Just take the drift being equal to the first point and the initial volatility equal to zero. The other parameters are irrelevant. Estimation done.K

mizhael · April 24th, 2008, 9:58 pm

QuoteOriginally posted by: JediQuantApologies for the intrusion..here's a related question from me...:I have recently acquired myself Javaher's book, 'Inside Volatility Arbitrage'. I must say the book has proved to be really useful and is quite beautifully written.My question is whether anyone here has implemented any of the particle filters he gives for estimating jump diffusions with stoch. volatility? I have tried to, but, I notice it takes so much time! My dataset has 4,000 values and plus with a 1,000 particles...each function evaluation takes so much time! I am using the Nelder-Meade algorithm for optimisation, however, that runs into problems!What I normally tend to do with MLE is use many different random starting values and then select the set which gives the highest likelihood. However, in the particle filtering method, evaluating the function take so so much time that it seems quite absurd for me to use multiple starting values. In fact, in Javaher's book, the examples he uses...he tends to use starting parameter values which are quite close to the actual ones (he plays with simulated data).Any thoughts on this? Are there any faster algorithms? Any references people can point me towards.Many thanks!Have you made the example C++ code in his book work? I think those codes are reasonably fast, because the dimension of the problem is low and the scale of the problem is still small.

JediQuant · April 25th, 2008, 10:44 pm

Many thanks for your comments Rez! I agree with what you said about particle filters probably will turn out to be as good/crap as anything else with randomly chosen starting values. In fact, Javaheri does do alot of simulation exercises and does demonstrate that one doesn't always get parameters close to actual ones with all of these various filters...even though simulations match the sample moments. I am using R for my optimisations and it does take alot of time. I gave up on the Nelder-Meade algorithm ages ago...R does come with a couple of others, like, BFGS,CG and something called the 'SANN' method (simulated annealing) and am presently playing around with that.I have come across a few review papers on MCMC by the main researchers: Johannes/Polson/Stroud,...these seem really interesting...

JediQuant · April 25th, 2008, 10:50 pm

QuoteOriginally posted by: mizhaelQuoteOriginally posted by: JediQuantApologies for the intrusion..here's a related question from me...:I have recently acquired myself Javaher's book, 'Inside Volatility Arbitrage'. I must say the book has proved to be really useful and is quite beautifully written.My question is whether anyone here has implemented any of the particle filters he gives for estimating jump diffusions with stoch. volatility? I have tried to, but, I notice it takes so much time! My dataset has 4,000 values and plus with a 1,000 particles...each function evaluation takes so much time! I am using the Nelder-Meade algorithm for optimisation, however, that runs into problems!What I normally tend to do with MLE is use many different random starting values and then select the set which gives the highest likelihood. However, in the particle filtering method, evaluating the function take so so much time that it seems quite absurd for me to use multiple starting values. In fact, in Javaher's book, the examples he uses...he tends to use starting parameter values which are quite close to the actual ones (he plays with simulated data).Any thoughts on this? Are there any faster algorithms? Any references people can point me towards.Many thanks!Have you made the example C++ code in his book work? I think those codes are reasonably fast, because the dimension of the problem is low and the scale of the problem is still small.I have translated his c++ code into R. The model I am playing around with is a bit different to his, nonetheless, I don't think I have made any mistake. The dimension of my problem is the same as his. Essentially the filter has two for loops. I have 4,000 data points and for each one of these it has to loop through my chosen number of particles. He uses 1,000 particles in his book...so with that number, my function becomes computationally very demanding (4,000x1,000)! I guess these methods by default are very computationally demanding and for a large dataset this becomes an issue. Or perhaps its because I am using R and should really be using C++. Although not sure hwo much of a difference this would make.