Regression

barryyu · September 23rd, 2007, 11:14 am

I am a bit confused about the quantities that we run regression on...I have seen regression run on the differences P(t) - P(t-1) or logP(t), but also on % change P(t)/P(t-1) - 1 or log[P(t)/P(t-1)]In theory which one is better (under which circumstances as well)?if we do the former case, should we also normalize the data (indexed at 100 on a particular date..)I really appreciate it if someone could shed some light on this!Regards,

msperlin · September 23rd, 2007, 10:14 pm

QuoteOriginally posted by: barryyuI am a bit confused about the quantities that we run regression on...I have seen regression run on the differences P(t) - P(t-1) or logP(t), but also on % change P(t)/P(t-1) - 1 or log[P(t)/P(t-1)]In theory which one is better (under which circumstances as well)?if we do the former case, should we also normalize the data (indexed at 100 on a particular date..)I really appreciate it if someone could shed some light on this!Regards,From my short academic experience, the only I've never seen is P(t) - P(t-1). The logP(t) is usually used in cointegration tests and the P(t)/P(t-1) - 1 and log[P(t)/P(t-1)] are just returns and the log formula is far more used because log returns are easier to handle in research. If you're doing academic research its important to speck in the same language as everyone else, so check the papers on the subject and maintain the same unit so you can compare the results. If in practical research, whether you're dealing with P(t) - P(t-1), logP(t), P(t)/P(t-1) - 1 or log[P(t)/P(t-1)], is not really important, because the real "juice" is the pattern that you're trying to pick up. The pattern will exist in any of the transformed data since they all take as input the price vector, but off course such pattern will be shaped differently according to your transformation. There is no right answer to you question. If you have to choose, go with log returns.

Colossus2420 · September 25th, 2007, 1:27 pm

Msperlin,Great help. I've often wondered about that as well.Now the next question is about regression. It seems there are numerous methods of running regression and if I want to run multiple regressions, where should I start? With which method?

msperlin · September 25th, 2007, 2:44 pm

QuoteOriginally posted by: Colossus2420Msperlin,Great help. I've often wondered about that as well.Now the next question is about regression. It seems there are numerous methods of running regression and if I want to run multiple regressions, where should I start? With which method?What do you mean by multiple regressions ? is it a multivariate regression ? a vector autoregression ? Can you be more specific.Regarding methods, if you're dealing with simple models with equation just in the conditional mean, the least squares suits. The point here is just minimize the sum of quadratic error of the model, in other words, minimize the uncertainty (variance) of the model.If you dealing with conditional mean and variance (a.k.a arch models), perhaps some conditional kurtosis as well, the most broad method is maximum likelihood. The idea is quite neat, maximize the probability that the fitted values come from the same distribution as the dependent variable.Under some conditions, for instance, for a normal distribution with constant variance, the maximum likeihood solution is the same as the least squares method since both minimize the sum of squared error.

quantmeh · September 26th, 2007, 2:43 am

QuoteOriginally posted by: barryyuI have seen regression run on the differences P(t) - P(t-1) or logP(t), but also on % change P(t)/P(t-1) - 1 or log[P(t)/P(t-1)]In theory which one is better (under which circumstances as well)?u need a really basic book on regressions. none of them's better than others. if u're talking about linear regressions, then these ones r ways to linearize non linear models.if ur model's something like P(t) = a + b t, then P(t) is the best. if it's like P(t) = a exp(-b t), then log P(t) is better and so on.