Computing correlation with missing data / unsynchronised data

sunil100 · August 3rd, 2005, 11:56 am

I have two time series: the FTSE 100 and the S&P 500 stock indices. I want to compute their correlation coefficient but I am concerned about the unsyncrhonised nature of the data, since the FTSE100 is the London market and the S&P in the US - the prices of the latter is lagged by 5 hours. Can I just naively compute the correlation as normal, or is there a way to address this problem?Also, how do I deal with missing data? Is it best to drop days with no prices available, or is it better to interpolate between days?Also, when computing correlation, should I adjust for the drift, or is it OK to assume that the mean of the log-returns is zero?

APD · August 4th, 2005, 5:08 pm

It is generally sensible to make sure that you are comparing time synchronized data. If you think about it would not make much sense to compare 9am numbers from the UK with 9am numbers from the US, far better to compare the US 9am numbers with 2pm numbers from the UK or something similar. As for missing data - interpolating between missing dates (ie when there has been a holiday in one location but not the other) is again not a particualry good idea. With the data you describe you would have no difficulty finding several years worth of data so the odd dropped data point is not going to adversely affect your results.