September 14th, 2007, 12:07 pm
1. Watch out for overfitting. In any large dataset, there will be patterns that occur by chance. Make sure you retest any found relationship against out-of-sample data (preferably new data that you've never tested before). You might also want to construct a randomized dataset, cross correlate that with your lead-lag system and study the distribution of the correlation coefficients. Testing on randomized data will give you a feel for when a correlation coefficient is abnormally good.2. Watch out for transaction costs. Even if the correlation coefficient is great, the amount of explained price variance (in currency units) may be too small to cover the transaction costs of trading. Paradoxically, you may get higher profits by trading a lower correlation coefficient setup on a higher variance dataset than trading a higher correlation coefficient setup on a lower variance dataset.3. Watch out for latency and jitter issues. You may find that lag = X works best, but in implementing a trading system you have some unavoidable lags in getting the data, cleaning it, processing it, generating orders, submitting them, waiting for them to appear on the exchange, and waiting for them to execute. These delays will need to be accommodated by the system. Unfortunately, some of these delays may have variations which will mean you are really trading a system with lag +/- delta, where delta is a highly skewed random variable.