August 4th, 2004, 5:54 pm
Let's suppose you are trying to beat a roulette wheel. A perfect wheel has 1/38 chance for each number. But say the wheels in one casino are slightly off balance. 28 of the numbers are selected with probability 0.025 each and 10 of the numbers are selected with probability 0.030 each.You record 200 spins of such a wheel, in which case you expect each of the 28 low-probability numbers to come up 5 times and each of the high probability numbers to come up 6 times. So you decide to bet in the future on every number that came up 6 or more times in the sample.Your backtest will show, on average, a 33% return on this strategy. But if you actually try it in the future, you will get, on average, a -4% return.Why? The trouble is that you are using the same data to decide which numbers to bet on and to evaluate the strategy. 38% of the low probability numbers will have 6 or more hits in 200, 44% of the high-probability numbers will have 5 or fewer hits. So in the future you will be betting on twice as many low-probability numbers (38% x 28) than high-probability ones (56% x 10).However, even if you correctly identified the high probability numbers and bet on only those 10, you would only get an 8% average return. The 33% from the backtest comes from the few numbers that had a lot of hits in the sample. The odds are better than even that one number will come up 10 or more times in 200 spins. Your technique will pick this number to bet on in the future, and project that it will come up 5% of the time for an 80% profit (0.05 x 36 - 1). The actual return on betting this number going forward is 3%.Your backtesting misleads you in two ways. First it gets you to pick the wrong strategy, the correct strategy is to bet on all numbers that came up 7 or more times in the 200. That has an expected return going forward of 1%. Second, it wildly overestimates the success your strategy will enjoy in the future.The antidote to this effect is to fit your model on one set of data, then test it on another, preferably data from a period later than the fitting period. Or test your model dynamically, updating the parameters and test each day based on the parameters you would have known at that time.Even that does not eliminate the problem. There may be unconsious biases in model selection and fitting based on what you know about all the data. This is one of several reasons that backtesting results have to be viewed with caution, especially if they are not computed by a careful, honest and experienced analyst.