what is the metric by which different classifications will be compared? what are you trying to accomplish?I am interested to take price (or rather, returns) data and divide it to n periods, according to distribution moments, in a way that will create the best separation. How do you approach that?
I start by comparing by volatility but I'm not really sure what's the relevant metric. The purpose is to try to map between statistical properties of the data to performance of a trading system. I will probably need to narrow it down more, for example volatility in certain times, or in specific areas (for example only when the price is above last day's range). Or it may be autocorrelation of returns. Or autocorrelation in specific areas.what is the metric by which different classifications will be compared? what are you trying to accomplish?I am interested to take price (or rather, returns) data and divide it to n periods, according to distribution moments, in a way that will create the best separation. How do you approach that?
So far you've focused on classifying returns (unsupervised learning), describe the data as it is. Instead I would look at regression instead of classification (supervised learning) -find the relation between your performance and the market-.The purpose is to try to map between statistical properties of the data to performance of a trading system.
Yes, individual returns by themselves are not enough. I also look at auto-correlation, not sure what other measures to add. Perhaps the Hurst exponent? The performance of my systems seem to strongly depend on the extent the market is mean reverting, and that changes from one period to another.Also, instead of modelling returns without any regard for returns in the preceding days you would need to model the dynamical state of the market because your trading system has memory: your trading rules depend on both your current positions -an accumulation of past trades- and perhaps actions taken in the past like pending limit orders.
You can do all sorts of things to model the dynamical state of the market (latent variable models like stoch volatility) , but the performance of your trading system would be the most sensible measure to use to characterise the state of the market, since that's the most relevant indicator and rids you of needless intermediate models!
Absolute returns produce different partitions.Interesting. I would try your method on absolute returns and see how the splits aligns with other possibilities, like NBER business cycle dates for US data. You could also do a GARCH-type fit and compare with that.
As a practical matter, subperiod means will be very hard to estimate, subperiod variances much easier, and subperiod higher moments very unstable. So splitting by variances (proxied by absolute returns) may make the most sense.
Good points.Yes, individual returns by themselves are not enough. I also look at auto-correlation, not sure what other measures to add. Perhaps the Hurst exponent? The performance of my systems seem to strongly depend on the extent the market is mean reverting, and that changes from one period to another.Also, instead of modelling returns without any regard for returns in the preceding days you would need to model the dynamical state of the market because your trading system has memory: your trading rules depend on both your current positions -an accumulation of past trades- and perhaps actions taken in the past like pending limit orders.
You can do all sorts of things to model the dynamical state of the market (latent variable models like stoch volatility) , but the performance of your trading system would be the most sensible measure to use to characterise the state of the market, since that's the most relevant indicator and rids you of needless intermediate models!
Trades that come out of a trading system will be the most direct indicator, but they are a limited sample of the characteristics of the market I try to exploit. The trading rules include many filters, some are probably less important than others, some parameters assist in getting a reasonable performance over various conditions and smoothing the equity curve rather than exploiting specific conditions, and some trades will not appear due to considerations unrelated to market behavior (e.g. not opening more then one position in parallel, or a setup occurring in hours the system cannot be monitored). So, if a certain behavior in the data provides an edge, the system will probably exploit just a small part of it, and the number of trades to analyze won't be very big. Reacting to changes in the market may be late and costly.
In any case, to use trades of a system in supervised learning, I still need to know on which features of the market to focus.