November 24th, 2009, 8:58 pm
Hi,If you want to use only several variables that resemble the variance (kurtosis) of your dataset, then use PCA (ICA). If you want to create a multivariate distribution given marginals, then copulas. But from my understanding, you want to eliminate irrelevant explanatory variables. There are a myriad of techniques that can be used to eliminate or filter irrelevant variables (Filters, Wrappers, Combinations, Feature Selection, etc). I like to use Random Forest for these kinds of experiments, although the choice, is of course, personal (background in ML). Another one that I like use, it is called ADHOC, Automatic Discovery in High Order Correlations. Hopefully you are able to find a set of inputs which help the learning method to generalize in independent samples. Hope this helps!