September 25th, 2014, 7:40 am
Dear All ,I have a very big class of equity returns (about 500 time series each of 500 daily observations) from which to infer some statistics to detect outliers (sample mean, sample deviation, quartiles, median absolute dev and so on). Once I have calculated for each time series the above statistics, I need to aggregate them in a single measure to detect outliers valid for all the class.Question 1: I have 500 daily means, daily standard deviations, quartiles (Q1, Q3 and IQR) for interquartile analysis and median absolute deviations (MAD). I can easily calculate the average of the 500 means and the average of standard deviations using Average Std = Sqrt (Average of Variances) but I don?t know how to aggregate in a statistically consistent way the other statistics (Q1, Q3, IQR, MAD). A simple mean of 500 of the above statistics does not seem statistically correct. Any formulas or ideas to solve this problem? Question 2: In case I use Principal Component Analysis to reduce the dimension of my asset class, is it better to estimate the above statistics directly from the first n principal components (for example, PC1 to PC10) or it is better to estimate from the principal component approximation of the original returns calculated as in (II.2.5) from the book "Market Risk Analysis vol. 2" by Carol Alexander? Again, how can I aggregate the ten statistics I get (Q1, Q3, IQR and MAD for 10 time series) in an average Q1, an average Q3, an average IQR and an average MAD? In case this average aggregation does not make any statistical sense, can I calculate my statistics just on the first principal component PC1 if it explains enough variation in my original returns (so to avoid any aggregation problem)? In general, any statistical analysis about dispersion of my original returns is better made on a given number of first principal components or on the principal component approximation of the original returns?Thank you very much for reply to all my questions.Pier