Using PCA with equity data

Blazes · March 4th, 2005, 11:12 am

Hi, I am currently building a VAR model for an equity business. I am using "historical simulation", i.e looking at the past return pattern and modifying today's prices accordingly to obtain my data set. Difficulty is that a subset of stocks (10 to 20%) do not have a full data set. I am testing using Principal Components to generate the additional points for the truncated series. Briefly I estimate PC parameters for the stock with fewest data points using the data for all stocks for the period which I have full data set. For the stock with the next fewest data points I use the data for all stocks bar the first one. I continue with this process until I have PC parameter data for all of the stocks for which data is incomplete. I calculate the actual Principal Components using the maximum set of parameters I have for a given day. I then estimate the missing data points my multiplying the parameter array by the PC array for the appropriate points. I would appreciate any comments on the pitfalls associated with this approach or anything that would allow me to refine or improve my approach.

Aaron · March 6th, 2005, 6:09 pm

I think you're using the principal components to fill in the holes of the historical data. That seems like overkill to me. I think a one-factor regression would do a better job. In either case, it's more important to fill in the error terms than to get the covariance right.

Blazes · March 7th, 2005, 3:17 pm

Yes you are correct regarding what I am trying to do although it is for truncated data sets rather than for "holes". I'm afraid I don't understand what you mean by "filling" in the error terms, I would be most obliged if you could elaborate. In the portfolios I have looked at so far (with between 50 and 80 stocks) one needs a relatively high number of PCs to get high explanatory power certainly relative to analysis I have done wrt Fixed Income portfolios, so I'm not so sure if 1 factor regression would be superior.

Aaron · March 10th, 2005, 5:52 pm

Let me see if I understand.You compute historical VaR by looking at the distribution of value of your stock portfolio over the past. Some of your securities didn't exist in the past, for example IPO's, spin-offs and mergers. So you fill in model prices for these stocks using principal components.I hope you do not use the same data to fit the components as manage the portfolio. That would be a big danger.The actual risk a stock will add to the portfolio is composed of the risk indicated by the PC fit, the risk introduced by errors in the PC fit and idiosyncratic risk layered on to the PC fit. I think in most cases the first part will be negligable, and a high-dimension fit will add more to the second source than it reduces from the first. The third will also be significant.

Blazes · March 11th, 2005, 8:06 am

Perhaps I can best illustrate by an example. Assume portfolio consists of 10 stocks. I have complete data on 8 (assume 500 day history) and incomplete data on 2 (say 200 and 300 day). First step is to calculate PCs for 200 point data set so have 10 PC parameters for each stock, then calculate PC parameters for 300 point data set now have 9 PC parameters and finally for 500 point data set where I will have 8 PC parameters. The next step is to back out the daily PCs (using the 9 parameters I have) for days 201 to 300. My observations for those days for the stock that has only 200 days of history are then generated by simply multiplying the PC parameters (obtained from the 200 data point set) by the PCs (9 for each day). I then back out the daily PCs (using the 8 parameters I have) for days 301 to 500 and generate the missing data points for the stocks for which I have incomplete data sets. Of course I am losing the risk specific to those stocks for the data points I have to simulate but in practice I have a much larger number of stocks so the percentage of the actual movement that is determined by the missing PCs is small. I see the point about errors in the PC fit but others I am not so sure about the other 2 points. Thanks for the comments.

Aaron · March 11th, 2005, 3:28 pm

I tried this, and it worked better than I expected. I generated a multivariate Normal random set of data for 10 stocks (2% volatility of return, 0.3 pairwise correlations) for 500 days. I selected 10 random portfolios (independent uniformly distributed weight for each stock between -0.5 and +0.5). I computed the 99% 500-day historical VaR for each portfolio.Then I deleted 200 days of data for one stock and 300 for another. I filled in the data using the PC method you described. Then I recomputed the 99% 500-day historical VaR. The top numbers below are the recomputed numbers for each of the ten portfolios, the number below each one is the original estimate.-3.87% -2.05% -2.06% -5.35% -7.42% -3.01% -2.33% -3.55% -5.65% -4.07%-3.23% -2.41% -2.18% -6.17% -7.22% -2.92% -2.33% -2.66% -5.89% -3.76%Losing 10% of your data does make a significant difference to the estimate in some cases, for example the eighth portfolio where VaR increases from -2.66% to -3.55%. But the method clearly distinguishes between the high and low risk portfolios, it does not seem to be biased and the error in VaR is not large compared to all the uncertainties in this kind of calculation. In practice, with non-Normal data, complex correlation structure and highly offset portfolios; the error from filling in data this way is probably negligible.I still don't like it on general principles, I prefer simpler methods with fewer parameters. But it seems to work okay in your example.

hedgeQuant · March 14th, 2005, 10:44 pm

In the discussion so far, isnt there an assumption that the portfolio is going to be the same historically ? That is, we look at the current portfolio and compute the returns that this portfolio would have generated historically. Wont it be better if we look at the characteristic of the portfolio and tried to calculate the VaR in terms of the characteristics. For example, suppose the portfolio is long Large Cap stocks. If we can quantify the market cap exposure of the portfolio, shouldnt we be looking at the risk associated with the large cap exposure and calculate the VaR based on that ?This would also simplify the problem of missing data. Take a large and well diversified universe of stocks. For each day, calculate the returns to various factors like market cap, dividend yield etc. using this diverse universe (A risk model from BARRA / Northfield would aid this calculation). Even if a stock has missing data, we certainly have the current exposure of the portfolio to the various factors. This should enable us to calculate the daily (historical) returns of the portfolio. Comments ?Hedge Q.

Aaron · March 15th, 2005, 12:53 pm

The standard definition of VaR is based on the distribution of value of a portfolio given no transactions in normal markets. This is a useful measure. It's not based on the assumption there will be no trading, it's benchmark for quantification.It's also useful to try to project future value changes in the moving portfolio, but that's not VaR.

jasemin · January 20th, 2006, 4:02 am

I am curious how you guys did it.Assume portfolio consists of 10 stocks. I have complete data on 8 (assume 500 day history) and incomplete data on 2 (say 200 and 300 day). First step is to calculate PCs for 200 point data set so have 10 PC parameters for each stock, then calculate PC parameters for 300 point data set now have 9 PC parameters and finally for 500 point data set where I will have 8 PC parameters. The next step is to back out the daily PCs (using the 9 parameters I have) for days 201 to 300. My observations for those days for the stock that has only 200 days of history are then generated by simply multiplying the PC parameters (obtained from the 200 data point set) by the PCs (9 for each day). So here you need to get a 300 by 10 matrix, but you only have 200X8 = 200X8 * 8PC; 300X9 = 300X9 * 9PC, how you fill the holes?If anyone can post a spreadsheet, I'll greatly appreciate it!