Page 1 of 1

Identifying the components in PCA

Posted: April 5th, 2019, 11:36 am
by miltenpoint
My understanding is that PCA does not identify which variables depend on each other and does not specifically identify each component so the how do analysts interpret the factors/components in PCA? 

A good example are PCA factors for different maturity US Treasuries. We are told PC1 is parallel shifts, PC2 is curve shifts (steepeners and flatteners) and PC3 are convexity (curvature) changes - how do we know these descriptors are the factors/components and why in that order? What could be PC>3?

Re: Identifying the components in PCA

Posted: April 13th, 2019, 12:17 pm
by ISayMoo
@how do we know these descriptors are the factors/components and why in that order? 

1. visual inspection
2. simulation of the data by independently varying the coefficients of PCA components
3. comparing their eigenvalues

@What could be PC>3?

It depends. You need to stop looking at the data and think about the underlying processes.

Re: Identifying the components in PCA

Posted: April 13th, 2019, 1:17 pm
by bearish
I have used variations of this as an interview question in recent times. If you grab constant maturity US Treasury yields from, say, 1990-2007, calculate the covariance matrix of the absolute (basis point) changes ordered by maturity, and plot the eigenvectors corresponding to the three largest eigenvalues against the bond maturities, you do indeed get something that closely approximate the level, slope and curvature story. However, if you repeat the exercise using data from the last decade (conveniently missing the most turbulent months of 2008 and 2009), the story changes. Now the first PC has a strong slope component over the first 5 years or so, given Fed promises to keep rates "low for long". Since the second PC needs to be orthogonal to the first, it needs to have a richer structure than a simple slope, and so on. These are empirical regularities, rather than laws of nature. 

As for higher order PCs, they could just be noise, or they could pick up on more subtle clustering effects. E.g., the 30 year rate could have a bit of a life of its own, driven by insurance companies and pension funds hedging very long dated liabilities. Your specific choice of rates to include in the analysis also matters, especially for the higher order PCs. 

Re: Identifying the components in PCA

Posted: April 13th, 2019, 2:04 pm
by miltenpoint
Thanks guys. Great answers!

Re: Identifying the components in PCA

Posted: June 20th, 2019, 2:04 pm
by ikicker
My understanding is that PCA does not identify which variables depend on each other and does not specifically identify each component so the how do analysts interpret the factors/components in PCA? 

A good example are PCA factors for different maturity US Treasuries. We are told PC1 is parallel shifts, PC2 is curve shifts (steepeners and flatteners) and PC3 are convexity (curvature) changes - how do we know these descriptors are the factors/components and why in that order? What could be PC>3?
Actually its the opposite. PCA uses a covariance matrix of standardized values to determine how much UNIQUE information is contained. Because it uses a covariance matrix, it takes into account co-movement.

Procedure: 1. convert your data to z-scores -> 2. create a covariance matrix of the z-scores -> 3. compute the eigen value matrix that corresponds with the eigen vectors

After performing this procedure, the largest eigen values contain the most unique information. If you would like, you can compute contribution % to the model.

Forgetting to standardize your data into z-scores typically results in incomparable measures as covariance is a unitless measure.