July 2nd, 2013, 2:43 am
Hi All,I am a student, and struggling to understand how PCA helps finding the impact of say, a 1000 exogenous features (x1, x2, ...., x1000) on an endogenous variable (y), beyond the fact they definitely help reduce the dimensionality of the visualization and help us view the overall impact in 2D or 3D space. What's not clear is that the so called "impact" we are visualizing (of the handful of principal components that explain most of the variance), is not the impact of the actual variables x1, x2, ...., x1000, but of the principal components! Each of the new features say, z1, z2, ...., zk (where k << 1000) is a complete transformation of the original features, and each of the principal components is a product of the corresponding eigen vector and the vector of the original features. So, just examining the top k PCs, how do I reverse engineer and conclude which of the original features are the impactful ones? I have googled this up, and there are a variety of answers but none that makes sense to me (maybe I am missing something fundamental), and many students have the same predicament as mine. Can any experts on this forum share their insight please? Most grateful for your responses on this post.