PCA vs LSE

Alkmene · February 5th, 2010, 1:01 am

What is the difference?To me it seems that LSE minimises the squared error with the actual data whereas PCA finds the PC that has the maximum variance and hence has the most "power" of an explanatory variable? But isn't the assumption of the least square similar in that the squared diff errors are assumed to be normally distributed with a mean of zero? Not much time to think about it in detail but very curious.Are there any papers outlining the difference similarities as well as the interpretation?Thanks,Alk Edit: I can see that PCA tells me that which variable explains the most of the variance of the dependent variable and the LSE creates a new estimator (linear function) to represent the dependent variable. With the LSE, I can then make predictions if I wish to but how can I weave PCE into this way of analysing data?

Traden4Alpha · February 5th, 2010, 1:15 pm

"Standard" LSE has one or more independent variables that are assumed to be measured without error and a linear model that explains the value of the dependent variable with some additive error. Any random errors in the independent variables will induce biases in standard LSE.PCA makes no assumptions that any variables are dependent/independent or error free. Instead it finds orthogonal modes of covariation. The dominant mode of a PCA analysis minimizes least-square error that is orthogonal to that dominant mode (the mode being a linear combination of ALL variables) rather than minimizing least-square error with respect to a single variable.

JojoLeBarjo · February 10th, 2010, 3:10 am

To go one step further, I would say that LSE and PCA have nothing in common. LSE output is the orthogonal projection in a L2 space whereas PCA output is an ordered orthogonal base with decreasing projected entropy.

Alkmene · February 11th, 2010, 5:53 am

Thanks for your answers. I wonder if there is some overlap though. I mean, I am only using these things in practise and am trying to understand a bit more of the "why" to extend the "how" of what I am doing.Can I use PCA to make a prediction about some dependent variable without LSE or is it only useful for selection/reduction of a set of variables? There is also always lurking in the back of my mind the connection between the R^2 = correl^2 (assuming only 2 variables) in LSE and the use of PCA on a Var/Covar matrix. Is there a connection through this between them?Thanks,Alk

Traden4Alpha · February 11th, 2010, 1:45 pm

Both LSE and the dominant mode of PCA encode a "best fit" linear equation in the variables. Both minimize an L2 norm between the data and the best fit equation. For a true linear system (Y = c_0 + c_i*X_i) and as the amount of stochastic error declines, both will converge to the same solution.As I posted before, the biggest issue is whether one assumes the independent variables are measured without error or not. If the independent variables are error-free (i.e., all errors are assumed to be in the measurement of the independent variable), then use LSE.PCA can give spurious results if the sampling of the independent variables contains marked linearities or if the variances in the sampling of the independent variables are are small relative to the variances of the dependent variable errors. The dominant mode can become aligned with the structure of the sampling rather than the structure of the dependent variables best-fit equation.

Alkmene · February 11th, 2010, 9:05 pm

Traden, thanks for that. Makes a lot of sense.Just one more thing, as per my previous post: Once PCA has revealed the dominant mode (or the variable that is responsible for the largest portion of the variance), what can you do to make any forecast or inferrence on unknown dependent variables?Thanks,Alk