Serving the Quantitative Finance Community

 
User avatar
JosephFrank
Topic Author
Posts: 1
Joined: June 13th, 2003, 3:41 pm

nb of variables in a regression

February 7th, 2008, 10:16 am

Hi,I have a small sample of 96 observations. I developed an empirical model with 4 independent variables to explain the independent variables and the results make sense. A referee asked me to add additional 9 variables to the model to test the robustness of the model. The results no longer make sense including 2 dependent variables of the 4 initial variables. I don't have any multicollinearity issue and I doubt that it s because the sample is small and the number of variables is 13 (too much for the small sample) . does it make sense? in general what is the maximum number of dependent variables that would result a reliable results for such a sample?J
 
User avatar
bostonquant
Posts: 0
Joined: March 26th, 2006, 7:46 pm

nb of variables in a regression

February 9th, 2008, 11:02 pm

QuoteOriginally posted by: JosephFrankHi,I have a small sample of 96 observations. I developed an empirical model with 4 independent variables to explain the independent variables and the results make sense. A referee asked me to add additional 9 variables to the model to test the robustness of the model. The results no longer make sense including 2 dependent variables of the 4 initial variables. I don't have any multicollinearity issue and I doubt that it s because the sample is small and the number of variables is 13 (too much for the small sample) . does it make sense? in general what is the maximum number of dependent variables that would result a reliable results for such a sample?JFirst off this makes no sense (don't take this personal).If you are trying to model a dependent variable by looking at independent variables I would HIGHLY recommend looking at mean squared error, especially in this situation I think you are talking about. Too many people think "r-squared" is the ultimate decision factor in regression. Fact: You cannot lose "r-squared" meaning you can keep adding independent variables that are not significant to a model and r-squared will not go down. But if you are adding independent variables to a model that are not significant your mean squared error (MSE) will increase.The best model is some combination of high r-squared and minimized MSE.Also, consider that the bond market is highly predictable with 2 independent variables and equities with 10-15.
 
User avatar
bostonquant
Posts: 0
Joined: March 26th, 2006, 7:46 pm

nb of variables in a regression

February 9th, 2008, 11:03 pm

QuoteOriginally posted by: JosephFrankHi,I have a small sample of 96 observations. I developed an empirical model with 4 independent variables to explain the independent variables and the results make sense. A referee asked me to add additional 9 variables to the model to test the robustness of the model. The results no longer make sense including 2 dependent variables of the 4 initial variables. I don't have any multicollinearity issue and I doubt that it s because the sample is small and the number of variables is 13 (too much for the small sample) . does it make sense? in general what is the maximum number of dependent variables that would result a reliable results for such a sample?JFirst off this makes no sense (don't take this personal).If you are trying to model a dependent variable by looking at independent variables I would HIGHLY recommend looking at mean squared error, especially in this situation I think you are talking about. Too many people think "r-squared" is the ultimate decision factor in regression. Fact: You cannot lose "r-squared" meaning you can keep adding independent variables that are not significant to a model and r-squared will not go down. But if you are adding independent variables to a model that are not significant your mean squared error (MSE) will increase.The best model is some combination of high r-squared and minimized MSE.Also, consider that the bond market is highly predictable with 2 independent variables and equities with 10-15.
 
User avatar
jaguaracer
Posts: 0
Joined: January 7th, 2007, 1:12 am

nb of variables in a regression

February 10th, 2008, 2:22 am

Grabbed from a google search for *adjusted r-square*. This should help you out.R_Square (the Coefficient of Determination) is the percent of the Total Sum of Squares that is explained; i.e., Regression Sum of Squares (explained deviation) divided by Total Sum of Squares (total deviation). This calculation yields a percentage. It also has a weakness. The denominator is fixed (unchanging) and the numerator can ONLY increase. Therefore, each additional variable used in the equation will, at least, not decrease the numerator and will probably increase the numerator at least slightly, resulting in a higher R_Square, even when the new variable causes the equation to become less efficient(worse).In theory, using an infinite number of independent variables to explain the change in a dependent variable would result in an R_ Square of ONE. In other words, the R_Square value can be manipulated and should be suspect.The Adjusted R_Square value is an attempt to correct this short_coming by adjusting both the numerator and the denominator by their respective degrees of freedom. Unlike the R_Square, the Adjusted R_Square can decline in value if the contribution to the explained deviation by the additional variable is less than the impact on the degrees of freedom. This means that the Adjusted R_Square will react to alternative equations for the same dependent variable in a manner similar to the Standard Error of the Estimate; i.e., the equation with the smallest Standard Error of the Estimate will most likely also have the highest Adjusted R_Square.A final caution, however, is that while the R_Square is a percent, the Adjusted R_Square is NOT and should be referred to as an index value