Stat 301– Lecture 20 Sums of Squares

Stat 301– Lecture 20 Sums of Squares SS(C. Total) = 203748.80  SS(Year) = 187300.39    Year explains 91.9% SS(Year2|Year) = 16264.87  Year2 adds 8.0% 1 Sums of Squares SS(C. Total) = 203748.80 SS(Year) = 187300.39 91.9% SS(Year2|Year) = 16264.87 8.0% 2 Sums of Squares SS(C. Total) = 203748.80 2  SS(Year ) = 188977.13    Year2 explains 92.8% SS(Year|Year2) = 14588.13  Year adds 7.1% 3 Stat 301– Lecture 20 Sums of Squares SS(C. Total) = 203748.80 SS(Year|Year2) = 14588.13 7.1% SS(Year2) = 188977.13 92.8% 4 Sums of Squares SS(C. Total) = 203748.80 SS(shared) = 172895.80 84.8% SS(Year|Year2) = 14588.13 7.1% SS(Year2|Year) = 16264.87 8.0% 5 Sums of Squares SS(C. Total) = 203748.80  SS(Year – 1900) = 187300.39    (Year – 1900) explains 91.9% SS= 16264.87  (Year – 1900)2 adds 8.0% 6 Stat 301– Lecture 20 Sums of Squares SS(C. Total) = 203748.80 SS((Year–1900)) = 187300.39 SS((Year–1900)2| (Year–1900)) = 16264.87 91.9% 8.0% 7 Sums of Squares SS(C. Total) = 203748.80 2  SS((Year – 1900) ) = 16264.87    (Year – 1900)2 explains 8.0% SS((Year – 1900)|(Year – 1900)2) = 187300.39  (Year – 1900) adds 91.9% 8 Sums of Squares SS(C. Total) = 203748.80 SS((Year–1900)|(Year–1900)2) = 187300.39 91.9% SS((Year–1900)2) = 16264.87 8.0% 9 Stat 301– Lecture 20 Sums of Squares SS(C. Total) = 203748.80 SS((Year–1900)|(Year–1900)2)=187300.39 91.9% SS(shared) = 0.00 0.0% SS((Year–1900)2|(Year–1900))=16264.87 8.0% 10 Effects of Centering Year2 shares over 85% of the explained variation with Year. 2  (Year – 1900) shares none of the explained variation with (Year – 1900).  11 Why does this happen? The correlation between Year2 and Year is statistically significant, multicollinearity.  The correlation between (Year–1900)2 and (Year–1900) is zero, no linear relationship.  12 Stat 301– Lecture 20 What about 1940 & 1950?    The predictions for 1940 and 1950 are much higher than the actual population values. Why? Can we add a term to the model that could account for this? 13 Dummy Variable A dummy of indicator variable can be used to identify individual or sets of values.  X = 1 if Year is 1940 or 1950  X = 0 otherwise  14 Quadratic with Dummy  Predicted Population = 75.467 + 1.368*(Year – 1900) + 0.0066577*(Year – 1900)2 – 8.947*X  Note that the other estimated slope coefficients are very close to those in the quadratic model. 15 Stat 301– Lecture 20 Quadratic with Dummy  For 1940 and 1950, the prediction is lowered by 8.947 million. 16 Quadratic  1940 Actual = 132.165 Predicted = 139.426  Residual = –7.261    1950 Actual =151.326 Predicted =159.129  Residual = –7.803   17 Quadratic with Dummy  1940 Actual = 132.165 Predicted = 131.908  Residual = 0.257    1950 Actual =151.326 Predicted =151.583  Residual = –0.257   18 Stat 301– Lecture 20 Change in R2  Quadratic: R2 =0.9991   Quadratic+Dummy: R2 =0.9998   99.91% explained variation 99.98% explained variation Only a small increase. 19 Significant Improvement?  Dummy variable, X added to the quadratic model. t = –7.25, P-value < 0.0001  Because the P-value is small, the dummy variable, X, adds significantly to the quadratic model.  20 Change in RMSE  Quadratic:   Quadratic + Dummy:   RMSE = 3.029 RMSE = 1.602 RMSE reduced quite a bit. 21 Stat 301– Lecture 20 5 4 3 2 1 0 -1 -2 -3 -4 -5 1800 1850 1900 1950 2000 Year 22 Plot of Residuals One might detect a up – down – up – down, wave.  Worst predictions are still within 3 or 4 million of the actual population.  Probably can’t do much better.  23 Residuals 24

Stat 301– Lecture 20 Sums of Squares

Related documents

Products

Support

Stat 301– Lecture 20 Sums of Squares

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib