Sums of Squares SS(C. Total) = 203748.80 SS(Year) = 187300.39 SS(Year

advertisement
Sums of Squares
SS(C. Total) = 203748.80
 SS(Year) = 187300.39



Year explains 91.9%
SS(Year2|Year) = 16264.87

Year2 adds 8.0%
1
Sums of Squares
SS(C. Total) = 203748.80
SS(Year) = 187300.39
91.9%
SS(Year2|Year) = 16264.87
8.0%
2
Sums of Squares
SS(C. Total) = 203748.80
 SS(Year2) = 188977.13



Year2 explains 92.8%
SS(Year|Year2) = 14588.13

Year adds 7.1%
3
Sums of Squares
SS(C. Total) = 203748.80
SS(Year|Year2) = 14588.13
7.1%
SS(Year2) = 188977.13
92.8%
4
Sums of Squares
SS(C. Total) = 203748.80
SS(shared) = 172895.80
84.8%
SS(Year|Year2) = 14588.13
7.1%
SS(Year2|Year) = 16264.87
8.0%
5
Sums of Squares
SS(C. Total) = 203748.80
 SS(YearCtr) = 187300.39



YearCtr explains 91.9%
SS(YearCtr2|YearCtr) =
16264.87

YearCtr2 adds 8.0%
6
Sums of Squares
SS(C. Total) = 203748.80
SS(YearCtr) = 187300.39
91.9%
SS(YearCtr2|YearCtr) = 16264.87
8.0%
7
Sums of Squares
SS(C. Total) = 203748.80
 SS(YearCtr2) = 16264.87



YearCtr2 explains 8.0%
SS(YearCtr|YearCtr2) =
187300.39

YearCtr adds 91.9%
8
Sums of Squares
SS(C. Total) = 203748.80
SS(YearCtr|YearCtr2) = 187300.39
91.9%
SS(YearCtr2) = 16264.87
8.0%
9
Sums of Squares
SS(C. Total) = 203748.80
SS(YearCtr|YearCtr2) = 187300.39
91.9%
SS(shared) = 0.00
0.0%
SS(YearCtr2|YearCtr) = 16264.87
8.0%
10
Effects of Centering
Year2 shares over 85% of the
explained variation with Year.
 YearCtr2 shares none of the
explained variation with YearCtr.

11
Why does this happen?
The correlation between Year2
and Year is statistically
significant, multicollinearity.
 The correlation between
YearCtr2 and YearCtr is zero, no
linear relationship.

12
What about 1940 & 1950?



The predictions for 1940 and 1950
are much higher than the actual
population values.
Why?
Can we add a term to the model
that could account for this?
13
Dummy Variable
A dummy of indicator variable
can be used to identify
individual or sets of values.
 X = 1 if Year is 1940 or 1950
 X = 0 otherwise

14
Quadratic with Dummy

Predicted Population = 75.467 +
1.368*(Year – 1900) +
0.0066577*(Year – 1900)2 –
8.947*X

Note that the other estimated
slope coefficients are very close
to those in the quadratic model.
15
Quadratic with Dummy

For 1940 and 1950, the
prediction is lowered by 8.947
million.
16
Quadratic

1940
Actual = 132.165
 Predicted = 139.426
 Residual = –7.261


1950
Actual =151.326
 Predicted =159.129
 Residual = –7.803

17
Quadratic with Dummy

1940
Actual = 132.165
 Predicted = 131.908
 Residual = 0.257


1950
Actual =151.326
 Predicted =151.583
 Residual = –0.257

18
Change in

Quadratic: R2 =0.9991


99.91% explained variation
Quadratic+Dummy: R2 =0.9998


2
R
99.98% explained variation
Only a small increase.
19
Significant Improvement?

Dummy variable, X added to
the quadratic model.
t = –7.25, P-value < 0.001
 Because the P-value is small, the
dummy variable, X, adds
significantly to the quadratic
model.

20
Change in RMSE

Quadratic:


Quadratic + Dummy:


RMSE = 3.029
RMSE = 1.602
RMSE reduced quite a bit.
21
22
Plot of Residuals
One might detect a up – down
– up – down, wave.
 Worst predictions are still within
3 or 4 million of the actual
population.
 Probably can’t do much better.

23
Residuals
24
Download