(Cont’d) t fi f

advertisement
Stat 330 (Spring 2015): slide set 31
ŷi − yi =: ei.
2
♠ After fitting the line ŷ = b0 + b1x, one doesn’t predict y as ȳ anymore
and suffer the errors of prediction above, but rather only the errors
Last update: April 21, 2015
Stat 330 (Spring 2015)
Slide set 31
Stat 330 (Spring 2015): slide set 31
(yi − ȳ) =
2
i=1
n
yi2
1
−
n
i=1
n
yi
2
= SST Total Sum of Squares
i=1
n
e2i =
i=1
n
1
(yi − ŷ)2 = SSE : Sum of Squares of Errors
Stat 330 (Spring 2015): slide set 31
3
♠ The fact is that SST ≥ SSE, so that SSR := SST − SSE ≥ 0 is taken
as a measure of ”variation accounted for” in the fitting of the line.
is a measure for the remaining/residual/error variation.
and
♠ Notice SST is (n − 1) · s2y , where s2y is the sample variance of Y .
is a measure for the variability of y (figure below).
i=1
n
Ideas: The quantity
Second measure of goodness of fit: Coefficient of determination R2, it is
based on a comparison of “variation accounted for” by the line versus “raw
variation” of y.
Review: sample correlation as a measure of goodness of fit
Goodness of fit (Cont’d)
n
y
7.185
7.341
7.480
7.601
7.150
7.445
7.741
7.639
8.060
7.823
7.569
7.830
8.122
8.071
8.903
8.242
8.344
8.541
8.541
8.720
8.670
8.500
x = year − 1900
0
4
8
12
20
24
28
32
36
48
52
56
60
64
68
72
76
80
84
88
92
96
where Sxy
ŷ
7.204
7.266
7.328
7.390
7.513
7.575
7.637
7.699
7.761
7.947
8.009
8.071
8.133
8.195
8.257
8.319
8.381
8.443
8.505
8.567
8.629
8.691
y − ŷ
-0.019
0.075
0.152
0.211
-0.363
-0.130
0.104
-0.060
0.299
-0.124
-0.440
-0.241
-0.011
-0.124
0.646
-0.077
-0.037
0.098
0.036
0.153
0.041
-0.191
SSR
SST
i=1
yi2
1
−
n
i=1
n
yi
2
= 1406.109 −
= 0.81
6
Connection Between R and r
7
♣ It is possible to go beyond simply fitting a line and summarizing the
goodness of fit in terms of r and R2 to doing inference, i.e. making
confidence intervals, predictions, . . . based on the line fitting. But for that,
we need a probability model.
Example (Olympic-continued): R2 = 0.8095 = (0.8997)2 = r2.
R2 = r2 if and only if ŷ = b0 + b1x
♠ Then R2 is equal to the squared sample correlation between y and x,
which is exactly r2:
♠ If - and only if! - we use a linear function in x to predict y, i.e.
ŷ = b0 + b1x, the correlation between ŷ and x is 1.
♠ R is SSR/SST - that’s the squared sample correlation of y and ŷ!
2
4.707
5.810
Stat 330 (Spring 2015): slide set 31
(y − ŷ)2
0.000
0.006
0.023
0.045
0.132
0.017
0.011
0.004
0.089
0.015
0.194
0.058
0.000
0.015
0.417
0.006
0.001
0.010
0.001
0.024
0.002
0.036
2
=
Stat 330 (Spring 2015): slide set 31
SSR
SST
175.5182
= 5.81.
22
5
SSE = SST −SSR = 5.810−4.707 = 1.103 and R2 =
SST = Syy =
n
♠ Obviously: 0 ≤ R2 ≤ 1, the closer R2 is to 1, the better is the linear
fit. In other word, the more variability can be explained by the line or linear
model.
Example (Olympics-continued):
R2 =
♠ Definition: The coefficient of determination R2 is defined as:
Coefficient of determination R2
Stat 330 (Spring 2015): slide set 31
4
1100 · 175.518
= 9079.584 −
= 303.584
22
SSR = b1 · Sxy = 0.0155 · 303.684 = 4.707
Example (Olympics-continued):
SSR = b1 · Sxy
is the portion of the total variation explained by the fitted model.
♠ Its is easier to compute it using the formula
(ŷ − y)2 = SSR : Regression Sum of Squares
♠ Using previous notation SSR is also shown to be
i=1
Stat 330 (Spring 2015): slide set 31
Regression Sum of Squares, SSR
Stat 330 (Spring 2015): slide set 31
Stat 330 (Spring 2015): slide set 31
10
♣ Remark: Using the model, not only can we estimate β0, β1 and σ 2,
we can also pursue further statistical inferences like confidence interval,
hypothesis testing. The tool for inference is called ANOVA (analysis of
variance).
y = 7.2037 + 0.0155x + e, with e ∼ N (0, 0.055).
Overall, we assume a linear regression model of the form:
β̂1 = b1 = 0.0155 (in m/year)
1.103
SSE
σ̂ 2 =
=
= 0.055.
n−2
20
β̂0 = b0 = 7.2073 (in m)
♣ Example (Olympics-continued):
8
♠ β0, β1, and σ 2 are the parameters of the model and have to be estimated
from the data (the data pairs (xi, yi)).
♠ In symbols: yi = β0 + β1xi + i with i i.i.d. normal N (0, σ 2).
♠ Idea: in word, for input x the output y is normally distributed with mean
β0 + β1x = μy|x and standard deviation σ.
Simple line regression model
Stat 330 (Spring 2015): slide set 31
σ̂ 2 =
n
1 SSE
(yi − ŷi)2 =
.
n − 2 i=1
n−2
♠ The “right” estimator for σ 2 turns out to be:
9
♠ σ 2 measures the variation around the “true” line β0 + β1x - we don’t
know that line, but only b0 + b1x. Should we base the estimation of σ 2 on
this line?
♥ What about σ 2?
♠ Point estimates: how to estimate β0, β1 and σ 2?
♥ β̂0 = b0, βˆ1 = b1 from Least Squares fit (which gives β̂0 and βˆ1 the name
Least Squares Estimates).
Estimates for regression model
Download