Partitioning the Sums of Squares for Linear Regression

advertisement
Partitioning the Sums of Squares for Linear Regression
MA 223
Let (xk , yk ), 1 ≤ k ≤ n be any set of points in the plane. As we showed in class the best
fit line y = a + bx though this data is obtained when
b=
Sxy
, a = ȳ − bx̄
Sxx
where as usual x̄ and ȳ denote the sample means for the xk and yk respectively and
Sxx =
∑
∑
k
k
(xk − x̄)2 , Sxy =
(xk − x̄)(yk − ȳ).
Recall also that the total sum of squares for the data is
SStotal =
∑
(yk − ȳ)2
k
and the residual sum of squares is
SSres =
∑
∑
k
k
(ŷk − yk )2 =
(bxk + a − yk )2 .
In the text it is claimed on page 119 that
SStotal = SSres + bSxy .
(1)
Note that since b = Sxy /Sxx the quantity bSxy is always non-negative, so that SStotal ≥ SSres
is always true.
Equation (1) isn’t too hard to prove. First, I claim that we may as well assume that
x̄ = 0 and ȳ = 0. To see this, recall from class that we can add any constant p to each xk
and the quantity xk − x̄ is unchanged for any k. Similarly we can add any constant q to each
xk and the quantity yk − ȳ is unchanged for any k. This leaves Sxx , Syy and Sxy unchanged,
and of course then b = Sxy /Sxx and SStotal are also unchanged. However, x̄ changes to x̄ + p
and ȳ changes to ȳ + q, so that a changes to a + q − bp. Then the quantity (bxk + a − yk )
changes to b(xk + p) + (a + q − bp) − yk − q = bxk + a − yk , that is, remains unchanged. Thus
SSres also remains unchanged.
The upshot of all this is that adding any constant to the xk and/or yk leaves the essential
quantities of interest b, Sxx , Sxy , SStotal , and SSres unchanged. As a consequence we may as
well assume, for simplicity, that x̄ = ȳ = 0, since we can always add appropriate constants
to make this true. In this case we have
∑
∑
∑
∑
xk yk
2
2
,
SS
=
(bx
−
y
)
,
S
=
xk yk .
,
a
=
0,
SS
=
y
res
k
k
xy
total
k
2
k xk
k
k
k
b = ∑k
Then
SSres + bSxy =
∑
∑
(bxk − yk ) +
2
k
1
(
k
xk yk )2
2
k xk
∑
= b
2
∑
∑
=
=
(
x2k
− 2b
k
∑
k
xk yk +
∑
∑
k
2
∑
yk2
+
(
k
xk yk )2
2
k xk
∑
∑
∑
xk yk )
( k xk yk )
( k xk yk )2
2
−2 ∑
+
y
+
∑ 2
∑ 2
k
2
k xk
k xk
k xk
k
∑
k
2
yk2
k
= SStotal .
This demonstrates equation (1).
2
Download