Linear Regression Equations - Appalachian State University

advertisement
Partitioning of Sums of Squares in Simple Linear Regression
George H Olson, Ph. D.
Leadership and Educational Studies
Appalachian State University
The parametric model for the regression of Y on X is given by
Yi = α + βXi + εi..
(1)
The model for the regression of Y on X in a sample is
Yi = a + bXi + ei.
(2)
Calculation of the constants in the model:
the slope is given by
x y
x
i
b=
i
2
, where xi  X i  X  ,
(3)
i
and the intercept by
a  Y  bX
.
(4)
A closer look at the regression equation, Y   a  bX i leads us to
Yˆ  a  bX
i
i
 (Y  bX )  bX
(5)
i
 Y  b( X  X )
i
 Y  bX .
i
Partitioning the Sum of Squares,  (Yˆi  Y ) 2 .
First, consider the following identity
Y  Y  (Yˆ  Y )  (Y  Yˆ ) .
i
i
i
i
(6)
If we subtract Y from each side of the equation, we obtain
Y  Y  (Yˆ  Y )  (Y  Yˆ )
i
i
i
(7)
i
After squaring and summing, we have
 (Y  Y )   [(Yˆ  Y )  (Y  Yˆ )]
  (Yˆ  Y )   (Y  Yˆ )
2
2
i
i
i
i
2
i
i
(8)
2
i
2 (Yˆ  Y )(Y  Yˆ )
i
i
i
or, after simplifying1,
 Y   (Yˆ  Y )   (Y  Yˆ )
 SS  SS
2
2
i
i
2
(9)
i
reg
res
Where SSreg = regression sum of squears, and SSres = residual sum of squares.
Dividing (9) through by the total sum of squares, SStot (=
SS
SS
y


y
y y
y
2
) gives
2
reg
i
2
i
(10)
res
2
i
2
i
or
1
SS
SS

y y
reg
res
2
2
i
(11)
i
A computational example
It is often useful to devise simple computational examples, such as the following:
Y
3
1
0
4
5
1
See the appendix to see how the equation simplifies.
X
1
0
1
-1
2
The means of the two variables, Y and X are
X
X 3
  .6
n
5
Y
 Y 13
  2.6
n
5
i
i
Having computed the means, we now compute the deviations, squares of deviations,
and cross-products of deviations
Y
3
1
0
4
5
Deviations, squares, and cross-products
y
y2
X
x
x2
xy
.4
.16
1
.4
.16
.16
-1.6 2.56
0
-.6
.36
.96
-2.6 6.76
1 -1.6 2.56 4.16
1.4 1.96
-1
.4
.16
.56
2.4 5.76
2
1.4 1.96 3.36
The sums of squares and cross-products are computed as
 x  5.2
 y  17.2
2
i
2
i
 x y  9.2
i
i
and the regression coefficients as
b
 x y 9.2

 .769,
5.2
x
i
i
2
i
and
a  Y  bX = 2.6 – 1.769 * .6 = 1.061 = 1.539
The regression equation can now be written as
Y  1.539  1.769 X
i
i
From an earlier equation (9) we obtain
SS   (Yˆ  Y )
reg
2
i
  [(Y  bx )  Y ]
2
i
  (Y  bx  Y )
2
i
  (bx )
2
i
b x
x y
(
) x
x
( x y )

x
84.64

5.2
 16.28.
2
2
i
i
i
2
i
2
2
i
i
2
(12)
Note that we could also have computed
SS  b  x y
reg
i
i
 1.769 *9.2
 16.28.
(13)
An alternative calculation is given by
SS  b  x
2
reg
2
i
 (1.769) *5.2
 3.1294*5.2
 16.28
2
(14)
Hence,
SS   y  SS
2
res
i
reg
 17.2  16.28
 .92
(15)
The equation for the Pearson correlation is
r 
2
xy
( x y )
x y
2
i
2
i
(16)
i
2
i
Therefore, SSres can also be computed as
( x y )
y
x y
( x y )

x
2
r y 
2
xy
2
i
i
2
i
2
2
i
i
i
(17)
2
i
i
2
i
 SS
reg
Appendix
Beginning with,
 (Y  Y )   [(Yˆ  Y )  (Y  Yˆ )]
  (Yˆ  Y )   (Y  Yˆ )
2
2
i
i
i
i
2
i
i
2
i
2 (Yˆ  Y )(Y  Yˆ ),
i
i
i
we need to show that
2 (Yˆ  Y )(Y  Yˆ )  0.
i
i
i
Recalling (3, 4 & 5) we can write,
2 (Yˆ  Y )(Y  Yˆ )  2 ((Y  bX  bX )  Y )(Y  Yˆ )
 2 ((Y  b( X  X )  Y )(Y  Yˆ ))
i
i
i
i
i
i
i
i
 2 (b( X  X )(Y  Yˆ ))
i
i
i
 2 b( X  X )(Y  (Y  bX )  bX )
i
i
i
 2b  ( X  X )(Y  (Y  b( X  X )))
i
i
i
 2b  ( X  X )((Y  Y )  b( X  X ))
i
i
i
 2b  (( X  X )(Y  Y )  b( X  X ) )
2
i
i
i
 2b  ( x y  bx )
2
i
i
i

x y
 2b   x y  
 x

 2b( x y   x y )
i
i
i
2
i
i
 0.
i
i
i
i


x





2
i
Download