J. MULTIPLE REGRESSION 1. Two explanatory variables a. Model

advertisement
252mreg.doc 1/22/07 (Open this document in 'Outline' view!)
Roger Even Bove
J. MULTIPLE REGRESSION
1. Two explanatory variables
a. Model
Let us assume that we have two independent variables, so that Y j represents the jth observation
on the dependent variable and X ij is the jth observation on independent variable i. For example, X 15 is
the 5th observation on independent variable 1 and X 29 is the 9th observation on independent variable 2.
We wish to estimate the coefficients  0 ,1 and  2 , of the presumably 'true' regression line
Y   0  1 X 1   2 X 2 . Any actual point
Y j   0  1 X 1 j   2 X 2 j   j , where
j
Yj may not be precisely on the regression line, so we write
is a random variable usually assumed to be N 0,   , and  is
unknown but constant.
The line that we estimate will have the equation Yˆ  b0  b1 X 1  b2 X 2 . Our prediction of Y for
any specific X 1 j and X 2 j will be Yˆ j , and since Yˆ j is unlikely to equal Y j exactly, we call our error
e j  Y j  Yˆ j
( in estimating Y j )
so that Y j  b0  b1 X 1 j  b2 X 2 j  e j .
b. Solution.
After computing a set of six "spare parts" put them together in a set of Simplified Normal Equations
X 1Y  nX 1Y  b1
X 12  nX 12  b2
X 1 X 2  nX 1 X 2

X
2Y
 nX 2

Y  b  X
1
1X 2
 
 nX X  b  X
1
2
2
2
2


 nX 22
and solve them as two equations in two unknowns for b1 and b2 ; and, then get b0 by solving
b0  Y  b1 X 1  b2 X 2 .
c. Example
Recall our original example. Y is the number of children actually born and X is the number of children
wanted. Add a new independent variable W , a dummy variable indicating the education of the wife.
( W  1 if she has finished college, W  0 if she has not.) In the above equations, X  X 1 and W  X 2 .
i Y X
1 0 0
2 2 1
3 1 2
4 3 1
5 1 0
6 3 3
7 4 4
8 2 2
9 1 2
10 2 1
sum 19 16
W
1
0
1
0
0
0
0
1
1
0
4
X2
0
1
4
1
0
9
16
4
4
1
40
W 2 XW
1
0
0
0
1
2
0
0
0
0
0
0
0
0
1
2
1
2
0
0
4
6
XY
0
2
2
3
0
9
16
4
2
2
40
WY Y 2
0 0
0 4
1 1
0 9
0 1
0 9
0 16
2 4
1 1
0 4
4 49
1
 Y  19,  X  X  16,  X  W  4,
 X  X  40,  X  W  4,  X X   XW  6,  X Y   XY  40,
 X Y   WY  4 and  Y  49 .
 Y  19  1.90, X  X   X  16  1.60
The compute means: Y 
Copy sums: n  10,
2
1
1
2
2
2
2
2
1
2
1
2
2
n
and X 2  W 
Spare Parts:
1
10
n
W  4  0.40.
n
Y
2
10
 nY 2  49  101.92  12.90  SST  SSY
 X Y  nX Y  40  101.61.9  9.60  S
 X Y  nX Y  4  100.41.9  3.60  S
 X 12  nX 12  40 101.602  14.40  SS X
 X 22  nX 22  4 100.42  2.40  SS X
 X X  nX X  6 101.60.4  0.40  S
1
1
2
10
X 1Y
2
X 2Y
1
2
1
2
1
2
X1 X 2
Note that SS Y , SS X 1 and SS X 2 must be positive, while the other sums can be either positive or negative.
Also note that df  n  k  1  10  2  1  7 . ( k is the number of independent variables.) SST is used later.
Rewrite the Normal Equations to move the unknowns to the right.
 X Y  nX Y   X
 X Y  nX Y   X
1
2
2
1
1
2

 X X  nX X b
X b   X  nX b
 nX 12 b1 
1X 2
 nX 1
2
1
1
2
1
2
2
2
2
(Eqn. 1)
2
2
2
(Eqn. 2)
Y  b0  X 1b1  X 2 b2 .
(Eqn. 3)
Or:
S X1Y  SS X1 b1  S X1 X 2 b2
(Eqn. 1)
S X 2Y  S X1 X 2 b1  SS X 2 b2
(Eqn. 2)
Y  b0  X 1b1  X 2 b2 .
(Eqn. 3)
If we fill in the above spare parts, we get:
9.60 
14 .40b1  0.40b2
 3.60 
1.90  b0 
 0.40b1  2.40b2
1.60b1 
0.40b2
Eqn. 1
Eqn. 2
Eqn. 3
We solve the first two equations alone, by multiplying one of them so that the coefficients of b1 or b2 are
of equal value. We then add or subtract the two equations to eliminate one of the variables. In this case,
note that if we multiply equation 1 by 6, the coefficients of b2 in Equations 1 and 2 will be equal and
opposite, so that, if we add them together, b2 will be eliminated.
2
57 .60 
 3.60 
6  Eqn. 1
Eqn. 2
86 .40b1  2.40b2
 0.40 b1  2.40b2
54 .00 
But if 86b1  54.00 , then b1 
86 .00b1
54
 0.62791 .
86
Now, solve either Equation 1 or 2 for b2 . If we pick Equation 1, we can write it as
0.40b2  9.60  14.40b1 . We can solve this for b2 by dividing through by 0.40, so that
b2  24.0  36.0b1 . If we substitute in b1  0.62791 , we find that b2  24.0  36.00.62791   1.3956 .
Finally rearrange Equation 3 to read
b0  1.90  1.60b1  0.40b2  1.90  1.600.62791   0.401.3956   1.4536 . Now that we have values of all
the coefficients, our regression equation, Yˆ  b  b X  b X , becomes
0
1
1
2
2
Yˆ  1.4536  0.6279X 1 1.3956X 2 or Y  1.4536  0.6279 X 1  1.3956 X 2  e .
2. Interpretation
3. Standard errors
Recall that in the example in J1 SST 
Y
2
 nY 2  SSY  12 .90 and that we had computed Spare Parts:
S X1Y  9.60, S X 2Y  3.60, SS X1  14.40, SS X 2  2.40 and S X
1X 2
 0.40 .
The explained or regression sum of squares is
SSR  b1
X 1Y  nX 1Y  b2
X 2 Y  nX 2 Y  b1 S X 1Y  b2 S X 2Y . The error or residual sum of squares

 

is SSE  SST  SSR . k  2 is the number of independent variables.
The coefficient of determination is
R2 
SSR b1 S X1Y b2 S X 2Y b1


SST
SSY
 X Y  nX Y  b  X Y  nX Y 
.
 Y  nY
1
1
2
2
2
2
2
An alternate formula, if spare parts are not available, is
 Y  b  X Y  b  X Y  nY .
 Y  nY
SS
SS 1  R 
SSE
The standard error is s 


2
SSR b0
R 

SST
1
2
1
2
2
2
2
2
2
e
Or s e2 
Y
2
 nY  b1
2
Y
n3
Y
 b1 S X 1Y  b2 S X 2Y
n  k 1
n3
 X Y  nX Y  b  X Y  nX Y 
1
1
2
2
n3
An alternate formula, if spare parts are not available, is s e2 
2
Y
2
 b0
Y  b  X Y  b  X
1
n3
1
2
2Y
.
3
If we wish the coefficient of determination in the example in J1, recall that
SSR  b1
X 1Y  nX 1Y  b2
X 2 Y  nX 2 Y  b1 S X 1Y  b2 S X 2Y  0.6279 9.60   1.3956  3.60 

 

 6.02784  5.02416  11.0520
R2 
b1
 X Y  nX Y  b  X Y  nX Y  b S

 Y  nY
1
1
2
2
2
2
1
X 1Y
2
 b2 S X 2Y
SSY

SSR 11 .0520

 .8567 . This
SST
12 .90
represents a considerable improvement over the simple regression.
SSE  SST  SSR  SSY  b1 S X1Y  b2 S X 2Y  12.90  11.0520  1.848
Alternate computations for s e2 are s e2 
Or s e2 
 Y
2
 nY
2
1  R 
2
n  k 1


SSY 1  R
n3
Or, if no spare parts are available, s e2 

SSY  b1 S X 1Y  b2 S X 2Y
n3
2
Y

SSE 1.848

 0.2640
10  3
7
  12.901  .8587   .2604
7
2
 b0
Y  b  X Y  b  X
1
1
2
2Y
n3
49  1.453 19  0.6279 40  1.3956 4 1.8594

 0.2655 .
10  3
7
We can use this to find an approximate confidence interval  Y0  Yˆ0  t
se
and an approximate
n
prediction interval Y0  Yˆ0  t s e . Remember df  n  k  1  10  2  1  7 .
To do significance tests or confidence intervals for the coefficients of the regression equation, we must
either invert a matrix or use a computer output.
H 0 :  i   i 0
b   i0
To test 
use t i  i
. For example, part of the computer output for the current
H
:



s bi
i0
 1 i
example reads:
Predictor
Constant
X
W
Coef
1.4535
0.6279
- 1.3953
Stdev
0.3086
0.1357
0.3325
t-ratio
p
4.71
0.000
4.63
0.000
-4.20
0.004
b2  0  1.3953
To test the coefficient of W for significance, use t 2 

 4.196
s b2
0.3325
k 1
To do a test at the 5% significance level, compare this with t .n025
. If the t coefficient lies between
k 1
, it is not significant.
 t.n025
Toward the end of the computer output we find:
Analysis of Variance
SOURCE
DF
SS
MS
Regression
2
11.0512
5.5256
Error
7
1.8488
0.2641
Total
9
12.9000
F
20.92
p
0.001
4
The first SS is the regression Sum of Squares
SSR  b1
X 1Y  nX 1Y  b2
X 2 Y  nX 2 Y  b1 S X 1Y  b2 S X 2Y and the third SS is the Total Sum of

 
Y

SSR
. Of course, the second sum of squares is the Error
SST
 R2 
 SSR 




 k 
 k 



k , n  k 1


Sum of Squares SSE  SST  SSR . This means that F
. In other words,
 SSE   1  R 2 

 

 n  k  1   n  k  1 
Squares SST 
2
 nY 2  SSY so that R 2 
we can substitute R 2 and 1  R 2 for SSR and SSE in the F-test and get the same results. (In this case
2, 7
, so that we reject the null hypothesis of no relationship between the
F  20 .92 is larger than F.05
dependent variable and the independent variables.) . For a Minitab printout example of the multiple
regression problem used here and in subsequent sections, see 252regrex2.
4. Stepwise regression
Note: In the material that follows we compare the simple regression and the regression with two
independent variables with a third version of the regression that has three independent variables. The third
independent variable is X 3  H , a dummy variable that represents the husband’s education, again a 1 if the
husband completed college, a zero if he did not . The ten observations now read:
Obs Y X W H
1
2
3
4
5
6
7
8
0
2
1
3
1
3
4
2
0
1
2
1
0
3
4
2
1
0
1
0
0
0
0
1
1
0
1
0
0
0
0
0
9
10
1
2
2
1
1
0
1
0
R 2 adjusted for degrees of freedom is Rk2 
n  1R 2  k , where
k is the number of independent
n  k 1
variables. From this we can get an F statistic to check the value of adding r independent variables in
addition to the k already in use:
2
2
2
n  k  1  1  Rk  n  k  r  1 n  k  r  1  Rk  r  Rk 
2
F r ,n k r 1 





 . Though R adjusted for
2
2
r
r
r
1  Rk  r 
 1  Rk  r 
degrees of freedom is useful to compare as a preliminary way to see if adding new independent variables is
worthwhile, a trick using R 2 and a faked or real ANOVA table may be easier to follow. In the example we
have been using, three ANOVA tables are generated, the first for 1 independent variable, the second for two
independent variables and the third for three independent variables. Beneath the second two ANOVAs the
effect of adding the independent variables on SSR is given in sequence.
5
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
8
9
Analysis of Variance
SOURCE
Regression
Error
Total
DF
2
7
9
SOURCE
X
W
DF
1
1
DF
3
6
9
SOURCE
X
W
H
DF
1
1
1
9.857   2


2
2
 .816 
 k  2, R  .857 , R 
7


SS
MS
F
11.0512
5.5256
20.92
1.8488
0.2641
12.9000
p
0.023
p
0.001
SEQ SS
6.4000
4.6512
Analysis of Variance
SOURCE
Regression
Error
Total
9.496   1


2
2
 .433 
 k  1, R  .496 , R 
8


SS
MS
F
6.4000
6.4000
7.88
6.5000
0.8125
12.9000
9.906   3


2
2
 .859 
 k  3, R  .906 , R 
6


SS
MS
F
11.6937
3.8979
19.39
1.2063
0.2011
12.9000
p
0.002
SEQ SS
6.4000
4.6512
0.6425
If we wish to tell whether the addition of the second and third independent variables is an improvement over
the regression with one independent variable, r  2 and k  1 compute
2
n  k  1  1  Rk  n  k  r  1 10  1  1 1  .433  6
 .567 
F r ,n k r 1 



1  .859   2  4 .141   3  13 .09 or
2
r
r
2




1  Rk  r 
2
2
n  k  r  1  Rk  r  Rk  10  1  2  1  .906  .496  6  .410 


 1  .906   2  .094   13 .09
2
r
2




 1  Rk  r 
It may be easier to see what is happening if we copy the third ANOVA, but split the regression sum of
squares between the first independent variable and the last two, using the regression sum of squares in the
first regression or the itemization in the third.
F r ,n k r 1 
SOURCE
Regression
X
W,H
Error
Total
DF
3
1
2
6
9
SS
11.6937
6.4000
5.2937
1.2063
12.9000
MS
2.64685
0.2011
F
13.1619
6
The F-test here would be done with 2 and 6 degrees of freedom, and could as easily be done with .410 in
place of 5.2937 and .094 in place of 1.2063 as is done in the equation immediately above this ANOVA. The
2, 6
difference in the value of F is due to rounding. In any case the table value of Fr ,n  k  r 1  F.05
is 5.14 so
we reject the null hypothesis that W and H do not add to the explanatory power of the independent
variables.
Appendix to J1 – Derivation of the regression equations.
Ignore this appendix unless you have had calculus.
As we did with simple regression, we want to pick values of b0 ,b1 and b2 that minimize the sum of
squares, SS 
e
2
. To do so we use a technique involving partial derivatives. A partial derivative is
simply a derivative of a function of several variables with respect to one variable taken while presuming
that the other variables are constants. For example, remember that if a is a constant and x is a variable,
2
3
2
d ax  a , d
d
dx
dx ax  2ax and dx ax  3ax .(Also recall that if v is a function of x and f v  is a
 
function of v ,
d
dx
 f (v) 
 
df
dv
dv
Remembering this and using the symbol  for a partial derivative
dx )

we can find the partial derivatives of f x, y, z   xy 2 z 3 .
but y 2 z 3 is treated as a constant). Similarly,
treated as a constant) and





xy 2 z 3  y 2 z 3 (Remember, x is a variable,
x

xy 2 z 3  2xyz 3 (this time y is the variable and xz 3 is
y


xy 2 z 3  3xy 2 z 2 . Just as with regular derivatives, to have a minimum,
z
partial derivatives must equal zero.
If we take partial derivatives of the sum of squares with respect to our unknowns, b0 ,b1 and b2
and set them equal to zero, we find for SS 
 e    Y
2
j
j
SS

b0
 2Y
SS

b1
 2Y
SS

b2
 2Y
j
 b0  b1 X 1 j  b2 X 2 j
2  ,
j
 
j
 b0  b1 X 1 j  b2 X 2 j  1  0
j
 b0  b1 X 1 j  b2 X 2 j  X 1 j  0
j
 b0  b1 X 1 j  b2 X 2 j  X 2 j  0
j




j
j

 

If we multiply through by the term in parentheses on the right ( - 1,  X 1 j , or  X 2 j ), and divide by two,
these equations, called the Normal Equations, become:
Y  nb0  b1
X 1  b2
X2



X
Y

b
X

b


X b X X
X Y b X b X X b X
Sums such as  Y ,  X ,  X ,  X Y ,  X Y ,  X
1
0
2
0
1
2
2
1
1
1
1
2
2
1
2
1
1
2
2
2
2
2
2
1
X
and
2
2
are easy to compute, but
we are still left with three simultaneous equations in three unknowns, b0 ,b1 and b2 .
Fortunately, we can use the first of the three Normal Equations to reduce this system to two equations in
Y  nb0  b1
X 1  b2
X 2 , by n , we get
two unknowns. If we divide this first equation,



Y  b0  b1 X 1  b2 X 2 .
7
If we multiply this equation by nX 1 and subtract it from the second Normal Equation, then the
X Y b X
 X Y  nX Y  b  X
second Normal Equation,
1
X b X
 nX   b  X X
 b1
1
0
1
1
1
2
1
2
1
2
1
2
2
1
1X 2
2
becomes:

 nX 1 X 2 .
Similarly if we multiply the equation for Y by nX 2 , and subtract it from the third Normal
 X Y  b  X  b  X X  b  X , the third Normal Equation becomes:
 X Y  nX Y  b  X X  nX X   b  X  nX  .
Equation,
2
2
0
2
2
1
1
1
1
2
2
2
2
2
1
2
2
2
2
2
2
X Y , X Y ,
, X
and  X X ; third, to compute our spare parts,  X Y  nX Y ,  X Y  nX Y ,
 nX ,  X  nX and  X X  nX X :fourth, to substitute these numbers into the
Thus our procedure is : first, to compute Y , X 1 and X 2 ; second, to compute

X
X 12
2
1
2
2
2
1
1
2
2
1
2
2
2
1
Simplified Normal Equations:
X 1Y  nX 1Y  b1
2
1
1
1
2
2
2
2
 X  nX   b  X X  nX X 

 X Y  nX Y  b  X X  nX X   b  X  nX 
2
2
1
2
1
1
2
1
2
2
1
2
1
2
2
1
2
2
2
2
2
and solve them as two equations in two unknowns for b1 and b2 ; and, finally to get b0 by solving:
b0  Y  b1 X 1  b2 X 2 .
8
Download