KNNL_Ch06

advertisement
Multiple Regression I
KNNL – Chapter 6
Models with Multiple Predictors
• Most Practical Problems have more than one
potential predictor variable
• Goal is to determine effects (if any) of each
predictor, controlling for others
• Can include polynomial terms to allow for
nonlinear relations
• Can include product terms to allow for
interactions when effect of one variable
depends on level of another variable
• Can include “dummy” variables for categorical
predictors
First-Order Model with 2 Numeric Predictors
Yi  0  1 X i1  2 X i 2   i
E  i   0  E Y   0  1 X1  2 X 2 Plane in 3-dimensions
E(Y)=100-5X1+10X2
200
E(Y)
150
100
150-200
50
100-150
01
4
7
10
X1
13
16
19
50-100
0-50
Interpretation of Regression Coefficients
• Additive: E{Y} = 0  1X1 + 2X2 ≡ Mean of Y @ X1, X2
• 0 ≡ Intercept, Mean of Y when X1=X2=0
• 1 ≡ Slope with Respect to X1 (effect of increasing X1 by 1
unit, while holding X2 constant)
• 2 ≡ Slope with Respect to X2 (effect of increasing X2 by 1
unit, while holding X1 constant)
• These can also be obtained by taking the partial
derivatives of E{Y} with respect to X1 and X2, respectively
• Interaction Model: E{Y} = 0  1X1 + 2X2 + 3X1X2
• When X2 = 0: Effect of increasing X1 by 1: 1(1)+3(1)(0)= 1
• When X2 = 1: Effect of increasing X1 by 1: 1(1)+3(1)(1)= 1+3
• The effect of increasing X1 depends on level of X2, and vice versa
General Linear Regression Model
Yi   0  1 X i1   2 X i 2  ...   p 1 X i , p 1   i
p 1
 Yi   0    k X ik   i
k 1
p 1
 Yi    k X ik   i
k 0
where: X i 0  1
E  i   0  E Y    0  1 X 1   2 X 2  ...   p 1 X p 1 (Hyperplane in p -dimensions)
p  1  1  Simple linear regression
Normality, independence, and constant variance for errors:
 i ~ NID  0,  2   Yi ~ N   0  1 X i1   2 X i 2  ...   p 1 X i , p 1 ,  2   Yi , Y j   0  i  j
Special Types of Variables/Models - I
• p-1 distinct numeric predictors (attributes)
 Y = Sales, X1=Advertising, X2=Price
• Categorical Predictors – Indicator (Dummy) variables,
representing m-1 levels of a m level categorical variable
 Y = Salary, X1=Experience, X2=1 if College Grad, 0 if Not
• Polynomial Terms – Allow for bends in the Regression
 Y=MPG, X1=Speed, X2=Speed2
• Transformed Variables – Transformed Y variable to
achieve linearity Y’=ln(Y) Y’=1/Y
Special Types of Variables/Models - II
• Interaction Effects – Effect of one predictor depends on
levels of other predictors






Y = Salary, X1=Experience, X2=1 if Coll Grad, 0 if Not, X3=X1X2
E(Y) = 0  1X1  2X2  3X1X2
Non-College Grads (X2 =0):
E(Y) = 0  1X1  2(0)  3X1(0)= 0  1X1
College Grads (X2 =1):
E(Y) = 0  1X1  2(1)  3X1(1)= 0  21 3 X1
• Response Surface Models
 E(Y) = 0  1X1  2X12  3X2  4X22 5X1X2
• Note: Although the Response Surface Model has
polynomial terms, it is linear wrt Regression parameters
Matrix Form of Regression Model
Yi   0  1 X i1   2 X i 2  ...   p 1 X i , p 1   i
i  1,..., n
Matrix Form:
 Y1 
Y 
Y   2
n1
 
 
Yn 
1 X 11
1 X
21
X 
n p


1 X n1
 0 
  
1 
β 


p1



 p 1 
 1 
 
ε   2
n1
 
 
 n 
Y X β ε
n1
n p p1
n1
 
X 12
X 22
X n2
X 1, p 1 
X 2, p 1 


X n , p 1 
0 
0 
E ε  
n×1
 
 
0 
 


 E Y = EX β + ε   X β
n×1
n×p p×1 n×1  n p p1
 2 0

2
0

2
σ ε 
n1


0
 0


σ 2 Y   2I
n1
0

0
  2I

2
 
Least Squares Estimation of Regression Coefficients
Goal: Minimize: Q      Yi   0  1 X i1  ...   p 1 X i , p 1 
n
i 1
n
2
i
2
i 1
 Obtain Estimates of  0 , 1 ,...,  p 1 that minimize Q  b0 , b1 ,..., bp 1
 b0 
b 
-1
1 

Normal Equations: X ' X b  X ' Y  b 
  X'X  X'Y
p p p1
p1
p1




b
 p 1 
Maximum Likelihood also leads to the same estimator b :

L β, 
2
 
 2
2

 n /2
 1
exp   2
 2

Y




X

...


X
 i 0 1 i1

p 1 i , p 1  
i 1

n
2
 Y  
n
since maximizing L involves minimizing
i 1
i
0
 1 X i1  ...   p 1 X i , p 1 
2
Fitted Values and Residuals
^ 
Y 1 
^ 
^
Fitted Values: Y  Y 2 
n1
 
 
Y^ 
 n
^
 e1 
e 
Residuals: e   2 
n1
 
 
 en 
Y  X b  X  X'X  X'Y = HY
H = X  X'X  X' H = H' = HH
-1
n1
-1
n p p1
^
e = Y  Y  Y  X b  Y  X  X'X  X'Y =  I  H  Y
n1
-1
n1
n1

2
^
n1
n p p1
σ Y = σ HY = Hσ Y H' = σ H
2
2
2
I  H   I  H  '  I  H  I  H 

^
s Y  MSE  H 
2
σ 2 e = σ 2  I  H  Y =  I  H  σ 2 Y I  H  ' = σ 2  I  H 
s 2 e  MSE  I  H 
Analysis of Variance – Sums of Squares
^ 
Y 1 
^ 
^
Y  Y 2   Xb  HY
n1
 
 
Y^ n 
 
 Y1 
Y 
Y   2
n1
 
 
Yn 
n

SSTO   Yi  Y
i 1
Y 
 
1
Y
1
 
Y     
n1
n
 
1
Y 
 Y1 
1  
 Y2    1  JY
    n 
1  
Yn 
   Y  Y  '  Y  Y   Y'  I   1n  J  Y
2
2
^
^
^

 
 

SSE    Yi  Y i    Y  Y  '  Y  Y   Y'  I  H  Y  Y'Y - b'X'Y
 
 

i 1 
n
MSE 

1 
1
^
 ^
 ^

SSR    Y i  Y    Y  Y  '  Y  Y   Y'  H    J  Y  b'X'Y  Y'   JY
 
 

n 
n
i 1 

SSE
n p
2
n
MSR 
E MSE   2
p 1
p 1
E MSR      SS kk     k  k ' SS kk '
2
k 1
E MSR  E MSE
2
k
k 1 k ' k
n

SS kk '   X ik  X k
i 1
 X
E MSR  E MSE  1  ...   p 1  0
ik '
 X k'

SSR
p 1
ANOVA Table, F-test, and R2
Analysis of Variance (ANOVA) Table
Source
df
Sum of Squares
Mean Square
------------------------------------------------------------------------------------------------------------------------------------Regression
p 1
1
SSR  b'X'Y - Y'   JY
n
Error
n p
SSE  Y'Y - b'X'Y
Total
n 1
1
SSTO  Y'Y - Y'   JY
n
Test of H 0 : 1  ...   p 1  0
Test Statistic: F * 
MSR
MSE
 E Y    
MSR 
MSE 
SSR
p 1
SSE
n p
H A : Not all  k  0
0
Rejection Region: F *  F 1   ; p  1, n  p  P-value= Pr F  p  1, n  p   F *
Coefficient of Multiple Determination: R 2 
SSR
SSE
1
SSTO
SSTO
 SSE 
n  p
  1   n  1  SSE
2
Adjusted-R  1  


 SSTO 
 n  p  SSTO
 n  1 
Correlation: R  R 2
places a "penalty" on models with extra predictors
Inferences Regarding Regression Parameters
E ε = 0
Y = Xβ + ε

σ 2 ε =  2I  E Y = Xβ

σ 2 Y =  2I
E b = E  X'X  X'Y =  X'X  X'E Y =  X'X  X'Xβ = β
-1
-1
-1
 b0 , bp 1
  2 b0 
 b0 , b1

  b1 , b0 
 2 b1
2
σ b  


 bp 1 , b0   bp 1 , b1
σ 2 b = σ 2
 X'X 
-1
s 2 b = MSE  X'X 
bk   k
~ tn  p
s bk 


 b1 , bp 1 



 2 bp 1 


 s 2 b0 
s b0 , b1 

 s b1 , b0 
s 2 b1
2
s b  


 s bp 1 , b0  s bp 1 , b1
X'Y   X'X  X'σ 2 Y  X'X  X' '   2  X'X  X'X  X'X    2  X'X 
-1
-1
-1
-1
1   100% CI for  k  bk  t 1 
Test of H 0 :  k  0 H A :  k  0

Test Statistic: t* 
 

Rejection Region: t *  t 1  ; n  p 
 2



; n  p  s bk 
2

bk
s bk 
P-value=2Pr  t  n  p   t * 



Simultaneous 1   100% CI s for g  p  s : bk  t 1 
; n  p  s bk 
 2g

-1
s b0 , b p 1

s b1 , b p 1 



s 2 bp 1 
-1
Estimating Mean Response at Specific X-levels
Given set of levels of X 1 ,..., X p 1 : X h1 ,..., X h , p 1
 1 
 X 
h1 
Xh  


p1


X
 h , p 1 

^
E Yh  X β
'
h
^
E Yh   X β
Y h  X'h b
'
h

^
 Y h  X σ b X h   X  X'X  X h
2
1   100% CI for E

^
Yh
'
h
2
2
-1
'
h

^

s Y h  MSE X 'h  X'X  X h
2
-1


 
 ^
: Y h  t 1  ; n  p  s Y h
 2

^
1   100% Confidence Region for Regression Surface:

^
^

^
1   100% CI for several (g ) E Y h : Y h  Bs Y h
^

^
Y h  Ws Y h
W 2  pF 1   ; p, n  p 



B  t 1 
;n  p
 2g

Predicting New Response(s) at Specific X-levels
Given set of levels of X 1 ,..., X p 1 : X h1 ,..., X h , p 1
 1 
 X 
h1 
Xh  


p1


 X h , p 1 
E Yh   X β
'
h

s 2 pred  MSE 1  X'h  X'X  X h
-1
^
Y h  X'h b

-1
1

mean of m observations (at same X-levels): s 2 predmean  MSE   X'h  X'X  X h 
m

1   100% CI for Yh (new) :
 

Y h  t 1  ; n  p  s pred
 2

^
^
Scheffe: 1   100% CI for several (g ) Yh (new) : Y h  Ss pred
^
Bonferroni: 1   100% CI for several (g ) Yh (new) : Y h  Bs pred
S 2  gF 1   ; g , n  p 



B  t 1 
;n  p 
 2g

Download