Multiple Regression

advertisement
Objectives of Multiple Regression
• Establish the linear equation that best predicts values
of a dependent variable Y using more than one
explanatory variable from a large set of potential
predictors {x1, x2, ... xk}.
• Find that subset of all possible predictor variables that
explains a significant and appreciable proportion of the
variance of Y, trading off adequacy of prediction
against the cost of measuring more predictor
variables.
15-1
Expanding Simple Linear Regression
• Quadratic model.
Y = b0 + b1x1 + b2x1 + e
2
• General polynomial model.
Y = b0 + b1 x 1 + b2 x 1 2 +
b3x13 + ... + bkx1k + e
Adding one
or more
polynomial
terms to the
model.
Any independent variable, xi, which appears in
the polynomial regression model as xik is
called a kth-degree term.
15-2
Polynomial model shapes.
10
20
30
x
Adding one more
terms to the model
significantly improves
the model fit.
40
1.5 2.0 2.5yn 3.0 3.5
1.5 2.0 2.5y 3.0 3.5
Linear
10
Quadratic
20
30
40
xn
15-3
Incorporating Additional Predictors
Simple additive multiple regression model
y = b0 + b1x1 + b2x2 + b3x3 + ... + bkxk + e
Additive (Effect) Assumption - The expected
change in y per unit increment in xj is constant
and does not depend on the value of any other
predictor. This change in y is equal to bj.
15-4
Additive regression models:
For two independent variables, the response is modeled as a surface.
15-5
Interpreting Parameter Values
(Model Coefficients)
• “Intercept” - value of y when all predictors are 0.
b0
• “Partial slopes”
b1, b2, b3, ... bk
bj - describes the expected change in y per unit
increment in xj when all other predictors in
the model are held at a constant value.
15-6
Graphical depiction of bj.
b1 - slope in direction of x1.
b2 - slope in direction of x2.
15-7
Multiple Regression with Interaction Terms
Y = b0 + b1x1 + b2x2 +
b3x3 + ... + bkxk +
b12x1x2 + b13x1x3 +
cross-product
terms quantify
the interaction
among
predictors.
... + b1kx1xk + ...
+ bk-1,kxk-1xk + e
Interactive (Effect) Assumption: The effect of one
predictor, xi, on the response, y, will depend on the value
of one or more of the other predictors.
15-8
Interpreting Interaction
Interaction Model
y i  b 0  b 1 x i1  b 2 x i 2  b 12 x i1 x i 2  e i
or Define:
x i 3  x i1 x i 2
No
difference
y i  b 0  b 1 x i1  b 2 x i 2  b 12 x i 3  e i
b1 – No longer the expected change in Y per unit increment in X1!
b12 – No easy interpretation! The effect on y of a unit increment in X1,
now depends on X2.
15-9
x2=2
no-interaction
y
}b
}
x2=1
x2=0
2
b2
b1
b0
x1
y
b1
b 0+ 2 b 2
b0+b2
b0
x2=0
interaction
x2=1
b1+2b12
x2=2
x1
15-10
Multiple Regression models with interaction:
Lines move apart
Lines come together
15-11
Effect of the Interaction Term in Multiple Regression
Surface is twisted.
15-12
A Protocol for Multiple Regression
Identify all possible predictors.
Establish a method for estimating model
parameters and their standard errors.
Develop tests to determine if a parameter is
equal to zero (i.e. no evidence of association).
Reduce number of predictors appropriately.
Develop predictions and associated standard error.
15-13
Estimating Model Parameters
Least Squares Estimation
Assuming a random sample of n observations
(yi, xi1,xi2,...,xik), i=1,2,...,n. The estimates of
the parameters for the best predicting
equation:
yˆ i  bˆ 0  bˆ 1 x i1  bˆ 2 x i 2    bˆ k x ik
Is found by choosing the values:
bˆ 0 , bˆ 1 ,  , bˆ k
which minimize the expression:
n
SSE 
 ( y i  yˆ i ) 
2
i1
n
 ( y i  b 0  b 1x i1  b 2 x i 2    b k x ik )
2
i 1
15-14
Normal Equations
Take the partial derivatives of the SSE function with respect to b0,
b1,…, bk, and equate each equation to 0. Solve this system of k+1
equations in k+1 unknowns to obtain the equations for the
parameter estimates.
n
1
y
i
 n bˆ 0 
i1
n
x i1

i1


x i1bˆ 0 
i1
bˆ 1
 
i1
n

x i1bˆ 1
2
x
ik
bˆ k
x
i1
x ik bˆ k
i1
n
 
i1
i1


n
x ik
x
n
i1
n
x i1 y i 
n

i1
n
x ik y i 

i1
x ik bˆ 0 
n

i1
x ik x i1bˆ 1
n
 

x ik bˆ k
2
i1
15-15
An Overall Measure of How Well the Full
Model Performs
Coefficient of Multiple Determination
• Denoted as R2.
• Defined as the proportion of the variability in the
dependent variable y that is accounted for by the
independent variables, x1, x2, ..., xk, through the
regression model.
• With only one independent variable (k=1), R2 = r2, the
square of the simple correlation coefficient.
15-16
Computing the Coefficient of Determination
R
2
y  x1 x 2  x k

SSR

S yy  SSE
S yy
,
0  R 1
2
S yy
n
S yy 

( y i  y )  TSS
2
i 1
n
SSE


2
( y i  yˆ i )

y  bˆ 0  y i  bˆ1  x1 i y i    bˆ k  x ik y i
i 1
n

i 1
n
n
n
i 1
i 1
i 1
2
i
15-17
Multicollinearity
A further assumption in multiple regression (absent in SLR), is that the
predictors (x1, x2, ... xk) are statistically uncorrelated. That is, the
predictors do not co-vary. When the predictors are significantly
correlated (correlation greater than about 0.6) then the multiple
regression model is said to suffer from problems of multicollinearity.
r=0
r = 0.8
r = 0.6
5
5
6
4
3
x2
4
x2
x2
3
1
2
1
0
2
1
1
2
3
x
1
4
5
1
0
1
2
3
x
1
4
5
6
0
2
4
6
x
1
15-18
Effect of Multicollinearity on the Fitted Surface
Extreme collinearity
y
x
x
x
xx
x
x
2
x
x
x1
15-19
Multicollinearity leads to
•
Numerical instability in the estimates of the regression parameters – wild
fluctuations in these estimates if a few observations are added or removed.
•
No longer have simple interpretations for the regression coefficients in the
additive model.
Ways to detect multicollinearity
•
Scatterplots of the predictor variables.
•
Correlation matrix for the predictor variables – the higher these correlations
the worse the problem.
•
Variance Inflation Factors (VIFs) reported by software packages. Values
larger than 10 usually signal a substantial amount of collinearity.
What can be done about multicollinearity
•
Regression estimates are still OK, but the resulting confidence/prediction
intervals are very wide.
•
Choose explanatory variables wisely! (E.g. consider omitting one of two
highly correlated variables.)
•
More advanced solutions: principal components analysis; ridge regression.
15-20
Testing in Multiple Regression
• Testing individual parameters in the model.
• Computing predicted values and associated standard
errors.
Y  b 0  b1 X 1    b k X
k
e,
e ~ N (0, 
2
)
Overall AOV F-test
H0: None of the explanatory variables is a significant predictor of Y
F 
Reject if:
SSR / k
SSE /( n  k  1)

MSR
MSE
F  F k , n  k 1 ,
15-21
Standard Error for Partial Slope Estimate
The estimated standard error for:
s bˆ  ˆ e
j
and
bˆ j
1
n  ( k  1)
where
S x j x j (1  R x j  x1 x 2  x j 1 x j 1  x k )
2
2
R x j  x1 x 2  x j 1 x j 1  x k
n
S xjxj 
 0
s bˆ 
j
ij
 xj)
2
i 1
If there is high dependency?
R x j  x 1 x 2  x j 1 x j 1  x k  1
2
2
x j  x1 x 2  x j 1 x j 1  x k
 (x
is the coefficient of determination for the
model with xj as the dependent variable and
all other x variables as predictors.
What happens if all the predictors are
truly independent of each other?
R
SSE
ˆ e 
ˆ e
s bˆ  
j
S xjxj
15-22
Confidence Interval
100(1-)% Confidence Interval for bˆ j
bˆ j  t n  ( k  1), s bˆ
2
j
Reflects the number of data points minus the
number of parameters that have to be estimated.
df for SSE
15-23
Testing whether a partial slope coefficient is equal to zero.
H0
bj 0
Rejection Region:
Alternatives:
Ha
bj 0
t  t n  ( k  1),
bj 0
t   t n  ( k  1),
bj  0
| t | t n  ( k  1),
Test Statistic:
t 
2
bˆ j
s bˆ
j
15-24
Predicting Y
• We use the least squares fitted value, yˆ , as our
predictor of a single value of y at a particular value of
the explanatory variables (x1, x2, ..., xk).
• The corresponding interval about the predicted value of
y is called a prediction interval.
• The least squares fitted value also provides the best
predictor of E(y), the mean value of y, at a particular
value of (x1, x2, ..., xk). The corresponding interval for
the mean prediction is called a confidence interval.
• Formulas for these intervals are much more
complicated than in the case of SLR; they cannot be
calculated by hand (see the book).
15-25
Minimum R2 for a “Significant” Regression
Since we have formulas for R2 and F, in terms of n, k, SSE and TSS,
we can relate these two quantities.
We can then ask the question: what is the min R2 which will ensure
the regression model will be declared significant, as measured by the
appropriate quantile from the F distribution?
The answer (below), shows that this depends on n, k, and SSE/TSS.
R
2
min

k
SSE
n  k  1 TSS
F k , n  k  1 ,
15-26
Minimum R2 for Simple Linear Regression (k=1)
15-27
Download