Chap 14_ slides 1-28

advertisement
Chapter 14
Multiple Regression Models
1
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Multiple Regression Models
A general additive multiple regression
model, which relates a dependent variable
y to k predictor variables x1, x2,…, xk is
given by the model equation
y =  + 1x1 + 2x2 + … + kxk + e
2
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Multiple Regression Models
The random deviation e is assumed to be
normally distributed with mean value 0 and
variance 2 for any particular values of x1,
x2,…, xk.
This implies that for fixed x1, x2,…, xk values,
y has a normal distribution with variance 2
and
(mean y value for
fixed x1, x2,…, xk values) =  + 1x1 + 2x2 + … + kxk
3
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Multiple Regression Models
The i’s are called population regression
coefficients; each i can be interpreted as
the true average change in y when the
predictor xi increases by 1 unit and the
values of all the other predictors remain
fixed.
The deterministic portion
+ 1x1 + 2x2 + … + kxk
is called the population regression function.
4
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Polynomial Regression Models
The kth degree polynomial regression
model
y =  + 1x + 2x2 + … + kxk + e
is a special case of the general multiple
regression model with x1 = x, x2 = x2, … ,
xk = xk.
The population regression function
(mean value of y for fixed values of the
predictors) is
 + 1x + 2x2 + … + kxk.
5
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Polynomial Regression Models
The most important special case other than
simple linear regression (k = 1) is the
quadratic regression model
y = + 1x + 2x2.
This model replaces the line y = + x with a
parabolic cure of mean values  + 1x + 2x2.
If 2 > 0, the curve opens upward, whereas if
2 < 0, the curve opens downward.
6
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Interaction
If the change in the mean y value
associated with a 1-unit increase in one
independent variable depends on the value
of a second independent variable, there is
interaction between these two variables.
When the variables are denoted by x1 and
x2, such interaction can be modeled by
including x1x2, the product of the variables
that interact, as a predictor variable.
7
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Qualitative Predictor Variables.
Up to now, we have only considered the
inclusion of quantitative (numerical) predictor
variables in a multiple regression model.
Two other types are very common:
Dichotomous variable: One with just two
possible categories coded 0 and 1
Examples
 Gender {male, female}
 Marriage status {married, notmarried}
8
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Qualitative Predictor Variables.
Ordinal variables: Categorical variables
that have a natural ordering
 Activity level {light, moderate,
heavy} coded respectively as 1, 2
and 3
 Education level {none, elementary,
secondary, college, graduate}
coded respectively 1, 2, 3, 4, 5 (or
for that matter any 5 consecutive
integers}
9
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Least Square Estimates
According to the principle of least squares,
the fit of a particular estimated regression
function a + b1x1 + b2x2 + … + bkxk to the
observed data is measured by the sum of
squared deviations between the observed y
values and the y values predicted by the
estimated function:
S[y –(a + b1x1 + b2x2 + … + bkxk )]2
The least squares estimates of , 1, 2,…, k are
those values of a, b1, b2, … , bk that make this sum of
squared deviations as small as possible.
10
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Predicted Values & Residuals
The first predicted value ŷ1 is obtained by
taking the values of the predictor variables
x1, x2,…, xk for the first sample observation
and substituting these values into the
estimated regression function.
Doing this successively for the remaining
observations yields the predicted values
yˆ 2 ,yˆ 3 , ,yˆ k
(sometimes referred to as the fitted values or fits).
11
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Predicted Values & Residuals
The residuals are then the differences
y1  yˆ 1,y 2  yˆ 2, ,yk  yˆ k
between the observed and predicted y
values.
12
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Sums of Squares
The residual (or error) sum of sqyares,
SSResid, and total sum of squares,
SSTo, are given by
SSResid=  y-yˆ 
2
SSTo=  y-y 
2
where y is the mean of the y observations
in the sample.
The number of degrees of freedom associated with
SSResid is n - (k + 1), because k + 1 df are lost in
estimating the k + 1 coefficients , 1, 2,…,k.
13
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Estimate for 2
An estimate of the random deviation
variance 2 is given by
SSResid
2
se 
n - (k + 1)
and se  s2e is the estimate of .
14
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Coefficient of Multiple Determination, R2
The coefficient of multiple determination,
R2, interpreted as the proportion of variation
in observed y values that is explained by
the fitted model, is
SSResid
2
R  1
SSTo
15
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Adjusted R2
Generally, a model with large R2 and small
se are desirable. If a large number of
variables (relative to the number of data
points) is used those conditions may be
satisfied but the model will be unrealistic
and difficult to interpret.
16
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Adjusted R2
To sort out this problem, sometimes
computer packages compute a quantity
called the adjusted R2,
 n  1  SSResid
adjusted R  1  
 SSTo
n

(k

1)


2
Notice that when a large number of variables are
used to build the model, this value will be
substantially lower than R2 and give a better
indication of usability of the model.
17
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
F Distributions
F distributions are similar to a Chi-Square
distributions, but have two parameters, dfden and
dfnum.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
The F Test for Model Utility
The regression sum of squares
denoted by SSReg is defined by
SSREG = SSTo - SSresid
19
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
The F Test for Model Utility
When all k i’s are zero in the model
y =  + 1x1 + 2x2 + … + kxk + e
And when the distribution of e is normal
with mean 0 and variance 2 for any
particular values of x1, x2,…, xk, the statistic
SSRe gr
F
k
SSRe sid
n  (k  1)
has an F probability distribution based on k
numerator df and n - (K+ 1) denominator df
20
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
The F Test for Utility of the Model
y =  +  1 x 1 +  2 x 2 + … +  kx k + e
Null hypothesis:
H0: 1 = 2 = … = k =0
(There is no useful linear relationship
between y and any of the predictors.)
Alternate hypothesis:
Ha: At least one among 1, 2, … , k is
not zero
(There is a useful linear relationship
between y and at least one of the
predictors.)
21
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
The F Test for Utility of the Model
y =  +  1 x 1 +  2 x 2 + … +  kx k + e
Test statistic:
F
SSRe gr
k
SSRe sid
n  (k  1)
where SSreg = SST0 - SSresid.
An alternate formula:
F
R2
(1  R2 )
k
n  (k  1)
where SSreg = SST0 - SSresid.
22
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
The F Test Utility of the Model
y =  +  1 x 1 +  2 x 2 + … +  kx k + e
The test is upper-tailed, and the information
in the Table of Values that capture specified
upper-tail F curve areas is used to obtain a
bound or bounds on the P-value using
numerator df = k and denominator
df = n - (k + 1).
Assumptions: For any particular combination of
predictor variable values, the distribution of e, the
random deviation, is normal with mean 0 and
constant variance.
23
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A number of years ago, a group of college
professors teaching statistics met at an NSF
program and put together a sample student
research project.
They attempted to create a model to explain lung
capacity in terms of a number of variables.
Specifically,
Numerical variables: height, age, weight, waist
Categorical variables: gender, activity level and
smoking status.
24
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
They managed to sample 41 subjects and
obtain/measure the variables.
There was some discussion and many felt
that the calculated variable (height)(waist)2
would be useful since it would likely be
proportional to the volume of the individual.
The initial regression analysis performed
with Minitab appears on the next slide.
25
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Linear Model with All Numerical Variables
The regression equation is
Capacity = - 13.0 - 0.0158 Age + 0.232 Height - 0.00064
Weight - 0.0029 Chest
+ 0.101 Waist -0.000018 hw2
40 cases used 1 cases contain missing values
Predictor
Coef
Constant
-13.016
Age
-0.015801
Height
0.23215
Weight
-0.000639
Chest
-0.00294
Waist
0.10068
hw2
-0.00001814
S = 0.5260
26
SE Coef
2.865
0.007847
0.02895
0.006542
0.06491
0.09427
0.00001761
R-Sq = 78.2%
T
-4.54
-2.01
8.02
-0.10
-0.05
1.07
-1.03
P
0.000
0.052
0.000
0.923
0.964
0.293
0.310
R-Sq(adj) = 74.2%
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
The only coefficient that appeared to be
significant and the 5% level was the height.
Since the P-value for the coefficient on the
age was very close to 5% (5.2%) it was
decided that a linear model with the two
independent variables height and age would
be calculated.
The resulting model is on the next slide.
27
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Linear Model with variables: Height & Age
Notice that even though the R2 value
decreases slightly, the adjusted R2 value
actually increases. Also note that the
coefficient on Age is now significant at 5%.
The regression equation is
Capacity = - 10.2 + 0.215 Height - 0.0133 Age
40 cases used 1 cases contain missing values
Predictor
Constant
Height
Age
S = 0.5073
28
Coef
-10.217
0.21481
-0.013322
SE Coef
1.272
0.01921
0.005861
R-Sq = 77.2%
T
-8.03
11.18
-2.27
P
0.000
0.000
0.029
R-Sq(adj) = 76.0%
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
In an attempt to determine if incorporating
the categorical variables into the model
would significantly enhance the it.
Gender was coded as an indicator variable
(male = 0 and female = 1),
Smoking was coded as an indicator variable
(No = 0 and Yes = 1), and
Activity level (light, moderate, heavy) was
coded respectively as 1, 2 and 3.
The resulting Minitab output is given on the
next slide.
29
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Linear Model with categorical variables added
The regression equation is
Capacity = - 7.58 + 0.171 Height - 0.0113 Age - 0.383 C-Gender
+ 0.260 C-Activity - 0.289 C-Smoke
37 cases used 4 cases contain missing values
Predictor
Constant
Height
Age
C-Gender
C-Activi
C-Smoke
S = 0.4596
30
Coef
-7.584
0.17076
-0.011261
-0.3827
0.2600
-0.2885
SE Coef
2.005
0.02919
0.005908
0.2505
0.1210
0.2126
R-Sq = 84.2%
T
-3.78
5.85
-1.91
-1.53
2.15
-1.36
P
0.001
0.000
0.066
0.137
0.040
0.185
R-Sq(adj) = 81.7%
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
It was noted that coefficient for the coded
indicator variables gender and smoking
were not significant, but after considerable
discussion, the group felt that a number of
the variables were related.
This, the group felt, was confounding the study. In an
attempt to determine a reasonable optimal subgroup of
the variables to keep in the study, it was noted that a
number of the variables were highly related. Since the
study was small, a stepwise regression was run and the
variables, Height, Age, Coded Activity, Coded Gender
were kept and the following model was obtained.
31
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Linear Model with Height, Age & Coded Activity
and Gender
The regression equation is
Capacity = - 6.93 + 0.161 Height - 0.0137 Age
+ 0.302 C-Activity - 0.466 C-Gender
40 cases used 1 cases contain missing values
Predictor
Constant
Height
Age
C-Activi
C-Gender
S = 0.4477
32
Coef
-6.929
0.16079
-0.013744
0.3025
-0.4658
SE Coef
1.708
0.02454
0.005404
0.1133
0.2082
R-Sq = 83.2%
T
-4.06
6.55
-2.54
2.67
-2.24
P
0.000
0.000
0.016
0.011
0.032
R-Sq(adj) = 81.3%
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Linear Model with Height, Age & Coded Activity
and Gender
The rest of the Minitab output is given below.
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Height
Age
C-Activi
C-Gender
DF
1
1
1
1
DF
4
35
39
SS
34.8249
7.0151
41.8399
MS
8.7062
0.2004
F
43.44
P
0.000
Seq SS
30.9878
1.3296
1.5041
1.0034
Unusual Observations
Obs
Height
Capacity
4
66.0
2.2000
23
74.0
5.7000
39
70.0
5.4000
Fit
3.2039
4.7635
4.4228
SE Fit
0.1352
0.2048
0.1064
Residual
-1.0039
0.9365
0.9772
St Resid
-2.35R
2.35R
2.25R
R denotes an observation with a large standardized residual
33
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Linear Model with Height, Age & Coded Activity
and Gender
All of the coefficients in this model were
significant at the 5% level and the R2 and
adjusted R2 were both fairly large.
This appeared to be a reasonable model for
describing lung capacity even though the
study was limited by sample size, and
measurement limitations due to antique
equipment.
Minitab identified 3 outliers (because the standardized
residuals were unusually large.
Various plots of the standardized residuals are produced
on the next few slides with comments
34
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Linear Model with Height, Age & Coded Activity
and Gender
Histogram of the Residuals
(response is Capacity)
Frequency
10
5
0
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
Residual
The histogram of the residuals appears to be consistent
with the assumption that the residuals are a sample from
a normal distribution.
35
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Linear Model with Height, Age & Coded Activity
and Gender
Normal Probability Plot of the Residuals
(response is Capacity)
2
Normal Score
1
0
-1
-2
-1
0
1
Residual
The normality plot also tends to indicate the residuals
can reasonably be thought to be a sample from a normal
distribution.
36
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Linear Model with Height, Age & Coded Activity
and Gender
Residuals Versus the Fitted Values
(response is Capacity)
Residual
1
0
-1
2
3
4
5
6
Fitted Value
37
The residual plot also tends to indicate that the model
assumptions are not unreasonable, although there would be
some concern that the residuals are predominantly positive
for smaller fitted lung capacities.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Download