Chapter 13 Multiple Regression

advertisement
Chapter 13
Multiple Regression







Multiple Regression Model
Least Squares Method
Multiple Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
Qualitative Independent Variables
© 2003 Thomson/South-Western
Slide 1
Multiple Regression Model


The equation that describes how the dependent
variable y is related to the independent variables x1,
x2, . . . xp and an error term is called the multiple
regression model.
The multiple regression model is:
y = b0 + b1x1 + b2x2 + . . . + bpxp + e
• b0, b1, b2, . . . , bp are the parameters.
• e is a random variable called the error term.
© 2003 Thomson/South-Western
Slide 2
Multiple Regression Equation


The equation that describes how the mean value of y
is related to x1, x2, . . . xp is called the multiple
regression equation.
The multiple regression equation is:
E(y) = b0 + b1x1 + b2x2 + . . . + bpxp
© 2003 Thomson/South-Western
Slide 3
Estimated Multiple Regression Equation

A simple random sample is used to compute sample
statistics b0, b1, b2, . . . , bp that are used as the point
estimators of the parameters b0, b1, b2, . . . , bp.

The estimated multiple regression equation is:
y^ = b0 + b1x1 + b2x2 + . . . + bpxp
© 2003 Thomson/South-Western
Slide 4
Estimation Process
Multiple Regression Model
E(y) = b0 + b1x1 + b2x2 + . . + bpxp + e
Multiple Regression Equation
E(y) = b0 + b1x1 + b2x2 + . . . + bpxp
Unknown parameters are
Sample Data:
x1 x2 . . . xp y
. .
. .
. .
. .
b0 , b1 , b2 , . . . , bp
b0 , b1 , b2 , . . . , bp
provide estimates of
b0 , b1 , b2 , . . . , bp
© 2003 Thomson/South-Western
Estimated Multiple
Regression Equation
yˆ  b0  b1 x1  b2 x2  ...  bp xp
b0, b1, b2, . . . , bp
are sample statistics
Slide 5
Least Squares Method

Least Squares Criterion
min  ( y i  y^i ) 2

Computation of Coefficients Values
The formulas for the regression coefficients b0, b1,
b2, . . . bp involve the use of matrix algebra. We will
rely on computer software packages to perform the
calculations.
© 2003 Thomson/South-Western
Slide 6
Least Squares Method

A Note on Interpretation of Coefficients
bi represents an estimate of the change in y
corresponding to a one-unit change in xi when all
other independent variables are held constant.
© 2003 Thomson/South-Western
Slide 7
Multiple Coefficient of Determination

Relationship Among SST, SSR, SSE
SST = SSR + SSE
2
2
2
ˆ
ˆ
(
y

y
)

(
y

y
)

(
y

y
)
 i
 i
 i i
© 2003 Thomson/South-Western
Slide 8
Multiple Coefficient of Determination

Multiple Coefficient of Determination
R 2 = SSR/SST

Adjusted Multiple Coefficient of Determination
Ra2  1  ( 1  R 2 )
© 2003 Thomson/South-Western
n1
np1
Slide 9
Model Assumptions

Assumptions About the Error Term e
1. The error e is a random variable with mean of
zero.
2. The variance of e , denoted by 2, is the same for
all values of the independent variables.
3. The values of e are independent.
4. The error e is a normally distributed random
variable reflecting the deviation between the y
value and the expected value of y given by
b0 + b1x1 + b2x2 + . . . + bpxp
© 2003 Thomson/South-Western
Slide 10
Example: Programmer Salary Survey
A software firm collected data for a sample of 20
computer programmers. A suggestion was made that
regression analysis could be used to determine if salary
was related to the years of experience and the score on
the firm’s programmer aptitude test.
The years of experience, score on the aptitude test,
and corresponding annual salary ($1000s) for a sample
of 20 programmers is shown on the next slide.
© 2003 Thomson/South-Western
Slide 11
Example: Programmer Salary Survey
Exper.
4
7
1
5
8
10
0
1
6
6
Score
78
100
86
82
86
84
75
80
83
91
Salary
24
43
23.7
34.3
35.8
38
22.2
23.1
30
33
© 2003 Thomson/South-Western
Exper.
9
2
10
5
6
8
4
6
3
3
Score
88
73
75
81
74
87
79
94
70
89
Salary
38
26.6
36.2
31.6
29
34
30.1
33.9
28.2
30
Slide 12
Example: Programmer Salary Survey

Multiple Regression Model
Suppose we believe that salary (y) is related to the
years of experience (x1) and the score on the
programmer aptitude test (x2) by the following
regression model:
y = b0 + b1x1 + b2x2 + e
where
y = annual salary ($000)
x1 = years of experience
x2 = score on programmer aptitude test
© 2003 Thomson/South-Western
Slide 13
Example: Programmer Salary Survey

Solving for the Estimates of b0, b1, b2
Least Squares
Output
Input Data
x1
x2 y
4 78 24
7 100 43
.
.
.
. . .
3 89 30
Computer
Package
for Solving
Multiple
Regression
Problems
© 2003 Thomson/South-Western
b0 =
b1 =
b2 =
R2 =
etc.
Slide 14
Example: Programmer Salary Survey

Minitab Computer Output
The regression is
Salary = 3.174 + 1.404 Exper + 0.251 Score
Predictor
Constant
Exper
Score
Coef
3.174
1.4039
.25089
s = 2.419
R-sq = 83.4%
© 2003 Thomson/South-Western
Stdev
6.156
.1986
.07735
t-ratio
.52
7.07
3.24
p
.613
.000
.005
R-sq(adj) = 81.5%
Slide 15
Example: Programmer Salary Survey

Estimated Regression Equation
SALARY = 3.174 + 1.404(EXPER) + 0.2509(SCORE)
Note: Predicted salary will be in thousands of dollars
© 2003 Thomson/South-Western
Slide 16
Testing for Significance


In simple linear regression, the F and t tests provide
the same conclusion.
In multiple regression, the F and t tests have different
purposes.
© 2003 Thomson/South-Western
Slide 17
Testing for Significance: F Test


The F test is used to determine whether a significant
relationship exists between the dependent variable
and the set of all the independent variables.
The F test is referred to as the test for overall
significance.
© 2003 Thomson/South-Western
Slide 18
Testing for Significance: t Test



If the F test shows an overall significance, the t test is
used to determine whether each of the individual
independent variables is significant.
A separate t test is conducted for each of the
independent variables in the model.
We refer to each of these t tests as a test for
individual significance.
© 2003 Thomson/South-Western
Slide 19
Testing for Significance: F Test



Hypotheses
H 0 : b 1 = b2 = . . . = bp = 0
Ha: One or more of the parameters
is not equal to zero.
Test Statistic
F = MSR/MSE
Rejection Rule
Reject H0 if F > F
where F is based on an F distribution with p d.f. in
the numerator and n - p - 1 d.f. in the denominator.
© 2003 Thomson/South-Western
Slide 20
Testing for Significance: t Test

Hypotheses
H 0 : bi = 0
H a : bi = 0

Test Statistic
bi
t
sbi

Rejection Rule
Reject H0 if t < -tor t > t
where t is based on a t distribution with
n - p - 1 degrees of freedom.
© 2003 Thomson/South-Western
Slide 21
Example: Programmer Salary Survey

Minitab Computer Output (continued)
Analysis of Variance
SOURCE
Regression
Error
Total
DF
2
17
19
SS
500.33
99.46
599.79
© 2003 Thomson/South-Western
MS
250.16
5.85
F
42.76
P
0.000
Slide 22
Example: Programmer Salary Survey

F Test
• Hypotheses
H 0 : b1 = b2 = 0
Ha: One or both of the parameters
is not equal to zero.
• Rejection Rule
For  = .05 and d.f. = 2, 17:
F.05 = 3.59
Reject H0 if F > 3.59.
© 2003 Thomson/South-Western
Slide 23
Example: Programmer Salary Survey

F Test
• Test Statistic
F = MSR/MSE
= 250.16/5.85 = 42.76
• Conclusion
F = 42.76 > 3.59, so we can reject H0.
© 2003 Thomson/South-Western
Slide 24
Example: Programmer Salary Survey

t Test for Significance of Individual Parameters
• Hypotheses
H 0 : bi = 0
H a : bi = 0
• Rejection Rule
For  = .05 and d.f. = 17:
t.025 = 2.11
Reject H0 if t > 2.11
© 2003 Thomson/South-Western
Slide 25
Example: Programmer Salary Survey

t Test for Significance of Individual Parameters
• Test Statistics
•
b1 1. 4039

 7 . 07
sb1
. 1986
Conclusions
b2 . 25089

 3. 24
sb2 . 07735
Reject H0: b1 = 0 and reject H0: b2 = 0.
Both independent variables are significant.
© 2003 Thomson/South-Western
Slide 26
Testing for Significance: Multicollinearity


The term multicollinearity refers to the correlation
among the independent variables.
When the independent variables are highly
correlated (say, |r | > .7), it is not possible to
determine the separate effect of any particular
independent variable on the dependent variable.
© 2003 Thomson/South-Western
Slide 27
Testing for Significance: Multicollinearity


If the estimated regression equation is to be used
only for predictive purposes, multicollinearity is
usually not a serious problem.
Every attempt should be made to avoid including
independent variables that are highly correlated.
© 2003 Thomson/South-Western
Slide 28
Using the Estimated Regression Equation
for Estimation and Prediction




The procedures for estimating the mean value of y
and predicting an individual value of y in multiple
regression are similar to those in simple regression.
We substitute the given values of x1, x2, . . . , xp into
the estimated regression equation and use the
corresponding value of y^ as the point estimate.
The formulas required to develop interval estimates
for the mean value of y and for an individual value
of y are beyond the scope of the text.
Software packages for multiple regression will often
provide these interval estimates.
© 2003 Thomson/South-Western
Slide 29
Qualitative Independent Variables



In many situations we must work with qualitative
independent variables such as gender (male, female),
method of payment (cash, check, credit card), etc.
For example, x2 might represent gender where x2 = 0
indicates male and x2 = 1 indicates female.
In this case, x2 is called a dummy or indicator
variable.
© 2003 Thomson/South-Western
Slide 30
Qualitative Independent Variables


If a qualitative variable has k levels, k - 1 dummy
variables are required, with each dummy variable
being coded as 0 or 1.
For example, a variable with levels A, B, and C
would be represented by x1 and x2 values of (0, 0),
(1, 0), and (0,1), respectively.
© 2003 Thomson/South-Western
Slide 31
Example: Programmer Salary Survey (B)
As an extension of the problem involving the
computer programmer salary survey, suppose that
management also believes that the annual salary is
related to whether or not the individual has a graduate
degree in computer science or information systems.
The years of experience, the score on the programmer
aptitude test, whether or not the individual has a
relevant graduate degree, and the annual salary ($000)
for each of the sampled 20 programmers are shown on
the next slide.
© 2003 Thomson/South-Western
Slide 32
Example: Programmer Salary Survey (B)
Exp. Score Degr. Salary
4
78
No
24
7
100 Yes
43
1
86
No
23.7
5
82
Yes
34.3
8
86
Yes
35.8
10
84
Yes
38
0
75
No
22.2
1
80
No
23.1
6
83
No
30
6
91
Yes
33
© 2003 Thomson/South-Western
Exp. Score Degr. Salary
9
88
Yes
38
2
73
No
26.6
10
75
Yes
36.2
5
81
No
31.6
6
74
No
29
8
87
Yes
34
4
79
No
30.1
6
94
Yes
33.9
3
70
No
28.2
3
89
No
30
Slide 33
Example: Programmer Salary Survey (B)


Multiple Regression Equation
E(y ) = b0 + b1x1 + b2x2 + b3x3
Estimated Regression Equation
y^ = b0 + b1x1 + b2x2 + b3x3
where
y = annual salary ($000)
x1 = years of experience
x2 = score on programmer aptitude test
x3 = 0 if individual does not have a grad. degree
1 if individual does have a grad. degree
Note: x3 is referred to as a dummy variable.
© 2003 Thomson/South-Western
Slide 34
Example: Programmer Salary Survey (B)

Minitab Computer Output
The regression is
Salary = 7.95 + 1.15 Exp + 0.197 Score + 2.28 Deg
Predictor
Constant
Exp
Score
Deg
s = 2.396
Coef
7.945
1.1476
.19694
2.280
Stdev
7.381
.2976
.0899
1.987
R-sq = 84.7%
© 2003 Thomson/South-Western
t-ratio
1.08
3.86
2.19
1.15
p
.298
.001
.044
.268
R-sq(adj) = 81.8%
Slide 35
Example: Programmer Salary Survey (B)

Minitab Computer Output (continued)
Analysis of Variance
SOURCE
Regression
Error
Total
DF
3
16
19
SS
507.90
91.89
599.79
© 2003 Thomson/South-Western
MS
169.30
5.74
F
29.48
P
0.000
Slide 36
Example: Programmer Salary Survey (B)

Interpreting the Parameters
• b1 = 1.15
Salary is expected to increase by $1,150 for each
additional year of experience (when all other
independent variables are held constant)
© 2003 Thomson/South-Western
Slide 37
Example: Programmer Salary Survey (B)

Interpreting the Parameters
• b2 = 0.197
Salary is expected to increase by $197 for each
additional point scored on the programmer
aptitude test (when all other independent
variables are held constant)
© 2003 Thomson/South-Western
Slide 38
Example: Programmer Salary Survey (B)

Interpreting the Parameters
• b3 = 2.28
Salary is expected to be $2,280 higher for an
individual with a graduate degree than one
without a graduate degree (when all other
independent variables are held constant)
© 2003 Thomson/South-Western
Slide 39
End of Chapter 13
© 2003 Thomson/South-Western
Slide 40
Download