Chap. 12: Multiple Regression

advertisement
© 2000 Prentice-Hall, Inc.
Statistics
Multiple Regression and Model
Building
Chapter 12 part I
12a - 1
Learning Objectives
© 2000 Prentice-Hall, Inc.
1. Explain the Linear Multiple Regression
Model
2. Explain Residual Analysis
3. Test Overall Significance
4. Explain Multicollinearity
5. Interpret Linear Multiple Regression
Computer Output
12a - 2
© 2000 Prentice-Hall, Inc.
Types of
Regression Models
1 Explanatory
Variable
Regression
Models
2+ Explanatory
Variables
Multiple
Simple
Linear
12a - 3
NonLinear
Linear
NonLinear
Regression Modeling
Steps
© 2000 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
12a - 4
Regression Modeling
Steps
© 2000 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
12a - 5
© 2000 Prentice-Hall, Inc.
Linear Multiple Regression
Model
Hypothesizing the
Deterministic Component
12a - 6
Regression Modeling
Steps
© 2000 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of Random
Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
12a - 7
© 2000 Prentice-Hall, Inc.
Linear Multiple
Regression Model
1. Relationship between 1 dependent &
2 or more independent variables is a
linear function
Population
Y-intercept
Population
slopes
Random
error
Yi   0   1X 1i   2 X 2i   k X ki   i
Dependent
(response)
variable
12a - 8
Independent
(explanatory)
variables
© 2000 Prentice-Hall, Inc.
Population Multiple
Regression Model
Bivariate model
Y
Response
Plane
X1
Yi =  0 +  1X1i +  2X2i +  i
(Observed Y)
0
i
X2
(X1i,X2i)
E(Y) =  0 +  1X1i +  2X2i
12a - 9
© 2000 Prentice-Hall, Inc.
Sample Multiple
Regression Model
Bivariate model
Y
Response
Plane
X1
Yi = ^0 + ^1X1i + ^2X2i + ^i
(Observed Y)
^

0
^
i
X2
(X1i,X2i)
^ ^
Yi =  0 + ^1X1i + ^2X2i
12a - 10
© 2000 Prentice-Hall, Inc.
Parameter Estimation
12a - 11
Regression Modeling
Steps
© 2000 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
12a - 12
© 2000 Prentice-Hall, Inc.
Multiple Linear
Regression Equations
Too
complicated
by hand!
12a - 13
Ouch!
© 2000 Prentice-Hall, Inc.
12a - 14
Interpretation of
Estimated Coefficients
Interpretation of
Estimated Coefficients
© 2000 Prentice-Hall, Inc.
^
1. Slope (k)

^
Estimated Y Changes by k for Each 1
Unit Increase in Xk Holding All Other
Variables Constant
^1 = 2, then Sales (Y) Is Expected
 Example: If 
to Increase by 2 for Each 1 Unit Increase in
Advertising (X1) Given the Number of Sales
Rep’s (X2)
12a - 15
Interpretation of
Estimated Coefficients
© 2000 Prentice-Hall, Inc.
^
1. Slope (k)

^
Estimated Y Changes by k for Each 1
Unit Increase in Xk Holding All Other
Variables Constant
^1 = 2, then Sales (Y) Is Expected
 Example: If 
to Increase by 2 for Each 1 Unit Increase in
Advertising (X1) Given the Number of Sales
Rep’s (X2)
^
2. Y-Intercept (0)

Average Value of Y When Xk = 0
12a - 16
© 2000 Prentice-Hall, Inc.
Parameter Estimation
Example
You work in advertising for
the New York Times. You
want to find the effect of
ad size (sq. in.) &
newspaper circulation
(000) on the number of ad
responses (00).
12a - 17
You’ve collected the
following data:
Resp Size Circ
1
1
2
4
8
8
1
3
1
3
5
7
2
6
4
4
10
6
Parameter Estimation
Computer Output
© 2000 Prentice-Hall, Inc.
^P
Parameter
Variable DF Estimate
INTERCEP 1
0.0640
ADSIZE
1
0.2049
CIRC
1
0.2805
Parameter Estimates
Standard T for H0:
Error Param=0 Prob>|T|
0.2599 0.246
0.8214
0.0588 3.656
0.0399
0.0686 4.089
0.0264
^0
^1
12a - 18
^2
© 2000 Prentice-Hall, Inc.
12a - 19
Interpretation of
Coefficients Solution
Interpretation of
Coefficients Solution
© 2000 Prentice-Hall, Inc.
^
1. Slope (1)

# Responses to Ad Is Expected to Increase
by .2049 (20.49) for Each 1 Sq. In. Increase
in Ad Size Holding Circulation Constant
12a - 20
Interpretation of
Coefficients Solution
© 2000 Prentice-Hall, Inc.
^
1. Slope (1)

# Responses to Ad Is Expected to Increase
by .2049 (20.49) for Each 1 Sq. In. Increase
in Ad Size Holding Circulation Constant
^
2. Slope (2)

# Responses to Ad Is Expected to Increase
by .2805 (28.05) for Each 1 Unit (1,000)
Increase in Circulation Holding Ad Size
Constant
12a - 21
© 2000 Prentice-Hall, Inc.
Evaluating the Model
12a - 22
Regression Modeling
Steps
© 2000 Prentice-Hall, Inc.
1. Hypothesize Deterministic Component
2. Estimate Unknown Model Parameters
3. Specify Probability Distribution of
Random Error Term

Estimate Standard Deviation of Error
4. Evaluate Model
5. Use Model for Prediction & Estimation
12a - 23
© 2000 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
12a - 24
© 2000 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
12a - 25
© 2000 Prentice-Hall, Inc.
Variation Measures
12a - 26
© 2000 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
12a - 27
© 2000 Prentice-Hall, Inc.
Coefficient of
Multiple Determination
1. Proportion of Variation in Y ‘Explained’
by All X Variables Taken Together
R2 = Explained Variation = SSR
Total Variation
SSyy
2. Never Decreases When New X Variable
Is Added to Model


Only Y Values Determine SSyy
Disadvantage When Comparing Models
12a - 28
© 2000 Prentice-Hall, Inc.
Residual Analysis
12a - 29
© 2000 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
12a - 30
Residual Analysis
© 2000 Prentice-Hall, Inc.
1. Graphical Analysis of Residuals

Plot Estimated Errors vs. Xi Values
Difference Between Actual Yi & Predicted Yi
 Estimated Errors Are Called Residuals


Plot Histogram or Stem-&-Leaf of Residuals
2. Purposes


Examine Functional Form (Linear vs.
Non-Linear Model)
Evaluate Violations of Assumptions
12a - 31
© 2000 Prentice-Hall, Inc.
Linear Regression
Assumptions
1. Mean of Probability Distribution of Error
Is 0
2. Probability Distribution of Error Has
Constant Variance
3. Probability Distribution of Error is
Normal
4. Errors Are Independent
12a - 32
© 2000 Prentice-Hall, Inc.
Residual Plot
for Functional Form
Add X2 Term
Correct Specification
^
e
^
e
X
12a - 33
X
© 2000 Prentice-Hall, Inc.
Residual Plot
for Equal Variance
Unequal Variance
SR
Correct Specification
SR
X
Fan-shaped.
Standardized residuals used typically.
12a - 34
X
© 2000 Prentice-Hall, Inc.
Residual Plot
for Independence
Not Independent
Correct Specification
SR
SR
X
Plots reflect sequence data were collected.
12a - 35
X
© 2000 Prentice-Hall, Inc.
Residual Analysis
Computer Output
Dep Var Predict
Student
Obs SALES
Value Residual Residual -2-1-0 1 2
1 1.0000 0.6000
0.4000
1.044 |
|**
2 1.0000 1.3000 -0.3000
-0.592 |
*|
3 2.0000 2.0000
0
0.000 |
|
4 2.0000 2.7000 -0.7000
-1.382 |
**|
5 4.0000 3.4000
0.6000
1.567 |
|***
Plot of standardized
(student) residuals
12a - 36
|
|
|
|
|
© 2000 Prentice-Hall, Inc.
Testing Parameters
12a - 37
© 2000 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
12a - 38
Testing Overall
Significance
© 2000 Prentice-Hall, Inc.
1. Shows If There Is a Linear Relationship
Between All X Variables Together & Y
2. Uses F Test Statistic
3. Hypotheses

H0: 1 = 2 = ... = k = 0


No Linear Relationship
Ha: At Least One Coefficient Is Not 0

At Least One X Variable Affects Y
12a - 39
Testing Overall Significance
Computer Output
© 2000 Prentice-Hall, Inc.
Analysis of Variance
Source DF
Model
2
Error
3
C Total 5
k
Sum of
Squares
9.2497
0.2503
9.5000
n - k -1
n-1
12a - 40
Mean
Square
4.6249
0.0834
F Value
55.440
Prob>F
0.0043
MS(Model)
MS(Error)
P-Value
© 2000 Prentice-Hall, Inc.
Multicollinearity
12a - 41
© 2000 Prentice-Hall, Inc.
Evaluating Multiple
Regression Model Steps
1. Examine Variation Measures
2. Do Residual Analysis
3. Test Parameter Significance


Overall Model
Individual Coefficients
4. Test for Multicollinearity
12a - 42
Multicollinearity
© 2000 Prentice-Hall, Inc.
1. High Correlation Between X Variables
2. Coefficients Measure Combined Effect
3. Leads to Unstable Coefficients
Depending on X Variables in Model
4. Always Exists -- Matter of Degree
5. Example: Using Both Age & Height as
Explanatory Variables in Same Model
12a - 43
Detecting
Multicollinearity
© 2000 Prentice-Hall, Inc.
1. Examine Correlation Matrix

Correlations Between Pairs of X Variables
Are More than With Y Variable
2. Examine Variance Inflation Factor (VIF)

If VIFj > 5, Multicollinearity Exists
3. Few Remedies


Obtain New Sample Data
Eliminate One Correlated X Variable
12a - 44
© 2000 Prentice-Hall, Inc.
Correlation Matrix
Computer Output
Correlation Analysis
Pearson Corr Coeff /Prob>|R| under HO:Rho=0/ N=6
RESPONSE
1.00000
0.0
ADSIZE
0.90932
0.0120
CIRC
0.93117
0.0069
ADSIZE
0.90932
0.0120
1.00000
0.0
0.74118
0.0918
CIRC
0.93117
0.0069
0.74118
0.0918
1.00000
0.0
RESPONSE
rY1
12a - 45
rY2
r12
All 1’s
© 2000 Prentice-Hall, Inc.
Variance Inflation Factors
Computer Output
Parameter Standard T for H0:
Variable DF Estimate
Error Param=0 Prob>|T|
INTERCEP 1
0.0640
0.2599 0.246
0.8214
ADSIZE
1
0.2049
0.0588 3.656
0.0399
CIRC
1
0.2805
0.0686 4.089
0.0264
Variable DF
INTERCEP 1
ADSIZE
1
CIRC
1
12a - 46
Variance
Inflation
0.0000
2.2190
2.2190
VIF1  5
© 2000 Prentice-Hall, Inc.
Regression Cautions
12a - 47
Regression Cautions
© 2000 Prentice-Hall, Inc.
1. Violated Assumptions
2. Relevancy of
Historical Data
3. Level of Significance
4. Extrapolation
5. Cause & Effect
12a - 48
Extrapolation
© 2000 Prentice-Hall, Inc.
Y
Interpolation
Extrapolation
Extrapolation
Relevant Range
12a - 49
X
Cause & Effect
© 2000 Prentice-Hall, Inc.
Liquor
Consumption
# Teachers
12a - 50
Conclusion
© 2000 Prentice-Hall, Inc.
1. Explained the Linear Multiple Regression
Model
2. Explained Residual Analysis
3. Tested Overall Significance
4. Explained Multicollinearity
5. Interpreted Linear Multiple Regression
Computer Output
12a - 51
End of Chapter
Any blank slides that follow are
blank intentionally.
Download