© 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I 12a - 1 Learning Objectives © 2000 Prentice-Hall, Inc. 1. Explain the Linear Multiple Regression Model 2. Explain Residual Analysis 3. Test Overall Significance 4. Explain Multicollinearity 5. Interpret Linear Multiple Regression Computer Output 12a - 2 © 2000 Prentice-Hall, Inc. Types of Regression Models 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Simple Linear 12a - 3 NonLinear Linear NonLinear Regression Modeling Steps © 2000 Prentice-Hall, Inc. 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation 12a - 4 Regression Modeling Steps © 2000 Prentice-Hall, Inc. 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation 12a - 5 © 2000 Prentice-Hall, Inc. Linear Multiple Regression Model Hypothesizing the Deterministic Component 12a - 6 Regression Modeling Steps © 2000 Prentice-Hall, Inc. 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation 12a - 7 © 2000 Prentice-Hall, Inc. Linear Multiple Regression Model 1. Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Yi 0 1X 1i 2 X 2i k X ki i Dependent (response) variable 12a - 8 Independent (explanatory) variables © 2000 Prentice-Hall, Inc. Population Multiple Regression Model Bivariate model Y Response Plane X1 Yi = 0 + 1X1i + 2X2i + i (Observed Y) 0 i X2 (X1i,X2i) E(Y) = 0 + 1X1i + 2X2i 12a - 9 © 2000 Prentice-Hall, Inc. Sample Multiple Regression Model Bivariate model Y Response Plane X1 Yi = ^0 + ^1X1i + ^2X2i + ^i (Observed Y) ^ 0 ^ i X2 (X1i,X2i) ^ ^ Yi = 0 + ^1X1i + ^2X2i 12a - 10 © 2000 Prentice-Hall, Inc. Parameter Estimation 12a - 11 Regression Modeling Steps © 2000 Prentice-Hall, Inc. 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation 12a - 12 © 2000 Prentice-Hall, Inc. Multiple Linear Regression Equations Too complicated by hand! 12a - 13 Ouch! © 2000 Prentice-Hall, Inc. 12a - 14 Interpretation of Estimated Coefficients Interpretation of Estimated Coefficients © 2000 Prentice-Hall, Inc. ^ 1. Slope (k) ^ Estimated Y Changes by k for Each 1 Unit Increase in Xk Holding All Other Variables Constant ^1 = 2, then Sales (Y) Is Expected Example: If to Increase by 2 for Each 1 Unit Increase in Advertising (X1) Given the Number of Sales Rep’s (X2) 12a - 15 Interpretation of Estimated Coefficients © 2000 Prentice-Hall, Inc. ^ 1. Slope (k) ^ Estimated Y Changes by k for Each 1 Unit Increase in Xk Holding All Other Variables Constant ^1 = 2, then Sales (Y) Is Expected Example: If to Increase by 2 for Each 1 Unit Increase in Advertising (X1) Given the Number of Sales Rep’s (X2) ^ 2. Y-Intercept (0) Average Value of Y When Xk = 0 12a - 16 © 2000 Prentice-Hall, Inc. Parameter Estimation Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00). 12a - 17 You’ve collected the following data: Resp Size Circ 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 Parameter Estimation Computer Output © 2000 Prentice-Hall, Inc. ^P Parameter Variable DF Estimate INTERCEP 1 0.0640 ADSIZE 1 0.2049 CIRC 1 0.2805 Parameter Estimates Standard T for H0: Error Param=0 Prob>|T| 0.2599 0.246 0.8214 0.0588 3.656 0.0399 0.0686 4.089 0.0264 ^0 ^1 12a - 18 ^2 © 2000 Prentice-Hall, Inc. 12a - 19 Interpretation of Coefficients Solution Interpretation of Coefficients Solution © 2000 Prentice-Hall, Inc. ^ 1. Slope (1) # Responses to Ad Is Expected to Increase by .2049 (20.49) for Each 1 Sq. In. Increase in Ad Size Holding Circulation Constant 12a - 20 Interpretation of Coefficients Solution © 2000 Prentice-Hall, Inc. ^ 1. Slope (1) # Responses to Ad Is Expected to Increase by .2049 (20.49) for Each 1 Sq. In. Increase in Ad Size Holding Circulation Constant ^ 2. Slope (2) # Responses to Ad Is Expected to Increase by .2805 (28.05) for Each 1 Unit (1,000) Increase in Circulation Holding Ad Size Constant 12a - 21 © 2000 Prentice-Hall, Inc. Evaluating the Model 12a - 22 Regression Modeling Steps © 2000 Prentice-Hall, Inc. 1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation 12a - 23 © 2000 Prentice-Hall, Inc. Evaluating Multiple Regression Model Steps 1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity 12a - 24 © 2000 Prentice-Hall, Inc. Evaluating Multiple Regression Model Steps 1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity 12a - 25 © 2000 Prentice-Hall, Inc. Variation Measures 12a - 26 © 2000 Prentice-Hall, Inc. Evaluating Multiple Regression Model Steps 1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity 12a - 27 © 2000 Prentice-Hall, Inc. Coefficient of Multiple Determination 1. Proportion of Variation in Y ‘Explained’ by All X Variables Taken Together R2 = Explained Variation = SSR Total Variation SSyy 2. Never Decreases When New X Variable Is Added to Model Only Y Values Determine SSyy Disadvantage When Comparing Models 12a - 28 © 2000 Prentice-Hall, Inc. Residual Analysis 12a - 29 © 2000 Prentice-Hall, Inc. Evaluating Multiple Regression Model Steps 1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity 12a - 30 Residual Analysis © 2000 Prentice-Hall, Inc. 1. Graphical Analysis of Residuals Plot Estimated Errors vs. Xi Values Difference Between Actual Yi & Predicted Yi Estimated Errors Are Called Residuals Plot Histogram or Stem-&-Leaf of Residuals 2. Purposes Examine Functional Form (Linear vs. Non-Linear Model) Evaluate Violations of Assumptions 12a - 31 © 2000 Prentice-Hall, Inc. Linear Regression Assumptions 1. Mean of Probability Distribution of Error Is 0 2. Probability Distribution of Error Has Constant Variance 3. Probability Distribution of Error is Normal 4. Errors Are Independent 12a - 32 © 2000 Prentice-Hall, Inc. Residual Plot for Functional Form Add X2 Term Correct Specification ^ e ^ e X 12a - 33 X © 2000 Prentice-Hall, Inc. Residual Plot for Equal Variance Unequal Variance SR Correct Specification SR X Fan-shaped. Standardized residuals used typically. 12a - 34 X © 2000 Prentice-Hall, Inc. Residual Plot for Independence Not Independent Correct Specification SR SR X Plots reflect sequence data were collected. 12a - 35 X © 2000 Prentice-Hall, Inc. Residual Analysis Computer Output Dep Var Predict Student Obs SALES Value Residual Residual -2-1-0 1 2 1 1.0000 0.6000 0.4000 1.044 | |** 2 1.0000 1.3000 -0.3000 -0.592 | *| 3 2.0000 2.0000 0 0.000 | | 4 2.0000 2.7000 -0.7000 -1.382 | **| 5 4.0000 3.4000 0.6000 1.567 | |*** Plot of standardized (student) residuals 12a - 36 | | | | | © 2000 Prentice-Hall, Inc. Testing Parameters 12a - 37 © 2000 Prentice-Hall, Inc. Evaluating Multiple Regression Model Steps 1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity 12a - 38 Testing Overall Significance © 2000 Prentice-Hall, Inc. 1. Shows If There Is a Linear Relationship Between All X Variables Together & Y 2. Uses F Test Statistic 3. Hypotheses H0: 1 = 2 = ... = k = 0 No Linear Relationship Ha: At Least One Coefficient Is Not 0 At Least One X Variable Affects Y 12a - 39 Testing Overall Significance Computer Output © 2000 Prentice-Hall, Inc. Analysis of Variance Source DF Model 2 Error 3 C Total 5 k Sum of Squares 9.2497 0.2503 9.5000 n - k -1 n-1 12a - 40 Mean Square 4.6249 0.0834 F Value 55.440 Prob>F 0.0043 MS(Model) MS(Error) P-Value © 2000 Prentice-Hall, Inc. Multicollinearity 12a - 41 © 2000 Prentice-Hall, Inc. Evaluating Multiple Regression Model Steps 1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity 12a - 42 Multicollinearity © 2000 Prentice-Hall, Inc. 1. High Correlation Between X Variables 2. Coefficients Measure Combined Effect 3. Leads to Unstable Coefficients Depending on X Variables in Model 4. Always Exists -- Matter of Degree 5. Example: Using Both Age & Height as Explanatory Variables in Same Model 12a - 43 Detecting Multicollinearity © 2000 Prentice-Hall, Inc. 1. Examine Correlation Matrix Correlations Between Pairs of X Variables Are More than With Y Variable 2. Examine Variance Inflation Factor (VIF) If VIFj > 5, Multicollinearity Exists 3. Few Remedies Obtain New Sample Data Eliminate One Correlated X Variable 12a - 44 © 2000 Prentice-Hall, Inc. Correlation Matrix Computer Output Correlation Analysis Pearson Corr Coeff /Prob>|R| under HO:Rho=0/ N=6 RESPONSE 1.00000 0.0 ADSIZE 0.90932 0.0120 CIRC 0.93117 0.0069 ADSIZE 0.90932 0.0120 1.00000 0.0 0.74118 0.0918 CIRC 0.93117 0.0069 0.74118 0.0918 1.00000 0.0 RESPONSE rY1 12a - 45 rY2 r12 All 1’s © 2000 Prentice-Hall, Inc. Variance Inflation Factors Computer Output Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 0.0640 0.2599 0.246 0.8214 ADSIZE 1 0.2049 0.0588 3.656 0.0399 CIRC 1 0.2805 0.0686 4.089 0.0264 Variable DF INTERCEP 1 ADSIZE 1 CIRC 1 12a - 46 Variance Inflation 0.0000 2.2190 2.2190 VIF1 5 © 2000 Prentice-Hall, Inc. Regression Cautions 12a - 47 Regression Cautions © 2000 Prentice-Hall, Inc. 1. Violated Assumptions 2. Relevancy of Historical Data 3. Level of Significance 4. Extrapolation 5. Cause & Effect 12a - 48 Extrapolation © 2000 Prentice-Hall, Inc. Y Interpolation Extrapolation Extrapolation Relevant Range 12a - 49 X Cause & Effect © 2000 Prentice-Hall, Inc. Liquor Consumption # Teachers 12a - 50 Conclusion © 2000 Prentice-Hall, Inc. 1. Explained the Linear Multiple Regression Model 2. Explained Residual Analysis 3. Tested Overall Significance 4. Explained Multicollinearity 5. Interpreted Linear Multiple Regression Computer Output 12a - 51 End of Chapter Any blank slides that follow are blank intentionally.