Slides by John Loucks St. Edward’s University © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 1 Chapter 14, Part B Simple Linear Regression Using the Estimated Regression Equation for Estimation and Prediction Computer Solution Residual Analysis: Validating Model Assumptions Residual Analysis: Outliers and Influential Observations © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 2 Using Excel’s Regression Tool Up to this point, you have seen how Excel can be used for various parts of a regression analysis. Excel also has a comprehensive tool in its Data Analysis package called Regression. The Regression tool can be used to perform a complete regression analysis. © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 3 Estimated Regression Equation Excel Worksheet (showing data) 1 2 3 4 5 6 7 A Week 1 2 3 4 5 B TV Ads 1 3 2 1 3 C Cars Sold 14 24 18 17 27 D © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 4 Using Excel’s Regression Tool Performing the Regression Analysis Step 1 Select the Tools menu Step 2 Choose the Data Analysis option Step 3 Choose Regression from the list of Analysis Tools © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 5 Using Excel’s Regression Tool Excel Regression Dialog Box © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 6 Using Excel’s Regression Tool 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Excel Value Worksheet A Week 1 2 3 4 5 C Cars Sold 14 24 18 17 27 B TV Ads 1 3 2 1 3 SUMMARY OUTPUT D E F G H I Data Regression Statistics Output Regression Statistics 0.936585812 Multiple R 0.877192982 R Square 0.83625731 Adjusted R Square 2.160246899 Standard Error 5 Observations Estimated Regression Equation Output ANOVA Output ANOVA SS df Regression Residual Total Intercept TV Ads 1 3 4 Significance F F 0.018986231 100 21.42857 100 14 4.666667 114 MS P-value t Stat Standard Error Coefficients 2.366431913 4.225771 0.024236 10 4.6291 0.018986 1.08012345 5 Upper 95% Lower 95.0% Upper 95.0% Lower 95% 2.468950436 17.53104956 2.468950436 17.53104956 1.562561893 8.437438107 1.562561893 8.437438107 © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 7 Using Excel’s Regression Tool Excel Value Worksheet (bottom-left portion) A B C D E 22 23 Coeffic. Std. Err. t Stat P-value 24 Intercept 10 2.36643 4.2258 0.02424 25 TV Ads 5 1.08012 4.6291 0.01899 26 Note: Columns F-I are not shown. © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 8 Using Excel’s Regression Tool Excel Value Worksheet (bottom-right portion) A B F G H I 22 23 Coeffic. Low. 95% Up. 95% Low. 95.0% Up. 95.0% 24 Intercept 10 2.46895 17.53105 2.46895044 17.5310496 25 TV Ads 5 1.562562 8.437438 1.56256189 8.43743811 26 Note: Columns C-E are hidden. © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 9 Using Excel’s Regression Tool Excel Value Worksheet (middle portion) A 16 17 18 19 20 21 22 B C D E F ANOVA df Regression Residual Total SS MS F Significance F 1 100 100 21.4286 0.018986231 3 14 4.66667 4 114 © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 10 Using Excel’s Regression Tool Excel Value Worksheet (top portion) A 9 10 11 12 13 14 15 16 B C Regression Statistics Multiple R 0.936585812 R Square 0.877192982 Adjusted R Square 0.83625731 Standard Error 2.160246899 Observations 5 © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 11 Using the Estimated Regression Equation for Estimation and Prediction A confidence interval is an interval estimate of the mean value of y for a given value of x. A prediction interval is used whenever we want to predict an individual value of y for a new observation corresponding to a given value of x. The margin of error is larger for a prediction interval. © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 12 Using the Estimated Regression Equation for Estimation and Prediction Confidence Interval Estimate of E(y*) yˆ * t /2 syˆ * Prediction Interval Estimate of y* yˆ * t /2 spred where: confidence coefficient is 1 - and t/2 is based on a t distribution with n - 2 degrees of freedom © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 13 Point Estimation If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be: y^ = 10 + 5(3) = 25 cars © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 14 Confidence Interval for E(y*) Estimate of the Standard Deviation of ŷ * ( x * x )2 1 syˆ * s n ( x i x )2 (3 2)2 1 syˆ * 2.16025 5 (1 2)2 (3 2)2 (2 2)2 (1 2)2 (3 2)2 syˆ * 2.16025 1 1 1.4491 5 4 © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 15 Confidence Interval for E(y*) The 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is: yˆ * t /2 syˆ * 25 + 3.1824(1.4491) 25 + 4.61 20.39 to 29.61 cars © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 16 Prediction Interval for y* Estimate of the Standard Deviation of an Individual Value of y* spred ( x * x )2 1 s 1 n ( x i x )2 1 1 spred 2.16025 1 5 4 spred 2.16025(1.20416) 2.6013 © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 17 Prediction Interval for y* The 95% prediction interval estimate of the number of cars sold in one particular week when 3 TV ads are run is: yˆ * t /2 spred 25 + 3.1824(2.6013) 25 + 8.28 16.72 to 33.28 cars © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 18 Residual Analysis If the assumptions about the error term e appear questionable, the hypothesis tests about the significance of the regression relationship and the interval estimation results may not be valid. The residuals provide the best information about e . Residual for Observation i y i yˆ i Much of the residual analysis is based on an examination of graphical plots. © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 19 Residual Plot Against x If the assumption that the variance of e is the same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables, then The residual plot should give an overall impression of a horizontal band of points © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 20 Residual Plot Against x Residual y yˆ Good Pattern 0 x © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 21 Residual Plot Against x Residual y yˆ Nonconstant Variance 0 x © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 22 Residual Plot Against x Residual y yˆ Model Form Not Adequate 0 x © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 23 Residual Plot Against x Residuals © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 24 Residual Plot Against x Using Excel to Produce a Residual Plot • The steps outlined earlier to obtain the regression output are performed with one change. • When the Regression dialog box appears, we must also select the Residual Plot option. • The output will include two new items: • A plot of the residuals against the independent variable, and • A list of predicted values of y and the corresponding residual values. © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 25 Residual Plot Against x TV Ads Residual Plot 3 Residuals 2 1 0 -1 -2 -3 0 1 2 3 4 TV Ads © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 26 Standardized Residuals Standardized Residual for Observation i y i yˆ i syi yˆ i where: syi yˆ i s 1 hi ( x i x )2 1 hi n ( x i x )2 © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 27 Standardized Residual Plot The standardized residual plot can provide insight about the assumption that the error term e has a normal distribution. If this assumption is satisfied, the distribution of the standardized residuals should appear to come from a standard normal probability distribution. © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 28 Standardized Residual Plot Standardized Residuals Observation Predicted y Residual 1 15 -1 Standardized Residual -0.5345 2 25 -1 -0.5345 3 20 -2 -1.0690 4 15 2 1.0690 5 25 2 1.0690 © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 29 Standardized Residual Plot Standardized Residual Plot Standard Residuals 28 29 30 31 32 33 34 35 36 37 1.5 A B C D 1 RESIDUAL OUTPUT 0.5 Observation 0 -0.5 0 -1 -1.5 1 2 3 4 5 Predicted Y 15 10 25 20 15 25 Residuals Standard Residuals -1 -0.534522 20 30 -1 -0.534522 -2 -1.069045 2 1.069045 2 1.069045 Cars Sold © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 30 Standardized Residual Plot All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason to question the assumption that e has a normal distribution. © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 31 Outliers and Influential Observations Detecting Outliers • An outlier is an observation that is unusual in comparison with the other data. • Minitab classifies an observation as an outlier if its standardized residual value is < -2 or > +2. • This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier. • This rule’s shortcoming can be circumvented by using studentized deleted residuals. • The |i th studentized deleted residual| will be larger than the |i th standardized residual|. © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 32 End of Chapter 14, Part B © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 33