A continuation of regression analysis Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 1 Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 2 Example 1: continued … Case X Sample of n = 5 students, Y = Weight in pounds, X = Height in inches. Y 1 73 175 2 68 158 3 67 140 4 72 207 5 62 115 Prediction equation: ^ Wt = – 332.73 + 7.189 Ht r-square = ? Std. error = ? Department of ISM, University of Alabama, 1992-2003 To be found later. M23- Residuals & Minitab 3 Example 1, continued 220 WEIGHT 200 ^ Y = – 332.7 + 7.189X 180 160 140 120 100 60 64 68 HEIGHT Department of ISM, University of Alabama, 1992-2003 Residuals = distance from point to line, measured parallel to Y- axis. 72 76 M23- Residuals & Minitab 4 Calculation: For each case, residual = observed value estimated mean For the ith case, ^ ei = yi - yi Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 5 Example 1, continued Compute the fitted value and residual for the 4th person in the sample; i.e., X = 72 inches, Y = 207 lbs. fitted value = ^ y 4 = -332.73 + 7.189( = _________ ^ residual = e4 = y4 - y 4 = = __________ Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 6 ) Scatterplot of residuals vs. ^ the predicted means of Y, Y; or an X-variable. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 7 Example 1, continued e4 = +22.12. 220 WEIGHT 200 ^ Y = – 332.7 + 7.189X 180 160 140 120 100 60 64 68 HEIGHT Department of ISM, University of Alabama, 1992-2003 Residuals = distance from point to line, measured parallel to Y- axis. 72 76 M23- Residuals & Minitab 8 Example 1, continued 24 Residuals 16 Residual Plot 8 0 -8 -16 -24 Regression line from previous plot is rotated to horizontal. 60 e4 is the residual for the 4th case, = +22.12. 64 68 HEIGHT Department of ISM, University of Alabama, 1992-2003 72 76 M23- Residuals & Minitab 9 Residual Plot Scatterplot of residuals versus ^ the predicted means of Y, Y; or an X-variable, or Time. random Expect dispersion around a horizontal line at zero. Problems occur if: • Unusual patterns • Unusual cases Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 10 Residuals Residuals versus X 0 l l l l l l l l l l l ll l l l l l l l l Good random pattern Department of ISM, University of Alabama, 1992-2003 X, or time M23- Residuals & Minitab 11 Residuals versus X Residuals l 0 l l l l ll l l l l ll ll l l l l l ll l l Outliers? Department of ISM, University of Alabama, 1992-2003 Next step: ________ to determine if a recording error has occurred. X, or time M23- Residuals & Minitab 12 Residuals Residuals versus X Next step: Add a “quadratic term,” or use “______.” ll l ll l lll l l l l l l l ll 0 l ll l ll l l ll Nonlinear relationship Department of ISM, University of Alabama, 1992-2003 X, or time M23- Residuals & Minitab 13 Residuals versus X l l l l l l l l l l l l l l l l l 0 ll l l l l ll l l l l l l l l l l l l l Residuals Next step: Stabilize variance by using “________.” Variance is increasing Department of ISM, University of Alabama, 1992-2003 X, or time M23- Residuals & Minitab 14 Residual Plots help identify Unusual patterns: Possible curvature in the data. Variances that are not constant as X changes. Unusual cases: Outliers High leverage cases Influential cases Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 15 Three properties of Residuals illustrated with some computations. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 16 Y = Weight X = Height X 73 68 67 72 62 Property 1. ^ Y = – 332.73 + 7.189 X ^ Y Residuals ^ e=Y–Y Y 175 192.07 –17.07 158 156.12 1.88 140 . 207 . 115 . .01 Find the sum of the residuals. round-off error Properties of Least Squares Line 1. Residuals always sum to zero. Sei = 0. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 18 Y = Weight X = Height X Y 73 175 68 158 67 140 72 207 62 115 Property 2. ^ Y = – 332.73 + 7.189 X ^ Y ^ e=Y–Y 192.07 –17.07 156.12 1.88 148.93 –8.93 184.88 22.12 112.99 2.01 Find the sum of squares of the residuals. .01 e2 291.38 3.53 79.74 489.29 4.04 867.98 Properties of Least Squares Line 1. Residuals always sum to zero. 2. This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can. Sei2 = SSE = 867.98 < Department of ISM, University of Alabama, 1992-2003 “SSE for any other line”. M23- Residuals & Minitab 20 Property 3. 220 WEIGHT 200 X = 68.4, Y = 159 180 160 Y 140 120 100 60 64 68 HEIGHTX Department of ISM, University of Alabama, 1992-2003 72 76 M23- Residuals & Minitab 21 Properties of Least Squares Line 1. Residuals always sum to zero. 2. This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can. 3. Line always passes through the point ( x, y ). Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 22 Illustration of unusual cases: Outliers Leverage Influential Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 23 Y l outlier l l lll l ll ll l ll l l X “Unusual point” does not follow pattern. It’s near the X-mean; the entire line pulled toward it. Department of ISM, University of Alabama, 1992-2003 X M23- Residuals & Minitab 24 l l l l l l l l l l ll l l l Y “Unusual point” does not follow pattern. The line is pulled down and twisted slightly. outlier l X 1992-2003 Department of ISM, University of Alabama, X M23- Residuals & Minitab 25 Y “Unusual point” is far from the X-mean, but still follows the pattern. l ll l l ll ll l ll l l X Department of ISM, University of Alabama, 1992-2003 l High leverage X M23- Residuals & Minitab 26 Y “Unusual point” is far from the X-mean, but does not follow the pattern. Line really twists! l l ll l l ll l l ll l l l l leverage & outlier, influential X Department of ISM, University of Alabama, 1992-2003 X M23- Residuals & Minitab 27 Definitions: Outlier: An unusual y-value relative to the pattern of the other cases. Usually has a large residual. High Leverage Case: An extreme X value relative to the other X values. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 28 Definitions: continued Influential Case has an unusually large effect on the slope of the least squares line. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 29 Definitions: continued Conclusion: High leverage potentially influential. & High leverage Outlier influential!! Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 30 Why do we care about identifying unusual cases? The least squares regression line is not resistant to unusual cases. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 31 Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 32 Lesson Objectives Learn two ways to use Minitab to run a regression analysis. Learn how to read output from Minitab. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 33 Example 3, continued … Can height be predicted using shoe size? Step 1? Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 34 Example 3, continued … Can height be predicted using shoe size? 84 Graph Plot … 80 Scatterplot Height 76 72 Female Male 68 64 “Jitter” added in X-direction. 60 The scatter for 56each subpopulation is 5 about the same; i.e., there is “constant variance.” 6 7 8 9 10 11 12 13 14 15 Shoe Size Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 35 Example 3, continued … Stat Method 1 Regression Regression … Y = a + bX Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 36 Example 3, continued … Copied from “Session Window.” Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Constant Shoe Siz S = 1.947 Coef 50.5230 1.87241 SE Coef 0.5912 0.06033 T 85.45 31.04 R-Sq = 79.1% P 0.000 0.000 R-Sq(adj) = 79.0% Analysis of Variance Source DF Regression 1 Error 255 Total 256 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F P 963.26 0.000 M23- Residuals & Minitab 37 Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Constant Shoe Siz S = 1.947 Coef 50.5230 1.87241 SE Coef 0.5912 0.06033 Least squares estimated T P 85.45 coefficients. 0.000 31.04 R-Sq = 79.1% 0.000 R-Sq(adj) = 79.0% Total “Degrees of Freedom” Analysis of Variance = Number of cases - 1 Source DF Regression 1 Error 255 Total 256 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F P 963.26 0.000 M23- Residuals & Minitab 38 Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Constant Shoe Siz S = 1.947 Coef 50.5230 1.87241 SSR 3650.0 SE Coef R-Sq = T = P TSS 0.000 4616.3 0.5912 85.45 0.06033 31.04 R-Sq = 79.1% 0.000 R-Sq(adj) = 79.0% Analysis of Variance Source DF Regression 1 Error 255 Total 256 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F P 963.26 0.000 M23- Residuals & Minitab 39 Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Standard Error+of1.87 Regression. Height = 50.5 Shoe Size Measure of variation around Predictor Coef SE Coef Constant 50.5230 line. 0.5912 the regression Shoe Siz 1.87241 0.06033 T 85.45 31.04 S = 1.947 R-Sq(adj) = 79.0% R-Sq = 79.1% P 0.000 0.000 S = MSE = 3.8 Analysis SumofofVariance squared residuals Source DF Regression 1 Error 255 Total 256 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F Squared P Mean 963.26 0.000 Error MSE M23- Residuals & Minitab 40 Example 3, continued … Can height be predicted using shoe size? Are there any problems visible in this plot? ___________ No “Jitter” added. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 41 Example 3, continued … Can height be predicted using shoe size? Least squares regression equation: Height = 50.52 + 1.872 Shoe r-square = 79.1%, Std. error = 1.947 inches The two summary measures that should always be given with the equation. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 42 Example 3, continued … Can height be predicted using shoe size? Stat Method 2 Regression This program gives a scatterplot with the regression superimposed on it. Fitted Line Plot … Y = a + bX Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 43 Example 3, continued … Can height be predicted using shoe size? Regression Plot Height = 50.5230 + 1.87241 Shoe Size S = 1.94659 R-Sq = 79.1 % R-Sq(adj) = 79.0 % 80 Height The fit looks 70 60 5 6 7 8 9 10 11 12 13 14 15 Shoe Size Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 44 Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size What information do these values provide? The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Constant Shoe Siz S = 1.947 Coef 50.5230 1.87241 SE Coef 0.5912 0.06033 T 85.45 31.04 R-Sq = 79.1% P 0.000 0.000 R-Sq(adj) = 79.0% Analysis of Variance Source DF Regression 1 Error 255 Total 256 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F P 963.26 0.000 M23- Residuals & Minitab 45 How do you determine if the 1 X-variable is a useful predictor? Use the “t-statistic” or the F-stat. “t” measures how many standard errors the estimated coefficient is from “zero.” “F” = t2 for simple regression. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 46 How do you determine if the 2 X-variable is a useful predictor? A “P-value” is associated with “t” and “F”. The further “t” and “F” are from zero, in either direction, the smaller the corresponding P-value will be. P-value: a measure of the “likelihood that the true coefficient IS ZERO.” Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 47 If the P-value IS SMALL (typically “< 0.10”), then conclude: 3 1. It is unlikely that the true coefficient is really zero, and therefore, 2. The X variable IS a useful predictor for the Y variable. Keep the variable! If the P-value is NOT SMALL (i.e., “> 0.10”), then conclude: 1. For all practical purposes the true coefficient MAY BE ZERO; therefore 2. The X variable IS NOT a useful predictor of the Y variable. Don’t use it. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 48 Example 3, continued … Can height be predicted using shoe size? Could “shoe Analysis: size” Regression Height versus Shoe Size have a true “t” measures how many standard The regression is the estimated coefficient coefficient that equation errors = “zero”? 50.5 + 1.87 Shoe Size is from “zero.” isHeight actually Predictor Constant Shoe Siz S = 1.947 Coef 50.5230 1.87241 SE Coef 0.5912 0.06033 T 85.45 31.04 P 0.000 0.000 R-Sq = P-value: 79.1% a measure R-Sq(adj) = 79.0% of the likelihood that the true coefficient is “zero.” Analysis of Variance The P-value for Shoe Size IS SMALL (< 0.10). Conclusion: Source DF SS MS F P The “shoe coefficient is NOT zero! Regression 1 size” 3650.0 3650.0 963.26 0.000 Error 255 966.3 3.8 “Shoe size” IS a useful predictor Total 256 4616.3 of the mean of “height”. M23- Residuals & Minitab Department of ISM, University of Alabama, 1992-2003 49 The logic just explained is statistical inference. This will be covered in more detail during the last three weeks of the course. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 50