ST 260, M23 Residuals & Minitab Lesson Objectives ls a u d i Res q q Continue to build on regression analysis . Learn how residual plots help identify problems with the analysis. A continuation of regression analysis Example 1: continued … Case X M23- Residuals & Minitab 1 Sample of n = 5 students, Y = Weight in pounds, X = Height in inches. Y 1 73 175 2 68 158 3 67 140 4 5 72 207 62 115 Prediction equation: ^ = – 332.73 + 7.189 Ht Wt r-square = ? Std. error = ? Department of ISM, University of Alabama, 1992-2003 To be found later. M23- Residuals & Minitab 3 M23- Residuals & Minitab 2 Department of ISM, University of Alabama, 1992-2003 Example 1, continued 220 200 WEIGHT Department of ISM, University of Alabama, 1992-2003 ^ Y = – 332.7 + 7.189X 7.189X • • 180 160 • 140 • 120 100 60 • 64 68 HEIGHT Department of ISM, University of Alabama, 1992-2003 Residuals = distance from point to line, measured parallel to Y- axis. 72 76 M23- Residuals & Minitab 4 Example 1, continued Calculation: For each case, residual = observed value estimated mean Compute the fitted value and residual for the 4th person in the sample; i.e., X = 72 inches, Y = 207 lbs. fitted value = ^ y 4 = -332.73 + 7.189( For the ith case, ) = _________ ei = yi - ^ yi ^4 residual = e4 = y4 - y = Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 5 = __________ Department of ISM, University of Alabama, 1992-2003 Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 6 1 ST 260, M23 Residuals & Minitab Example 1, continued e4 = +22.12. 220 al u d i s Re Plots WEIGHT 200 Example 1, continued • 24 Residuals 16 Residual Plot 8 • 0 -8 -16 -24 Regression line from previous plot is rotated to horizontal. 60 e4 is the residual for the 4th case, = +22.12. • Department of ISM, University of Alabama, 1992-2003 • 140 • 60 • 64 68 HEIGHT Department of ISM, University of Alabama, 1992-2003 Residuals = distance from point to line, measured parallel to Y- axis. 72 76 M23- Residuals & Minitab 8 Residual Plot Scatterplot of residuals versus ^ the predicted means of Y, Y; or an X-variable , or Time . random Expect dispersion around a horizontal line at zero. • 64 68 HEIGHT 160 100 M23- Residuals & Minitab 7 Department of ISM, University of Alabama, 1992-2003 • 180 120 Scatterplot of residuals vs. the predicted means of Y, ^ Y; or an X-variable. • ^ Y = – 332.7 + 7.189X • 72 76 M23- Residuals & Minitab 9 Problems occur if: • Unusual patterns • Unusual cases M23- Residuals & Minitab 10 Department of ISM, University of Alabama, 1992-2003 Residuals versus X Residuals versus X 0 l l l l l l l l l l ll l l l l l l l l l Residuals Residuals l 0 l ll l l ll l l l ll l l l l l l l ll l l Good random pattern Department of ISM, University of Alabama, 1992-2003 X, or time M23- Residuals & Minitab 11 Outliers? Department of ISM, University of Alabama, 1992-2003 Department of ISM, University of Alabama, 1992-2003 Next step: ________ to determine if a recording error has occurred. X, or time M23- Residuals & Minitab 12 2 ST 260, M23 Residuals & Minitab Residuals versus X Residuals l l l l ll l lll l l l l l l 0 l ll l l l ll l l ll Nonlinear relationship Department of ISM, University of Alabama, 1992-2003 X, or time M23- Residuals & Minitab 13 Next step: Stabilize variance by using “________.” l l l l ll l l l l l l l l l l 0 ll l l l l l ll l ll l l l l l l l ll l Residuals Next step: Add a “quadratic term,” or use “ ______.” Residuals versus X Variance is increasing Department of ISM, University of Alabama, 1992-2003 X, or time M23- Residuals & Minitab 14 Residual Plots help identify Unusual patterns: q Possible curvature in the data. q Variances that are not constant as X changes. Three properties of Residuals Unusual cases: q Outliers q High leverage cases q Influential cases Department of ISM, University of Alabama, 1992-2003 Y = Weight X = Height M23- Residuals & Minitab 15 illustrated with some computations. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 16 Property 1. ^ Y = – 332.73 + 7.189 X Properties of Least Squares Line Residuals X 73 68 67 72 62 ^ e= Y–^ Y Y Y 175 192.07 –17.07 Find sum 158 156.12 1.88 the of the residuals. 140 . 207 . 115 . .01 ç round -off error 1. Residuals always sum to zero. Σ ei = 0. Department of ISM, University of Alabama, 1992-2003 Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 18 3 ST 260, M23 Residuals & Minitab Y = Weight X = Height X Y 73 175 68 158 67 140 72 207 62 115 Property 2. ^ Y = – 332.73 + 7.189 X Properties of Least Squares Line ^ Y 1. Residuals always sum to zero. 192.07 156.12 148.93 184.88 112.99 Find the sum of squares of the residuals. WEIGHT X = 68.4, Y = 159 • Σ ei2 = SSE = 867.98 < Department of ISM, University of Alabama, 1992-2003 “SSE for any other line”. M23- Residuals & Minitab 20 160 Y 140 • 120 60 • • 64 68 HEIGHTX Department of ISM, University of Alabama, 1992-2003 Properties of Least Squares Line • 180 100 2. This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can. Property 3. 220 200 e= Y–^ Y e2 –17.07 291.38 1.88 3.53 –8.93 79.74 22.12 489.29 2.01 4.04 .01 867.98 72 1. Residuals always sum to zero. 76 M23- Residuals & Minitab 21 2. This “least squares” line produces a smaller “Sum of squared residuals” than any other straight line can. 3. Line always passes through the point ( x, y ). Department of ISM, University of Alabama, 1992-2003 Y Illustration of unusual cases: q Outliers q Leverage q Influential Department of ISM, University of Alabama, 1992-2003 l outlier ll ll l ll l l llll l l M23- Residuals & Minitab 23 M23- Residuals & Minitab 22 X “Unusual point” does not follow pattern pattern. It’s near the XX -mean mean; the entire line pulled toward it. Department of ISM, University of Alabama, 1992-2003 Department of ISM, University of Alabama, 1992-2003 X M23- Residuals & Minitab 24 4 ST 260, M23 Residuals & Minitab l l l l l l l l l l ll l l l Y “Unusual point” does not follow pattern pattern. The line is pulled down and twisted slightly slightly. outlier Y “Unusual point” is far from the X-mean mean, but still follows the pattern pattern. l ll ll l l l l llll l l High leverage l X Department of ISM, University of Alabama, 1992-2003 Y X M23- Residuals & Minitab 25 “Unusual point” is far from the XX-mean mean, but does not follow the pattern pattern. X M23- Residuals & Minitab 26 Definitions: Outlier: Line really twists! ll ll l ll l l llll l l X Department of ISM, University of Alabama, 1992-2003 An unusual y-value relative to the pattern of the other cases. l Usually has a large residual. leverage & outlier, influential X Department of ISM, University of Alabama, 1992-2003 X M23- Residuals & Minitab 27 High Leverage Case: An extreme X value relative to the other X values. Department of ISM, University of Alabama, 1992-2003 Definitions: continued Definitions: continued Influential Case Conclusion: has an unusually large effect on the slope of the least squares line. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 29 M23- Residuals & Minitab 28 High leverage potentially influential. & High leverage Outlier influential!! Department of ISM, University of Alabama, 1992-2003 Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 30 5 ST 260, M23 Residuals & Minitab Why do we care about identifying unusual cases? The least squares regression line is not resistant to unusual cases. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 31 n o i s s e Regr ysis Anal tab i in Min Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 32 Example 3, continued … Lesson Objectives Can height be predicted using shoe size? Step 1? q q Learn two ways to use Minitab to run a regression analysis. Learn how to read output from Minitab. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 33 DTDP Department of ISM, University of Alabama, 1992-2003 Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 34 6 ST 260, M23 Residuals & Minitab Example 3, continued … Can height be predicted using shoe size? 84 Graph 80 Plot … Scatterplot Height 76 72 Female Male 68 64 “Jitter” added in X-direction. 60 The scatter for 56each subpopulation is 5 about the same; i.e., there is “constant variance.” 6 7 8 9 10 11 12 13 14 15 Shoe Size Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 35 Example 3, continued … Stat Method 1 Regression Regression … Y = a + bX Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 36 Department of ISM, University of Alabama, 1992-2003 7 ST 260, M23 Residuals & Minitab Example 3, continued … Copied from “Session Window.” Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Constant Shoe Siz Coef 50.5230 1.87241 S = 1.947 SE Coef 0.5912 0.06033 R-Sq = 79.1% T 85.45 31.04 P 0.000 0.000 R-Sq(adj) = 79.0% Analysis of Variance Source DF Regression 1 Error 255 Total 256 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F P 963.26 0.000 M23- Residuals & Minitab 37 Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Constant Shoe Siz Coef 50.5230 1.87241 S = 1.947 R-Sq = 79.1% Analysis of Variance Source DF Regression 1 Error 255 Total 256 SE Coef 0.5912 0.06033 Least squares estimated T P coefficients. 85.45 0.000 31.04 0.000 R-Sq(adj) = 79.0% Total “Degrees of Freedom” = Number of cases - 1 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F P 963.26 0.000 M23- Residuals & Minitab 38 Department of ISM, University of Alabama, 1992-2003 8 ST 260, M23 Residuals & Minitab Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Height = 50.5 + 1.87 Shoe Size Predictor Constant Shoe Siz Coef 50.5230 1.87241 S = 1.947 SSR 3650.0 SE Coef R-Sq = T = P 0.5912 85.45 TSS 0.000 4616.3 0.06033 31.04 0.000 R-Sq = 79.1% R-Sq(adj) = 79.0% Analysis of Variance Source DF Regression 1 Error 255 Total 256 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F P 963.26 0.000 M23- Residuals & Minitab 39 Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size The regression equation is Standard Error+of1.87 Regression. Height = 50.5 Shoe Size Measure of variation around Predictor Coef SE Coef Constant 50.5230 line. 0.5912 the regression Shoe Siz 1.87241 0.06033 T 85.45 31.04 S = 1.947 R-Sq(adj) = 79.0% R-Sq = 79.1% P 0.000 0.000 S = MSE = 3.8 Analysis SumofofVariance squared residuals Source DF Regression 1 Error 255 Total 256 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F Squared P Mean 963.26 0.000 Error MSE M23- Residuals & Minitab 40 Department of ISM, University of Alabama, 1992-2003 9 ST 260, M23 Residuals & Minitab Example 3, continued … Can height be predicted using shoe size? Residuals Versus Shoe Siz (response is Height) Are there any problems visible in this plot? ___________ Residual 5 0 -5 5 10 15 Shoe Siz Department of ISM, University of Alabama, 1992-2003 No “Jitter” added. M23- Residuals & Minitab 41 Example 3, continued … Can height be predicted using shoe size? Least squares regression equation: Height = 50.52 + 1.872 Shoe r-square = 79.1%, Std. error = 1.947 inches The two summary measures always be given with the equation. that should Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 42 Department of ISM, University of Alabama, 1992-2003 10 ST 260, M23 Residuals & Minitab Example 3, continued … Can height be predicted using shoe size? Stat Method 2 This program gives a scatterplot with the regression superimposed on it. Regression Fitted Line Plot … Y = a + bX M23- Residuals & Minitab 43 Department of ISM, University of Alabama, 1992-2003 Example 3, continued … Can height be predicted using shoe size? Regression Plot Height = 50.5230 + 1.87241 Shoe Size S = 1.94659 R-Sq = 79.1 % R-Sq(adj) = 79.0 % 80 Height The fit looks 70 60 5 6 7 8 9 10 11 12 13 14 15 Shoe Size Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 44 Department of ISM, University of Alabama, 1992-2003 11 ST 260, M23 Residuals & Minitab Example 3, continued … Can height be predicted using shoe size? Regression Analysis: Height versus Shoe Size What information do values provide? The regression equation is these Height = 50.5 + 1.87 Shoe Size Predictor Constant Shoe Siz Coef 50.5230 1.87241 S = 1.947 SE Coef 0.5912 0.06033 R-Sq = 79.1% T 85.45 31.04 P 0.000 0.000 R-Sq(adj) = 79.0% Analysis of Variance Source DF Regression 1 Error 255 Total 256 SS 3650.0 966.3 4616.3 MS 3650.0 3.8 Department of ISM, University of Alabama, 1992-2003 F P 963.26 0.000 M23- Residuals & Minitab 45 How do you determine if the 1 X-variable is a useful predictor? Use the “t-statistic” or the F-stat. “t” measures how many standard errors the estimated coefficient is from “zero.” “F” = t2 for simple regression. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 46 Department of ISM, University of Alabama, 1992-2003 12 ST 260, M23 Residuals & Minitab How do you determine if the 2 X-variable is a useful predictor? A “P-value” is associated with “t” and “F”. The further “t” and “F” are from zero, in either direction, the smaller the corresponding P-value will be. P-value: a measure of the “likelihood that the true coefficient IS ZERO.” Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 47 If the P-value IS SMALL (typically “< 0.10”), then conclude: 3 1. It is unlikely that the true coefficient is really zero, and therefore, 2. The X variable IS a useful predictor for the Y variable. Keep the variable! If the P-value is NOT SMALL (i.e., “> 0.10”), then conclude: 1. For all practical purposes the true coefficient MAY BE ZERO; therefore 2. The X variable IS NOT a useful predictor of the Y variable. Don’t use it. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 48 Department of ISM, University of Alabama, 1992-2003 13 ST 260, M23 Residuals & Minitab Example 3, continued … Can height be predicted using shoe size? Could “shoeAnalysis: size” Regression Height versus Shoe Size have a true “t” measures how many standard The regression is the estimated coefficient coefficient that equation errors Height = 50.5 + 1.87 Shoe Size is from “zero.” is actually “zero”? Predictor Constant Shoe Siz S = 1.947 Coef 50.5230 1.87241 SE Coef 0.5912 0.06033 T 85.45 31.04 P 0.000 0.000 R-Sq = P-value: 79.1% a measure R-Sq(adj) 79.0% of the = likelihood that the true coefficient is “zero.” Analysis of Variance The P-value for Shoe Size IS SMALL (< 0.10). Conclusion: Source DF SS MS F P Regression 1 size” 3650.0 3650.0 963.26 0.000 The “shoe coefficient is NOT zero! Error 966.3 “Shoe 255 size” 4616.3 IS a useful3.8 predictor Total 256 of the mean of “height”. M23- Residuals & Minitab Department of ISM, University of Alabama, 1992-2003 49 The logic just explained is statistical inference. This will be covered in more detail during the last three weeks of the course. Department of ISM, University of Alabama, 1992-2003 M23- Residuals & Minitab 50 Department of ISM, University of Alabama, 1992-2003 14