1 Chapter 3 Diagnostics and Remedial Measures 3.2, 3.3 Diagnostics for Residuals The residuals are defined as ei yi yˆ i yi b0 b1 xi1 bp 1 xip 1 , i 1,2,, n. If the least square estimate is a sensible estimate, e i can be regarded as the “estimate” of i , ei 0 1 xi1 p1 xip1 i b0 b1 xi1 b p1 xip1 0 b0 1 b1 xi1 p1 b p1 xip1 i i Thus, the residuals should exhibit tendencies that tend to confirm the assumptions about i , such as independence, having zero mean, having a constant variance 2 , and following a normal distribution. On the other hand, as the residuals do not exhibit tendencies, this might imply the assumptions might be violated. Usually, we can plot the residuals and then examine the tendencies. The principal ways of plotting residuals e i are I. To check for non-normality. II. To check for time effects if the time order of the data is known. III. To check for nonconstant variance. IV. To check for curvature of higher order than fitted in the X’s I. Non-Normality Checks on Residuals: A simple histogram or a stem and leaf plot of the residuals can be used to check the normality. If these residuals are bell-shaped distributed, this might imply the normality assumption might not be violated. 0 5 10 15 Histogram: -2 -1 0 Res idual s 1 2 3 2 Stem-and leaf plot: -2 : 2 -1 : 9755 -1 : 33322111110 -0 : 9988887776665555 -0 : 44444333333332210 0 : 011111222222234444 0 : 5666667777888999 1 : 011223 1 : 5556788 2 : 01 2 : 89 0 -2 -1 Residuals 1 2 3 The better method to check normality is the normal probability plot. The residuals and their corresponding normal scores are plotted. If the plot look straight, then the normality assumption might not be violated. -2 -1 0 1 Quantiles of Standard Normal II, III, IV: The 3 properties can be checked by plotting the residuals versus (a) fitted value ŷi (b) the covariate x ij The typical satisfactory plots are as follows: 2 0 -3 -2 -1 Residuals 1 2 3 3 0 20 40 60 Three typical unsatisfactory plots are as follows: (i) fitted value or covariate value (ii) 80 100 4 fitted value or covariate value (iii) fitted value or covariate value (a) versus fitted value ŷi : Why ŷi , but not yi : rey correlatio n coefficien t of e and y 1 R 2 1/ 2 reyˆ 0 Therefore, ei ' s and y i ' s are correlated. If we plot ei ' s versus y i ' s , then a linear 5 trend might be due to the positive correlation of ei ' s and y i ' s rather than the violation of the assumptions about the random error. On the other hand, since ei ' s and yˆ i ' s are uncorrelated, a unsatisfactory residual plot might do imply the violation of the assumptions. reyˆ 0 : Derivation of Let n e e i 1 n yˆ i ˆ y and n i i 1 . n Since n n e yˆ yˆ e i i 1 i i 1 0 and X t e X t y yˆ X t y Xb X t y X t Xb 1 Xty XtX XtX Xty Xty Xty 0 then seYˆ ei e yˆ i yˆ ei yˆ i yˆ ei yˆ i since i 1 i 1 i 1 t yˆ t e Xb e bt X t e 0 since X t e 0 n n n ˆ e y 0 i i 1 n Thus, n reyˆ Derivation of Sicne s seyˆ s ee yˆyˆ 1/ 2 e i 1 i e yˆ i yˆ 2 2 ˆ ˆ e e y y i i i 1 i 1 n rey 1 R 2 n 1/ 2 : 1/ 2 0 6 S ey ei e yi y ei yi y ei yi since i 1 i 1 i 1 n n ei yi yˆ i since i 1 n n n n e yˆ i i i 1 n n e y y e 0 i 1 i i 1 i 0 yi yˆ i 2 i 1 thus, n reY S S ey S yy 1/ 2 ee n y i 1 1/ 2 e i i 1 n n 2 2 e e y y i i i 1 i 1 i n yˆ i n 2 ˆ y y i i i 1 n 2 y y i i 1 n 1/ 2 y 1/ 2 i 1 2 yˆ i 2 i 1/ 2 n n 2 2 ˆ y y y y i i i i 1 i 1 n 2 ˆ y y i i 1 1 n 2 yi y i 1 yi y i 1 1/ 2 2 n 2 n 2 ei yi y i 1 i 1 since e yi y 1/ 2 1/ 2 1 R2 1/ 2 2 yi yˆi yˆi y i 1 i 1 n 2 n The residual plot (i): nonconstant variance. Remedy: use weighted least square or transformation of yi . Note: Transformation of the response yi can stabilize the variance (make 7 the variance of the transformed yi constant). The residual plot (ii): errors in analysis, some extra terms highly correlated to the other terms are missing or wrongful omission of 0 . [Justification:] Suppose the postulated model is yi 0 1 xi1 2 xi 2 p1 xip1 i and the true model is Yi 0 1 xi1 2 xi 2 p1 xip1 p xip i , i 1,2,, n. Then, yˆ i b0 b1 xi1 b p1 xip1 and ei p xip i . If p 0 and xip is correlated to ŷi (since xip is correlated to some covariates in the postulated model), then the residual plot will be like (ii). Remedy: rechecking the analysis process, adding some extra terms to the model or adding 0 back to the model. The residual plot (iii): some extra terms (might be second order terms of existing variables or some variables not in the model) are missing or unequal variance Remedy: adding extra terms to the model or transformation of yi . 8 (b) versus the covariate x ij : The residual plot (i): nonconstant variance. Remedy: use weighted least square or transformation of yi . The residual plot (ii): errors in analysis or some extra terms highly correlated to X j are missing. Suppose the true model yi 0 1 xi1 p1 xip1 p zi i , and the covariate X j is correlated to Z. Thus, heuristically, ei p zi i . As p 0 and X j is positively correlated to Z, the residual plot would look like (ii). Remedy: rechecking the analysis process or adding some extra terms to the model. The residual plot (iii): some extra terms are missing or the variance of yi are not equal. Remedy: adding extra terms (second order terms of the existing variables or the terms not in the original model) to the model or transformation of Yi 3.9 Transformation Box-Cox Transformation: Suppose the tentatively used model is y X . Two transformations can be used. The first transformation is 9 Wi ( ) ( yi 1) , 0 log( yi ), 0 , i 1,2,, n. The above transformation is also called Box-Cox transformation. Wi ( ) lim Note: lim 0 0 yi 1 log( yi ) . The other transformation is to scale Wi ( ) by Vi ( ) ( yi 1) y 1 y log( yi ), , 0 0 where 1/ n y n n y1 y2 yn yi i 1 is the geometric mean of y1 , y 2 ,, y n . The model based on the transformed response is W1 ( ) W ( ) X W ( ) 2 Wn ( ) or V1 ( ) V ( ) X V ( ) 2 . Vn ( ) Note: V transformation is usually preferred!! 11.1 Weighted Least Squares Suppose the linear regression model is , 10 yi 0 1 xi1 2 xi 2 p1 xip1 i , i ~ N (0, i2 ), i 1,, n. Let wi 1 i2 and w1 0 W 0 0 0 . wn 0 w2 0 To estimate , we need to minimize the weighted least square criterion, S w S w 0 , 1 ,, p 1 wi yi 0 1 xi1 p 1 xip1 n 2 i 1 y X W y X t The weighted least square estimate is b X tWX 1 X tWY