ch3.3

1. Model diagnostics: The following plots or quantities can be used for assessing the fit or examining the assumptions made in the model fitting process. (a) residuals versus fitted value, square root of absolute residuals versus fitted value, and normal quartile plot of residuals: Let ei  Yi  Yî  Yi  xi b be the residual for the i’th observation, i  1,2,  , n and .i .d  N (0,  2 ) and e i is the “estimate” let Yî  xi b be the fitted value. Since  i i of  i , the behavior of e i could reflect the possible behavior of  i ’s. The plot of e i can help us to identify the outliers and to visualize the structure in the residuals. If the residuals spread randomly, this might imply the assumptions of independence and equal variance might not be violated. The reason to plot e i versus Yî , not versus Yi , is that Cov(ei , Yî )  0 but Cov(ei , Yi )  0 . Example: >plot(ozonelm3) # page 1: e i versus Yî # page 2: >resid<-ozonelm3$residuals ei versus Yî # ei >par(mfrow=c(1,2)) >plot(yhat4,resid) # e i versus Yî >plot(ozonelm3,ask=T) Selection: 2 Selection: 0 Normal quartile plot of residuals provide a visual test of the assumption that  i ’s are normally distributed. If the quartile-quartile line is quite straight, then we might have evidence that the errors are indeed normal. >plot(ozonelm3,ask=T) Selection: 5. 1 (b) Yi versus Yî : This plot can provide the evidence of how well the model has captured the broad outlines of the data. >plot(ozonelm3,ask=T) Selection: 4. (c) Outliers and influential observations:  Outliers: Two diagnostics are commonly used for identifying the outliers. They are (i) Internally studentized residuals: ti  ei ei  s.e.(ei ) s 1  pii , where pii is the i’th diagonal element of P  X ( X t X ) 1 X t . Note : Var (ei )  (1  pii ) 2 . (ii) Externally studentized residuals: ti  s( i ) ei ti  1/ 2 1  pii  n  p  ti2  , where   n  p  1   s (i2 ) is the mean residual sum of square while fitting the linear regression with the i’th observation deleted. Example: >help(lm.influence) >lminflu<-lm.influence(ozonelm3) >hat<-lminflu$hat >resid<-ozonelm3$residuals # pii # ei 2 >yhat4<-ozonelm3$fitted.values >s2<-sum((ozone-yhat4)^2)/107 # Yˆ # s2 >itresid<-resid/(sqrt(s2*(1-hat)) # ti  ei s 2 (1  pii ) >itresid >plot(itresid) >etresid<-itresid/sqrt((107-itresid^2)/106) # t i  ti  n  p  t i2     n  p 1  1/ 2 >etresid >plot(etresid) >si<-lminflu$sigma # s (i ) >etresid2<-resid/(si*sqrt(1-hat)) # t i  ei s(i ) 1  pii >etresid-etresid2  Influential Observations: The commonly used diagnostic for identifying the influential observations is the Cook’s distance, Ci  (b  b(i ) ) t X t X (b  b(i ) ) ps 2  Xb  Xb(i ) 2  ps 2 (Yˆ  Yˆ(i ) ) t (Yˆ  Yˆ(i ) ) ps 2  Yˆ  Yˆ(i ) ps 2 2 , where b(i ) is the least square estimate with observation i deleted and Yˆ( i ) is the vector of fitted values without contribution from observation i. Example: >bi<-lminflu$coefficients  b(t1)   t  b #  ( 2)      t  b( n )  >bi[5,] 3 n p >dellm5<-lm(ozone[-5]~radi[-5]+temper[-5]+wind[-5]) >dellm5$coefficients >x<-cbind(1,air[,2:4]) #X >b<-ozonelm3$coefficients b0  b  # b   1 b2    b3  >cook<-rep(0,111) >for (j in 1:111){ cook[j]<-sum(((b-bi[j,])%*%t(x))^2)/(4*s2) } >par(mfrow=c(1,3)) >plot(cook) >plot(ozonelm3,ask=T) Selection: 7 Note : a simpler formula for Cook’s distance without using the loop is t i2 p ii Ci   . p 1  pii Example (conti): >cook2<-(itresid^2/4)*(hat/(1-hat)) >plot(cook2) 4

ch3.3

Related documents

Products

Support

ch3.3

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib