Statistical Tests: Two methods can be used to test the serial correlation. They are: (a) Durbin-Watson test: Let s be the correlation between t and t s , for example, as s 1 1 cov( t , t 1 ) correlation between t and t 1 . Var ( t ) We want to determine whether there exists correlation between the observations. That H1 : s s , ( 0, 1) . is, testing H 0 : s 0, s 1,2, vs Note: the other hypothesis equivalent to this above hypothesis is H 0 : t ’s are uncorrelated vs H 1 : t t 1 z t , where z t ~ N (0, 2 ) and z t is independent of t 1 , t 1 , and z t 1 , z t 1 , . That is, H 0 : t ’s are uncorrelated vs H 1 : t ’s are autoregressive residuals with log 1. Durbin-Watson statistic: The Durbin-Watson statistic for testing H1 : s s , ( 0, 1) H 0 : s 0, s 1,2, vs is n d e t 2 t et 1 n e t 1 1 2 t 2 . Properties of Durbin-Watson statistic: 0 d 4: Since et et 1 2et2 2et21 , thus 2 n e t 2 n n et 1 2 e 2 e 2 t 2 t t 2 t 2 n n 2 t 1 2 e 2e 2 et2 2en2 t 1 4 e 2 e e t 1 2 t n 2 1 2 n 2 t 2 1 t 1 . Therefore, n 0d e t 2 et 1 t n et2 n 2 4 et2 2 e12 en2 t 1 n et2 t 1 4 2 e12 en2 n e t 1 t 1 4 2 t 0 d 4. 1 (very strong positive correlation), d 0 . 1 (very strong negative correlation), d 4 . 0 (no correlation), d 1 . [Heuristic Justification:] Suppose et t and et et 1 2 t t 1 2 t 1 zt t 1 2 1 t 1 zt 2 12 t21 as z t2 is small compared to t2 . Thus, n d e t 2 t et 1 n e t 1 n 2 2 t 1 2 t 2 n t 1 2 t n 1 2 t 1 1 2 t 2 t 2 t 1 n t 1 2 1 2 . Therefore, 1 as d 0 , 1 as d 4 , 0 as d 1. Primary Durbin-Watson Test: (I) H 0 : s 0 vs H 1 : s s , 0 if d d L , reject H 0 at level . if d d U , do not reject H 0 if d L d d U , no conclusion . where d L and dU are some critical values which can be found in table 7.1. (Draper & Smith, pp. 184~192). Note: in table 7.1, (II) n sample size, k number of covariates. H 0 : s 0 vs H1 : s s , 0 if 4 - d d L , reject H 0 at level . if 4 - d dU , do not reject H 0 if d L 4 d dU , no conclusion . (III) H 0 : s 0 vs H1 : s s , 0 if d d L or 4 d d L , reject H 0 at level 2 . if d dU and 4 d dU , do not reject H 0 Otherwise, no conclusion . Example: Suppose the following are the residuals from the model 3 Y 0 1 X 1 2 X 2 3 X 3 , e1 e2 e3 e4 ... e24 e25 0.12 0.66 0.72 -0.32 … -0.64 -0.68 Then, 25 d e t 2 t et 1 25 e t 1 2 t 2 2 2 2 0.66 0.12 0.72 0.66 0.68 0.64 0.12 2 0.66 2 0.72 2 (0.68) 2 0.625 Then, if we want to test H 0 : s 0 vs H1 : s s , 0 , we obtain d L 0.9, dU 1.41 (by table 7.1). Since d 0.625 d L 0.9 , we reject H 0 . That is, we conclude there exist serial correlation in residuals. Note: the inconclusive feature of the tests above is not attractive. A Simplified, Approximate Durbin-Watson Test: (I) H 0 : s 0 vs H 1 : s s , 0 if d dU , reject H 0 at level . if d dU , do not reject H 0 (II) H 0 : s 0 vs H1 : s s , 0 if 4 - d dU , reject H 0 at level . if 4 - d dU , do not reject H 0 (III) H 0 : s 0 vs H1 : s s , 0 if d dU or 4 d dU , reject H 0 at level 2 . otherwise , do not reject H 0 4 (b) Run test: Motivating Example: Suppose we have the following 5 residuals: -1.5 -2.1 0.4 -0.7 -0.6 1.8 The signs of the above residuals are (- -) (+) (- -) (+), total 4 runs. Intuitively, a very small number of runs imply that the residuals might have positive serial correlation, for example, for only one run, (+ + + ...) . On the other hand, a very large number of runs (a very large number of sign switches) imply the residuals might have negative serial correlation, for example, (+) (-) (+) (-) (+) (-).... For 6 residuals, suppose there are 2 positive residuals and 4 negative residuals. Then, the following sign arrangements are possible: Arrangements Number of runs (+ +) (- - - -) (+) (-) (+) (- - -) (+) (- -) (+) (- -) (+) (- - -) (+) (-) (+) (- - - -) (+) (-) (+ +) (- - -) (-) (+) (-) (+) (- -) (-) (+) (- -) (+) (-) (-) (+) (- - -) (+) (- -) (+ +) (- -) (- -) (+) (-) (+) (-) (- -) (+) (- -) (+) (- - -) (+ +) (-) (- - -) (+) (-) (+) (- - - -) (+ +) 2 4 4 4 3 3 5 5 4 3 5 4 3 4 2 5 6 6! There are totally 15 combinations. The distribution of runs is 2 2!4! Runs 2 3 4 5 Frequency Empirical probability Cumulative Empirical Probability 2 4 6 3 2 15 4 15 6 15 3 15 2 15 2 5 4 5 1 As we want to know if too few runs occur with 0.25 , then, the hypothesis is H 0 : uncorrelat ed residuals vs H1 : positive correlated residuals (too few runs) . If we have 6 observations, then 6 residuals could be obtained. Suppose the number of runs of 6 residuals is 2. Thus, we would conclude too few runs since p value P(number of runs 2) 0.133 0.25 . On the other hand, if the number of runs of the 6 residuals is greater than 2, then we would not reject H 0 . As we want to know if too many runs occur with 0.25 , then, the hypothesis is H 0 : uncorrelat ed residuals vs H 1 : negative correlated residuals (too many runs) . Suppose the number of runs of 6 residuals is 5. Thus, we would conclude too many runs since p value P(number of runs 5) 0.2 0.25 . On the other hand, if the number of runs of the 6 residuals is smaller than 5, then we would not reject H 0 . Run Test: Let n be the sample size, n1 be the number of positive residuals, n 2 be the number of negative residuals and r be the number of runs. (I) Small sample size, 3 n1 n2 10 : The p-values for the run test can be found in tables 7.5 and 7.6 (pp. 196~197). Example: Suppose we fit a regression model. 20 residuals, 10 positive and 10 negative, were 6 obtained. Suppose the runs of signs are 5. Is this an unusually small number at 0.01 level?? [solutions:] n1 n2 10, r 5 P(r 5) 0.004 0.01 (by table 7.5). We conclude there are too few runs. (I) Large sample size, n1 10, n2 10 : As the sample size is large, it is convenient to use a normal approximation, 2n n (2n n n1 n2 ) 2n1n2 . Thus, r ~ N ( , 2 ) , where 1 and 2 1 2 12 2 n1 n2 n1 n2 n1 n2 1 r r 1 2 2n n 1 r 1 2 1 n1 n2 2 ~ N (0,1) 2n1n2 2n1n2 n1 n2 n1 n2 2 n1 n2 1 as the residuals are uncorrelated and testing for too few runs. 1 r 2 r 2n n 1 r 1 2 1 n1 n2 2 ~ N (0,1) 2n1n2 2n1n2 n1 n2 n1 n2 2 n1 n2 1 as testing for too many runs. (I) H 0 : s 0 vs H 1 : s s , 0 if r z1 , reject H 0 at level . otherwise , do not reject H 0 (II) H 0 : s 0 vs H1 : s s , 0 7 if r z1 , reject H 0 at level . otherwise , do not reject H 0 (III) H 0 : s 0 vs H1 : s s , 0 if r z1 / 2 , reject H 0 at level . otherwise , do not reject H 0 Example: Suppose we fit a regression model. 27 residuals, 15 positive and 12 negative, were obtained. Suppose the runs of signs are 7. Does the arrangement of signs appear to have “too few runs”? [solution:] n 27, n1 15, n2 12, r 7 r 2 15 12 43 2 15 122 15 12 15 12 740 1 , 2 2 15 12 3 117 . 15 12 15 12 1 7 43 / 3 1 / 2 740 / 117 2.713 1.64 z 0.95 too few runs!! 8