Statistical Tests:

advertisement
Statistical Tests:
Two methods can be used to test the serial correlation. They are:
(a) Durbin-Watson test:
Let  s be the correlation between  t and  t  s , for example, as s  1
1 
cov( t ,  t 1 )
 correlation between  t and  t 1 .
Var ( t )
We want to determine whether there exists correlation between the observations. That
H1 :  s   s , (   0,   1) .
is, testing H 0 :  s  0, s  1,2, vs
Note: the other hypothesis equivalent to this above hypothesis is
H 0 :  t ’s are uncorrelated
vs
H 1 :  t   t 1  z t ,
where z t ~ N (0,  2 ) and z t is independent of  t 1 ,  t 1 , and z t 1 , z t 1 , .
That is,
H 0 :  t ’s are uncorrelated vs H 1 :  t ’s are autoregressive residuals with log 1.
Durbin-Watson statistic:
The Durbin-Watson statistic for testing
H1 :  s   s , (   0,   1)
H 0 :  s  0, s  1,2, vs
is
n
d 
 e
t 2
t
 et 1 
n
e
t 1
1
2
t
2
.
Properties of Durbin-Watson statistic:

0  d  4:
Since et  et 1   2et2  2et21 , thus
2
n
 e
t 2
n
n
 et 1   2 e  2 e
2
t
2
t
t 2
t 2

n
n
2
t 1
 2 e  2e  2 et2  2en2
t 1
 4 e  2 e  e
t 1
2
t
n
2
1
2
n
2
t
2
1
t 1
.

Therefore,
n
0d 
 e
t 2
 et 1 
t
n
 et2

n
2

4 et2  2 e12  en2

t 1
n
 et2
t 1
 4

2 e12  en2
n
e
t 1
t 1
4
2
t
 0  d  4.

  1 (very strong positive correlation), d  0 .
  1 (very strong negative correlation), d  4 .
  0 (no correlation), d  1 .
[Heuristic Justification:]
Suppose et   t and
et  et 1 2   t   t 1 2   t 1  zt   t 1 2    1 t 1  zt 2    12  t21
as z t2 is small compared to  t2 . Thus,
n
d
 e
t 2
t
 et 1 
n
e
t 1
n
2
2
t

   1 
2
t 2
n

t 1
2
t
n 1
2
t 1
   1

2
t

2
t
2 t 1
n
t 1
2
   1
2
.
Therefore,
  1 as d  0 ,   1 as d  4 ,   0 as d  1.
Primary Durbin-Watson Test:
(I)
H 0 :  s  0 vs H 1 :  s   s ,   0
 if d  d L , reject H 0 at level  .
if d  d U , do not reject H 0
if d L  d  d U , no conclusion .
where d L and dU are some critical values which can be found in table 7.1.
(Draper & Smith, pp. 184~192).
Note: in table 7.1,
(II)
n  sample size, k  number of covariates.
H 0 :  s  0 vs H1 :  s   s ,   0
 if 4 - d  d L , reject H 0 at level  .
if 4 - d  dU , do not reject H 0
if d L  4  d  dU , no conclusion .
(III)
H 0 :  s  0 vs H1 :  s   s ,   0
 if d  d L or 4  d  d L , reject H 0 at level 2 .
if d  dU and 4  d  dU , do not reject H 0
Otherwise, no conclusion .
Example:
Suppose the following are the residuals from the model
3
Y   0  1 X 1   2 X 2   3 X 3   ,
e1
e2
e3
e4
...
e24
e25
0.12
0.66
0.72
-0.32
…
-0.64
-0.68
Then,
25
d
 e
t 2
t
 et 1 
25
e
t 1
2
t
2
2
2
2

0.66  0.12  0.72  0.66     0.68  0.64

0.12 2  0.66 2  0.72 2    (0.68) 2
 0.625
Then, if we want to test H 0 :  s  0 vs H1 :  s   s ,   0 , we obtain
d L  0.9, dU  1.41 (by table 7.1). Since d  0.625  d L  0.9 , we reject H 0 . That
is, we conclude there exist serial correlation in residuals.
Note: the inconclusive feature of the tests above is not attractive.
A Simplified, Approximate Durbin-Watson Test:
(I)
H 0 :  s  0 vs H 1 :  s   s ,   0
 if d  dU , reject H 0 at level  .
if d  dU , do not reject H 0
(II)
H 0 :  s  0 vs H1 :  s   s ,   0
 if 4 - d  dU , reject H 0 at level  .
if 4 - d  dU , do not reject H 0
(III)
H 0 :  s  0 vs H1 :  s   s ,   0
 if d  dU or 4  d  dU , reject H 0 at level 2 .
otherwise , do not reject H 0
4
(b) Run test:
Motivating Example:
Suppose we have the following 5 residuals:
-1.5
-2.1
0.4
-0.7
-0.6
1.8
The signs of the above residuals are
(-
-) (+)
(-
-) (+),
total 4 runs. Intuitively, a very small number of runs imply that the residuals might
have positive serial correlation, for example, for only one run, (+
+
+ ...)
. On the other hand, a very large number of runs (a very large number of sign switches)
imply the residuals might have negative serial correlation, for example, (+)
(-) (+)
(-) (+) (-)....
For 6 residuals, suppose there are 2 positive residuals and 4 negative residuals. Then,
the following sign arrangements are possible:
Arrangements
Number of runs
(+ +) (- - - -)
(+) (-) (+) (- - -)
(+) (- -) (+) (- -)
(+) (- - -) (+) (-)
(+) (- - - -) (+)
(-) (+ +) (- - -)
(-) (+) (-) (+) (- -)
(-) (+) (- -) (+) (-)
(-) (+) (- - -) (+)
(- -) (+ +) (- -)
(- -) (+) (-) (+) (-)
(- -) (+) (- -) (+)
(- - -) (+ +) (-)
(- - -) (+) (-) (+)
(- - - -) (+ +)
2
4
4
4
3
3
5
5
4
3
5
4
3
4
2
5
 6  6!
There are totally   
 15 combinations. The distribution of runs is
 2  2!4!
Runs
2
3
4
5
Frequency
Empirical
probability
Cumulative
Empirical
Probability
2
4
6
3
2
15
4
15
6
15
3
15
2
15
2
5
4
5
1
As we want to know if too few runs occur with   0.25 , then, the hypothesis is
H 0 : uncorrelat ed residuals vs H1 : positive correlated residuals (too few runs) .
If we have 6 observations, then 6 residuals could be obtained. Suppose the number of
runs of 6 residuals is 2. Thus, we would conclude too few runs since
p  value  P(number of runs  2)  0.133  0.25 .
On the other hand, if the number of runs of the 6 residuals is greater than 2, then we
would not reject H 0 . As we want to know if too many runs occur with   0.25 ,
then, the hypothesis is
H 0 : uncorrelat ed residuals vs H 1 : negative correlated residuals (too many runs) .
Suppose the number of runs of 6 residuals is 5. Thus, we would conclude too many
runs since
p  value  P(number of runs  5)  0.2  0.25 .
On the other hand, if the number of runs of the 6 residuals is smaller than 5, then we
would not reject H 0 .
Run Test:
Let n be the sample size, n1 be the number of positive residuals, n 2 be the
number of negative residuals and r be the number of runs.
(I) Small sample size, 3  n1  n2  10 :
The p-values for the run test can be found in tables 7.5 and 7.6 (pp. 196~197).
Example:
Suppose we fit a regression model. 20 residuals, 10 positive and 10 negative, were
6
obtained. Suppose the runs of signs are 5. Is this an unusually small number at
  0.01 level??
[solutions:]
n1  n2  10, r  5  P(r  5)  0.004  0.01 (by table 7.5). We conclude there
are too few runs.
(I) Large sample size, n1  10, n2  10 :
As the sample size is large, it is convenient to use a normal approximation,
2n n (2n n  n1  n2 )
2n1n2
. Thus,
r ~ N (  ,  2 ) , where  
 1 and  2  1 2 12 2
n1  n2
n1  n2  n1  n2  1
r 
r 

1
2 
 2n n
 1
r   1 2  1 
 n1  n2
 2 ~ N (0,1)
2n1n2 2n1n2  n1  n2 
n1  n2 2 n1  n2  1
as the residuals are uncorrelated and testing for too few runs.
1
r 
2 
r 

 2n n
 1
r   1 2  1 
 n1  n2
 2 ~ N (0,1)
2n1n2 2n1n2  n1  n2 
n1  n2 2 n1  n2  1
as testing for too many runs.
(I)
H 0 :  s  0 vs H 1 :  s   s ,   0
 if r    z1 , reject H 0 at level  .
otherwise , do not reject H 0
(II)
H 0 :  s  0 vs H1 :  s   s ,   0
7
 if r   z1 , reject H 0 at level  .
otherwise , do not reject H 0
(III)
H 0 :  s  0 vs H1 :  s   s ,   0
 if
r   z1 / 2 , reject H 0 at level  .
otherwise , do not reject H 0
Example:
Suppose we fit a regression model. 27 residuals, 15 positive and 12 negative, were
obtained. Suppose the runs of signs are 7. Does the arrangement of signs appear to
have “too few runs”?
[solution:]
n  27, n1  15, n2  12, r  7

r 
2  15 12
43
2  15 122  15  12  15  12 740
1 
, 2 

2
15  12
3
117 .
15  12 15  12  1
7  43 / 3  1 / 2
740 / 117
 2.713  1.64   z 0.95  too few runs!!
8
Download