A comparison of some of pattern identi cation methods Wai-Sum Chan

advertisement
Statistics & Probability Letters 42 (1999) 69 – 79
A comparison of some of pattern identication methods
for order determination of mixed ARMA models
Wai-Sum Chan ∗
Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong
Received February 1998; received in revised form June 1998
Abstract
Model identication is a crucial step in time-series modelling. The orthodox Box–Jenkins (BJ) identication examines the
patterns of the sample autocorrelation function (SACF) and the sample partial autocorrelation function (SPACF). However,
for mixed ARMA processes, the SACF and SPACF often exhibit similar behaviour, which makes the identication much
more dicult. Recently, identication methods using the patterns of some functions of the autocorrelations have been
proposed to supplement the BJ methods. This paper studies some of these proposed procedures. Their performances for
order selection of a mixed ARMA process are compared with an expert system in a simulation study. Comments on each
c 1999 Elsevier Science B.V. All rights reserved
individual identication method are also given. Keywords: Corner method; ESACF table; Model identication; Outliers; SCAN table; Time-series ARMA models
1. Introduction
Suppose that a time-series Yt has the stationary autoregressive-moving average representation
(B)Yt = (B)at ;
(1)
where B is the backshift operator such that Bs Yt = Yt−s ,
(B) = 1 − 1 B − · · · − p Bp ;
(B) = 1 − 1 B − · · · − q Bq ;
(B) has all its roots outside the unit circle, and at is white noise with zero mean and constant variance
a2 ¡ ∞. We further assume that the AR and MA polynomials in Eq. (1) have no common factors.
For all integers k, dening the autocovariance k at lag k by Cov[Yt ; Yt−k ], we have
k = E[Yt Yt−k ]:
(2)
In particular, 0 = Var(Yt ). The general process in Eq. (1) can be fully characterized by
= (1 ; : : : ; p ; 1 ; : : : ; q ; a2 ):
∗
Tel.: +852 2859 2466; fax: +852 2858 9041; e-mail: chanws@hku.hk.
c 1999 Elsevier Science B.V. All rights reserved
0167-7152/99/$ – see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 9 8 ) 0 0 1 9 5 - 3
(3)
70
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
Alternatively, it can be represented by
{0 ; 1 ; : : : ; p+q }:
Dening the autocorrelation at lag k by
k
k =
0
(4)
precisely the same information as in is contained in 0 and
{1 ; 2 ; : : : ; p+q }:
The complete set of autocorrelations 1 ; 2 ; : : : is termed the autocorrelation function (ACF). The partial
autocorrelation function (PACF) discussed in Box and Jenkins (1976, p. 64) can be obtained by the ACF
through the Yule–Walkers equations.
The orthodox Box–Jenkins (BJ) identication method examines the patterns of the ACF and the PACF. If
{Yt } is an MA(q) process, then the ACF has a cut-o at lag q. On the other hand, the PACF cuts o at lag
p for an AR(p) model. However, the BJ identication method is not very useful when dealing with mixed
ARMA processes. Both the ACF and the PACF have tail-o patterns. Simple inspection of the graphs of the
ACF and the PACF would not, in general, give clear values of p and q for mixed models. The diculty is
compounded when the ACF and the PACF are replaced by their estimates.
Since an ARMA process can be completely characterized by its ACF, identication methods using the
patterns of some functions of the autocorrelations have been proposed to supplement the BJ method for
identication of mixed ARMA processes. These are often classied as pattern identiÿcation methods in time
series literature. They include the R and S array method (Gray et al., 1978), the Corner method (Beguin et
al., 1980), the GPAC method (Woodward and Gray, 1981), the EACF method (Tsay and Tiao, 1984), the
SCAN method (Tsay and Tiao, 1985), and many others. Choi (1992, Ch. 5) provided a comprehensive review
of pattern identication methods.
Pattern identication methods have received mixed responses from time series analysts. Some, like Liu
and Hanssens (1982), Lii (1985), and Rezayat and Anandalingam (1988), favour the Corner method. Others,
like Gooijer and Heuts (1981), and Petruccelli and Davies (1984), are less optimistic. Davies and Petruccelli
(1984) presented an argument against the use of the GPAC method. Gooijer and Heuts (1981) mentioned
that the complexity and theoretical diculty of the R and S array method have retained us from using it.
This article concentrates on three selected pattern identication methods. They are the Corner method, the
EACF method, and the SCAN method. These methods are selected on the criteria that they are: (i) relatively
simple to use; (ii) available in some time-series computer packages; and (iii) less criticized by other time-series
analysts. A Monte Carlo study is performed to compare the identication power of these methods for a mixed
ARMA model.
2. Some selected pattern identiÿcation methods
2.1. The Corner method
Beguin et al. (1980) considered the determinant
r
r−1 · · · r−s+1 r+1
r
· · · r−s+2 (r; s) = .
..
.. ..
..
.
.
. r+s−1 r+s−2 · · ·
r (5)
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
71
Table 1
The asymptotic pattern of the Corner table and the SCAN table for an ARMA (p; q) model
MA order (j)
AR order (i)
···
···
q−1
q
q+1
···
···
K
..
.
..
.
..
.
p−1
p
p+1
p+2
..
.
K
X
..
.
..
.
X
X
X
X
..
.
X
···
..
.
..
.
···
···
···
···
..
.
···
X
..
.
..
.
X
X
X
X
..
.
X
X
..
.
..
.
X
O
O
O
..
.
O
X
..
.
..
.
X
O
O
O
..
.
O
X
..
.
..
.
X
O
O
O
..
.
O
···
..
.
..
.
···
···
···
···
..
.
···
X
..
.
..
.
X
O
O
O
..
.
O
Note: X represents nonzero; O represents zero.
for r¿1 and s¿1. The Corner table C is dened by
C(i; j) = ( j + 1; i + 1)
(6)
for i; j = 0; 1; : : : ; K, and K is an arbitrary but large integer. The asymptotic pattern of the Corner table C
for an ARMA (p; q) model is given in Table 1. It is expected that the Corner table produces a large south
east rectangular sub-matrix whose all elements are zeros. The coordinates of the northwest corner of this
zero-rectangle are (p; q). It provides a strong clue for us to identify the order of the underlying process.
In practice, we only have nite number of observations. The autocorrelations in Eq. (5) have to be estimated.
ˆ s). Gooijer and Heuts (1981) complained that it is dicult to locate
We calculate the Corner table using (r;
the possible values of p and q by visual inspection of the numerical elements inside the table. Following
Beguin et al. (1981) and Tsay and Tiao (1984), we simplify the Corner table using indicator symbols. The
Simplied Corner table is dened as

(
ˆ j + 1; i + 1) 

 O; if ¡2
ˆ j + 1; i + 1)) SE((
(7)
C∗ (i; j) =



X; otherwise
for i; j = 0; 1; : : : ; K and “O” is an indicator symbol to represent an element whose value is not dierent from
zero. On the other hand, “X ” represents a nonzero element. The standard error of any ˆ element in the
estimated Corner table is given by
s
0
ˆ = AGA
(8)
SE()
n
where A is a (1 × h) vector with elements a( j) = @=@j ; h is the maximal lag among all the autocorrelations
in Eq. (5), n is the sample size, G is a (h × h) matrix whose (i; j) element is
∞
X
{k k−i+j + k+j k−i − 2k j k−i − 2i k k−j + 2i j 2k }:
k=−∞
(9)
72
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
The estimated matrices  and Ĝ can be obtained by replacing the autocorrelations in Eqs. (5) and (9) by
their corresponding estimates. In practice, we also have to approximate the summation in Eq. (9) using nite
number of terms (say, −506k650).
The Simplied Corner table with indicator symbols is easier for us to compare with its asymptotic pattern
in Table 1. However, it should be noted that the indicator symbols only provide a crude guide and are not
meant to give formal signicant tests.
2.2. The ESACF method
Tsay and Tiao (1984) proposed the extended sample autocorrelation function (ESACF) based on the concept
of iterated least-squares (ILS) regression. Let {y1 ; : : : yn } be a realization from an ARMA (p; q) model. The
lth iterated AR (k) regression is dened as
yt =
k
X
r=1
k;(l)r yt−r −
l
X
s=1
(l)
k;(l)s âk;(l−s)
t−s + ak; t
(10)
for t = (k + l + 1); : : : ; n; k = 1; 2; : : : , and l = 0; 1; : : : , where
âk;(l)t = yt −
k
X
(l)
ˆ k; r yt−r +
r=1
l
X
s=1
(l)
ˆk; s âk;(l−s)
t−s
(11)
(l)
(l)
(l)
(l)
is the estimated residual of the lth iterated AR(k) regression, and {ˆ k; 1 ; : : : ; ˆ k; l } and {ˆk; 1 ; : : : ; ˆk; l } are the
ordinary least-squares estimates obtained from the regression in Eq. (10).
Let
k;(l)t = yt −
k
X
(l)
ˆ k; r yt−r
(12)
r=1
for k = 0; 1; : : : . The extended sample autocorrelation {rk; l } is dened as the sample autocorrelation at lag l
of {k;(l)t }. It should be noted that {r0; l } is simply the standard SACF (ˆl ) of {yt }. We can form a two-way
table from the ESACF. The ESACF table, E, is dened as
E(i; j) = ri; j+1
(13)
for i; j = 0; 1; : : : ; K, and K is an arbitrary but large integer. The asymptotic ESACF pattern for an ARMA
(p; q) model is tabulated in Table 2. There is a remarkable zero-triangle in the table and its vertex is in (p; q)
position. Hence, the ESACF can be a useful tool in model identication, particularly for a mixed ARMA
process.
Tsay and Tiao (1984) proposed using a simplied ESACF table for nite sample situations. The original
ESACF table of values can be summarized in a condensed form by replacing those values that are within two
standard
√ errors of zero by an “O”, and by an “X ” otherwise. The standard errors of {rk; l } can be approximated
by 1= n − k − l.
2.3. The SCAN method
Tsay and Tiao (1985) proposed a smallest canonical correlation (SCAN) approach for tentative order
determination in building ARMA models. Let {y1 ; : : : ; yn } be a realization from an ARMA (p; q) process.
The steps of calculating the SCAN table is summarized as follows:
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
73
Table 2
The asymptotic pattern of the ESACF table for an ARMA (p; q) model
MA order (j)
AR order (i)
···
···
q−1
q
q+1
···
···
K
..
.
..
.
p−1
p
p+1
p+2
..
.
K
X
..
.
X
X
X
X
..
.
X
···
..
.
···
···
···
···
..
.
···
X
..
.
X
X
X
X
..
.
X
X
..
.
X
O
X
X
..
.
X
X
..
.
X
O
O
X
..
.
X
X
..
.
X
O
O
O
..
.
X
···
..
.
···
···
···
···
..
.
···
X
..
.
X
O
O
O
..
.
O
Note: X represents nonzero; O represents zero.
1. For 06i6K and 06j6K (K is an arbitrary but large integer), we compute the smallest eigenvalue (i;
ˆ j)
of the matrix
ˆ j) = Â(i; j)B̂(i; j);
(i;
(14)
where
Â(i; j) =
X
t
B̂(i; j) =
X
t
!−1
Zi; t Zi;0 t
X
t
!
Zi; t Zi;0 t−j−1
!−1
Zi; t−j−1 Zi;0 t−j−1
X
t
;
!
Zi; t−j−1 Zi;0 t
are (i + 1) × (i + 1) matrices with summation t from (i + j + 2) to n, and
Zi; t = (yt ; : : : ; yt−i )0 :
2. For each i and j, a (i; j) statistic is computed
(i;
ˆ j)
;
(i; j) = −(n − i − j) ln 1 −
(i; j)
(15)
where
(i; 0) = 1; (i; j) = 1 + 2
j
X
ˆk (!)
k=1
and ˆk (!) denotes the sample autocorrelation of the transformed series {!i; t } at lag k. The transformed
series can be obtained by
!0; t = yt ;
( j)
( j)
!i; t = yt − ˆ 1 yt−1 − · · · − ˆ i yt−i ;
( j)
( j)
ˆ j) corresponding to (i;
ˆ j).
where (1; −ˆ 1 ; : : : ; −ˆ i ) is a normalized eigenvector of (i;
3. A two-way SCAN table S can be arranged by
S(i; j) = (i; j)
for i; j = 0; 1; : : : ; K:
(16)
74
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
Table 3
A hypothetical simplied pattern identication table
MA order (j)
AR order (i)
0
1
2
3
4
5
6
7
0
1
2
3
4
X
X
X
O
O
X
X
O
O
O
X
X
O
O
O
X
X
O
O
O
X
X
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
Tsay and Tiao (1985) showed that (i; j) is a 12 random variable asymptotically when i = p and j¿q or
when i¿p and j = q. The asymptotic pattern of the SCAN table is given in Table 1. It has a similar pattern
as the Corner method. There is a large zero-rectangle with (p; q) as the upper-left vertex inside the table.
As in the case of the ESACF, a simplied SCAN table is commonly used in practice. Liu and Hudak
(1994, p. 545) suggested that a symbol “X ” is displayed to indicate a position where the statistic (i; j) is
signicant at the one per cent level, and the symbol “O” is displayed otherwise. Therefore, we can determine
possible values for p and q by searching for a corner in the simplied SCAN table.
3. Monte Carlo study
Computations for constructing pattern identication tables described in Section 2 are relatively simple.
Mareschal and Melard (1988) provided FORTRAN codes for the Corner method. The Corner table can also
be eciently calculated using matrix-based computer languages (e.g., Splus, GAUSS, MATLAB, SCA,: : : ,
etc.). The ESACF and the SCAN methods are incorporated in the SCA system (Liu and Hudak, 1994).
Despite the availability of ecient algorithms for the selected pattern identication methods there are only
a few empirical studies on them. Gooijer and Heuts (1981) examined the identication power of the Corner
method based on 15 simulation runs. Rezayat and Anandalingam (1988) compared the ESACF and the Corner
method in a Monte Carlo experiment with 50 replications. The mean and standard deviation of the elements
in the SCAN table were studied by Tsay and Tiao (1985) using simulation with 400 runs. Unfortunately, the
identication power of the SCAN method was not examined in their paper.
One of the major barriers in studying the order selection power of a pattern identication method is the
visual searching of the corner (vertex) inside the simplied table. It is subjective and time consuming for a
large-scale simulation experiment.
In this section, we compare the identication power of the selected methods through a larger scale (1000
runs) simulation. Since most time series analysts (especially the beginners) heavily rely on the simplied tables,
our study will only be based on the patterns of the symbols “X ” and “O” inside each table. We propose some
objective rules for the computer to locate the potential corners=vertices automatically. Following Rezayat and
Anandalingam (1988), we allow each method to select a set of orders instead of a single identication. The
computer is instructed to include (i; j) into the order set i
1. the symbol in the (i − 1; j) position is an “X ”; and
2. the symbol in the (i; j − 1) position is an “X ”; and
3. the symbol in the (i; j) position is an “O”.
The checking will be passed if (i − 1) ¡ 0 or ( j − 1) ¡ 0. For example, in Table 3, orders (0,5), (2,1) and
(3,0) are selected into the order set. These rules might not be perfect, but we nd that they are quite helpful
in specifying low order ARMA models.
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
75
Table 4
Number of selected combinations (p; q) in 1000 simulations from the ARMA (2,1) model in Eq. (17) for various pattern identication
methods and the SCA Expert System
(p; q)
n
(0; 0)
(1; 0)
(2; 0)
(3; 0)
(4; 0)
(0; 1)
(0; 2)
(0; 3)
(0; 4)
(1; 1)
(1; 2)
(1; 3)
(2,1)
(2; 2)
(2; 3)
(3; 1)
(3; 2)
NSa
Total
a
CORNER
100
ESACF
200
400
1000
100
SCAN
200
400
0
102
891
7
4
0
13
210
344
675
163
27
5
2
0
1
0
0
0
945
53
11
0
0
9
149
410
475
74
45
5
5
0
0
0
0
154
805
39
0
0
0
2
48
673
196
799
9
9
6
1
0
0
0
174
793
0
0
0
0
0
314
514
981
14
15
14
5
0
0
98
126
233
0
31
354
376
421
485
38
342
55
0
73
2
0
0
0
58
165
0
0
19
207
117
628
207
553
153
8
79
13
0
0
0
9
94
0
0
0
2
5
541
340
577
167
36
83
58
2444
2181
2741
2824
2634
2207
1912
1000
0
0
0
0
22
0
0
0
0
0
192
529
591
160
115
53
103
1765
100
EXPERT
200
400
1000
100
200
400
1000
1
13
292
204
65
1
1
4
0
166
7
3
44
21
7
0
0
171
1000
0
8
148
456
111
1
0
0
0
120
4
3
48
22
3
1
4
71
1000
0
0
49
717
140
0
1
0
0
36
1
0
41
6
0
0
0
9
1000
0
0
3
883
108
0
0
0
0
2
0
0
4
0
0
0
0
0
1000
0
0
575
400
52
0
109
574
276
687
227
4
30
7
4
15
22
0
0
268
687
64
0
0
72
418
331
591
40
266
12
3
27
14
0
0
247
704
55
0
0
0
36
46
788
140
463
8
5
19
12
0
0
215
754
40
0
0
0
0
0
476
436
502
25
6
14
10
2982
2793
2523
2478
NS represents non-stationary models.
Following ARMA (2,1) process is considered:
(1 − 0:8B)(1 − 0:5B)Yt = (1 + 0:5B)at :
(17)
The parameters are chosen to reduce possible cancellation eects. In the rst experiment, we consider time
series which are generated from model (17). The rst 100 observations are discarded to reduce the eects of
starting values. We consider sample size n = 100; 200; 400; 1000; a2 = 1; the greatest possible order K = 5.
Simplied tables for the Corner method, the ESACF method and the SCAN method are obtained by the SCA
system (Liu and Hudak, 1994). Using our proposed rules, the selected order set for each method is recorded.
The experiment is repeated 1000 times and the numbers of selected combinations (p; q) for dierent methods
are summarized in Table 4. For comparison purpose, order selection results by the SCA Expert System are
also reported. This expert system is described in detail by Liu (1993).
Outliers or unusual observations could easily aect the performance of the identication methods in a real
application. In the second experiment, a few outlying observations are introduced into the data. Let Yt be the
outlier-free time series generated from model (17), the contaminated time series Zt is obtained by
Zt = Yt +
m
X
!It(Ts ) ;
(18)
s=1
where m is the number of outliers, ! represents magnitude of the outliers, and
(
1; t = Ts ;
(Ts )
It =
0; t 6= Ts ;
(19)
76
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
Table 5
Rates of under-specication (in percent) for various method in the presence of outliers
Method
n
CORNER
100
200
400
1000
ESACF
100
200
400
1000
SCAN
100
200
400
1000
EXPERT
100
200
400
1000
!
0
3
5
7
9
∞
68.8
65.1
7.4
0.0
66.0
63.5
40.6
0.0
61.1
61.0
54.3
1.9
62.6
55.8
47.4
47.4
63.7
63.6
34.9
49.1
100.0
100.0
100.0
100.0
20.9
5.3
0.3
0.0
37.5
33.9
2.8
0.2
29.9
18.6
4.4
0.7
43.9
16.7
7.8
7.3
51.7
39.0
2.9
2.8
82.4
79.9
82.2
87.3
46.0
21.4
11.6
8.7
63.0
54.3
25.9
9.3
54.5
56.0
52.3
8.7
55.9
50.8
49.2
45.7
59.6
53.4
39.2
49.2
99.4
99.9
99.9
100.0
60.8
30.7
9.4
0.5
95.4
93.2
95.5
68.9
92.6
94.9
97.6
99.8
96.7
93.3
98.9
88.8
93.8
96.4
99.0
98.8
100.0
100.0
100.0
100.0
is the indicator variable representing the presence or absence of an outlier at time Ts . Time series are generated
from model (18) with the same basic setup as the rst experiment.
Three outliers are introduced at T1 =0:25n; T2 =0:5n; T3 =0:75n with magnitude !=0; 3; 5; 7; 9 and ∞. The
case of ! → ∞ is approximated by setting !=10 000. The experiment is repeated 1000 times. From a practical
point of view, under-specication maybe more serious a problem than minor over-specication. For example,
the true model is ARMA (2,1). Therefore, an AR (2) specication is a more “serious mistake” than an AR
(3) or an ARMA (1,2) specication. Consequently, for the second experiment, the rates of under-specication
(in percent) for each identication method are reported in Table 5.
4. Discussions
For the rst experiment, the number of correct identication for each method in 1000 simulations is typed in
bold face in Table 4. A “correct identication” means that the true order has been selected into the identied
order set by the method. We observe that the number of correct identication increases as sample size increases
for each pattern identication method. For example, in the Corner method, the number of correct identication
is 5 for n = 100, 45 for n = 200, 799 for n = 400, and it further climbs up to 981 for n = 1000. These agree
with the asymptotic results in each method.
For the Corner method, it works well in the large sample situations (n=400; 1000). However, its performance
for the cases with n6200 are very discouraging. The method only includes the true order 0.5% (n = 100)
and 4.5% (n = 200) of the total 1000 simulation runs. When the sample size is small, it erroneously favours
the AR(2) identication. Out of the 1000 simulations, the method selects (p; q) = (2; 0) into the order set in
891 runs for n = 100, and 945 runs for n = 200.
For the ESACF method, its performance is quite robust to the sample size. The number of correct identication is 342 for n = 100, 553 for n = 200, 577 for n = 400, and 591 for n = 1000. The average number of
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
77
Table 6
Identication ratios (in percent) for various methods
n
100
200
400
1000
CORNER
ESACF
SCAN
EXPERT
0.2
2.1
29.1
34.7
13.0
25.1
30.2
33.5
1.0
9.5
18.4
20.3
4.4
4.8
4.1
0.4
element in the selected order set is fast decreasing with the sample size. It is 2.634 for n = 100, 2.207 for
n = 200, 1.912 for n = 400, and 1.765 for n = 1000. It implies that the ESACF method usually makes a more
concrete identication for time series with larger sample size.
The results of the SCAN method are less optimistic. Its most favoured combination is (1; 1) for n = 100,
(3; 0) for n = 200, (1, 2) for n = 400, and (3; 0) for n = 1000. However, none of them is the true order (2; 1)
of model (17).
For comparison purpose, automatical identication results from the SCA Expert System (Liu, 1993) are also
reported in Table 4. It should be noted that the expert system is only allowed to make a single identication in
each simulation run. It is, therefore, not fair to directly compare the numbers of the expert system with results
of other pattern identication methods. However, the performance of the expert system is rather disappointing
for sample size as large as n = 1000. It clearly favours an ARMA (3,0) model for the simulated time series.
Since model (17) can be written as
Yt − 1:8Yt−1 + 1:3Yt−1 − 0:65Yt−3 + 0:33Yt−4 − 0:16Yt−5 + 0:08Yt−6 · · · = at ;
(20)
the frequent choice of ARMA (3,0) model by the expert system is not totally unreasonable. Our limited
experience in this study shows that there is still plenty of room for the expert system to be improved.
In order to have a criterion for evaluating the performance of all the identication methods, we compute
the identication ratio
R=
Number of correct identication in the 1000 runs
:
Total number of identication in the 1000 runs
(21)
The results are summarised in Table 6. .
Based on this criterion, the ESACF method outperforms the others. The Corner method has a reasonable
performance in large sample situations. On the other hand, our simulation study suggests that the SCAN
method and the SCA Expert System does not have adequate identication power in many cases.
Performances of the identication methods in the presence of time series outliers are summarized in
Table 5. All methods are adversely inuenced by the outlying observations. Only the ESACF method can
provide some resistance to extreme observations when the sample size is large.
Finally, some other characteristics of the identication methods are compared in Table 7. The ESACF and
the SCAN method can handle directly nonstationary processes. On the other hand, we are not able to get the
ACF from nonstationary time series. Without the proper autocorrelations, the Corner table in Eq. (6) is not
dened. Furthermore, the Corner method can easily underspecify the order if one of the AR roots is close, but
not on, the unit circle. For example, consider the model used. If (1 − 0:8B) in the AR polynomial is replaced
by (1 − 0:98B), the estimated serial correlations in the determinant of Eq. (5) are all very close to one. The
Corner method can easily identify the model as an AR(1). The multivariate extension of the ESACF approach
was proposed by Tiao and Tsay (1983). The SCAN method was also generalised by Tiao and Tsay (1989) for
identifying vector ARMA models. Liu and Hanssens (1982) discussed the use of the Corner table to select the
orders in a rational transfer function model. Unfortunately, none of the selected pattern identication methods
is able to provide direct extension to identify seasonal time series models. We also record in Table 7 the
78
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
Table 7
Some other characteristics of the identication methods
CORNER
ESACF
SCAN
EXPERT
Extensions to:
1. Nonstationary time series
2. Seasonal time series
3. Transfer function models
4. Vector time series
No
No
Yes
No
Yes
No
No
Yes
Yes
No
No
Yes
Yes
Yes
Yes
No
CPU time (in seconds)
4
7
80
15
CPU time (in seconds) requirement for each identication method (n=1000) using the SCA package on a
SGI PowerChallenge (running IRIX 6.2) system at the National University of Singapore.
5. Conclusions
For outlier-free time series, the general conclusions observed from Table 6 are (a) the Corner method
works well when the sample size is large, but it fares poor in small to moderate sample sizes; (b) the ESACF
method is more robust with respect to sample size and works reasonably well; and (c) the automatic method
fares poor for all sample size used. However, as noted in the Section 4, the automatic method only provides
a single model whereas multiple models are allowed for other methods. Therefore, direct comparison of the
identication power for the expert system with the other methods should be avoided.
For time-series contaminated with a few outlying observations, the conclusions obtained from Table 5 are
(a) no identication method in this study can resist extremely large outliers; (b) the adverse eects in model
identication caused by outliers can be generally diluted by increasing the sample size; and (c) only the
ESACF method can provide some resistance to moderate outliers when the sample size is large.
Finally, we should emphasize that our discussions are conned to the ARMA(2,1) model we have studied.
Extrapolation to other situations should be done with caution.
Acknowledgements
The author is grateful to the anonymous referee for his=her helpful comments on an earlier version of
this paper.
References
Beguin, J.M., Gourieroux, C., Monfort, A., 1980. Identication of a mixed autoregressive-moving average process: the corner method.
In: Anderson, O.D. (Ed.), Time Series. North-Holland, Amsterdam.
Box, G.E.P., Jenkins, G.M., 1976. Time Series Analysis, Forecasting and Control, 2nd ed., Holden-Day, San Francisco.
Choi, B., 1992. ARMA Model Identication. Springer, New York.
Davies, N., Petruccelli, J.D., 1984. On the use of the general partial autocorrelation function for order determination in ARMA (p; q)
processes. J. Amer. Statist. Assoc. 79, 374–377.
de Gooijer, J.G., Heuts, R.M.J., 1981. The corner method: an investigation of an order determination procedure for general ARMA
processes. J. Oper. Res. Soc. 32, 1039 –1046.
Gray, H.L., Kelley, G.D., McIntire, D.D., 1978. A new approach to ARMA modelling. Commun. Statist. B7, 1–77.
Lii, K.S., 1985. Transfer function model order and parameter estimation. J Time Series Anal. 6, 153–169.
Liu, L.M., 1993. A new expert system for time series modelling and forecasting. ASA Proc. Business and Economic Statistics Section,
424 –429.
W.-S. Chan / Statistics & Probability Letters 42 (1999) 69 – 79
79
Liu, L.M., Hanssens, D.M., 1982. Identication of multiple-input transfer function models. Commun. Statist. A11, 297–314.
Liu, L.M., Hudak, G.B., 1994. Forecasting and Time Series Analysis using the SCA Statistical System, Scientic Computing Associates
Corp., P.O. Box 4692, Oak Brook, Illinois 60522.
Mareschal, B., Melard, G., 1988. The corner method for identifying autoregressive moving average models. Appl. Statist. 37, 301–316.
Petruccelli, J.D., Davies, N., 1984. Some restrictions on the use of corner method hypothesis tests. Commun. Statist. A13, 543–551.
Rezayat, F., Anandalingam, G., 1988. Using instrumental variables for selecting the order of ARMA models. Commun. Statist. A17,
3029–3065.
Tiao, G.C., Tsay, R.S., 1983. Multiple time series modelling and extended sample cross-correlation. J. Bus. Econom. Statist. 1, 43–56.
Tiao, G.C., Tsay, R.S., 1989. Model specication in multivariate time series. J. Roy. Statist. Soc. B51, 157–213.
Tsay, R.S., Tiao, G.C., 1984. Consistent estimates of autoregressive parameters and extended sample autocorrelation function for stationary
and nonstationary ARMA models. J. Amer. Statist. Assoc. 79, 84 –96.
Tsay, R.S., Tiao, G.C., 1985. Use of canonical analysis in time series model identication. Biometrika 72, 299–315.
Woodward, W.A., Gray, H.L., 1981. On the relationship between the S array and the Box-Jenkins method of ARMA model identication.
J. Amer. Statist. Assoc. 76, 579–587.
Download