Extreme Value Distribution of a Recursive–type Detector in a Linear Model

advertisement
Extreme Value Distribution
of a Recursive–type Detector in a Linear Model
Alexander Aue
1
Mario Kühn
2
Abstract: We study a CUSUM–type monitoring scheme designed to sequentially detect
changes in the regression parameter of an underlying linear model. The test statistic
used is based on recursive residuals. Main aim of this paper is to derive the limiting
extreme value distribution under the null hypothesis of structural stability. The model
assumptions are flexible enough to include very general classes of error sequences such as
augmented GARCH(1,1) processes. The result is underlined by an illustrative simulation
study.
AMS 2000 Subject Classification: Primary 62J05; Secondary 62L10
Keywords and Phrases: linear models; recursive residuals; augmented GARCH processes; Darling–Erdős limit theorems; sequential testing.
1
Department of Mathematics, University of Utah, 155 South 1440 East, Salt Lake City, UT 84112–
0090, USA, email: aue@math.utah.edu
2
Mathematisches Institut, Universität zu Köln, Weyertal 86–90, 50938 Köln, Germany, email:
mkuehn@math.uni-koeln.de
Corresponding author: A. Aue, phone: +1–801–581–5231, fax: +1–801–581–4148.
Research partially supported by NATO grant PST.EAP.CLG 980599 and NSF–OTKA grant INT–
0223262
1
Introduction
Testing for structural stability of a time series is of major interest in statistics and also in
related areas such as engineering and business. Following Chu et al. (1996), we will focus
on data that are assumed to be generated by a linear model. Based on a historical data
set (of size m) we wish to perform an on–line inspection, that is a sequential monitoring,
to check whether or not the assumption of structural stability is still justified. If so,
the monitoring is continued, since observations are regarded as freely [at least cheaply]
available. [Possible applications in economics and finance, and in geophysics can be found
in Chu et al. (1996) and Horváth et al. (2004), cf. also the references therein.]
In this setting, test statistics are defined in terms of suitably defined detectors Γ(m, k)
and thresholds g(m, k), where the null hypothesis of structural stability is rejected as soon
as a lag k is reached such that the absolute value of Γ(m, k) crosses the value of g(m, k),
that is at the stopping time
τm = inf {k ≥ 1 : |Γ(m, k)| ≥ g(m, k)} .
Note that the index k labels the time that has elapsed after the monitoring has commenced. Horváth et al. (2004) studied CUSUM–type detectors based on [recursive] residuals. Their approach, yielding
√ asymptotic results (as m → ∞), included threshold functions of the form g(m, k) = mh(k/m), with h satisfying regularity conditions that assure
a finite and non–degenerate limit process. Special emphasis is put on the family
hγ (t) = tγ (1 + t)1−γ ,
0 ≤ γ < 21 ,
(1.1)
which allows for a flexible sensitivity adjustment of the test procedure by chooosing different values of γ. Now, simulation studies [cf. Horváth et al. (2004) and Aue et al. (2006b)]
and theoretical results for a change in the mean scenario [cf. Aue and Horváth (2004)]
imply that h1/2 —which, however, is excluded due to the law of the iterated logarithm—
would be a desirable boundary function in terms of an early detection of changes in the
regression parameter of the underlying linear model. A non–trivial extension leads to a
Darling–Erdős limit theorem for non–recursive residuals [cf. Horváth et al. (2005)].
Instead, we will deal with the recursive residuals here. They have played an important
role in statistics ever since their introduction by Brown et al. (1975), since they offer,
for example, the possibility to include the additional information obtained during the
monitoring process, while the non–recursive residuals only rely on the historical data and,
hence, no updating takes place even though the assumed model is still considered to be
adequate.
2
Moreover, we will allow for more flexible choices of the threshold h by defining only
the limit relations for very small and very large arguments. To be able to perform an
asymptotic analysis we additionally need to introduce a maximal number of observations
which is allowed to grow like a power of the historical data size m, but stronger than
linearly.
Also, instead of independent and identically distributed errors, we consider a much
wider class of innovations which include, for instance, augmented GARCH(1,1) processes
as possible noise sequences. Actually, the proofs show that basically any sequence satisfying a [strong] invariance principle that allows for a blocking argument along the lines of
Aue et al. (2006a) could be used.
Our main result is an extreme value asymptotic for the recursive–type CUSUM detector
under the assumption of structural stability. The finite sample properties are reported in
a simulation study.
The paper is organized as follows. In Section 2 we introduce the model and the main
results. Simulation results are stated and discussed in Section 3, while proofs are given
in Sections 4 and 5.
2
Model assumptions and main results
(a) Assumptions and results. In what follows we study the linear model
yi = xTi β i + εi ,
1 ≤ i < ∞,
where {xi } are p×1 dimensional random vectors, {β i } are p×1 dimensional deterministic
vectors and {εi } is some noise sequence. More specifically, we impose the following set of
conditions. Let | · | denote maximum norm of both vectors and matrices.
(A) The noise sequence {εi } satisfies
Eεi = 0,
0 < Eε2i ≤ C
Eεi εj = 0 (i 6= j),
with some C > 0.
Moreover, we assume that the following invariance principles hold true. For each
m, there are independent Wiener processes {W1,m (t) : t ≥ 0} and {W2,m (t) : t ≥ 0}
and a constant σ > 0 such that
sup
1≤k<∞
1
k 1/ν
m+k
X
εi
i=m+1
− σW1,m (k)
= OP (1)
3
(m → ∞)
(2.1)
and
m
X
i=1
εi − σW2,m (m) = OP m1/ν
(m → ∞)
(2.2)
with some ν > 2. Confer part (b) of the section, where we show that augmented
GARCH(1,1) processes satisfy conditions (2.1) and (2.2).
(B) The sequences {εi } and {xi } are independent.
(C) For all i ≥ 1, it holds that xTi = (1, x2,i , . . . , xp,i).
(D) There is a positive definite matrix C and a constant κ > 0 such that
n
1 X
xi xTi
n
i=1
− C
= O n−κ
(n → ∞).
a.s.
(E) There are random variables ξ, m0 and a constant ρ > 0 such that
m+k
X
(xi
i=m+1
− c1 )
≤ ξ (m + k)
1/2−ρ
+
q
k log N
for all 1 ≤ k ≤ N and m ≥ m0 , where c1 is the first column of C defined in part
(D) and N is the maximal number of observations [see also assumption (G) below].
Condition (E) surely is of rather technical nature. However, Horváth et al. (2005) pointed
out that, under (G) below, it is satisfied for a large class of random variables, for instance,
if the {xi } fulfill a strong invariance principle.
The next assumption states that there is no change in the regression parameter in what
is called a training period of size m, that is
(F) β i = β 0 ,
1 ≤ i ≤ m.
Condition (F) is particularly important because the test statistic to be defined can use
this historical data set as basis for comparisons with later observations. We are interested
in testing the null hypothesis of structural stability
H0 :
βi = β0 ,
i = m + 1, m + 2, . . . ,
4
(2.3)
against the alternative hypothesis of a structural break
there is a k ∗ ≥ 1 such that β i = β 0 , m < i < m + k ∗ ,
but β i = β ∗ , i = m + k ∗ , m + k ∗ + 1, . . . with β 0 6= β ∗ ,
HA :
(2.4)
where the parameters β 0 , β ∗ and k ∗ , the so–called change–point, are unknown.
Define the recursive residuals
ε̃i = yi − xTi β̂ i−1 ,
m + 1 ≤ i < ∞.
In contrast to the non–recursive residuals investigated in Horváth et al. (2005), these
residuals use the additional information obtained from the observations ym+1 , . . . , yi−1 to
calculate the least–squares estimator β̂ i−1 , which, at time n ≥ 1, is given by

β̂ n = 
n
X
j=1
−1
xj xTj 
n
X
xj yj .
j=1
For k, m ≥ 1 define the CUSUM of the ε̃j by
Q̃(m, k) = ε̃m+1 + . . . + ε̃m+k
and the stopping rule
(
τm = inf k ≥ 1 : |Q̃(m, k)| ≥
√
k
mh
m
!)
[with the understanding that inf ∅ = ∞], where h is a suitable boundary function. If h is
positive and continuous on (0, ∞) and if it satisfies the growth conditions
√
tγ
t log log t
lim
= 0,
lim sup
<∞
t→0 h(t)
h(t)
t→∞
with some 0 ≤ γ < min{κ, 21 }, then Horváth et al. (2004) proved that, under (B)–(F)
with iid errors having more than two finite moments,
lim P
m→∞
(
)
(
)
|Q̃(m, k)|
1
|W (t)|
sup √
≤1 ,
≤ 1 = P sup
σ̂m k≥1 mh(k/m)
t>0 h(t)
where {W (t) : t ≥ 0} denotes a standard Wiener process and σ̂m a consistent estimator.
A modified proof, taking into account the specifications made in (A), establishes the same
5
result also for the more general errors that are under consideration here. Details may be
found in Kühn (2006), but, to conserve space, they are omitted here.
It can be seen, using the law of the iterated logarithm for Wiener processes at zero,
that γ < 1/2 is a necessary condition to obtain a non–degenerate and finite limit. Here,
we are interested in imposing weaker conditions on the threshold h. This goes along with
a modification of the stopping rule τm . Instead of a continued monitoring we will stop
according to
τ̃m = min{τm , N}.
Therein, N denotes a maximal number of observations for which we assume
N
> 0.
m
Furthermore, the behavior of h for very small and very large arguments is changed to
inf
(G) N = O mλ with some λ ≥ 1 and lim
m→∞
(H) lim
q
t(1 + t)
= a > 0,
h(t)
√
t
< ∞,
(I) lim sup
t→∞ h(t)
t→0
(J) h is positive and continuous on (0, ∞).
Specific examples of boundary functions satisfying (H)–(J) are, for instance,
h(t) =
q
t(1 + t)
and
hb (t) =
q
(1 + t)[b2 + log(1 + t)].
The latter family of functions has the advantage that, in their case, the limit distribution
supt |W (t)|/hb(t) is known, which can be used to control the probability of a false alarm.
More detailed expositions on this topic may be found in Lerche (1984). Here, however,
the main focus is on the following extreme value asymptotic for the new stopping rule τ̃m .
Theorem 2.1 Let the assumptions (A)–(J) be satisfied. Then, under H0 ,
)
(
|Q̃(m, k)|
− D(log m) ≤ t = exp −e−t
lim P max A(log m)
m→∞
1≤k≤N
σag(m, k)
√
for all real t, where σ 2 = Eε21 , g(m, k) = mh(k/m) and, for x > 0,
A(x) =
q
2 log x,
D(x) = 2 log x +
1
1
log log x − log π.
2
2
6
The application of Theorem 2.1 requires the estimation of the unknown variance parameter
σ 2 . Since there is no change in the historical data, a natural estimator is
m
1 X
2
σ̂m =
ε̂2i ,
ε̂i = yi − xTi β̂ m .
m − p i=1
This leads to the following corollary.
Corollary 2.1 The statement of Theorem 2.1 remains true if σ is replaced by the estimator σ̂m .
The proofs of Theorem 2.1 and its Corollary 2.1 are given in Section 4.
(b) Augmented GARCH(1,1) processes. We follow the presentation in Aue et al.
(2006a). A process {εj } is called augmented generalized autoregressive conditional heteroskedastic of order (1, 1), shortly augmented GARCH(1,1), if it satisfies the relations
εj = σj ξj ,
(2.5)
2
Λ(σj2 ) = w(ξj−1) + b(ξj−1)Λ(σj−1
),
(2.6)
where j runs through all integers, Λ, b and w are real–valued functions and {ξj } is a
sequence of independent, identically distributed random variables. To solve (2.6) for σj2 ,
we need that Λ−1 exists.
Example 2.1 (i) On letting Λ(x) = x, w(x) = ω > 0 and b(x) = β+αx2 , where α, β ≥ 0,
we obtain the standard GARCH(1,1) model as introduced by Bollerslev (1986).
(ii) The exponential GARCH model of Nelson (1991) is defined by Λ(x) = log x, w(x) =
ω + α1 x + α2 |x| and b(x) = β log x, where ω > 0, α1 , α2 , β ≥ 0.
For plenty of further polynomial [Λ(x) = xδ with some δ > 0] and exponential GARCH(1,1)
submodels [Λ(x) = log x] of considerable interest, which are subsumed under the notion
of augmented GARCH(1,1) processes, we refer to Aue et al. (2006a), and Carrasco and
Chen (2002).
Next, we give a sufficient criterion for the existence of a strictly stationary solution to
(2.5) and (2.6). The solution is called non–anticipative if, for any j, εj is independent of
{ξi : i > j}. Set log+ x = log(max{x, 1}).
Theorem 2.2 [Aue et al. (2006a)] Let E log+ |w(ξ0)| < ∞ and E log+ |b(ξ0 )| < ∞.
If E log |b(ξ0 )| < 0, then the defining equations (2.5) and (2.6) have a unique strictly
stationary and non–anticipative solution given by
εj =
q
Λ−1 (Λ(σj2 ))ξj ,
Λ(σj2 ) =
∞
X
w(ξj−i)
i=1
i−1
Y
k=1
7
b(ξj−k ).
The following result gives conditions for the existence of moments of Λ(σj2 ).
Theorem 2.3 [Aue et al. (2006a)] Let ν > 0 and E log |b(ξ0 )| < 0. If E|w(ξ0 )|ν < ∞
and E|b(ξ0 )|ν < 1, then E|Λ(σj2 )|ν < ∞.
Since σj and ξj are independent, it is clear that the previous theorem also implies a
criterion for the finiteness of E|ε1 |ν by additionally imposing the condition E|ξ0 |ν < ∞.
Both theorems have a necessity counterpart which is suppressed here, but can be found
in Aue et al. (2006a).
To prove Theorem 2.1 it is necessary to obtain more insight into the structure of the
augmented GARCH errors. In particular, we need an approximation of their partial sums
with a Wiener process.
Proposition 2.1 If {εi} satisfies (2.5), (2.6) and the conditions of Theorem 2.2 with
E|ε1 |ν < ∞ for some ν > 2, then, as k → ∞,
k
1 X
k 1/ν i=1
εi
− σW (k)
= O(1)
a.s.,
where {W (t) : t ≥ 0} denotes a Wiener process.
Proposition 2.1 is based on the general strong invariance principle in Eberlein (1986). A
refinement states that the partial sums of the errors ε1 , . . . , εm of the training period and
those of the errors εm+1 , . . . , εm+k appearing after the monitoring process has commenced
can be approximated by independent Wiener processes [confer assumptions (2.1) and
(2.2)]. Towards this end, we need, in case of a polynomial–type GARCH model, to
impose the following technical assumptions on the function Λ.
• Λ(σ02 ) ≥ ∆ with some ∆ > 0,
• Λ′ exists, is nonnegative, and there are constants C and µ such that
1
′ −1
Λ (Λ (x)) ≤ Cxµ
for all x ≥ ∆.
(2.7)
Obviously, these conditions are violated for exponential–type GARCH errors. But luckily
they are not required [cf. the proofs of Lemmas 5.1–5.3 below]. Throughout we work either
with a polynomial–type sequence {εj } satisfying these conditions or with an exponential
GARCH process without further restrictions.
8
Proposition 2.2 If {εi} satisfies (2.5), (2.6) and the conditions of Theorem 2.2 with
E|ε1 |ν < ∞ for some ν > 2, then, for each m, there are independent Wiener processes
{W1,m (t) : t ≥ 0} and {W2,m (t) : t ≥ 0} such that conditions (2.1) and (2.2) are satisfied.
Proposition 2.2 is proved via an exponential inequalitiy for the difference between σj and
a suitably constructed variable from a sequence of blockwise independent variables which
is “close” to the original sequence {σj }. Towards this end, let ζ ∈ (0, 1) and define the
sequence {σ̃k2 } as the solution of the equations
ζ
Λ(σ̃j2 )
=
j
X
w(ξj−i)
i=1
i−1
Y
b(ξj−k ).
(2.8)
k=1
Now, set ε̃j = σ̃j ξj . Then, clearly, ε̃i and ε̃k are independent if i < k − k ζ . We will show
that
n
X
max εi
1≤n<∞ i=1
−
n
X
i=1
ε̃i = O(1)
a.s.,
so that it suffices to approximate the partial sums of the ε̃j , which, in turn, enables us
to define Wiener processes that are independent. The detailed proofs of Propositions 2.1
and 2.2 are given in Section 5.
3
Simulations
In this section, we report the results of a simulation study that includes both independent,
identically distributed innovations and GARCH(1,1) errors as data generating processes
(DGPs). The finite sample properties obtained here underline the empirical observations
as found in earlier contributions by Horváth et al. (2004) and Aue et al. (2006b), which
are dealing with open–end CUSUM–type procedures based on the threshold family
gγ (m, k) =
√
mhγ
!
k
,
m
γ ∈ [0, 21 ),
where hγ (t) is defined in (1.1). In greater generality, however, we will not merely consider
a change in the mean model or a simple linear regression, respectively, but a linear model
with three–dimensional (random) xi and (nonrandom) β i . The simulations are performed
using the free software R, see http://cran.r-project.org for more information.
(a) Choice of data generating processes. We study the following two data generating
processes based on the linear model yi = xTi β i + εi that differ in the assumptions made
on the innovations εi . In particular, we use
9
• DGP–1: {εi } are independent, standard normally distributed random variables.
• DGP–2: {εi } are GARCH(1,1) variables, that is εi = σi ξi , where {ξi } is a sequence
of independent, standard normally distributed random variables and with parameter
specifications
ω = 0.01,
α = 0.09,
β = 0.9.
These choices imply, since α + β < 1, that {εi } is a strictly stationary sequence
with Var εi = 1. We use a standard burn–in phase of 100 repetitions initialized with
a standard normal random variable to approach the stationary distribution before
starting the actual simulations. The programming is done with the fSeries package
which is available under http://www.itp.phys.ethz.ch/econophysics/R/2.2.
The parameter specifications are similar to what is often observed for financial data,
where α tends to be close to 0 while β is close to 1. Also, the moment conditions
needed to ensure that (2.1) and (2.2) hold are satisfied.
Moreover, it is assumed that xi = (1, x2,i , x3,i )T with {x2,i } and {x3,i } being independent
sequences of independent standard normally distributed random variables. Consequently,
the β i are three–dimensional vectors; we choose β 0 = (1, 1, 1)T under H0 .
(b) Empirical Sizes. We compare the asymptotic extreme value tests based on the non–
recursive detector investigated in Horváth et al. (2005) with those based on the recursive
detector studied in the present paper. The detector in Horváth et al. (2005) is based on
the non–recursive CUSUM
ε̂i = yi − xTi β m
Q̂(m, k) = ε̂m+1 + . . . + ε̂m+k ,
(m + 1 ≤ i < ∞)
and leads to the stopping rule
(
τ̂m = inf k ≥ 1 : |Q̂(m, k)| ≥
√
k
mh
m
q
!)
,
where h(t) = t(1 + t). The same threshold h(t) will also be used in the recursive setting.
At first, we consider the performance under the null hypothesis of no change in the
regression parameter. We have performed simulations which use different sizes of the
training period, namely m = 25, 100, 300, and have reported the number of false alarms
up to maximal numbers, chosen as multiples of m, when we terminated the monitoring.
The procedure is based on 2,500 replications. An overview can be found in Tables 1
10
and 2. It can be seen that the empirical sizes of the tests performed on DGP–1 stay far
below the asymptotic critical levels 0.05 and 0.10, while those performed on DGP–2 have
approximately the asymptotic level predicted by our theory. It should be noted that both
procedures used the same set of observations. While, on one hand, one can expect tests
m
25
100
300
N
2m
4m
6m
8m
10m
2m
4m
6m
8m
10m
2m
4m
6m
8m
10m
recursive
α = 0.05 α = 0.10
0.0072
0.0360
0.0096
0.0332
0.0080
0.0336
0.0084
0.0324
0.0092
0.0288
0.0128
0.0276
0.0100
0.0396
0.0052
0.0364
0.0132
0.0336
0.0064
0.0356
0.0120
0.0325
0.0048
0.0364
0.0112
0.0312
0.0096
0.0332
0.0116
0.0352
non–recursive
α = 0.05 α = 0.10
0.0760
0.0196
0.0820
0.0216
0.0772
0.0228
0.0772
0.0232
0.0764
0.0308
0.0532
0.0232
0.0712
0.0216
0.0692
0.0168
0.0656
0.0256
0.0636
0.0148
0.0568
0.0216
0.0592
0.0112
0.0636
0.0220
0.0552
0.0200
0.0628
0.0228
Table 1: Empirical sizes of detectors with N(0, 1) innovations (DGP–1).
coming from independent, identically distributed data to be better than those applied
to dependent data, it is, on the other hand, quite remarkable that even for the fixed m
under consideration the empirical sizes for the GARCH(1,1) errors are relatively close
to the asymptotic level. This is even more surprising, since it is well known that the
convergence rates for extreme value asymptotics are very slow.
Furthermore, it is also immediate from Tables 1 and 2 that the tests based on the
recursive residuals have a lower false alarm rate than their respective non–recursive counterparts.
(c) Empirical Power. To investigate the behavior of the stopping rules under the alternative, we consider different choices of β ∗ . In particluar, we have used the specifications
11
m
25
100
300
N
2m
4m
6m
8m
10m
2m
4m
6m
8m
10m
2m
4m
6m
8m
10m
recursive
α = 0.05 α = 0.10
0.0404
0.0644
0.0320
0.0760
0.0400
0.0620
0.0400
0.0604
0.0404
0.0728
0.0460
0.0688
0.0400
0.0704
0.0420
0.0776
0.0412
0.0680
0.0424
0.0688
0.0520
0.0772
0.0464
0.0776
0.0560
0.0820
0.0528
0.0996
0.0512
0.0776
non–recursive
α = 0.05 α = 0.10
0.0920
0.0580
0.1220
0.0512
0.1040
0.0576
0.0968
0.0600
0.1044
0.0576
0.0968
0.0624
0.0928
0.0532
0.1012
0.0616
0.1000
0.0616
0.0996
0.0568
0.0932
0.0668
0.0992
0.0608
0.1012
0.0692
0.1224
0.0688
0.1008
0.0668
Table 2: Empirical sizes of detectors with GARCH(1,1) innovations (DGP–2).
β∗,i = 1.1, 1.2, 1.3, 1.5 for i = 1, 2, 3 and—as worst case scenario—β ∗ = (1, 0.5, 1.5)T . In
the latter case, the second and third component change into different directions with the
same absolute size. The comparisons in Tables 3 and 4 on pages on pages 29 and 30 show
that, under HA , the non–recursive detectors are superior to the recursive ones in terms
of an earlier detection; a fact that is also visible in the corresponding density estimations
in Figure 1 on page 31, where β ∗ = (1.3, 1.3, 1.3)T . They are especially more sensitive if
it comes to detections of smaller changes, while the recursive tests turn out to be very
conservative.
That both tests should only be applied with some care is underlined by the bad performance in the worst case scenario. Here both stopping rules have an extremely low power,
but, yet again, the non–recursive ones perform better.
(d) Conclusions. Summarizing the observed phenomena, a rule of thumb could be derived as follows. If an early detection of a change is extremely important, one should
choose a non–recursive detector, due to the higher empirical power. If, on the other hand,
12
a false alarm is costly and time consuming, then, due to their better empirical sizes, recursive detectors should be preferred. One might also think of a parallel monitoring using
both tests, with a deeper statistical analysis commencing after one detector but not the
other rejects the stability hypothesis. This, however, is not subject of the present study.
There is an expected decay in the quality of the monitoring schemes when the errors
exhibit dependence, here, conditional heteroskedasticity governed by a GARCH(1,1) sequence. This difference, however, does not violate the asymptotical level predicted by the
limiting extreme value theorems, so that the tests can be easily used even in the finite
sample case.
4
Proofs of Theorem 2.1 and Corollary 2.1
The first proof section is divided into two parts. First, we will transform the detector
into a more suitable form, which can be approximated by a Wiener process. A functional
of this Wiener process is studied to derive the limiting extreme value distribution in the
second part.
Throughout the proofs, we will make use of the following basic properties of the inverse
matrix C−1
n , where
n
1X
xi xTi .
Cn =
n i=1
Lemma 4.1 If condition (D) is satisfied for some κ > 0, then it also holds with probability
one, as n → ∞,
−1
−κ
.
|C−1
n −C | = O n
Proof. It is clear that the assertion follows directly from assumption (D).
2
All proofs will be two–folded due to the fact that the threshold function h is defined
only via the limiting behavior for t → 0 and t → ∞. Moreover, we will assume that a = 1
in (H).
Finally note that by assumption (G) without loss of generality N > cm with some
constant c > 0.
(a) Transformation of the detector. Using the definition of the recursive residuals
ε̃i , we obtain
Q̃(m, k) =
m+k
X
i=m+1
εi −
m+k
X
i−1
1 T −1 X
xi Ci−1
xj εj .
i=m+1 i − 1
j=1
13
Our first goal is to simplify the second term on the right–hand side of the latter equation.
It turns out that, under the assumptions made, it can be replaced by the much simpler
expression
i−1
1 X
εj ,
i=m+1 i − 1 j=1
m+k
X
which does neither contain any xi nor any inverse matrix C−1
i−1 . Towards this end, write
m+k
X
m+k
i−1
i−1
X
1 T −1 X
1 X
xj εj −
εj
xi Ci−1
i=m+1 i − 1
i=m+1 i − 1 j=1
j=1
=
(4.1)
i−1
m+k
i−1
iX
X
X
1
1 T h −1
xi Ci−1 − C−1
[xi − c1 ]T C−1
xj εj +
xj εj .
j=1
i=m+1 i − 1
j=1
i=m+1 i − 1
m+k
X
Therein, we have applied that cT1 C−1 = (1, 0, . . . , 0) and the special form of the xj given
in assumption (C), which implies cT1 C−1 xj εj = εj . It has to be shown that the difference
in (4.1) is asymptotically small.
Lemma 4.2 If the assumptions of Theorem 2.1 are satisfied, then, as m → ∞,
m+k
i−1
iX
X
1 T h −1
1
−1
xj εj = oP (1).
xi Ci−1 − C
sup
1≤k≤N g(m, k) i=m+1 i − 1
j=1
Proof. (i) By independence of {εj } and {xj } [see assumption (B)], Exj εj = 0 and
Var(xj,ℓ εj ) ≤ C|xj |2 . So Kolmogorov’s maximal inequality [see, e.g., Durrett (2005, p.
61)] implies
P
Hence
i−1
X
max xj εj 1≤i≤(c+1)m j=1


X
i−1
max xj εj 1≤i≤(c+1)m j=1

√ 
≥ m ≤ C|xj |2 .
= OP

√
m
(m → ∞).
14
(4.2)
Since the boundary function g(m, k) is for ”small” k determined by the relation (H), it
follows after an application of Lemma 4.1 and assumption (E) that
m+k
i−1
iX
X
1
1 T h −1
−1
max
xi Ci−1 − C
xj εj 1≤k≤cm g(m, k) j=1
i=m+1 i − 1
= OP
q
log m
+OP
m+k
X
(i − 1)−κ−1 [xi
max 1≤k≤cm i=m+1
1
log m max √
1≤k≤cm
k
q
= oP (1),
− c1 ]T m+k
X
−κ−1 T (i − 1)
c1 i=m+1
finishing the first part of the proof.
(ii) Exercise 7.4.10 in Chow and Teicher (1988, p. 249) yields that, for any δ, δ ∗ > 0,
there is a random variable m∗ such that
X
i−1
x
ε
j j
j=1
≤ δ∗
q
(i − 1)[log(i − 1)]δ
(4.3)
for all m ≥ m∗ . Hence, we can find a random variable ξ such that
X
i−1
−κ−1 xj εj |xi |i
j=1
i=m+1
m+k
X
≤ ξ
m+k
X
i=m+1
|xi |i−κ−1/2 (log i)1/2+δ
≤ ξ[log(m + k)]1/2+δ
m+k
X
i=m+1
|xi |i−κ−1/2 .
(4.4)
By Abel’s summation formula [cf. Milne–Thomson (1933, p. 276)],
m+k
X
i=m+1
=
|xi |i−κ−1/2
m+k
X
i=m+1
≤
m+k
X
i=m+1
h
i−κ−1/2 − (i + 1)−κ−1/2
i−κ−3/2
i
X
j=m+1
i
i X
j=m+1
|xj | + (m + k + 1)−κ−1/2
|xj | + (m + k + 1)−κ−1/2
15
m+k
X
j=m+1
|xj |
m+k
X
j=m+1
|xj |



ℓ
m+k
X
1X



|xj |
i−κ−1/2 + (m + k + 1)−κ+1/2  .
≤
sup
ℓ≥1 ℓ j=1
i=m+1
(4.5)
Combining the previous statements, for ”large” k we obtain that, as m → ∞,
m+k
i−1
iX
X
1
1 T h −1
−1
xj εj max
xi Ci−1 − C
cm≤k≤N g(m, k) j=1
i=m+1 i − 1
= OP (1) max
cm≤k≤N
s
m + k [log(m + k)]1/2+δ
k
(m + k)κ
= oP (1).
Therein, we have used (4.4), (4.5) and Lemma 4.1 to obtain the first equality sign. The
second one follows after simple arithmetic operations. Hence, the proof is complete. 2
Lemma 4.3 If the assumptions of Theorem 2.1 are satisfied, then, as m → ∞,
i−1
X
X
m+k
1
1
T
−1
xj εj = oP (1).
[x
−
c
]
C
max
i
1
1≤k≤N g(m, k) j=1
i=m+1 i − 1
Proof. (i) Observe that, by assumption (G), log N = O(log m). Hence, applying (4.2)
yields
i−1
X
X
m+k
1
1
max
xj εj [xi − c1 ]T C−1
1≤k≤cm g(m, k) i=m+1 i − 1
j=1
1
= OP (1) max q
1≤k≤cm
k(m + k)
m+k
X
[xi
i=m+1
− c1 ] .
Using (E), the right–hand side can be further estimated by


s
√
(m + k)1/2−ρ + k log m
log
m
 = o(1)
q
max
= O(1) m−ρ +
1≤k≤cm
m
k(m + k)
as m → ∞.
16
(ii) Similarly, by (4.3) and assumption (E) it holds that, as m → ∞,
i−1
X
X
m+k
1
1
T
−1
xj εj [x
−
c
]
C
max
i
1
cm≤k≤N g(m, k) i=m+1 i − 1
j=1
[log(m + k)]1/2+δ
√
= OP (1) max
cm≤k≤N
km
m+k
X
[xi
i=m+1
s
− c1 ]
m + k (m + k)−ρ
√
= OP (1) max [log(m + k)]1/2+δ 
+
cm≤k≤N
k
m
= oP (1).
s

log m 
m
Thus, the proof is complete.
2
So far, we have shown that Q̃(m, k) can be asymptotically replaced by
Q∗ (m, k) =
m+k
X
i=m+1
εi −
m+k
X
i−1
1 X
εj .
i=m+1 i − 1 j=1
Next, Q∗ (m, k) will be approximated by a Wiener process.
Lemma 4.4 If the assumptions of Theorem 2.1 are satisfied, then, for all m ≥ 1, there
is a Wiener process {Wm (t) : t ≥ 0} such that
max
am ≤k≤N
1
1/ν−1/2
|Q∗ (m, k) − σWm (k)| = OP am
g(m, k)
(m → ∞),
where am = O(m) and ν > 2.
Proof. (i) Rewriting
m+k
X


m+k
i−1
m
X
X
X
1
1
εi −
Q∗ (m, k) =
,
εj  −
εj
i − 1 j=m+1
i=m+1
i=m+1 i − 1
j=1
(4.6)
it can be seen that the first term on the right–hand side contains only innovations εi with
time index i ≥ m + 1, while the second term only contains errors εi for which i ≤ m.
Hence, the invariance principles (2.1) and (2.2) can be utilized to approximate all partial
sums in (4.6) in the following.
17
(ii) First note that from (i) we obtain, as m → ∞,
X
m+k
1
εi − σW1,m (k)
max
1≤k≤cm g(m, k) i=m+1
= OP (1) max
1≤k≤cm
s
m
1/ν−1/2
k 1/ν−1/2 = OP am
m+k
on account of 1/ν − 1/2 < 0. Similarly,
max
cm≤k≤N
1
g(m, k)
m+k
X
εi
i=m+1
as m → ∞, since am = O(m).
− σW1,m (k)
1/ν−1/2
= OP am
(iii) From (i),
m+k
i−1
X 1
X
1
εj − σ
W1,m (i − m)
i=m+1 i
i=m+1 i − 1 j=m+1
m+k
X

m+k
X



m+k
i−1
i
X 1
X
X
1 1

=
εj − εi  +
εj − σW1,m (i − m)
i
i
−
1
i
i=m+1
i=m+1
j=m+1
j=m+1
= OP (1)
m+k
X
(i − m)1/ν
i
i=m+1
= OP (1)k
1/ν
!
m+k
.
log
m
Hence,
max
am ≤k≤cm
1
g(m, k)
m+k
X
i=m+1
= OP (1) max
am ≤k≤cm
1/ν−1/2
= OP am
s
m+k
X
i−1
X
1
εj − σ
i − 1 j=m+1
i=m+1
1
W1,m (i − m)
i
m+k
m
k 1/ν−1/2 log
m+k
m
18
!
and
m+k
i−1
X 1
X
X
m+k
1
1
max
εj − σ
W1,m (i − m)
cm≤k≤N g(m, k) i=m+1 i
i=m+1 i − 1 j=m+1
= OP (1) max k
1/ν−1/2
cm≤k≤N
m+k
log
m
!
1/ν−1/2
= OP am
,
since am = O(m). All results have to be viewed as m → ∞.
(iv) Observe that, for |t| < 1,
t2 t3
+ ∓...
2
3
Hence, as in part (iii) of the proof,
log(1 + t) = t −
m+k
m
X 1
X
m+k
1
1 X
ε
−
σ
max
W2,m (m)
j
am ≤k≤cm g(m, k) i=m+1 i
i=m+1 i − 1 j=1
= OP (1) max
am ≤k≤cm
s
1/ν−1/2
= OP (1) am
and
m m1/ν
m+k
√ log
m+k k
m
!
m+k
m
X 1
X
m+k
1 X
1
ε
−
σ
W2,m (m)
max
j
cm≤k≤N g(m, k) i=m+1 i
i=m+1 i − 1 j=1
m+k
m1/ν
= OP (1) max √ log
cm≤k≤N
m
k
1/ν−1/2
= OP (1) am
as m → ∞.
!
(v) For k ≥ 1 set
Wm (k) = W1,m (k) −
m+k
X
1
[W1,m (i − m) + W2,m (m)] .
i=m+1 i
19
We have just shown in parts (ii)–(iv) of the proof that
1/ν−1/2
.
max |Q∗ (m, k) − Wm (k)| = OP am
am ≤k≤N
It is easy to see that the finite dimensional distributions of {Wm (k) : k ≥ 1} are normal
and that EWm (k) = 0 for all k ≥ 1. Lengthy elementary calculations also show that
EW (k)W (l) = min{k, l}. Hence, the proof is complete.
2
From the previous calculations it is immediate that we have to derive the limiting
extreme value distribution of the process {g(m, k)−1Wm (k) : k ≥ 1}. This will be approached in the next subsection.
(b) Derivation of extreme value distribution. Let {W (t) : t ≥ 0} be a Wiener
process. Set
Γ(tk ) =
|W (tk )|
,
h(tk )
tk =
k
.
m
From part (a) of the proof we note that, by the scale transformation for Wiener processes,
max
1≤k≤N
|Wm (k)| D
= max Γ(tk ).
1≤k≤N
g(m, k)
The main idea in the upcoming proofs is to show that Γ(tk ) attains
its maximum for very
q
small values of k. But in this range h(tk ) is determined by tk (1 + tk ) [cf. assumption
√
(H)] and can hence be estimated by tk , which in turn makes possible the application of
a result of Darling and Erdős (1956).
We start with a lemma identifying the dominating k and thereby also the order of the
maximum.
Lemma 4.5 If the assumptions of Theorem 2.1 are satisfied, then, as m → ∞,
(i) √
1
P
max Γ(tk ) −→ 0,
2 log log m 1≤k≤am
(ii) √
1
P
max Γ(tk ) −→ 1,
2 log log m am ≤k≤bm
(iii) √
1
P
max Γ(tk ) −→ 0,
b
≤k≤N
2 log log m m
20
where we choose am = (log m)δ , with some δ > 0, in accordance with Lemma 4.4, and
bm = cm/ log m.
Proof. (i) Let ε > 0 and set sk = 1 + tk . Then, for the k under consideration,
1 ≤ k ≤ 1 + am /m → 1 as m → ∞. Thus, we can find a random variable m0 such that
√
|W (tk )|
|W (tk )|
|W (tk )|
1
√
√
max
≤ max √
≤ max
1≤k≤am
1≤k≤am
tk
tk sk
tk
1 + ε 1≤k≤am
for all m ≥ m0 , where we have used relation (H) to estimate h(tk ). But, by the scale
transformation for Wiener processes,
max
1≤k≤am
|W (k)|
|W (tk )| D
√
√
= max
1≤k≤am
tk
k
and hence the law of the iterated logarithm for partial sums yields
√
|W (tk )| P
1
√
−→ 1
max
tk
2 log log am 1≤k≤am
(m → ∞).
On recognizing that log log am ∼ log log log m as m → ∞, assertion (i) follows from the
sandwich theorem on letting ε ց 0. [We say that xn ∼ yn if xn yn−1 → 1 as n → ∞.]
(ii) Let ε > 0 and set sk = 1 + tk . Still, 1 ≤ sk ≤ 1 + bm /m → 1 as m → ∞. Hence, we
can similarly define a random variable m1 such that
√
1
|W (tk )|
|W (tk )|
|W (tk )|
√
√
max
≤ max √
≤ max
1≤k≤bm
1≤k≤bm
tk
tk sk
tk
1 + ε 1≤k≤bm
Using the scale transformation for Wiener processes and the law of the iterated logarithm
as in part (i),
√
1
|W (tk )| P
−→ 1
max √
tk
2 log log m 1≤k≤bm
(m → ∞),
since log log bm ∼ log log m. Letting ε ց 0 in combination with (i) finishes the proof.
(iii) Recall that without loss of generality N > cm, where c > 0. Assumptions (H) and
(J) imply that the maximum of Γ(tk )/h(tk ) can be estimated by
|W (tk )|
|W (t)|
|W (tk )|
√ .
√
√
≤ sup
≤ sup
bm ≤k≤cm
tk sk
tk
t
bm ≤k≤cm
bm /2m≤t≤c
max
21
An application of Lemmas 1.1 and 1.2 in Csörgő and Horváth (1993, pp. 255–256) gives
√
|W (t)| P
1
√
sup
−→ 1
2 log log log m bm /2m≤t≤c
t
and therefore
1
P
√
max Γ(tk ) −→ 0
2 log log m bm ≤k≤cm
(m → ∞)
(m → ∞).
It remains to examine the range cm ≤ k ≤ N. Using the limit relation (I) imposed on
the threshold h, we obtain
max
cm≤k≤N
|W (tk )|
|W (t)|
√
≤ sup √ ,
tk
t
c≤t<∞
so that the assertion follows from Lemma 3.6 in Horváth et al. (2005).
2
Note that all oP (1) rates obtained in the previous lemmas can be given in the form m−δ
with some δ > 0, so they remain still valid after a multiplication with the normalizing
factor A(log m), which is needed in what follows.
Lemma 4.6 If the assumptions of Theorem 2.1 are satisfied, then
lim P A(log m) max Γ(tk ) − D(log m) ≤ t = exp −e−t
m→∞
1≤k≤N
for all real t.
Proof. Lemma 4.5 implies that the extreme value asymptotic is solely
determined by the
q
range am ≤ k ≤ bm . But, by assumption (H), h(tk ) is close to tk (1 + tk ) for those k,
√
while the latter expression itself is close to tk . Hence, after another application of the
sandwich theorem, it is enough to derive the extreme value asymptotic of
|W (k)|
|W (tk )| D
√
= max √ ,
1≤k≤bm
1≤k≤bm
tk
k
where we have also used that the k ≤ am do not contribute to the maximum according
to Lemma 4.5(i). Now, the result of Darling and Erdős (1956) yields
max
(
)
|W (tk )|
−t
lim
P
A(log
b
≤
t
+
D(log
b
m ) max
m ) = exp −e
m→∞
1≤k≤bm h(tk )
for all real t. Since furthermore
A(log bm ) − A(log m) → 0
and
as m → ∞, Lemma 4.6 is readily proved.
D(log bm ) − D(log m) → 0
2
22
Lemma 4.7 If the assumptions of Theorem 2.1 are satisfied, then
(
)
|Q̃(m, k)|
lim P A(log m) max
− D(log m) ≤ t = exp −e−t
m→∞
1≤k≤N σg(m, k)
for all real t.
Proof. By Lemma 4.4, we obtain that
A(log m) max
am ≤k≤N
= OP
q
|Q̃(m, k)|
|W (tk )| − max
σg(m, k) am ≤k≤N h(tk ) log log m(log m)δ(1/ν−1/2) = oP (1)
by the definition of am and on account of ν > 2.
2
Proof of Theorem 2.1. The assertion follows on combining Lemmas 4.1–4.7.
2
It remains to replace the variance parameter σ by its estimator
2
2
σ̂m
.
2
Proof of Corollary 2.1. The consistent estimator σ̂m
satisfies the relation
2
σ̂m
2
− σ = oP
1
log log m
!
(m → ∞)
[see Csörgő and Horváth (1997, p. 228)]. Therefore, the result is immediate from Theorem
2.1.
2
5
Proofs of Propositions 2.1 and 2.2
At first, we prove that the partial sums of the augmented GARCH(1,1) innovations can be
approximated with a Wiener process, thereby satisfying a specified rate of convergence.
Proof of Proposition 2.1. The assertion follows from Theorem1 in Eberlein (1986),
since GARCH(1,1)–type sequences are martingale difference arrays.
2
Next, we turn our attention to Proposition 2.2. Its proof is based on three lemmas
which are stated and verified now. The first lemma establishes a maximal inequality for
|σj − σ̃j |, where the blockwise independent sequence {σ̃j } is defined via (2.8). Note that
under assumption (A), following Lemma 5.1 in Aue et al. (2006a), there are constants C1 ,
C2 and C3 such that
n
o
P Λ(σj2 ) − Λ(σ̃j2 ) ≥ exp(−C1 j ζ ) ≤ C2 exp(−C3 j ζ ),
where ζ ∈ (0, 1).
23
(5.1)
Lemma 5.1 If the assumptions of Theorem 2.1 are satisfied and {εj } is a polynomial–
type GARCH sequence, then there are constants C4 , C5 and C6 such that
n
o
P |σj − σ̃j | ≥ exp(−C4 j ζ ) ≤ C5 exp(−C6 j ζ ).
Proof. Applying the mean–value theorem, we obtain
q
Λ−1 (Λ(σj2 ))
σj − σ̃j =
−
q
Λ−1 (Λ(σ̃j2 ))
=
s
1
Λ′ (Λ−1 (ζj ))
Λ(σj2 ) − Λ(σ̃j2 ) ,
where ζj is between Λ(σj2 ) and Λ(σ̃j2 ). Hence, (2.7) implies that
|σj − σ̃j | ≤ C |Λ(σj2 )|µ/2 + |Λ(σ̃j2)|µ/2 Λ(σj2 ) − Λ(σ̃j2 ) .
If µ ≤ 0, then the assertion follows from (5.1) and the fact that Λ(σ0 )2 ≥ ∆ > 0 by
assumption. If µ > 0, then Theorem 2.3 and (5.1) yield
P
|Λ(σj2 )|µ/2
n
+
|Λ(σ̃j2 )|µ/2
Λ(σj2 ) − Λ(σ̃j2 )
≤ P Λ(σj2 ) − Λ(σ̃j2 ) ≥ exp(−C1 j ζ )
+P
≤ 2C2 exp(−C3 j ) + P
+P
|Λ(σj2 )|µ/2 + |Λ(σ̃j2 )|µ/2 ≥ exp
ζ
(
o
|Λ(σj2 )|
C1
≥ exp − j ζ
2
C1 ζ
j
2
|Λ(σj2 )|ν
C1 ζ
1
exp
j
≥
2
2
2ν/µ )
ν
C1 ζ
1
exp
j
≥
2
2
2ν/µ )
(
ζ
+ exp(−C1 j )
≤ C5 exp(−C6 j ζ )
and the proof is complete.
2
Next, we also provide an inequality for exponential–type GARCH sequences {εj }.
Lemma 5.2 If the assumptions of Theorem 2.1 are satisfied and {εj } is an exponential–
type GARCH sequence, then there are constants C7 and C8 such that
n
o
P |σj − σ̃j | ≥ exp(−C7 j ζ ) ≤ C8 j −ζν .
24
Proof. Recall that Λ(x) = log x. An application of the mean–value theorem gives
σj − σ̃j = exp(log σj ) − exp(log σ̃j ) = exp(ζj )(log σj − log σ̃j )
where ζj is between log σj and log σ̃j . Therefore,
1
|σj − σ̃j | ≤ [exp(log σj ) + exp(log σ̃j )]|Λ(σj2 ) − Λ(σ̃j2 )|.
2
We will suppress the factor 21 on the right–hand side of the inequality in the following.
Observe that on the set −1 < Λ(σj2 )−Λ(σ̃j2 ) < 1, it holds that exp(Λ(σ̃j2 )) ≤ 3 exp(Λ(σj2 )),
so that Theoreom 2.3 and (5.1) give
P |σj − σ̃j | ≥ exp −
≤ P
≤ P
n
C1 ζ
j
2
3 exp(Λ(σj2 ))|Λ(σj2 )
|Λ(σj2 )
−
Λ(σ̃j2 )|
−
Λ(σ̃j2 )|
C1
≥ exp − j ζ
2
≥ exp −C1 j
ζ
o
+P
+ C2 exp(−C3 j ζ )
3 exp(Λ(σj2 ))
C1 ζ
≥ exp
j
2
+C2 exp(−C3 j ζ )
ζ
≤ 2C2 exp(−C3 j ) + P
ζ
≤ 2C2 exp(−C3 j ) + P
Λ(σj2 )
(
C1 ζ
j − log 3
≥
2
|Λ(σj2 )|ν
C1 ζ
j − log 3
≥
2
ν )
≤ C8 j −ζν .
This proves the assertion.
2
Recall that ε̃j = σ̃j ξj . Lemmas 5.1 and 5.2 will help to establish that the partial sums
obtained from the sequence {εj } and those coming from the sequence {ε̃j } are “close”.
Lemma 5.3 If the assumptions of Theorem 2.1 are satisfied, then
n
X
max εj
1≤n<∞ j=1
ε̃j −
j=1 n
X
= O(1)
a.s.
25
Proof. Lemmas 5.1 and 5.2 imply that there is a constant C9 such that
|σj − σ̃j | = O exp(−C9 j ζ )
a.s.
Moreover, the moment conditions imposed in assumption (A) give that, as j → ∞,
|ξj | = o j 1/ν
a.s.
Thus, we conclude
n
X
max εj
1≤n<∞ j=1
ε̃j −
j=1 n
X
≤
∞
X
j=1
|εj − ε̃j | =
= O(1)
∞
X
j=1
∞
X
j=1
|σj − σ̃j ||ξj |
j 1/ν exp(−C9 j ζ ) = O(1)
finishing the proof.
a.s.,
2
Proof of Proposition 2.2. In view of Proposition 2.1, it remains to establish the
independence of the approximating Wiener processes. By Lemma 5.3 it suffices to so for
the sequence {ε̃j }. First note that, by definition,
ζ
m−m
X
ε̃j

 m+k
X
and

j=1
j=m+1
ε̃j : k ≥ 1



are independent. On the other hand, the strong invariance principle verified in Proposition
2.1 and the upper bounds for the increments of Wiener processes [cf. Csörgő and Révész
(1981)] imply
m
X
j=m−mζ +1
ε̃j = O m1/ν
a.s.
as m → ∞ and the assertion follows readily.
2
References
[1] Aue, A., and Horváth, L. (2004). Delay time in sequential detection of change. Statistics and Probability Letters 67, 221–231.
26
[2] Aue, A., Berkes, I., and Horváth, L. (2006a). Strong approximation for the sums of
squares of augmented GARCH sequences. Bernoulli, to appear.
[3] Aue, A., Horváth, L., Hušková, M., and Kokoszka, P. (2006b). Change–point monitoring in linear models. Preprint, University of Utah.
[4] Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307–327.
[5] Brown, R.L., Durbin, J., and Evans, J.M. (1975). Techniques for testing the constancy of regression relationships over time (with discussion). Journal of the Royal
Statistical Society B 37, 149–192.
[6] Carrasco, M., and Chen, X. (2002). Mixing and moment properties of various
GARCH and stochastic volatility models. Econometric Theory 18, 17–39.
[7] Chow, Y.S., and Teicher, H. (1988). Probability Theory (2nd ed.). Springer, New
York.
[8] Chu, C.–S.J., Stinchcombe, M., and White, H. (1996). Monitoring structural change.
Econometrica 64, 1045–1065.
[9] Csörgő, M., and Horváth, L. (1993). Weighted Approximations in Probability and
Statistics. Wiley, New York.
[10] Csörgő, M., and Horváth, L. (1997). Limit Theorems in Change–Point Analysis.
Wiley, New York.
[11] Csörgő, M., and Révész, P. (1981). Strong Approximations in Probability and Statistics. Academic Press, New York.
[12] Darling, D.A., and Erdős, P. (1956). A limit theorem for the maximum of normalized
sums of independent random variables. Duke Mathematical Journal 23, 143–155.
[13] Durrett, R. (2005). Probability: Theory and Examples (3rd ed.). Brooks/Cole–
Thomson Learning, Belmont, CA.
[14] Eberlein, E. (1986). On strong invariance principles under dependence assumptions.
Annals of Probability 14, 260–270.
27
[15] Horváth, L., Hušková, M., Kokoszka, P., and Steinebach, J. (2004). Monitoring
changes in linear models. Journal of Statistical Planning and Inference 126, 225–
251.
[16] Horváth., L., Kokoszka, P., and Steinebach, J. (2005). On sequential detection of
parameter changes in linear regression. Preprint, University of Utah.
[17] Kühn, M. (2006). Dissertation, University of Cologne, in preparation.
[18] Lerche, H. (1984). Boundary Crossing Probabilities for Brownian Motion. Springer–
Verlag, New York.
[19] Milne–Thomson, L.M. (1933). The Calculus of Finite Differences. MacMillan Press,
London (Reprinted by the American Mathematical Society, Providence, RI, 2000).
[20] Nelson, D.B. (1991). Conditional heteroskedasticity in asset returns: a new approach.
Econometrica 59, 347–370.
28
β∗T
(1.1,1.1,1.1)
(1.2,1.2,1.2)
(1.3,1.3,1.3)
(1.5,1.5,1.5)
(1.0,1.5,0.5)
β∗T
(1.1,1.1,1.1)
(1.2,1.2,1.2)
(1.3,1.3,1.3)
(1.5,1.5,1.5)
(1.0,1.5,0.5)
β∗T
(1.1,1.1,1.1)
(1.2,1.2,1.2)
(1.3,1.3,1.3)
(1.5,1.5,1.5)
(1.0,1.5,0.5)
m = 300, k ∗ − m = 1, N + m = 3300, α = 0.05, noise: N(0, 1)
recursive
non-recursive
min Q.25 Q.5 Q.75 max det
min Q.25 Q.5 Q.75 max
1 100 3000 3000 3000 0.4576
1
68 144 287 3000
1
17
30
51 3000 0.9996
1
16
28
45 201
1
9
13
20
72 1
1
8
13
20
70
1
3
5
8
25 1
1
3
5
8
25
1 3000 3000 3000 3000 0.06
1 3000 3000 3000 3000
∗
m = 300, k − m = 300, N + m = 3300, α = 0.05, noise: N(0, 1)
recursive
non–recursive
min Q.25 Q.5 Q.75 max det
min Q.25 Q.5 Q.75 max
1 3000 3000 3000 3000 0.0408
1 711 1011 1623 3000
1 496 562 642 3000 0.9972
1 439 496 562 1159
1 404 429 457 588 1
1 382 412 445 608
1 353 365 377 432 1
1 346 361 377 452
4 3000 3000 3000 3000 0.0076
4 3000 3000 3000 3000
∗
m = 300, k − m = 1200, N + m = 3300, α = 0.05, noise: N(0, 1)
recursive
non–recursive
min Q.25 Q.5 Q.75 max det
min Q.25 Q.5 Q.75 max
1 3000 3000 3000 3000 0.0088
1 2527 3000 3000 3000
1 1957 2105 2290 3000 0.9864
1 1644 1813 2018 3000
1 1568 1624 1678 1898 1
1 1464 1551 1654 2067
1 1384 1409 1432 1570 1
1 1344 1393 1440 1678
1 3000 3000 3000 3000 0.0112
1 3000 3000 3000 3000
Table 3: Empirical power of detectors with N(0, 1) innovations (DGP–1).
29
det
0.9748
1
1
1
0.0964
det
0.8848
1
1
1
0.0200
det
0.3876
0.9968
1
1
0.0228
m = 300, k ∗ − m = 1, N + m = 3300, α = 0.05, noise: GARCH(1,1)
recursive
non–recursive
β∗T
min Q.25 Q.5 Q.75 max det
min Q.25 Q.5 Q.75 max
(1.1,1.1,1.1)
1 133 3000 3000 3000 0.4084
1
82 158 300 3000
(1.2,1.2,1.2)
1
19
32
49 3000 0.9988
1
18
29
44 524
(1.3,1.3,1.3)
1
9
14
19
91 1
1
8
13
18
84
(1.5,1.5,1.5)
1
3
5
7
42 1
1
3
5
7
42
(1.0,1.5,0.5)
1 3000 3000 3000 3000 0.0784
1 3000 3000 3000 3000
∗
m = 300, k − m = 300, N + m = 3300, α = 0.05, noise: GARCH(1,1)
recursive
non–recursive
β∗T
min Q.25 Q.5 Q.75 max det
min Q.25 Q.5 Q.75 max
(1.1,1.1,1.1)
1 3000 3000 3000 3000 0.0864
1 710 991 1517 3000
(1.2,1.2,1.2)
1 493 558 635 3000 0.9960
1 434 490 551 1165
(1.3,1.3,1.3)
1 404 428 454 610 1
1 382 412 442 622
(1.5,1.5,1.5)
1 352 364 376 437 1
1 344 360 375 451
(1.0,1.5,0.5)
1 3000 3000 3000 3000 0.0520
1 3000 3000 3000 3000
∗
m = 300, k − m = 1200, N + m = 3300, α = 0.05, noise: GARCH(1,1)
recursive
non–recursive
β∗T
min Q.25 Q.5 Q.75 max det
min Q.25 Q.5 Q.75 max
(1.1,1.1,1.1)
1 3000 3000 3000 3000 0.0528
1 2445 3000 3000 3000
(1.2,1.2,1.2)
1 1947 2090 2269 3000 0.9832
1 1630 1799 1977 3000
(1.3,1.3,1.3)
1 1565 1618 1672 1982 1
1 1456 1551 1641 2054
(1.5,1.5,1.5)
1 1385 1409 1431 1569 1
1 1345 1390 1433 1653
(1.0,1.5,0.5)
1 3000 3000 3000 3000 0.0480
1 3000 3000 3000 3000
Table 4: Empirical power of detectors with GARCH(1,1) innovations (DGP–2).
30
det
0.9772
1
1
1
0.1032
det
0.9164
1
1
1
0.0712
det
0.4132
0.9976
1
1
0.0676
0.06
noise: GARCH(1,1)
0.06
noise: N(0,1)
0.05
0.04
0.03
Density
0.01
0.00
0
20
40
60
80
100
0
20
40
60
80
noise: N(0,1)
noise: GARCH(1,1)
0.015
Time of detection after the training period
m = 300, cp − m = 1, N = 3000, alpha = 0.05
0.015
Time of detection after the training period
m = 300, cp − m = 1, N = 3000, alpha = 0.05
100
0.005
Density
0.010
rec
non−rec
0.000
0.000
0.005
Density
0.010
rec
non−rec
350
400
450
500
550
600
300
350
400
450
500
550
Time of detection after the training period
m = 300, cp − m = 300, N = 3000, alpha = 0.05
noise: N(0,1)
noise: GARCH(1,1)
0.010
Time of detection after the training period
m = 300, cp − m = 300, N = 3000, alpha = 0.05
0.010
300
600
rec
non−rec
0.006
Density
0.004
0.000
0.000
0.002
0.004
0.006
0.008
0.008
rec
non−rec
0.002
Density
rec
non−rec
0.02
0.03
0.00
0.01
0.02
Density
0.04
0.05
rec
non−rec
1200
1400
1600
1800
2000
1200
Time of detection after the training period
m = 300, cp − m = 1200, N = 3000, alpha = 0.05
1400
1600
1800
2000
Time of detection after the training period
m = 300, cp − m = 1200, N = 3000, alpha = 0.05
Figure 1: Density estimation of the first exceedance of the 5% critical level.
31
Download