Extreme Value Distribution of a Recursive–type Detector in a Linear Model Alexander Aue 1 Mario Kühn 2 Abstract: We study a CUSUM–type monitoring scheme designed to sequentially detect changes in the regression parameter of an underlying linear model. The test statistic used is based on recursive residuals. Main aim of this paper is to derive the limiting extreme value distribution under the null hypothesis of structural stability. The model assumptions are flexible enough to include very general classes of error sequences such as augmented GARCH(1,1) processes. The result is underlined by an illustrative simulation study. AMS 2000 Subject Classification: Primary 62J05; Secondary 62L10 Keywords and Phrases: linear models; recursive residuals; augmented GARCH processes; Darling–Erdős limit theorems; sequential testing. 1 Department of Mathematics, University of Utah, 155 South 1440 East, Salt Lake City, UT 84112– 0090, USA, email: aue@math.utah.edu 2 Mathematisches Institut, Universität zu Köln, Weyertal 86–90, 50938 Köln, Germany, email: mkuehn@math.uni-koeln.de Corresponding author: A. Aue, phone: +1–801–581–5231, fax: +1–801–581–4148. Research partially supported by NATO grant PST.EAP.CLG 980599 and NSF–OTKA grant INT– 0223262 1 Introduction Testing for structural stability of a time series is of major interest in statistics and also in related areas such as engineering and business. Following Chu et al. (1996), we will focus on data that are assumed to be generated by a linear model. Based on a historical data set (of size m) we wish to perform an on–line inspection, that is a sequential monitoring, to check whether or not the assumption of structural stability is still justified. If so, the monitoring is continued, since observations are regarded as freely [at least cheaply] available. [Possible applications in economics and finance, and in geophysics can be found in Chu et al. (1996) and Horváth et al. (2004), cf. also the references therein.] In this setting, test statistics are defined in terms of suitably defined detectors Γ(m, k) and thresholds g(m, k), where the null hypothesis of structural stability is rejected as soon as a lag k is reached such that the absolute value of Γ(m, k) crosses the value of g(m, k), that is at the stopping time τm = inf {k ≥ 1 : |Γ(m, k)| ≥ g(m, k)} . Note that the index k labels the time that has elapsed after the monitoring has commenced. Horváth et al. (2004) studied CUSUM–type detectors based on [recursive] residuals. Their approach, yielding √ asymptotic results (as m → ∞), included threshold functions of the form g(m, k) = mh(k/m), with h satisfying regularity conditions that assure a finite and non–degenerate limit process. Special emphasis is put on the family hγ (t) = tγ (1 + t)1−γ , 0 ≤ γ < 21 , (1.1) which allows for a flexible sensitivity adjustment of the test procedure by chooosing different values of γ. Now, simulation studies [cf. Horváth et al. (2004) and Aue et al. (2006b)] and theoretical results for a change in the mean scenario [cf. Aue and Horváth (2004)] imply that h1/2 —which, however, is excluded due to the law of the iterated logarithm— would be a desirable boundary function in terms of an early detection of changes in the regression parameter of the underlying linear model. A non–trivial extension leads to a Darling–Erdős limit theorem for non–recursive residuals [cf. Horváth et al. (2005)]. Instead, we will deal with the recursive residuals here. They have played an important role in statistics ever since their introduction by Brown et al. (1975), since they offer, for example, the possibility to include the additional information obtained during the monitoring process, while the non–recursive residuals only rely on the historical data and, hence, no updating takes place even though the assumed model is still considered to be adequate. 2 Moreover, we will allow for more flexible choices of the threshold h by defining only the limit relations for very small and very large arguments. To be able to perform an asymptotic analysis we additionally need to introduce a maximal number of observations which is allowed to grow like a power of the historical data size m, but stronger than linearly. Also, instead of independent and identically distributed errors, we consider a much wider class of innovations which include, for instance, augmented GARCH(1,1) processes as possible noise sequences. Actually, the proofs show that basically any sequence satisfying a [strong] invariance principle that allows for a blocking argument along the lines of Aue et al. (2006a) could be used. Our main result is an extreme value asymptotic for the recursive–type CUSUM detector under the assumption of structural stability. The finite sample properties are reported in a simulation study. The paper is organized as follows. In Section 2 we introduce the model and the main results. Simulation results are stated and discussed in Section 3, while proofs are given in Sections 4 and 5. 2 Model assumptions and main results (a) Assumptions and results. In what follows we study the linear model yi = xTi β i + εi , 1 ≤ i < ∞, where {xi } are p×1 dimensional random vectors, {β i } are p×1 dimensional deterministic vectors and {εi } is some noise sequence. More specifically, we impose the following set of conditions. Let | · | denote maximum norm of both vectors and matrices. (A) The noise sequence {εi } satisfies Eεi = 0, 0 < Eε2i ≤ C Eεi εj = 0 (i 6= j), with some C > 0. Moreover, we assume that the following invariance principles hold true. For each m, there are independent Wiener processes {W1,m (t) : t ≥ 0} and {W2,m (t) : t ≥ 0} and a constant σ > 0 such that sup 1≤k<∞ 1 k 1/ν m+k X εi i=m+1 − σW1,m (k) = OP (1) 3 (m → ∞) (2.1) and m X i=1 εi − σW2,m (m) = OP m1/ν (m → ∞) (2.2) with some ν > 2. Confer part (b) of the section, where we show that augmented GARCH(1,1) processes satisfy conditions (2.1) and (2.2). (B) The sequences {εi } and {xi } are independent. (C) For all i ≥ 1, it holds that xTi = (1, x2,i , . . . , xp,i). (D) There is a positive definite matrix C and a constant κ > 0 such that n 1 X xi xTi n i=1 − C = O n−κ (n → ∞). a.s. (E) There are random variables ξ, m0 and a constant ρ > 0 such that m+k X (xi i=m+1 − c1 ) ≤ ξ (m + k) 1/2−ρ + q k log N for all 1 ≤ k ≤ N and m ≥ m0 , where c1 is the first column of C defined in part (D) and N is the maximal number of observations [see also assumption (G) below]. Condition (E) surely is of rather technical nature. However, Horváth et al. (2005) pointed out that, under (G) below, it is satisfied for a large class of random variables, for instance, if the {xi } fulfill a strong invariance principle. The next assumption states that there is no change in the regression parameter in what is called a training period of size m, that is (F) β i = β 0 , 1 ≤ i ≤ m. Condition (F) is particularly important because the test statistic to be defined can use this historical data set as basis for comparisons with later observations. We are interested in testing the null hypothesis of structural stability H0 : βi = β0 , i = m + 1, m + 2, . . . , 4 (2.3) against the alternative hypothesis of a structural break there is a k ∗ ≥ 1 such that β i = β 0 , m < i < m + k ∗ , but β i = β ∗ , i = m + k ∗ , m + k ∗ + 1, . . . with β 0 6= β ∗ , HA : (2.4) where the parameters β 0 , β ∗ and k ∗ , the so–called change–point, are unknown. Define the recursive residuals ε̃i = yi − xTi β̂ i−1 , m + 1 ≤ i < ∞. In contrast to the non–recursive residuals investigated in Horváth et al. (2005), these residuals use the additional information obtained from the observations ym+1 , . . . , yi−1 to calculate the least–squares estimator β̂ i−1 , which, at time n ≥ 1, is given by β̂ n = n X j=1 −1 xj xTj n X xj yj . j=1 For k, m ≥ 1 define the CUSUM of the ε̃j by Q̃(m, k) = ε̃m+1 + . . . + ε̃m+k and the stopping rule ( τm = inf k ≥ 1 : |Q̃(m, k)| ≥ √ k mh m !) [with the understanding that inf ∅ = ∞], where h is a suitable boundary function. If h is positive and continuous on (0, ∞) and if it satisfies the growth conditions √ tγ t log log t lim = 0, lim sup <∞ t→0 h(t) h(t) t→∞ with some 0 ≤ γ < min{κ, 21 }, then Horváth et al. (2004) proved that, under (B)–(F) with iid errors having more than two finite moments, lim P m→∞ ( ) ( ) |Q̃(m, k)| 1 |W (t)| sup √ ≤1 , ≤ 1 = P sup σ̂m k≥1 mh(k/m) t>0 h(t) where {W (t) : t ≥ 0} denotes a standard Wiener process and σ̂m a consistent estimator. A modified proof, taking into account the specifications made in (A), establishes the same 5 result also for the more general errors that are under consideration here. Details may be found in Kühn (2006), but, to conserve space, they are omitted here. It can be seen, using the law of the iterated logarithm for Wiener processes at zero, that γ < 1/2 is a necessary condition to obtain a non–degenerate and finite limit. Here, we are interested in imposing weaker conditions on the threshold h. This goes along with a modification of the stopping rule τm . Instead of a continued monitoring we will stop according to τ̃m = min{τm , N}. Therein, N denotes a maximal number of observations for which we assume N > 0. m Furthermore, the behavior of h for very small and very large arguments is changed to inf (G) N = O mλ with some λ ≥ 1 and lim m→∞ (H) lim q t(1 + t) = a > 0, h(t) √ t < ∞, (I) lim sup t→∞ h(t) t→0 (J) h is positive and continuous on (0, ∞). Specific examples of boundary functions satisfying (H)–(J) are, for instance, h(t) = q t(1 + t) and hb (t) = q (1 + t)[b2 + log(1 + t)]. The latter family of functions has the advantage that, in their case, the limit distribution supt |W (t)|/hb(t) is known, which can be used to control the probability of a false alarm. More detailed expositions on this topic may be found in Lerche (1984). Here, however, the main focus is on the following extreme value asymptotic for the new stopping rule τ̃m . Theorem 2.1 Let the assumptions (A)–(J) be satisfied. Then, under H0 , ) ( |Q̃(m, k)| − D(log m) ≤ t = exp −e−t lim P max A(log m) m→∞ 1≤k≤N σag(m, k) √ for all real t, where σ 2 = Eε21 , g(m, k) = mh(k/m) and, for x > 0, A(x) = q 2 log x, D(x) = 2 log x + 1 1 log log x − log π. 2 2 6 The application of Theorem 2.1 requires the estimation of the unknown variance parameter σ 2 . Since there is no change in the historical data, a natural estimator is m 1 X 2 σ̂m = ε̂2i , ε̂i = yi − xTi β̂ m . m − p i=1 This leads to the following corollary. Corollary 2.1 The statement of Theorem 2.1 remains true if σ is replaced by the estimator σ̂m . The proofs of Theorem 2.1 and its Corollary 2.1 are given in Section 4. (b) Augmented GARCH(1,1) processes. We follow the presentation in Aue et al. (2006a). A process {εj } is called augmented generalized autoregressive conditional heteroskedastic of order (1, 1), shortly augmented GARCH(1,1), if it satisfies the relations εj = σj ξj , (2.5) 2 Λ(σj2 ) = w(ξj−1) + b(ξj−1)Λ(σj−1 ), (2.6) where j runs through all integers, Λ, b and w are real–valued functions and {ξj } is a sequence of independent, identically distributed random variables. To solve (2.6) for σj2 , we need that Λ−1 exists. Example 2.1 (i) On letting Λ(x) = x, w(x) = ω > 0 and b(x) = β+αx2 , where α, β ≥ 0, we obtain the standard GARCH(1,1) model as introduced by Bollerslev (1986). (ii) The exponential GARCH model of Nelson (1991) is defined by Λ(x) = log x, w(x) = ω + α1 x + α2 |x| and b(x) = β log x, where ω > 0, α1 , α2 , β ≥ 0. For plenty of further polynomial [Λ(x) = xδ with some δ > 0] and exponential GARCH(1,1) submodels [Λ(x) = log x] of considerable interest, which are subsumed under the notion of augmented GARCH(1,1) processes, we refer to Aue et al. (2006a), and Carrasco and Chen (2002). Next, we give a sufficient criterion for the existence of a strictly stationary solution to (2.5) and (2.6). The solution is called non–anticipative if, for any j, εj is independent of {ξi : i > j}. Set log+ x = log(max{x, 1}). Theorem 2.2 [Aue et al. (2006a)] Let E log+ |w(ξ0)| < ∞ and E log+ |b(ξ0 )| < ∞. If E log |b(ξ0 )| < 0, then the defining equations (2.5) and (2.6) have a unique strictly stationary and non–anticipative solution given by εj = q Λ−1 (Λ(σj2 ))ξj , Λ(σj2 ) = ∞ X w(ξj−i) i=1 i−1 Y k=1 7 b(ξj−k ). The following result gives conditions for the existence of moments of Λ(σj2 ). Theorem 2.3 [Aue et al. (2006a)] Let ν > 0 and E log |b(ξ0 )| < 0. If E|w(ξ0 )|ν < ∞ and E|b(ξ0 )|ν < 1, then E|Λ(σj2 )|ν < ∞. Since σj and ξj are independent, it is clear that the previous theorem also implies a criterion for the finiteness of E|ε1 |ν by additionally imposing the condition E|ξ0 |ν < ∞. Both theorems have a necessity counterpart which is suppressed here, but can be found in Aue et al. (2006a). To prove Theorem 2.1 it is necessary to obtain more insight into the structure of the augmented GARCH errors. In particular, we need an approximation of their partial sums with a Wiener process. Proposition 2.1 If {εi} satisfies (2.5), (2.6) and the conditions of Theorem 2.2 with E|ε1 |ν < ∞ for some ν > 2, then, as k → ∞, k 1 X k 1/ν i=1 εi − σW (k) = O(1) a.s., where {W (t) : t ≥ 0} denotes a Wiener process. Proposition 2.1 is based on the general strong invariance principle in Eberlein (1986). A refinement states that the partial sums of the errors ε1 , . . . , εm of the training period and those of the errors εm+1 , . . . , εm+k appearing after the monitoring process has commenced can be approximated by independent Wiener processes [confer assumptions (2.1) and (2.2)]. Towards this end, we need, in case of a polynomial–type GARCH model, to impose the following technical assumptions on the function Λ. • Λ(σ02 ) ≥ ∆ with some ∆ > 0, • Λ′ exists, is nonnegative, and there are constants C and µ such that 1 ′ −1 Λ (Λ (x)) ≤ Cxµ for all x ≥ ∆. (2.7) Obviously, these conditions are violated for exponential–type GARCH errors. But luckily they are not required [cf. the proofs of Lemmas 5.1–5.3 below]. Throughout we work either with a polynomial–type sequence {εj } satisfying these conditions or with an exponential GARCH process without further restrictions. 8 Proposition 2.2 If {εi} satisfies (2.5), (2.6) and the conditions of Theorem 2.2 with E|ε1 |ν < ∞ for some ν > 2, then, for each m, there are independent Wiener processes {W1,m (t) : t ≥ 0} and {W2,m (t) : t ≥ 0} such that conditions (2.1) and (2.2) are satisfied. Proposition 2.2 is proved via an exponential inequalitiy for the difference between σj and a suitably constructed variable from a sequence of blockwise independent variables which is “close” to the original sequence {σj }. Towards this end, let ζ ∈ (0, 1) and define the sequence {σ̃k2 } as the solution of the equations ζ Λ(σ̃j2 ) = j X w(ξj−i) i=1 i−1 Y b(ξj−k ). (2.8) k=1 Now, set ε̃j = σ̃j ξj . Then, clearly, ε̃i and ε̃k are independent if i < k − k ζ . We will show that n X max εi 1≤n<∞ i=1 − n X i=1 ε̃i = O(1) a.s., so that it suffices to approximate the partial sums of the ε̃j , which, in turn, enables us to define Wiener processes that are independent. The detailed proofs of Propositions 2.1 and 2.2 are given in Section 5. 3 Simulations In this section, we report the results of a simulation study that includes both independent, identically distributed innovations and GARCH(1,1) errors as data generating processes (DGPs). The finite sample properties obtained here underline the empirical observations as found in earlier contributions by Horváth et al. (2004) and Aue et al. (2006b), which are dealing with open–end CUSUM–type procedures based on the threshold family gγ (m, k) = √ mhγ ! k , m γ ∈ [0, 21 ), where hγ (t) is defined in (1.1). In greater generality, however, we will not merely consider a change in the mean model or a simple linear regression, respectively, but a linear model with three–dimensional (random) xi and (nonrandom) β i . The simulations are performed using the free software R, see http://cran.r-project.org for more information. (a) Choice of data generating processes. We study the following two data generating processes based on the linear model yi = xTi β i + εi that differ in the assumptions made on the innovations εi . In particular, we use 9 • DGP–1: {εi } are independent, standard normally distributed random variables. • DGP–2: {εi } are GARCH(1,1) variables, that is εi = σi ξi , where {ξi } is a sequence of independent, standard normally distributed random variables and with parameter specifications ω = 0.01, α = 0.09, β = 0.9. These choices imply, since α + β < 1, that {εi } is a strictly stationary sequence with Var εi = 1. We use a standard burn–in phase of 100 repetitions initialized with a standard normal random variable to approach the stationary distribution before starting the actual simulations. The programming is done with the fSeries package which is available under http://www.itp.phys.ethz.ch/econophysics/R/2.2. The parameter specifications are similar to what is often observed for financial data, where α tends to be close to 0 while β is close to 1. Also, the moment conditions needed to ensure that (2.1) and (2.2) hold are satisfied. Moreover, it is assumed that xi = (1, x2,i , x3,i )T with {x2,i } and {x3,i } being independent sequences of independent standard normally distributed random variables. Consequently, the β i are three–dimensional vectors; we choose β 0 = (1, 1, 1)T under H0 . (b) Empirical Sizes. We compare the asymptotic extreme value tests based on the non– recursive detector investigated in Horváth et al. (2005) with those based on the recursive detector studied in the present paper. The detector in Horváth et al. (2005) is based on the non–recursive CUSUM ε̂i = yi − xTi β m Q̂(m, k) = ε̂m+1 + . . . + ε̂m+k , (m + 1 ≤ i < ∞) and leads to the stopping rule ( τ̂m = inf k ≥ 1 : |Q̂(m, k)| ≥ √ k mh m q !) , where h(t) = t(1 + t). The same threshold h(t) will also be used in the recursive setting. At first, we consider the performance under the null hypothesis of no change in the regression parameter. We have performed simulations which use different sizes of the training period, namely m = 25, 100, 300, and have reported the number of false alarms up to maximal numbers, chosen as multiples of m, when we terminated the monitoring. The procedure is based on 2,500 replications. An overview can be found in Tables 1 10 and 2. It can be seen that the empirical sizes of the tests performed on DGP–1 stay far below the asymptotic critical levels 0.05 and 0.10, while those performed on DGP–2 have approximately the asymptotic level predicted by our theory. It should be noted that both procedures used the same set of observations. While, on one hand, one can expect tests m 25 100 300 N 2m 4m 6m 8m 10m 2m 4m 6m 8m 10m 2m 4m 6m 8m 10m recursive α = 0.05 α = 0.10 0.0072 0.0360 0.0096 0.0332 0.0080 0.0336 0.0084 0.0324 0.0092 0.0288 0.0128 0.0276 0.0100 0.0396 0.0052 0.0364 0.0132 0.0336 0.0064 0.0356 0.0120 0.0325 0.0048 0.0364 0.0112 0.0312 0.0096 0.0332 0.0116 0.0352 non–recursive α = 0.05 α = 0.10 0.0760 0.0196 0.0820 0.0216 0.0772 0.0228 0.0772 0.0232 0.0764 0.0308 0.0532 0.0232 0.0712 0.0216 0.0692 0.0168 0.0656 0.0256 0.0636 0.0148 0.0568 0.0216 0.0592 0.0112 0.0636 0.0220 0.0552 0.0200 0.0628 0.0228 Table 1: Empirical sizes of detectors with N(0, 1) innovations (DGP–1). coming from independent, identically distributed data to be better than those applied to dependent data, it is, on the other hand, quite remarkable that even for the fixed m under consideration the empirical sizes for the GARCH(1,1) errors are relatively close to the asymptotic level. This is even more surprising, since it is well known that the convergence rates for extreme value asymptotics are very slow. Furthermore, it is also immediate from Tables 1 and 2 that the tests based on the recursive residuals have a lower false alarm rate than their respective non–recursive counterparts. (c) Empirical Power. To investigate the behavior of the stopping rules under the alternative, we consider different choices of β ∗ . In particluar, we have used the specifications 11 m 25 100 300 N 2m 4m 6m 8m 10m 2m 4m 6m 8m 10m 2m 4m 6m 8m 10m recursive α = 0.05 α = 0.10 0.0404 0.0644 0.0320 0.0760 0.0400 0.0620 0.0400 0.0604 0.0404 0.0728 0.0460 0.0688 0.0400 0.0704 0.0420 0.0776 0.0412 0.0680 0.0424 0.0688 0.0520 0.0772 0.0464 0.0776 0.0560 0.0820 0.0528 0.0996 0.0512 0.0776 non–recursive α = 0.05 α = 0.10 0.0920 0.0580 0.1220 0.0512 0.1040 0.0576 0.0968 0.0600 0.1044 0.0576 0.0968 0.0624 0.0928 0.0532 0.1012 0.0616 0.1000 0.0616 0.0996 0.0568 0.0932 0.0668 0.0992 0.0608 0.1012 0.0692 0.1224 0.0688 0.1008 0.0668 Table 2: Empirical sizes of detectors with GARCH(1,1) innovations (DGP–2). β∗,i = 1.1, 1.2, 1.3, 1.5 for i = 1, 2, 3 and—as worst case scenario—β ∗ = (1, 0.5, 1.5)T . In the latter case, the second and third component change into different directions with the same absolute size. The comparisons in Tables 3 and 4 on pages on pages 29 and 30 show that, under HA , the non–recursive detectors are superior to the recursive ones in terms of an earlier detection; a fact that is also visible in the corresponding density estimations in Figure 1 on page 31, where β ∗ = (1.3, 1.3, 1.3)T . They are especially more sensitive if it comes to detections of smaller changes, while the recursive tests turn out to be very conservative. That both tests should only be applied with some care is underlined by the bad performance in the worst case scenario. Here both stopping rules have an extremely low power, but, yet again, the non–recursive ones perform better. (d) Conclusions. Summarizing the observed phenomena, a rule of thumb could be derived as follows. If an early detection of a change is extremely important, one should choose a non–recursive detector, due to the higher empirical power. If, on the other hand, 12 a false alarm is costly and time consuming, then, due to their better empirical sizes, recursive detectors should be preferred. One might also think of a parallel monitoring using both tests, with a deeper statistical analysis commencing after one detector but not the other rejects the stability hypothesis. This, however, is not subject of the present study. There is an expected decay in the quality of the monitoring schemes when the errors exhibit dependence, here, conditional heteroskedasticity governed by a GARCH(1,1) sequence. This difference, however, does not violate the asymptotical level predicted by the limiting extreme value theorems, so that the tests can be easily used even in the finite sample case. 4 Proofs of Theorem 2.1 and Corollary 2.1 The first proof section is divided into two parts. First, we will transform the detector into a more suitable form, which can be approximated by a Wiener process. A functional of this Wiener process is studied to derive the limiting extreme value distribution in the second part. Throughout the proofs, we will make use of the following basic properties of the inverse matrix C−1 n , where n 1X xi xTi . Cn = n i=1 Lemma 4.1 If condition (D) is satisfied for some κ > 0, then it also holds with probability one, as n → ∞, −1 −κ . |C−1 n −C | = O n Proof. It is clear that the assertion follows directly from assumption (D). 2 All proofs will be two–folded due to the fact that the threshold function h is defined only via the limiting behavior for t → 0 and t → ∞. Moreover, we will assume that a = 1 in (H). Finally note that by assumption (G) without loss of generality N > cm with some constant c > 0. (a) Transformation of the detector. Using the definition of the recursive residuals ε̃i , we obtain Q̃(m, k) = m+k X i=m+1 εi − m+k X i−1 1 T −1 X xi Ci−1 xj εj . i=m+1 i − 1 j=1 13 Our first goal is to simplify the second term on the right–hand side of the latter equation. It turns out that, under the assumptions made, it can be replaced by the much simpler expression i−1 1 X εj , i=m+1 i − 1 j=1 m+k X which does neither contain any xi nor any inverse matrix C−1 i−1 . Towards this end, write m+k X m+k i−1 i−1 X 1 T −1 X 1 X xj εj − εj xi Ci−1 i=m+1 i − 1 i=m+1 i − 1 j=1 j=1 = (4.1) i−1 m+k i−1 iX X X 1 1 T h −1 xi Ci−1 − C−1 [xi − c1 ]T C−1 xj εj + xj εj . j=1 i=m+1 i − 1 j=1 i=m+1 i − 1 m+k X Therein, we have applied that cT1 C−1 = (1, 0, . . . , 0) and the special form of the xj given in assumption (C), which implies cT1 C−1 xj εj = εj . It has to be shown that the difference in (4.1) is asymptotically small. Lemma 4.2 If the assumptions of Theorem 2.1 are satisfied, then, as m → ∞, m+k i−1 iX X 1 T h −1 1 −1 xj εj = oP (1). xi Ci−1 − C sup 1≤k≤N g(m, k) i=m+1 i − 1 j=1 Proof. (i) By independence of {εj } and {xj } [see assumption (B)], Exj εj = 0 and Var(xj,ℓ εj ) ≤ C|xj |2 . So Kolmogorov’s maximal inequality [see, e.g., Durrett (2005, p. 61)] implies P Hence i−1 X max xj εj 1≤i≤(c+1)m j=1 X i−1 max xj εj 1≤i≤(c+1)m j=1 √ ≥ m ≤ C|xj |2 . = OP √ m (m → ∞). 14 (4.2) Since the boundary function g(m, k) is for ”small” k determined by the relation (H), it follows after an application of Lemma 4.1 and assumption (E) that m+k i−1 iX X 1 1 T h −1 −1 max xi Ci−1 − C xj εj 1≤k≤cm g(m, k) j=1 i=m+1 i − 1 = OP q log m +OP m+k X (i − 1)−κ−1 [xi max 1≤k≤cm i=m+1 1 log m max √ 1≤k≤cm k q = oP (1), − c1 ]T m+k X −κ−1 T (i − 1) c1 i=m+1 finishing the first part of the proof. (ii) Exercise 7.4.10 in Chow and Teicher (1988, p. 249) yields that, for any δ, δ ∗ > 0, there is a random variable m∗ such that X i−1 x ε j j j=1 ≤ δ∗ q (i − 1)[log(i − 1)]δ (4.3) for all m ≥ m∗ . Hence, we can find a random variable ξ such that X i−1 −κ−1 xj εj |xi |i j=1 i=m+1 m+k X ≤ ξ m+k X i=m+1 |xi |i−κ−1/2 (log i)1/2+δ ≤ ξ[log(m + k)]1/2+δ m+k X i=m+1 |xi |i−κ−1/2 . (4.4) By Abel’s summation formula [cf. Milne–Thomson (1933, p. 276)], m+k X i=m+1 = |xi |i−κ−1/2 m+k X i=m+1 ≤ m+k X i=m+1 h i−κ−1/2 − (i + 1)−κ−1/2 i−κ−3/2 i X j=m+1 i i X j=m+1 |xj | + (m + k + 1)−κ−1/2 |xj | + (m + k + 1)−κ−1/2 15 m+k X j=m+1 |xj | m+k X j=m+1 |xj | ℓ m+k X 1X |xj | i−κ−1/2 + (m + k + 1)−κ+1/2 . ≤ sup ℓ≥1 ℓ j=1 i=m+1 (4.5) Combining the previous statements, for ”large” k we obtain that, as m → ∞, m+k i−1 iX X 1 1 T h −1 −1 xj εj max xi Ci−1 − C cm≤k≤N g(m, k) j=1 i=m+1 i − 1 = OP (1) max cm≤k≤N s m + k [log(m + k)]1/2+δ k (m + k)κ = oP (1). Therein, we have used (4.4), (4.5) and Lemma 4.1 to obtain the first equality sign. The second one follows after simple arithmetic operations. Hence, the proof is complete. 2 Lemma 4.3 If the assumptions of Theorem 2.1 are satisfied, then, as m → ∞, i−1 X X m+k 1 1 T −1 xj εj = oP (1). [x − c ] C max i 1 1≤k≤N g(m, k) j=1 i=m+1 i − 1 Proof. (i) Observe that, by assumption (G), log N = O(log m). Hence, applying (4.2) yields i−1 X X m+k 1 1 max xj εj [xi − c1 ]T C−1 1≤k≤cm g(m, k) i=m+1 i − 1 j=1 1 = OP (1) max q 1≤k≤cm k(m + k) m+k X [xi i=m+1 − c1 ] . Using (E), the right–hand side can be further estimated by s √ (m + k)1/2−ρ + k log m log m = o(1) q max = O(1) m−ρ + 1≤k≤cm m k(m + k) as m → ∞. 16 (ii) Similarly, by (4.3) and assumption (E) it holds that, as m → ∞, i−1 X X m+k 1 1 T −1 xj εj [x − c ] C max i 1 cm≤k≤N g(m, k) i=m+1 i − 1 j=1 [log(m + k)]1/2+δ √ = OP (1) max cm≤k≤N km m+k X [xi i=m+1 s − c1 ] m + k (m + k)−ρ √ = OP (1) max [log(m + k)]1/2+δ + cm≤k≤N k m = oP (1). s log m m Thus, the proof is complete. 2 So far, we have shown that Q̃(m, k) can be asymptotically replaced by Q∗ (m, k) = m+k X i=m+1 εi − m+k X i−1 1 X εj . i=m+1 i − 1 j=1 Next, Q∗ (m, k) will be approximated by a Wiener process. Lemma 4.4 If the assumptions of Theorem 2.1 are satisfied, then, for all m ≥ 1, there is a Wiener process {Wm (t) : t ≥ 0} such that max am ≤k≤N 1 1/ν−1/2 |Q∗ (m, k) − σWm (k)| = OP am g(m, k) (m → ∞), where am = O(m) and ν > 2. Proof. (i) Rewriting m+k X m+k i−1 m X X X 1 1 εi − Q∗ (m, k) = , εj − εj i − 1 j=m+1 i=m+1 i=m+1 i − 1 j=1 (4.6) it can be seen that the first term on the right–hand side contains only innovations εi with time index i ≥ m + 1, while the second term only contains errors εi for which i ≤ m. Hence, the invariance principles (2.1) and (2.2) can be utilized to approximate all partial sums in (4.6) in the following. 17 (ii) First note that from (i) we obtain, as m → ∞, X m+k 1 εi − σW1,m (k) max 1≤k≤cm g(m, k) i=m+1 = OP (1) max 1≤k≤cm s m 1/ν−1/2 k 1/ν−1/2 = OP am m+k on account of 1/ν − 1/2 < 0. Similarly, max cm≤k≤N 1 g(m, k) m+k X εi i=m+1 as m → ∞, since am = O(m). − σW1,m (k) 1/ν−1/2 = OP am (iii) From (i), m+k i−1 X 1 X 1 εj − σ W1,m (i − m) i=m+1 i i=m+1 i − 1 j=m+1 m+k X m+k X m+k i−1 i X 1 X X 1 1 = εj − εi + εj − σW1,m (i − m) i i − 1 i i=m+1 i=m+1 j=m+1 j=m+1 = OP (1) m+k X (i − m)1/ν i i=m+1 = OP (1)k 1/ν ! m+k . log m Hence, max am ≤k≤cm 1 g(m, k) m+k X i=m+1 = OP (1) max am ≤k≤cm 1/ν−1/2 = OP am s m+k X i−1 X 1 εj − σ i − 1 j=m+1 i=m+1 1 W1,m (i − m) i m+k m k 1/ν−1/2 log m+k m 18 ! and m+k i−1 X 1 X X m+k 1 1 max εj − σ W1,m (i − m) cm≤k≤N g(m, k) i=m+1 i i=m+1 i − 1 j=m+1 = OP (1) max k 1/ν−1/2 cm≤k≤N m+k log m ! 1/ν−1/2 = OP am , since am = O(m). All results have to be viewed as m → ∞. (iv) Observe that, for |t| < 1, t2 t3 + ∓... 2 3 Hence, as in part (iii) of the proof, log(1 + t) = t − m+k m X 1 X m+k 1 1 X ε − σ max W2,m (m) j am ≤k≤cm g(m, k) i=m+1 i i=m+1 i − 1 j=1 = OP (1) max am ≤k≤cm s 1/ν−1/2 = OP (1) am and m m1/ν m+k √ log m+k k m ! m+k m X 1 X m+k 1 X 1 ε − σ W2,m (m) max j cm≤k≤N g(m, k) i=m+1 i i=m+1 i − 1 j=1 m+k m1/ν = OP (1) max √ log cm≤k≤N m k 1/ν−1/2 = OP (1) am as m → ∞. ! (v) For k ≥ 1 set Wm (k) = W1,m (k) − m+k X 1 [W1,m (i − m) + W2,m (m)] . i=m+1 i 19 We have just shown in parts (ii)–(iv) of the proof that 1/ν−1/2 . max |Q∗ (m, k) − Wm (k)| = OP am am ≤k≤N It is easy to see that the finite dimensional distributions of {Wm (k) : k ≥ 1} are normal and that EWm (k) = 0 for all k ≥ 1. Lengthy elementary calculations also show that EW (k)W (l) = min{k, l}. Hence, the proof is complete. 2 From the previous calculations it is immediate that we have to derive the limiting extreme value distribution of the process {g(m, k)−1Wm (k) : k ≥ 1}. This will be approached in the next subsection. (b) Derivation of extreme value distribution. Let {W (t) : t ≥ 0} be a Wiener process. Set Γ(tk ) = |W (tk )| , h(tk ) tk = k . m From part (a) of the proof we note that, by the scale transformation for Wiener processes, max 1≤k≤N |Wm (k)| D = max Γ(tk ). 1≤k≤N g(m, k) The main idea in the upcoming proofs is to show that Γ(tk ) attains its maximum for very q small values of k. But in this range h(tk ) is determined by tk (1 + tk ) [cf. assumption √ (H)] and can hence be estimated by tk , which in turn makes possible the application of a result of Darling and Erdős (1956). We start with a lemma identifying the dominating k and thereby also the order of the maximum. Lemma 4.5 If the assumptions of Theorem 2.1 are satisfied, then, as m → ∞, (i) √ 1 P max Γ(tk ) −→ 0, 2 log log m 1≤k≤am (ii) √ 1 P max Γ(tk ) −→ 1, 2 log log m am ≤k≤bm (iii) √ 1 P max Γ(tk ) −→ 0, b ≤k≤N 2 log log m m 20 where we choose am = (log m)δ , with some δ > 0, in accordance with Lemma 4.4, and bm = cm/ log m. Proof. (i) Let ε > 0 and set sk = 1 + tk . Then, for the k under consideration, 1 ≤ k ≤ 1 + am /m → 1 as m → ∞. Thus, we can find a random variable m0 such that √ |W (tk )| |W (tk )| |W (tk )| 1 √ √ max ≤ max √ ≤ max 1≤k≤am 1≤k≤am tk tk sk tk 1 + ε 1≤k≤am for all m ≥ m0 , where we have used relation (H) to estimate h(tk ). But, by the scale transformation for Wiener processes, max 1≤k≤am |W (k)| |W (tk )| D √ √ = max 1≤k≤am tk k and hence the law of the iterated logarithm for partial sums yields √ |W (tk )| P 1 √ −→ 1 max tk 2 log log am 1≤k≤am (m → ∞). On recognizing that log log am ∼ log log log m as m → ∞, assertion (i) follows from the sandwich theorem on letting ε ց 0. [We say that xn ∼ yn if xn yn−1 → 1 as n → ∞.] (ii) Let ε > 0 and set sk = 1 + tk . Still, 1 ≤ sk ≤ 1 + bm /m → 1 as m → ∞. Hence, we can similarly define a random variable m1 such that √ 1 |W (tk )| |W (tk )| |W (tk )| √ √ max ≤ max √ ≤ max 1≤k≤bm 1≤k≤bm tk tk sk tk 1 + ε 1≤k≤bm Using the scale transformation for Wiener processes and the law of the iterated logarithm as in part (i), √ 1 |W (tk )| P −→ 1 max √ tk 2 log log m 1≤k≤bm (m → ∞), since log log bm ∼ log log m. Letting ε ց 0 in combination with (i) finishes the proof. (iii) Recall that without loss of generality N > cm, where c > 0. Assumptions (H) and (J) imply that the maximum of Γ(tk )/h(tk ) can be estimated by |W (tk )| |W (t)| |W (tk )| √ . √ √ ≤ sup ≤ sup bm ≤k≤cm tk sk tk t bm ≤k≤cm bm /2m≤t≤c max 21 An application of Lemmas 1.1 and 1.2 in Csörgő and Horváth (1993, pp. 255–256) gives √ |W (t)| P 1 √ sup −→ 1 2 log log log m bm /2m≤t≤c t and therefore 1 P √ max Γ(tk ) −→ 0 2 log log m bm ≤k≤cm (m → ∞) (m → ∞). It remains to examine the range cm ≤ k ≤ N. Using the limit relation (I) imposed on the threshold h, we obtain max cm≤k≤N |W (tk )| |W (t)| √ ≤ sup √ , tk t c≤t<∞ so that the assertion follows from Lemma 3.6 in Horváth et al. (2005). 2 Note that all oP (1) rates obtained in the previous lemmas can be given in the form m−δ with some δ > 0, so they remain still valid after a multiplication with the normalizing factor A(log m), which is needed in what follows. Lemma 4.6 If the assumptions of Theorem 2.1 are satisfied, then lim P A(log m) max Γ(tk ) − D(log m) ≤ t = exp −e−t m→∞ 1≤k≤N for all real t. Proof. Lemma 4.5 implies that the extreme value asymptotic is solely determined by the q range am ≤ k ≤ bm . But, by assumption (H), h(tk ) is close to tk (1 + tk ) for those k, √ while the latter expression itself is close to tk . Hence, after another application of the sandwich theorem, it is enough to derive the extreme value asymptotic of |W (k)| |W (tk )| D √ = max √ , 1≤k≤bm 1≤k≤bm tk k where we have also used that the k ≤ am do not contribute to the maximum according to Lemma 4.5(i). Now, the result of Darling and Erdős (1956) yields max ( ) |W (tk )| −t lim P A(log b ≤ t + D(log b m ) max m ) = exp −e m→∞ 1≤k≤bm h(tk ) for all real t. Since furthermore A(log bm ) − A(log m) → 0 and as m → ∞, Lemma 4.6 is readily proved. D(log bm ) − D(log m) → 0 2 22 Lemma 4.7 If the assumptions of Theorem 2.1 are satisfied, then ( ) |Q̃(m, k)| lim P A(log m) max − D(log m) ≤ t = exp −e−t m→∞ 1≤k≤N σg(m, k) for all real t. Proof. By Lemma 4.4, we obtain that A(log m) max am ≤k≤N = OP q |Q̃(m, k)| |W (tk )| − max σg(m, k) am ≤k≤N h(tk ) log log m(log m)δ(1/ν−1/2) = oP (1) by the definition of am and on account of ν > 2. 2 Proof of Theorem 2.1. The assertion follows on combining Lemmas 4.1–4.7. 2 It remains to replace the variance parameter σ by its estimator 2 2 σ̂m . 2 Proof of Corollary 2.1. The consistent estimator σ̂m satisfies the relation 2 σ̂m 2 − σ = oP 1 log log m ! (m → ∞) [see Csörgő and Horváth (1997, p. 228)]. Therefore, the result is immediate from Theorem 2.1. 2 5 Proofs of Propositions 2.1 and 2.2 At first, we prove that the partial sums of the augmented GARCH(1,1) innovations can be approximated with a Wiener process, thereby satisfying a specified rate of convergence. Proof of Proposition 2.1. The assertion follows from Theorem1 in Eberlein (1986), since GARCH(1,1)–type sequences are martingale difference arrays. 2 Next, we turn our attention to Proposition 2.2. Its proof is based on three lemmas which are stated and verified now. The first lemma establishes a maximal inequality for |σj − σ̃j |, where the blockwise independent sequence {σ̃j } is defined via (2.8). Note that under assumption (A), following Lemma 5.1 in Aue et al. (2006a), there are constants C1 , C2 and C3 such that n o P Λ(σj2 ) − Λ(σ̃j2 ) ≥ exp(−C1 j ζ ) ≤ C2 exp(−C3 j ζ ), where ζ ∈ (0, 1). 23 (5.1) Lemma 5.1 If the assumptions of Theorem 2.1 are satisfied and {εj } is a polynomial– type GARCH sequence, then there are constants C4 , C5 and C6 such that n o P |σj − σ̃j | ≥ exp(−C4 j ζ ) ≤ C5 exp(−C6 j ζ ). Proof. Applying the mean–value theorem, we obtain q Λ−1 (Λ(σj2 )) σj − σ̃j = − q Λ−1 (Λ(σ̃j2 )) = s 1 Λ′ (Λ−1 (ζj )) Λ(σj2 ) − Λ(σ̃j2 ) , where ζj is between Λ(σj2 ) and Λ(σ̃j2 ). Hence, (2.7) implies that |σj − σ̃j | ≤ C |Λ(σj2 )|µ/2 + |Λ(σ̃j2)|µ/2 Λ(σj2 ) − Λ(σ̃j2 ) . If µ ≤ 0, then the assertion follows from (5.1) and the fact that Λ(σ0 )2 ≥ ∆ > 0 by assumption. If µ > 0, then Theorem 2.3 and (5.1) yield P |Λ(σj2 )|µ/2 n + |Λ(σ̃j2 )|µ/2 Λ(σj2 ) − Λ(σ̃j2 ) ≤ P Λ(σj2 ) − Λ(σ̃j2 ) ≥ exp(−C1 j ζ ) +P ≤ 2C2 exp(−C3 j ) + P +P |Λ(σj2 )|µ/2 + |Λ(σ̃j2 )|µ/2 ≥ exp ζ ( o |Λ(σj2 )| C1 ≥ exp − j ζ 2 C1 ζ j 2 |Λ(σj2 )|ν C1 ζ 1 exp j ≥ 2 2 2ν/µ ) ν C1 ζ 1 exp j ≥ 2 2 2ν/µ ) ( ζ + exp(−C1 j ) ≤ C5 exp(−C6 j ζ ) and the proof is complete. 2 Next, we also provide an inequality for exponential–type GARCH sequences {εj }. Lemma 5.2 If the assumptions of Theorem 2.1 are satisfied and {εj } is an exponential– type GARCH sequence, then there are constants C7 and C8 such that n o P |σj − σ̃j | ≥ exp(−C7 j ζ ) ≤ C8 j −ζν . 24 Proof. Recall that Λ(x) = log x. An application of the mean–value theorem gives σj − σ̃j = exp(log σj ) − exp(log σ̃j ) = exp(ζj )(log σj − log σ̃j ) where ζj is between log σj and log σ̃j . Therefore, 1 |σj − σ̃j | ≤ [exp(log σj ) + exp(log σ̃j )]|Λ(σj2 ) − Λ(σ̃j2 )|. 2 We will suppress the factor 21 on the right–hand side of the inequality in the following. Observe that on the set −1 < Λ(σj2 )−Λ(σ̃j2 ) < 1, it holds that exp(Λ(σ̃j2 )) ≤ 3 exp(Λ(σj2 )), so that Theoreom 2.3 and (5.1) give P |σj − σ̃j | ≥ exp − ≤ P ≤ P n C1 ζ j 2 3 exp(Λ(σj2 ))|Λ(σj2 ) |Λ(σj2 ) − Λ(σ̃j2 )| − Λ(σ̃j2 )| C1 ≥ exp − j ζ 2 ≥ exp −C1 j ζ o +P + C2 exp(−C3 j ζ ) 3 exp(Λ(σj2 )) C1 ζ ≥ exp j 2 +C2 exp(−C3 j ζ ) ζ ≤ 2C2 exp(−C3 j ) + P ζ ≤ 2C2 exp(−C3 j ) + P Λ(σj2 ) ( C1 ζ j − log 3 ≥ 2 |Λ(σj2 )|ν C1 ζ j − log 3 ≥ 2 ν ) ≤ C8 j −ζν . This proves the assertion. 2 Recall that ε̃j = σ̃j ξj . Lemmas 5.1 and 5.2 will help to establish that the partial sums obtained from the sequence {εj } and those coming from the sequence {ε̃j } are “close”. Lemma 5.3 If the assumptions of Theorem 2.1 are satisfied, then n X max εj 1≤n<∞ j=1 ε̃j − j=1 n X = O(1) a.s. 25 Proof. Lemmas 5.1 and 5.2 imply that there is a constant C9 such that |σj − σ̃j | = O exp(−C9 j ζ ) a.s. Moreover, the moment conditions imposed in assumption (A) give that, as j → ∞, |ξj | = o j 1/ν a.s. Thus, we conclude n X max εj 1≤n<∞ j=1 ε̃j − j=1 n X ≤ ∞ X j=1 |εj − ε̃j | = = O(1) ∞ X j=1 ∞ X j=1 |σj − σ̃j ||ξj | j 1/ν exp(−C9 j ζ ) = O(1) finishing the proof. a.s., 2 Proof of Proposition 2.2. In view of Proposition 2.1, it remains to establish the independence of the approximating Wiener processes. By Lemma 5.3 it suffices to so for the sequence {ε̃j }. First note that, by definition, ζ m−m X ε̃j m+k X and j=1 j=m+1 ε̃j : k ≥ 1 are independent. On the other hand, the strong invariance principle verified in Proposition 2.1 and the upper bounds for the increments of Wiener processes [cf. Csörgő and Révész (1981)] imply m X j=m−mζ +1 ε̃j = O m1/ν a.s. as m → ∞ and the assertion follows readily. 2 References [1] Aue, A., and Horváth, L. (2004). Delay time in sequential detection of change. Statistics and Probability Letters 67, 221–231. 26 [2] Aue, A., Berkes, I., and Horváth, L. (2006a). Strong approximation for the sums of squares of augmented GARCH sequences. Bernoulli, to appear. [3] Aue, A., Horváth, L., Hušková, M., and Kokoszka, P. (2006b). Change–point monitoring in linear models. Preprint, University of Utah. [4] Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307–327. [5] Brown, R.L., Durbin, J., and Evans, J.M. (1975). Techniques for testing the constancy of regression relationships over time (with discussion). Journal of the Royal Statistical Society B 37, 149–192. [6] Carrasco, M., and Chen, X. (2002). Mixing and moment properties of various GARCH and stochastic volatility models. Econometric Theory 18, 17–39. [7] Chow, Y.S., and Teicher, H. (1988). Probability Theory (2nd ed.). Springer, New York. [8] Chu, C.–S.J., Stinchcombe, M., and White, H. (1996). Monitoring structural change. Econometrica 64, 1045–1065. [9] Csörgő, M., and Horváth, L. (1993). Weighted Approximations in Probability and Statistics. Wiley, New York. [10] Csörgő, M., and Horváth, L. (1997). Limit Theorems in Change–Point Analysis. Wiley, New York. [11] Csörgő, M., and Révész, P. (1981). Strong Approximations in Probability and Statistics. Academic Press, New York. [12] Darling, D.A., and Erdős, P. (1956). A limit theorem for the maximum of normalized sums of independent random variables. Duke Mathematical Journal 23, 143–155. [13] Durrett, R. (2005). Probability: Theory and Examples (3rd ed.). Brooks/Cole– Thomson Learning, Belmont, CA. [14] Eberlein, E. (1986). On strong invariance principles under dependence assumptions. Annals of Probability 14, 260–270. 27 [15] Horváth, L., Hušková, M., Kokoszka, P., and Steinebach, J. (2004). Monitoring changes in linear models. Journal of Statistical Planning and Inference 126, 225– 251. [16] Horváth., L., Kokoszka, P., and Steinebach, J. (2005). On sequential detection of parameter changes in linear regression. Preprint, University of Utah. [17] Kühn, M. (2006). Dissertation, University of Cologne, in preparation. [18] Lerche, H. (1984). Boundary Crossing Probabilities for Brownian Motion. Springer– Verlag, New York. [19] Milne–Thomson, L.M. (1933). The Calculus of Finite Differences. MacMillan Press, London (Reprinted by the American Mathematical Society, Providence, RI, 2000). [20] Nelson, D.B. (1991). Conditional heteroskedasticity in asset returns: a new approach. Econometrica 59, 347–370. 28 β∗T (1.1,1.1,1.1) (1.2,1.2,1.2) (1.3,1.3,1.3) (1.5,1.5,1.5) (1.0,1.5,0.5) β∗T (1.1,1.1,1.1) (1.2,1.2,1.2) (1.3,1.3,1.3) (1.5,1.5,1.5) (1.0,1.5,0.5) β∗T (1.1,1.1,1.1) (1.2,1.2,1.2) (1.3,1.3,1.3) (1.5,1.5,1.5) (1.0,1.5,0.5) m = 300, k ∗ − m = 1, N + m = 3300, α = 0.05, noise: N(0, 1) recursive non-recursive min Q.25 Q.5 Q.75 max det min Q.25 Q.5 Q.75 max 1 100 3000 3000 3000 0.4576 1 68 144 287 3000 1 17 30 51 3000 0.9996 1 16 28 45 201 1 9 13 20 72 1 1 8 13 20 70 1 3 5 8 25 1 1 3 5 8 25 1 3000 3000 3000 3000 0.06 1 3000 3000 3000 3000 ∗ m = 300, k − m = 300, N + m = 3300, α = 0.05, noise: N(0, 1) recursive non–recursive min Q.25 Q.5 Q.75 max det min Q.25 Q.5 Q.75 max 1 3000 3000 3000 3000 0.0408 1 711 1011 1623 3000 1 496 562 642 3000 0.9972 1 439 496 562 1159 1 404 429 457 588 1 1 382 412 445 608 1 353 365 377 432 1 1 346 361 377 452 4 3000 3000 3000 3000 0.0076 4 3000 3000 3000 3000 ∗ m = 300, k − m = 1200, N + m = 3300, α = 0.05, noise: N(0, 1) recursive non–recursive min Q.25 Q.5 Q.75 max det min Q.25 Q.5 Q.75 max 1 3000 3000 3000 3000 0.0088 1 2527 3000 3000 3000 1 1957 2105 2290 3000 0.9864 1 1644 1813 2018 3000 1 1568 1624 1678 1898 1 1 1464 1551 1654 2067 1 1384 1409 1432 1570 1 1 1344 1393 1440 1678 1 3000 3000 3000 3000 0.0112 1 3000 3000 3000 3000 Table 3: Empirical power of detectors with N(0, 1) innovations (DGP–1). 29 det 0.9748 1 1 1 0.0964 det 0.8848 1 1 1 0.0200 det 0.3876 0.9968 1 1 0.0228 m = 300, k ∗ − m = 1, N + m = 3300, α = 0.05, noise: GARCH(1,1) recursive non–recursive β∗T min Q.25 Q.5 Q.75 max det min Q.25 Q.5 Q.75 max (1.1,1.1,1.1) 1 133 3000 3000 3000 0.4084 1 82 158 300 3000 (1.2,1.2,1.2) 1 19 32 49 3000 0.9988 1 18 29 44 524 (1.3,1.3,1.3) 1 9 14 19 91 1 1 8 13 18 84 (1.5,1.5,1.5) 1 3 5 7 42 1 1 3 5 7 42 (1.0,1.5,0.5) 1 3000 3000 3000 3000 0.0784 1 3000 3000 3000 3000 ∗ m = 300, k − m = 300, N + m = 3300, α = 0.05, noise: GARCH(1,1) recursive non–recursive β∗T min Q.25 Q.5 Q.75 max det min Q.25 Q.5 Q.75 max (1.1,1.1,1.1) 1 3000 3000 3000 3000 0.0864 1 710 991 1517 3000 (1.2,1.2,1.2) 1 493 558 635 3000 0.9960 1 434 490 551 1165 (1.3,1.3,1.3) 1 404 428 454 610 1 1 382 412 442 622 (1.5,1.5,1.5) 1 352 364 376 437 1 1 344 360 375 451 (1.0,1.5,0.5) 1 3000 3000 3000 3000 0.0520 1 3000 3000 3000 3000 ∗ m = 300, k − m = 1200, N + m = 3300, α = 0.05, noise: GARCH(1,1) recursive non–recursive β∗T min Q.25 Q.5 Q.75 max det min Q.25 Q.5 Q.75 max (1.1,1.1,1.1) 1 3000 3000 3000 3000 0.0528 1 2445 3000 3000 3000 (1.2,1.2,1.2) 1 1947 2090 2269 3000 0.9832 1 1630 1799 1977 3000 (1.3,1.3,1.3) 1 1565 1618 1672 1982 1 1 1456 1551 1641 2054 (1.5,1.5,1.5) 1 1385 1409 1431 1569 1 1 1345 1390 1433 1653 (1.0,1.5,0.5) 1 3000 3000 3000 3000 0.0480 1 3000 3000 3000 3000 Table 4: Empirical power of detectors with GARCH(1,1) innovations (DGP–2). 30 det 0.9772 1 1 1 0.1032 det 0.9164 1 1 1 0.0712 det 0.4132 0.9976 1 1 0.0676 0.06 noise: GARCH(1,1) 0.06 noise: N(0,1) 0.05 0.04 0.03 Density 0.01 0.00 0 20 40 60 80 100 0 20 40 60 80 noise: N(0,1) noise: GARCH(1,1) 0.015 Time of detection after the training period m = 300, cp − m = 1, N = 3000, alpha = 0.05 0.015 Time of detection after the training period m = 300, cp − m = 1, N = 3000, alpha = 0.05 100 0.005 Density 0.010 rec non−rec 0.000 0.000 0.005 Density 0.010 rec non−rec 350 400 450 500 550 600 300 350 400 450 500 550 Time of detection after the training period m = 300, cp − m = 300, N = 3000, alpha = 0.05 noise: N(0,1) noise: GARCH(1,1) 0.010 Time of detection after the training period m = 300, cp − m = 300, N = 3000, alpha = 0.05 0.010 300 600 rec non−rec 0.006 Density 0.004 0.000 0.000 0.002 0.004 0.006 0.008 0.008 rec non−rec 0.002 Density rec non−rec 0.02 0.03 0.00 0.01 0.02 Density 0.04 0.05 rec non−rec 1200 1400 1600 1800 2000 1200 Time of detection after the training period m = 300, cp − m = 1200, N = 3000, alpha = 0.05 1400 1600 1800 2000 Time of detection after the training period m = 300, cp − m = 1200, N = 3000, alpha = 0.05 Figure 1: Density estimation of the first exceedance of the 5% critical level. 31