Weighted Sums and Residual Empirical Processes for Time-varying Processes 1 1 Gabriel Chandler∗ and 2 Wolfgang Polonik Department of Mathematics, Pomona College, Claremont, CA 91711 2 Department of Statistics, University of California, Davis, CA 95616 E-mail: gabriel.chandler@pomona.edu and polonik@wald.ucdavis.edu December 15, 2012 Abstract In the context of a time-varying AR-process we study both function indexed weighted sums, and sequential residual empirical processes. As for the latter it turns out that somewhat surprisingly, under appropriate assumptions, the non-parametric estimation of the parameter functions has a negligible asymptotic effect on the estimation of the error distribution. Function indexed weighted sum processes are generalized partial sums. An exponential inequality for such weighted sums of time-varying processes provides the basis for a weak convergence result of weighted sum processes. Properties of weighted sum processes are needed to treat the residual empirical processes. As an application of our results we discuss testing for the shape of the variance function. Keywords: Cumulants, empirical process theory, exponential inequality, locally stationary processes, nonstationary processes, investigating unimodality. 1 1 Introduction Consider the following time-varying AR-model of the form Yt − p X θk t k=1 n Yt−k = σ t n t , t = −p + 1, . . . , 0, 1, . . . , n, (1) where θk are the autoregressive parameter functions, p is the order of the model, σ is a function controlling the volatility, and t ∼ (0, 1) i.i.d. Following Dahlhaus (1997), time is rescaled to the interval [0, 1] in order to make a large sample analysis feasible. Observe that this in particular means that Yt = Yt,n satisfying (1) in fact forms a triangular array. The consideration of non-stationary time series models goes back to Priestley (1965) who considered evolutionary spectra, i.e. spectra of time series evolving in time. The time-varying AR-process has always been an important special case, either in more methodological and theoretical considerations of non-stationary processes, or in applications such as signal processing and (financial) econometrics, e.g. Subba Rao (1970), Grenier (1983), Hall et al. (1983), Rjan and Rayner (1996), Girault et al. (1998), Eom (1999), Drees and Stǎricǎ (2002), Orbe et al. (2005), Fryzlewicz et al. (2006), and Chandler and Polonik (2006). Dahlhaus (1997) advanced the formal analysis of time-varying processes by introducing the notion of a locally stationary process. This is a time-varying processes with time being rescaled to [0, 1] that satisfies certain regularity assumptions - see (14 - 16) below. We would like to point out, however, that in this paper local stationarity is only used to calculate the asymptotic covariance function in Theorem 3. All the other results hold under weaker assumptions. b of θ = (θ1 , . . . , θp )0 and corresponding Let Yt−1 = (Yt−1 , . . . , Yt−p )0 . Given an estimator θ b t 0 Yt−1 , we estimate the distribution of the innovations ηt = σ t t residuals ηbt = Yt − θ n n by the distribution function of the residuals. Define the sequential empirical distribution function of the residuals as bαnc 1X b Fn (α, z) = 1{b ηt ≤ z}, z ∈ R, α ∈ [0, 1]. n t=1 Let Fn (α, z) = 1 n Pbαnc t=1 1{ηt ≤ z}, denote the sequential empirical distribution function of the innovations ηt . A somewhat surprising result that will be shown here is that under 2 appropriate conditions we have (see Theorem 2) sup |Fbn (α, z) − Fn (α, z)| = oP (n−1/2 ) (2) α∈[0,1],z∈[−L,L] even though non-parametric estimation of the parameter functions θk is involved. See also (12) where the ηbt are replaced by b t . One can see that using appropriate estimators for the parameter functions, the (average) distribution function of the ηt can be estimated just as well as if the parameters were known. The same applies to the estimation of the distribution function of the errors t (cf. (12) in section 2.1). This effect is similar to what is known in the literature from the parametric case, i.e. for a standard AR(p)-process (see Koul 2002). In our case, this phenomenon is caused by the particular structure underlying the time varying processes considered. Other empirical processes based on residuals studied in the literature that also involve nonparametric estimation do not allow for such a behavior. See, for instance, Akritas and van Keilegom (2001), Cheng (2005), Müller, Schick and Wefelmayer (2007, 2009a, 2009b, 2012) and van Keilegom et al. (2008). The fact that in this work we emphasize processes based on the ηbt rather than on b t is triggered by the applications of our results we have in mind. However, the incorporation of the estimation of σ is more or less straightforward. See section 2.1 for more details. The proof of (2) (and also the one of (12)) involves the behavior of two types of stochastic processes, the large sample behavior of which based on observations from a non-stationary process satisfying (1) is investigated in this paper. One of these processes is the residual sequential empirical process, closely related to the sequential empirical process based on the residuals, and the other is a generalized partial sum processes or weighted sums processes. Both of them are of independent interest. Residual sequential empirical process are defined as follows. First observe that we can write P b 0 ( t ) Yt−1 + z}. This motivates the consideration of Fbn (α, z) = 1 bαnc 1{σ( t ) t ≤ (θ − θ) n t=1 n n the process bαnc 1 X t νn (α, z, g) = √ 1 σ( n ) t ≤ g0 ( nt ) Yt−1 + z − F n t=1 g0 ( nt ) Yt−1 + z /σ t n , (3) where α ∈ [0, 1], z ∈ R, g : [0, 1] → Rp , g ∈ G p = {g = (g1 , . . . , gp )0 , gi ∈ G} with G an 3 b ∈ G p (see below), and F (z) denotes the distribution appropriate function class such that θ− θ function of the errors t . Following Koul (2002), we call the process (3) a residual sequential empirical process even though residuals are not directly involved anymore. Observe that with g = 0 = (0, . . . , 0)0 denoting the p-vector of null-functions, νn (α, z, 0) equals the sequential empirical process based on the innovations ηt . The basic form of νn is standard, and residual empirical processes based on non-parametric models have been considered in the literature as indicated above. Our contribution here is to study these processes based on a non-stationary Yt of the form (1). The second type of key processes in this paper are weighted sums of the form n 1 X t h( ) Yt , h ∈ H, Zn (h) = √ n t=1 n where H is an appropriate function class. Such processes can be considered as generalizations of partial sum processes. Exponential inequalities and weak convergence results for such processes are derived below. Motivation. Properties of both processes, νn (α, g, z) and Zn (h), are used to derive conditions for (2) to hold. Moreover, properties of both processes are also used to derive the large sample behavior of a test statistic for the shape of the variance function in model (1) that is proposed in an accompanying paper, Chandler and Polonik (2012). This test statistic is based on the bn,γ (α) given sequential empirical process based on the squared residuals - see definition of G in (21) below. For more details, see Chapter 4. The outline of the paper is as follows. In sections 2 and 3 we analyze the large sample behavior of the function indexed residual empirical process and weighted sums, respectively, under the time varying model (1), and apply them to show (2). Section 4 applies some bn,γ (α). of the results from the preceding sections to derive an approximation result for G This approximation provides a crucial ingredient to Chandler and Polonik (2012) where the asymptotic distribution free limit of a test statistics for testing modality of the variance function in model (1) is derived. Proofs are deferred to section 5. Remark on measurability. Suprema of function indexed processes will enter the theoretical results below. We assume throughout the paper that such suprema are measurable. 4 2 Residual empirical processes under time varying AR-models Work on empirical processes based on residuals in models involving non-parametric components has already been mentioned above. There of course exists a substantial body of work on residual empirical processes indexed by a finite dimensional parameter. For the time series setting see, for instance, Horváth et al. (2001), Stute (2001), Koul (2002), Koul and Ling (2006), Laı̈b et al. (2008), Müller et al. (2009a), and references therein. Here we are considering function indexed residual empirical processes based on triangular arrays of non-stationary time series of the form (1). In order to formulate one of our main results for the residual empirical process νn (α, z, g) defined in (3) above, we first introduce some notation and formulate the underlying assumptions. Let H denote a class of functions defined on [0, 1], and let d denote a metric on H. For a given δ > 0, let N (δ, H, d) denote the minimal number N of d-balls of radius less than or equal to δ that are needed to cover H, i.e., there exists functions gk , k = 1, . . . , N such S that the balls Ak = {h ∈ H : d(gk , h) ≤ δ} with H ⊂ N k=1 Ak . Then log N (δ, H, d) is called the metric entropy of H with respect to d. If the balls Ak are replaced by brackets Bk = {h ∈ H : g k ≤ h ≤ g k } for pairs of functions g k ≤ g k , k = 1, . . . , N with d(g k , g k ) ≤ δ, S then the minimal number N = NB (δ, H, d) of such brackets with H ⊂ N k=1 Bk is called a bracketing covering number, and log NB (δ, H, d) is called the metric entropy with bracketing of H with respect to d. For a function h : [0, 1] → R we denote khk∞ := sup |h(u)| and u∈[0,1] khk2n n 1 X 2 h := n t=1 t n . We further denote by Nn (δ, H) and Nn,B (δ, H) the metric entropy and the metric entropy with bracketing, respectively, with respect to the metric defined by k · kn . Assumptions. (i) The process Yt = Yt,T has an MA-type representation Yt,T = ∞ X at,T (j)t−j , (4) j=0 where t ∼i.i.d. (0, 1). The distribution F of t has a strictly positive Lipschitz continuous Lebesgue density f. The function σ(u) in (1) is of bounded variation with 0 < m∗ < σ(u) < 5 m∗ < ∞ for all u. (ii) The coefficients at,T (·) of the MA-type representation of Yt,T given in (i) satisfy sup |at,T (j)| ≤ 1≤t≤T K `(j) for some K > 0, and where for some κ > 0 we have `(j) = j (log j)1+κ for j > 1 and `(j) = 1 for j = 0, 1. Assumptions (i) and (ii) have been used in the literature on locally stationary processes before (e.g. Dahlhaus and Polonik, 2006, 2009). It is shown in Dahlhaus and Polonik (2005) by using a proof similar to Künsch (1995) that (i) holds for time-varying AR-processes (1) if the zeros of the corresponding AR-polynomials are bounded away from the unit circle (uniformly in the rescaled time u) and the parameter functions are of bounded variation. Theorem 1 Suppose that assumptions (i) and (ii) hold, and G denotes a function class satisfying for some c > 0 that Z 1 q log Nn,B (u2 , G) du < ∞ ∀ n. (5) c/n Then we have for any L > 0 and δn → 0 as n → ∞ that sup α∈[0,1],z∈[−L,L], g∈G p kgkn ≤δn |νn (α, z, g) − νn (α, z, 0)| = oP (1). (6) This theorem is of typical nature for work on residual empirical processes (e.g. (8.2.32) in Koul 2002), although here we are dealing with time-varying AR-processes, and are considering a non-parametric index class of functions. Also keep in mind that we are considering triangular arrays (recall that Yt = Yt,T ). Notice that the above result is considering the difference of two processes where the ‘right’ centering is used. In contrast to that, we now consider the difference between two sequential empirical distribution functions; one of them based on the residuals and the other on the innovations. It will turn out that somewhat surprisingly, the difference of these two func√ tions is oP (1/ n), so that the non-parametric estimation of the parameter functions has an negligible asymptotic effect on the estimation of the (average) innovation distribution. More assumptions are needed. 6 Assumptions. (iii) There exists a class G with θk (·) − θbk (·) ∈ G, k = 1, . . . , p, with probability tending to 1 as n → ∞. such that supg∈G kgk∞ < ∞ and for some C, c > 0 Z 1 log Nn (u, G) du < C < ∞ ∀ n. c/n (iv) For k = 1, . . . , p we have 1 n Pn t=1 2 θbk ( nt ) − θk ( nt ) = OP (m−1 n ) with mn → ∞ as n → ∞. Assumptions (iii) and (iv) are discussed below. Theorem 2 Suppose that the assumptions of Theorem 1 hold. If, in addition, assumptions (iii) and (iv) are satisfied, where mn satisfies sup n1/2 m2n log n → ∞ as n → ∞, then for any L > 0 √ |Fbn (α, z) − Fn (α, z)| = oP (1/ n). (7) α∈[0,1],z∈[−L,L] Discussion of assumptions (iii) and (iv). The assumption on the entropy integral (see assumptions (iii) and (5)) control the complexity of the class G. Many classes G are know to satisfy these assumptions - see below for one example (for more examples we refer to the literature on empirical process theory). Notice that a more standard condition on the R1 p covering integral is c/n log Nn (u, G) du < ∞ (or similarly with bracketing). In contrast to that, the entropy integral in assumption (iii) does not have a square root. This is similar p to condition (5) where the integral is over log Nn,B (u2 , G) (notice the u2 ). This makes both our entropy conditions stronger than the standard assumption. The reason for this is that the exponential inequality that is underlying the derivations of our results is not of sub-Gaussian type (see Lemma 3), which in turn is caused by the dependence structure of our underlying process. A class of non-parametric estimators satisfying conditions (iii) and (iv) is given by the wavelet estimators of Dahlhaus et al. (1999). These estimators lie in the Besov smoothness s class Bp,q (C) where the smoothness parameters satisfy the condition s + 7 1 2 − 1 max(2,p) > 1. The constant C > 0 is a uniform bound on the (Besov) norm of the functions in the class. Dahlhaus et al. derive conditions under which their estimators converge at rate log n s/(2s+1) s in the L2 -norm. For s ≥ 1 the functions in Bp,q (C) have uniformly bounded n total variation. Assuming that the model parameter functions also posses this property, the rate of convergence in the k · kn -norm is the same as the one of the L2 -norm, because in this case the error in approximating the integral by the average over equidistant points is of log n s/(2s+1) . In order to verify the order O(n−1 ). Consequently, in this case we have m−1 = n n condition on the bracketing covering numbers from (iii), we use Nickl and Pötscher (2007). Their Corollary 1, applied with s = 2, p = q = 2 implies that the bracketing entropy with respect to the L2 -norm can be bounded by C δ −1/2 . (When applying their Corollary to our situation, choose, in their notation, β = 0, µ = U [0, 1], r = 2 and γ = 2, say.) The proof of Theorem 1 rests on the following lemma which is of independent interest. It is modeled after similar results for empirical processes (see van de Geer, 2000, Theorem 5.11). Let HL = [0, 1] × [−L, L] × G p denote the index space of the process νn where L > 0 is some constant. Define a metric on HL as p X dn (h1 , h2 ) = dn (α1 , z1 , g1 ), (α2 , z2 , g2 ) = |α1 − α2 | + |z1 − z2 | + kg1,k − g2,k kn . (8) k=1 Lemma 1 For C0 > 0 let An = 2 2 ≤ C and define K ∗ = 1 + (1 + p C0 ) kfmk∗∞ . Y 0 s s=−p+1 1 Pn n Suppose that HL is totally bounded with respect to dn , that assumptions (i) and (ii) hold, and that for C1 > 0, 26 K ∗ η≥ √ , n (9) 1 ∗√ K n (τ 2 ∧ τ ), 2 Z τ p 2 log NB (u , HL , dn ) du ∨ τ . η ≥ C1 √ η ≤ (10) (11) η/28 K ∗ n 6 Then, for every L > 0, and for C1 ≥ 2 P sup h1 ,h2 ∈HL : dn (h1 ,h2 )≤τ 2 √ 10 K ∗ we have with C2 = νn (h1 ) − νn (h2 )| ≥ η, An ≤ C2 exp − 8 26 (26 +1)K ∗ C12 + 2 that η2 26 (26 + 1) K ∗ τ 2 . 2.1 Estimating the distribution function of the errors t As has been indicated in the introduction, the fact that we present our results for the case of σ(·) not being estimated is related to the applications we have in mind (see Section 4). However, the case of also estimating the variance function, and thus estimating the distribution of t rather than the average distribution of ηt = σ( nt )t , can be treated by using very similar techniques. This is discussed here briefly. Assume we have available an estimator σ b(·) of the variance function. Then the residuals become b t = b t )0 Yt−1 Yt −θ( n , σ b( nt ) and the sequential empirical distribution function of b t can be written as Fbn∗ (α, z) bαnc bαnc σ o b − θ)( t )0 Yt−1 b( nt ) (θ 1X n 1X n 1{b t ≤ z} = 1 t ≤ = + z − 1 + z n t=1 n t=1 σ( nt ) σ( nt ) Accordingly, we define the modified residual sequential empirical process as νn∗ (α, z, g, s) bαnc 1 Xh t 1 σ( n ) t ≤ g0 ( nt ) Yt−1 + z s( nt ) + z =√ n t=1 −F g0 ( nt ) Yt−1 +z s( nt ) + z /σ t n i , where the function s lies in a class S such that the difference σ b −σ ∈ S (with probability tending to 1 as n → ∞). An adapted version of Theorem 1 holds for νn∗ (α, z, g, s) − νn∗ (α, z, 0, 0) where the adaptation consists in replacing G by G ∪ S in the entropy integral, and the supremum is also taken over all s ∈ S with kskn ≤ δn → 0 as n → ∞. Assuming further that conditions on S and the difference σ b − σ ∈ S follow exactly the ones given in assumptions (iii) and (iv) above, then under the same assumption on the rates of convergence of our estimators, the non-parametric estimation of the AR-parameter functions and the variance function has a negligible asymptotic effect on the estimation of the distribution function of the errors t , i.e. bαnc 1 bαnc √ 1X X 1{b t ≤ z} − 1{t ≤ z} = oP (1/ n). sup n t=1 α∈[0,1],z∈[−L,L] n t=1 (12) The proofs of these results require only some straightforward adaptations, beginning with a modification of dn (given in (8)) to a metric on HL∗ = [0, 1] × [−L, L] × G p × S by adding 9 ks1 − s2 kn . With this modification Lemma 1 still holds for HL∗ . This then is the main ingredient to the proof of Theorems 1 and 2 for processes based on the b t rather than on the b t . Just as for the estimators of the AR-parameters, the wavelet based estimator of the variance function of Dahlhaus et al. (1999) satisfies the necessary assumptions on σ b. 3 Weighted sums under local stationarity As discussed above, the second type of processes of importance in our context are weighted partial sums of locally stationary processes given by n 1 X h Zn (h) = √ n t=1 t T Ys , h ∈ H. (13) In the i.i.d. case, weighted sums have received some attention in the literature. For functional central limit theorems and exponential inequalities see, for instance, Alexander and Pyke (1986), van de Geer (2000), and references therein. We will show below that under appropriate assumptions, Zn (h) converges weakly to a Gaussian process. In order to calculate the covariance function of the limit we assume that the process Yt is locally stationary as in Dahlhaus and Polonik (2009). Recalling assumption (i), this means that we assume the existence of functions a(·, j) : (0, 1] → R with sup |a(u, j)| ≤ u sup j n X t=1 K , `(j) t |at,T (j) − a( , j)| ≤ K, T T V (a(·, j)) ≤ K , `(j) (14) (15) (16) where for a function g : [0, 1] → R we denote by T V (g) the total variation of g on [0, 1]. Conditions (14) - (16) hold if the zeros of the corresponding AR-polynomials are bounded away from the unit circle (uniformly in the rescaled time u) and the parameter functions are of bounded variation (see Dahlhaus and Polonik, 2006). Further we define the time varying 10 spectral density as the function 1 |A(u, λ)|2 2π f (u, λ) := with A(u, λ) := ∞ X a(u, j) exp(−iλj), j=−∞ and π Z f (u, λ) exp(iλk)dλ = c(u, k) := −π ∞ X a(u, k + j) a(u, j) j=−∞ is the time varying covariance of lag k at rescaled time u ∈ [0, 1]. We also denote by cumk (X), the k-th order cumulant of a random variable X. Theorem 3 Let H denote a class of uniformly bounded, real valued functions of bounded variation defined on [0, 1]. Assume further that for some C, c > 0, Z 1 log Nn (u, H) du < C < ∞ ∀ n, (17) c/n and that for some constant C1 > 0 we have cumk (t ) ≤ k!C1k for k = 1, 2, . . . Then we have under assumptions (i) and (ii) that as n → ∞, the process Zn (h), h ∈ H converges weakly to a tight, mean zero Gaussian process {G(h) , h ∈ H}. If, in addition, (14) - (16) hold, then the R1 variance-covariance function of G(h) can be calculated as C(h1 , h2 ) = 0 h1 (u) h2 (u) S(u) du, P where S(u) = ∞ k=−∞ c(u, k). Remarks. a) Here weak convergence is meant in the sense of Hoffman-Jorgensen - see van der Vaart and Wellner (1996) for more details. b) It is well-known that for normally distributed errors we have cumk (t ) = 0 for k = 3, 4, . . ., and thus they satisfy the assumptions of the theorem. c) Weighted partial sums of the form bαnc 1 X Zn (α, h) = √ g n t=1 11 t n Ys are in fact a special case of processes considered in the theorem. Here h(u) = hg,α (u) = 1[0,α] (u) g(u). Note that if g ∈ G and G satisfies the assumptions on the covering integral from the above theorem, then so does the class {hg,α (u) : g ∈ G, α ∈ [0, 1]}. In this case, R α ∧α the limit covariance can then be written as C(hg1 ,α1 , hg2 ,α2 ) = 0 1 2 g1 (u) g2 (u) S(u) du. d) Assumptions (14) - (16) are only used for calculating the covariance function of the limit process. The main ingredients to the proof of Theorem 3 are presented in the following results. These results are of independent interest. Theorem 4 Let {Yt , t = 1, . . . , n} satisfy assumptions (i) and (ii) and let H = {h : [0, 1] → R} be totally bounded with respect to k·kn . Further assume that there exists a constant C > 0 P such that for all k = 1, 2, . . ., we have cumk ≤ k!C k , and let An = n1 nt=1 Yt2 ≤ M 2 where M > 0. There exists constants c0 , c1 , c2 > 0 such that for all η > 0 satisfying √ η < 16 M n τ Z τ log Nn (u, H) d u ∨ τ η > c0 and (18) (19) η√ 8M n we have P h i n η o . |Zn (h)| > η, An ≤ c1 exp − c2 τ h∈H, khkn ≤τ sup The next result of importance for the proof of Theorem 3 deals with cumulants. For random variables X1 , . . . , Xk we denote by cum(X1 , . . . , Xk ) their joint cumulant, and if Xi = X for all i = 1, . . . , k, then cum(X1 , . . . , Xk ) = cum(X, . . . , X) = cumk (X), the k-th order cumulant of X. Lemma 2 Let {Yt , t = 1, . . . , n} satisfy assumptions (i) and (ii). For j = 1, 2, . . ., let hj be functions defined on [a, b] with khj kn < ∞. Then there exists a constant 1 ≤ K0 < ∞ such that for all k ≥ 1, k Y cum(Zn (h1 ), . . . , Zn (hk )) ≤ K0k−1 cumk (1 ) khj kn . j=1 12 If, in addition, khj k∞ ≤ M < ∞, j = 1, . . . , k, then for k ≥ 3, cum(Zn (h1 ), . . . , Zn (hk )) ≤ (K0 )k−2 M k cumk (1 ) n− k−2 2 . The behavior of the cumulants given in the above lemma is needed for the following crucial exponential inequality. Lemma 3 Let {Yt , t = 1, . . . , n} satisfy the assumptions of Lemma 2. Let h be a function with khkn < ∞. Assume that there exists a constant C > 0 such that for all k = 1, 2, . . . we have cumk (t ) ≤ k! C k . Then there exists constants c1 , c2 > 0 such that for any η > 0 we have h i n o η P |Zn (h)| > η ≤ c1 exp − . (20) c2 khkn The assumptions on the cumulants in particular hold for errors with E|t |k ≤ C k 2 for all k = 1, 2, . . . 4 Applications We present two applications that motivate the above study of the two types of processes. The first is a test for the constancy of the variance function, i.e. homoskedasticity. The second, closely related though more involved than the first, is a test for determining the modality of the variance function. 4.1 A test of homoskedasticity We are interested in testing the null hypothesis H : σ(u) = σ0 for all u ∈ [0, 1]. (Note that if we additionally assume that the AR parameters do not depend on time, then under mild conditions on the θk , this is also a test for weak stationarity.) To this end, consider the process bαnc 1X b 1(b ηt2 ≥ qbγ2 ), Gn,γ (α) = n t=1 α ∈ [0, 1], (21) bn,γ (α) counts where qbγ is the empirical quantile of the squared residuals. Notice that G the number of large (squared) residuals within the first (100 × α)% of the observations 13 Y1 , . . . , Yn . If the variance function is constant, then, since one has a total of bnγc large bn,γ (α) approximately equals αγ. This motivates the form residuals, the expected value of G bn,γ (α) − αγ . The behavior of this test statistic follows of our test statistic, Tn = supα∈[0,1] G the discussion of our second application. 4.2 Asymptotic distribution of a test statistic for modality of the variance function We argue briefly that the test for constancy of variance from subsection 4.1 can be modified to a test for modality of the variance function. For more details, please refer to Chandler and Polonik (2012). A key realization is that for a function to be monotonic (which, say, a unimodal function is on either side of the mode), the variance function must be ‘at least’ constant. In other words, a constant variance function provides the boundary case under the null for a test of monotonicity. We conduct this test on appropriate intervals [a, b] ⊂ [0, 1], where this crucial choice depends on the underlying variance function. Under the alternative hypothesis the interval needs to be located around a deviation from the null-hypothesis, meaning around an antimode. Chandler and Polonik (2012) provide an estimator for the interval of interest and show that this estimation does not influence the asymptotic limit of the test statistic. Via this estimation, and supposing we are testing for a monotonically increasing function (locally on [a, b]), it is appropriate to modify the test statistic to Tn = bn,γ (α) − αγ . supα∈[a,b] G bn,γ (α) is To simplify the notation we assume in the following that [a, b] = [0, 1]. Notice that G closely related to the sequential residual empirical process, and as can be seen from the proof bn,γ (α) through handling of Theorem 5, weighted empirical processes enter the analysis of G the estimation of qγ . Theorem 5 below provides an approximation of the test statistic Tn by independent (but not necessarily identically distributed) random variables. This result crucially enters the proofs in Chandler and Polonik (2012). In particular, it implies that the large sample behavior of the test statistic Tn under the null hypothesis is not influenced by the in general non-parametric estimation of the parameter functions, as long as the rate of convergence of these estimators is sufficiently fast. This is somewhat a surprise, and it is connected to the particular structure of our model. First we introduce some additional notation. Let fu denote the pdf of σ(u) t , i.e. fu (z) = 1 σ(u) z f ( σ(u) ), and 14 bαnc 1X Gn,γ (α) = 1 t2 σ 2 n t=1 t T ≥ qγ2 , where qγ is defined by means of Z 1 Ψ(z) = 0 z F ( σ(u) ) du Z − 0 1 −z F ( σ(u) ) du, z ≥ 0, (22) as the solution to the equation Ψ(qγ ) = 1 − γ. (23) Notice that this solution is unique since we have assumed F to be strictly monotonic, and if σ 2 (u) = σ02 is constant for all u ∈ [a, b], then qγ2 equals the upper γ-quantile of the squared innovations ηt2 = σ02 2t . The approximation result that follows does not assume that the variance is constant, however. Theorem 5 Let γ ∈ [0, 1] and suppose that 0 ≤ a < b ≤ 1 are non-random. Assume further that cumk |t | < k!C k for some C > 0 and all k = 1, 2, . . . Then, under assumptions (i) (iv), with n1/2 m2n log n √ = o(1) we have bn,γ (α) − Gn,γ (α) + c(α) Gn,γ (1) − EGn,γ (1) = op (1), n sup G (24) α∈[0,1] where Rα f (q )) + f (−q ) du u γ u γ c(α) = R0 1 . fu (qγ ) + fu (−qγ ) du 0 Under the null-hypothesis σ(u) ≡ σ0 > 0 for u ∈ [0, 1] we have c(α) = α. Moreover, in case √ the AR-parameter in model (1) are constant, and n-consistent estimators are used, then the moment assumptions on the innovations can be significantly relaxed to E t2 < ∞. Under the (worst case) null-hypothesis, σ 2 (u) = σ02 for all u ∈ [0, 1], the innovations are i.i.d., we have c(α) = α, and EGn,γ (α) = αγ. Therefore the above result implies that √ b (γ(1 − γ))−1/2 n(G n,γ (α) − αγ) converges weakly to a standard Brownian Bridge. Further details can be found in Chandler and Polonik (2012). 15 5 Proofs 5.1 Proof of Lemma 1 We only present a brief outline. First notice that νn (α, z, g) is a sum of bounded martingale differences, because with ξtz,g = 1 σ( nt ) t ≤ g0 ( nt )Yt−1 + z we have n 1 X ez,g ξ 1 νn (α, z, g) = √ n t=1 t t n ≤α where ξetz,g = ξtz,g − E(ξtz,g |Ft−1 ) and Ft = σ(t , t−1 , . . .) denotes the σ-algebra generated by {t , t−1 , . . .}, and obviously also νn (α1 , z1 , g1 ) − νn (α2 , z2 , g2 ) are sums of martingale differences. The proof of this lemma is based on the basic chaining device that is well-known in empirical process theory, utilizing the following exponential inequality for sums of bounded martingale differences from Freedman (1975). Lemma. (Freedman 1975) Let Z1 , . . . , Zn denote martingale differences with respect to a P filtration {Ft , t = 0, . . . , n−1} with |Zt | ≤ C for all t = 1, . . . , n. Let further Sn = √1n nt=1 Zt P and Vn = Vn (Sn ) = n1 nt=1 E(Zt2 | Ft−1 ). Then we have for all , τ 2 > 0 that P Sn ≥ , Vn ≤ τ 2 2 ≤ exp − 2 2τ + ! 2 √C n . (25) The form of (25) motivates that we first need to control the quadratic variation Vn . Let ηtα,z,g = ξetz,g 1 t n ≤α . We have for h1 = (α1 , z1 , g1 ), h2 = (α2 , z2 , g2 ) ∈ H with d h1 , h2 ≤ that n 1X Vn =Vn (νn (h1 ) − νn (h2 )) = E |ηtα1 ,z1 ,g1 − ηtα2 ,z2 ,g2 | Ft−1 n t=1 n 1 X ≤ 1 n t=1 ≤ t n ≤ α1 − 1 t n n 1X z1 ,g1 ≤ α2 E(|ξt | Ft−1 ) + E(|ξtz1 ,g1 − ξtz2 ,g2 |Ft−1 ) n t=1 n 1X 1 n (α2 − α1 ) + 1 + E(|ξtz1 ,g1 − ξtz2 ,g2 |Ft−1 ) n n t=1 16 ≤ |α1 − α2 | + n 1 1X + E(|ξtz1 ,g1 − ξtz2 ,g2 |Ft−1 ). n n t=1 On An we have for the last sum that n 1X E(|ξtz1 ,g1 − ξtz2 ,g2 |Ft−1 ) n t=1 n 0 0 1 X ≤ Fa+ nt z1 + (g1 ( nt ) Yt−1 − F nt z2 + (g1 ( nt ) Yt−1 n t=1 n 0 0 1 X F Ts z2 + (g1 ( nt ) Yt−1 − F nt z2 + (g2 ( nt ) Yt−1 + n t=1 n 0 1 X (g1 − g2 )( nt ) Yt−1 | n t=1 v u X p X u1 n kf k∞ kg1k − g2k kn t Yt2 ≤ (1 + C0 ) . n m ∗ t=−p k=1 ≤ sup fu (x) |z1 − z2 | + sup fu (x) u,x ≤ u,x kf k∞ kf k∞ |z1 − z2 | + m∗ m∗ (26) Thus, on An we have for h1 = (α1 , z1 , g1 ), h2 = (α2 , z2 , g2 ) ∈ HL with dn h1 , h2 ≤ and ≥ 1 n that n 2 1 X h1 Vn (νn (h1 ) − νn (h2 )) ≤ E η s − η hs 2 Fs−1 ≤ K ∗ . n s=1 (27) This control of the quadratic variation in conjunction with Freedman’s exponential bound for martingales now enables us to apply the chaining argument in a way similar to the proof of Theorem 5.11 in van de Geer (2000). Details are omitted. 5.2 Proof of Theorem 1 We will utilize Lemma 1. For η > 0 we have P( |νn (α, z, g) − νn (α, z, 0)| ≥ η) sup α∈[0,1],z∈[−L,L],g∈G p ≤P An c +P sup dn (h1 ,h2 )≤C m−1 n h1 ,h2 ∈HL 17 |νn (h1 ) − νn (h2 )| ≥ η, An , P where again An = { n1 nt=1−p Yt2 ≤ C } for some C > 0. We will see below that Pn 1 2 t=−p+1 Yt = OP (1) as n → ∞. Therefore, for any given > 0 we can choose C n c such that P An ≤ for n large enough. An application of Lemma 1 now gives the asR1p sertion. Similar arguments as those leading to (19) show that c log Nn,B (u2 , G) du < ∞ n R1p implies c log NB (u2 , HL , dn ) du < ∞ (see also (46)). It remains to show that in fact n Pn 1 2 t=−p+1 Yt = OP (1). To this end we show that n n i h1 X E Yt2 < ∞, n t=1 n h1 X i 2 Var Y = o(1). n t=1 t (28) (29) These two facts can be seen by direct calculations as demonstrated now. First we consider (28). We have EYt2 t t t h X i2 X X =E at,n (t − j)j = E at,n (t − j)at,n (t − k)j k j=−∞ j=−∞ k=−∞ = t X a2t,T (t j=−∞ ∞ X K 2 − j) ≤ < C < ∞, `(j) j=0 (30) for some C > 0, where we were using assumption (ii). This implies (28). Next we indicate (29). Straightforward calculations show that E(Yt2 Ys2 ) − E(Yt2 ) E(Ys2 ) = t t s s X X X X at,T (t − i)at,T (t − j)as,T (s − k)as,T (s − `)E(i j k ` ) i=−∞ j=−∞ k=−∞ `=−∞ − t X a2t,T (t − i) i=−∞ = (E4t s X a2s,T (s − k) k=−∞ − 2) A(t, s) + 2B(t, s), where A(t, s) = ∞ X a2t,T (i)a2s,T (i + |t − s|), i=0 18 B(t, s) = ∞ X 2 at,T (i)as,T (i + |t − s|) . i=0 We obtain that for some C ∗ < ∞, ∞ ∞ X X 1 1 1 1 4 ≤K A(t, s) ≤ K 2 2 ` (i) ` (i + |t − s|) `(i) `(i + |t − s|) i=0 i=0 4 ≤ K4 ∞ X 1 1 1 < C∗ , `(|t − s|) i=0 `(i) `(|t − s|) and similarly ∞ i2 X 1 1 1 |B(t, s)| ≤ K . ≤ C∗ `(i) `(i + |t − s|) `(|t − s|) i=0 h 2 (31) Now we have n X n n X n n−1 1 X C∗ X 1 C ∗ X 2(n − k) 1 A(t, s) ≤ 2 ≤ 2 n s=1 t=1 n s=1 t=1 `(|t − s|) n k=0 n `(k) ≤ This implis that Var 5.3 n−1 1 2 C∗ X 1 =O as n → ∞. n k=0 `(k) n 1 Pn n 2 Y = O(1/n) which is (29). s s=1 Proof of Theorem 3 Showing weak convergence of the process Zn (h) means proving asymptotic tightness and convergence of the finite dimensional distribution (e.g. see van der Vaart and Wellner, 1996). Tightness follows from Theorem 4. It remains to show convergence of the finite dimensional distributions. To this end we will utilize the Cramér-Wold device in conjunction with the method of cumulants. It follows from Lemma 2, that all the cumulants of Zn (h) of order k ≥ 3 converge to zero as n → ∞. Using the linearity of the cumulants, the same holds for any linear combination of Zn (hi ), i = 1, . . . , K. The mean of all the Zn (h) equals zero. It remains to show convergence of the covariances cov(Zn (h1 ), Zn (h2 )). The range of the summation indices below are such that the indices of the Y -variables are between 1 and n. For ease of notation we achieve this by 19 formally setting hi (u) = 0 for u ≤ 0 and u > 1, i = 1, 2. We have n n 1 XX h1 n t=1 s=1 cov(Zn (h1 ), Zn (h2 )) = = n t−1 1X X h1 n t=1 k=t−n t n = n 1X X h1 n t=1 √ t n · h2 t−k n · h2 t−k n t n · h2 s n cov(Ys , Yt ) cov(Yt , Yt−k ) cov(Yt , Yt−k ) + R1n (32) |k|≤ n where for n sufficiently large |R1n | ≤ n 1 X X h1 n t=1 √ t T · h2 t−k T cov(Yt , Yt−k ). |k|> n From Proposition 5.4 of Dahlhaus and Polonik (2009) we obtain that supt |cov(Yt , Yt−k )| ≤ P∞ K 1 for some constant K. Since both h and h are bounded and 1 2 k=−∞ `(k) < ∞, we can `(k) conclude that R1n = o(1). The main term in (32) can be approximated as n 1X X h1 n t=1 √ t T · h2 t−k n · h2 c t , n k + R2n (33) |k|≤ n where n 1 X X |R2,n | ≤ h1 n t=1 √ t n t−k n cov(Yt , Yt−k ) − c t , n k . |k|≤ n √ Proposition 5.4 of Dahlhaus and Polonik (2009) also says that for |k| ≤ n we have Pn cov(Yt , Yt−k ) − c t , k ≤ K 1 + |k| for some K > 0. Applying this result we t=0 n n obtain that √ n n 1 X X h1 |R2,n | ≤ n t=1 √ ≤ K1 1 n k=− n √ n n X X √ k=− n t=1 t n · h2 t−k n cov(Yt , Yt−k ) − c t , n k √ cov(Yt , Yt−k ) − c n 1 X t , k ≤ K1 n n √ 1+ |k| `(|k|) = o(1) k=− n as n → ∞. Next we replace h2 ( t−k ) in the main term of (33) by h2 ( nt ). The approximation n 20 error can be bounded by n 1 X X h1 n t=1 √ t n · h2 t−k n t n − h2 K `(|k|) = o(1). |k|≤ n Here we are using the fact that supu |c(u, k)| ≤ K `(|k|) (see Proposition 5.4 in Dahlhaus and Polonik, 2009) together with the assumed (uniform) continuity of h2 , the boundedness of h1 P 1 and the boundedness of ∞ k=−∞ `(|k|) . We have seen that n X 1X cov(Zn (h1 ), Zn (h2 ) ) = h1 nt · h2 nt c nt , k + o(1). n t=1 √ k≤ n P∞ t Since S(u) = k=−∞ c n , k < ∞ we also have ∞ n X 1X cov(Zn (h1 ), Zn (h2 ) ) = h1 nt · h2 nt c nt , k + o(1). n t=1 k=−∞ Finally, we utilize the fact that T V (c(·, k)) ≤ K `(|k|) which is another result from Proposi- tion 5.4 of Dahlhaus and Polonik (2009). This result, together with the assumed bounded variation of both h1 and h2 allows us to replace the average over t by the integral. 5.4 Proof of Lemma 2 By utilizing multilinearity of cumulants we obtain that cum(Zn (h1 ), . . . , Zn (hk )) = n − k2 n X h1 t1 n · · · hk tk n cum(Yt1 , . . . , Ytk ). t1 ,t2 ,...,tk =1 In order to estimate cum(Yt1 , . . . , Ytk ) we utilize the special structure of the Yt -variables. Since the j are independent we again obtain by again using multilinearity of the cumulants together with the fact that cum(j1 , . . . , jk ) = 0 unless all the j` , ` = 1, . . . , k are equal, that min{t1 ,...,tk } cum(Yt1 , . . . , Ytk ) = X at1 ,n (t1 − j) · · · atk ,n (tk − j) cum(j , . . . , j ) j=0 min{t1 ,...,tk } = cumk (1 ) X at1 ,n (t1 − j) · · · atk ,n (tk − j). j=0 P∞ Qk Thus cum(Yt1 , . . . , Ytk ) ≤ cumk (1 ) j=0 i=1 21 K , `(|ti −j|) (34) and consequently, ∞ k hX n Y X cum(Zn (h1 ), . . . , Zn (hk )) ≤ n− k2 cumk (1 ) hi =n − k2 cumk (1 ) 2 Y n hX hi × n hX hi ti n ti =0 i=3 i K `(|ti − j|) ti n i K `(|ti − j|) ti =0 j=−∞ i=1 k Y ti =0 j=−∞ i=1 ∞ X ti n i K . `(|ti − j|) Utilizing Cauchy-Schwarz inequality we have for the last product k hX n Y hi i=3 ti n ti =0 1 `(|ti − j|) i v k uX Y u n t ≤ hi v uX u n ti 2 t n ti =0 i=3 ti =0 2 K `(|ti − j|) v u ∞ k Y uX k K 2 −1 t 2 ≤n khi kn `(|t|) t=−∞ i=3 ≤ k K0k−2 n 2 −1 k Y khi kn . (35) i=3 where we used the fact that r P∞ t=−∞ K `(|t|) 2 ≤ P∞ K t=−∞ `(|t|) ≤ K0 for some K0 < ∞. Notice that the bound (35) does not depend on the index j anymore, so that k ∞ 2 hX n Y Y X cum(Zn (h1 ), . . . , Zn (hk )) ≤ K0k−2 n−1 hi khi kn cumk (1 ) i=3 j=−∞ i=1 As for the last sum, we have by again using the fact that = j=−∞ i=1 n X n X ti =0 h1 tn1 t1 =0 t2 =0 h2 tn2 ti n P∞ K K j=−∞ `(|t1 −j|) `(|t2 −j|) i K `(|ti − j|) ∞ X K K `(|t1 − j|) `(|t2 − j|) j=−∞ 22 ti =0 for some K ∗ > 0 (cf. (31)) together with the Cauchy-Schwarz inequality that ∞ 2 hX n X Y hi ti n i K . `(|ti − j|) ≤ K∗ `(|t1 −t2 |) ≤ n X n X t1 n h1 t2 n h2 t1 =0 t2 =0 v uX n u n X ∗t h1 ≤K t1 n K∗ `(|t1 − t2 |) 2 t1 =0 t2 =0 v uX u n ∗t h1 =K n X t1 2 n t1 =0 1 `(|t1 − t2 |) t2 1 `(|t1 − t2 |) =0 v uX n u n X t h1 t2 2 n t1 =0 t2 =0 v uX u n t h1 t2 2 n t1 =0 1 `(|t1 − t2 |) n X t2 1 `(|t1 − t2 |) =0 ≤ K0 n kh1 kn kh2 kn . This completes the proof of the first part of the lemma. The second part follows similar to the above by observing that if khi k∞ < M for all i = 1, . . . , k, then, instead of the estimate (35), we have k hX n Y hi i=3 with K0 = 5.5 ti n ti =0 k X n i Y 1 1 k−2 ≤M ≤ (M K0 )k−2 `(|ti − j|) `(|t − j|) i i=3 t =0 i P∞ 1 t=−∞ `(|t|) . Proof of Lemma 3 Using Lemma 2 the assumptions on the cumulants imply that ΨZn (h) (t) = log Ee t Zn (h) = ∞ X tk k=0 k! cumk (Zn (h)) ∞ 1 X ≤ (tCK0 khkn )k , K0 k=1 assuming that t > 0 is such that the infinite sum exists and is finite. We obtain P |Zn (h)| > η ≤ 2 e−tη E eZn (h) = 2 exp{−tη} exp{ΨZn (h) (t)} ∞ X −1 ≤ 2 exp{−tη + K0 (tCK0 khkn )k }. k=1 23 Choosing t = 1 2CK0 khkn gives the assertion: n 1/K0 exp − P |Zn (h)| > η ≤ 2 e o η . 2CK0 khkn The fact that cumj (t ) ≤ j! C j , j = 1, 2, . . . holds if E|t |k ≤ (36) C k , 2 k = 1, 2, . . . can be seen by induction as follows. First observe that since |cum1 (t )| = E(t ) = 0, (36) holds for j = 1. Now assume that (36) holds for all P k−1 k−i |cumi (t )| we get j = 1, . . . , k − 1. Using |cumk (t )| ≤ E(|t |k ) + k−1 i=1 i−1 E|t | k−1 X k−1 |cumk (t )| ≤ + E|t |k−j |cumj (t )| 2 j − 1 j=1 k−1 C k X k − 1 C k−j + j! C j ≤ 2 j−1 2 j=1 C k ≤ ≤ ≤ C k 2 C k 2 C k 2 ≤ k! C k 5.6 + (k − 1)! C k k−1 X j=1 + k! C k k−1 X `=1 j (k − j)! 2k−j 2−` `! √ + k! C k ( e − 1) (for k ≥ 2). Proof of Theorem 4 Using the exponential inequality from Lemma 3 we can mimic the proof of Lemma 3.2 from van de Geer (2000). As compared to van de Geer, our exponential bound is of the form 2 η η c1 exp{−c2 khk } rather than c exp{−c }. It is well-known that this type of inequality 1 2 khkn n leads to the covering integral being the integral of the metric entropy rather than the square root of the metric entropy. (See for instance Theorem 2.2.4 in van der Vaart and Wellner, 1996.) This indicates the necessary modifications to the proof in van de Geer. Details are omitted. Condition (18) just makes sure that the upper limit in the integral from (19) is larger than the lower limit. 24 5.7 Proof of Theorem 2 and Theorem 5 First we indicate how the two processes studied in this paper enter the picture, and simultaneously we provide an outline of the proof. Recall that bαnc 1X b 1(b ηt ≤ z) Fn (α, z) = n t=1 and bαnc 1X Fn (α, z) = 1(σ( nt ) t ≤ z). n t=1 With this notation we can write √ bn,γ (α) − Gn,γ (α) ) n (G √ √ n Fn (α, qbγ ) − Hn (α, qbγ ) − n Fn (α, −b qγ ) − Hn (α, −b qγ ) = √ √ − n Fn (α, qbγ ) − Fn (α, qγ ) − n Fn (α, −b qγ ) − Fn (α, −qγ ) =: − In (α) IIn (α) (37) with In (α) and IIn (α), respectively, denoting the two quantities inside the two [·]-brackets. The assertion of Theorem 5 follows from sup |In (α)| = oP (1) and (38) α∈[0,1] sup |IIn (α) − c(α) Gn,γ (1) − EGn,γ (1))| = oP (1). (39) α∈[0,1] Property (39) can be shown by using empirical process theory based on independent, but not identically distributed random variables. This will be done below. First we consider (38). We will see that this involves properties of both residual empirical processes and weighted sum processes. Proof of supα∈[0,1] |In (α)| = oP (1) and proof of Theorem 2. Observe that bαnc bαnc 1X 1X b Fn (α, z ) = 1 ηbt ≤ z = 1 σ( nt ) t ≤ σ( nt ) t − ηbt + z n t=1 n t=1 bαnc 1X b t ) 0 Yt−1 + z . = 1 σ( nt ) t ≤ (θ − θ)( n n t=1 25 bn (·) − θ(·) ∈ G p → 1 as n → ∞, where for g = (g1 , . . . , gp )0 : Recall that by assumption P θ [0, 1]p → R we write {g ∈ G p } for {gi ∈ G, i = 1, . . . , p}. We also write bαnc 1X t Fn (α, z, g) = 1 σ( n ) t ≤ g0 ( nt ) Yt−1 + z . n t=1 b − θ = OP (m−1 ) the above implies that for any By taking into account that by assumption θ n > 0 we can find a C > 0 such that with probability at least 1 − we have for large enough n that sup α∈[0,1], z∈[−L,L] √ b n Fn (α, z) − Fn (α, z) ≤ sup α∈[0,1], z∈[−L,L], g∈G p , kgkn ≤Cm−1 n √ n Fn (α, z) − Fn (α, z, g) . (40) Thus, if L is such that qγ ∈ (−L, L) and P (b qγ ∈ [−L, L]) → 1, as n → ∞, then the right hand side in (40) being oP (1) for any C > 0 implies that supα |In (α)| = oP (1), and this then also proves Theorem 2. In order to control the right hand side in (40), we first introduce an appropriate centering. Let bαnc 1X t E 1 σ( n ) t ≤ g0 ( nt ) Yt−1 + z t−1 , t−2 , . . . En (α, z, g) := n t=1 ! 0 bαnc t X g( ) Y + z 1 t−1 n = F t n t=1 σ( n ) (41) and let 0 = (0, . . . , 0)0 ∈ G p denote the p-vector of null-functions. Write h i Fn (α, z) − Fn (α, z, g) = Fn (α, z, 0) − En (α, z, 0) − Fn (α, z, g) − En (α, z, g) + En (α, z, 0) − En (α, z, g) =: Tn1 (α, z, g) + Tn2 (α, z, g) (42) with Tn1 and Tn2 denoting the two terms inside the two [ ] brackets. This centering makes T1n a sum of martingale differences (cf. proof of Lemma 1 given above). The quantity √ νn (α, z, g)) = n Fn (α, z, g) − En (α, z, g) in fact is the residual empirical process discussed in section 2. It follows from (40) and (42) that together 26 √ | n Tn1 (α, z, g)| = oP (1) sup α∈[0,1], z∈[−L,L], g∈G p , kgkn ≤Cm−1 n sup α∈[0,1], z∈[−L,L], g∈G p , kgkn ≤Cm−1 n and (43) √ n Tn2 (α, z, g) = oP (1) (44) imply the desired result, provided we can show that P (b qγ ∈ / [−L, L]) = o(1) as n → ∞. Since qγ ∈ (−L, L), the desired property follows from consistency of qbγ as an estimator for qγ . This will be shown at the end of this proof. Verification of (43). We have with HL as defined immediately above Lemma 1 in Section 2 that |νn (α, z, g) − νn (α, z, 0)| ≤ sup |νn (h1 ) − νn (h2 )|. sup dn (h1 ,h2 )≤C m−1 n h1 ,h2 ∈HL α∈[0,1], z∈[−L,L], g∈G p , kgkn ≤Cm−1 n P As above let An = { n1 nt=1−p Yt2 ≤ C0 } for some C0 > 0. It is shown in the proof of P Theorem 1 that n1 ns=−p+1 Ys2 = OP (1), and thus P (An ) = o(1). It remains to show that and for η > 0 we have P |νn (h1 ) − νn (h2 )| ≥ η, An = o(1). sup (45) dn (h1 ,h2 )≤C m−1 n h1 ,h2 ∈HL This, however, is an immediate application of Theorem 1, and (43) is verified. Notice here that the finiteness of the covering integral of HL with respect to k · kn follows by standard arguments. In fact, it is not difficult to see that for some C0 > 0 log Nn (C1 δ, HL ) ≤ −C0 log + log Nn (, G). (46) Verification of (44). We have √ dnαe 1 X g n Tn2 (α, z, g) = √ F n t=1 dnαe 1 X =√ f t (z) g n t=1 n with ζt between z and z + g t n t n 0 0 t 0 Yt−1 n σ nt + z −F z σ nt dnαe 1 X Yt−1 + √ (f t (ζt ) − f nt (z)) g n t=1 n t n 0 Yt−1 (47) Yt−1 . The second term in (47) is a remainder term that 27 will be treated below. The first term is a weighted sum of the Ys ’s and we will now use P Theorem 4 to control this term. We can write the first term as pk=1 Zk,n (h) where n 1 X t h( )Yt−k , Zk,n (h) = √ n t=1 n k = 1, . . . , p, (48) for h ∈ H with H = {hα,z,g (u) = 1α (u) fu (z)g(u), α ∈ [0, 1], z ∈ R, g ∈ G} where we use the shorthand notation 1α (u) = 1 u ≤ α . We will apply Theorem 4 to show that each Zk,n (h) tends to zero uniformly in h ∈ H. Since by our assumptions the functions {fu (z), z ∈ = o(1). [−L, L]} are uniformly bounded, we have supα,z,g khα,z,g k2n ≤ supu,z |fu (z)|2 m−1 n Since EZn (hα,z,g ) = 0, an application of Theorem 4 now shows that the first term of (47) converges to zero in probability uniformly in (α, z, g). Now we treat the second term in (47). Recall that our assumptions imply that the functions {fu , u ∈ [0, 1]} are uniformly Lipschitz continuous with Lipschitz constant c, say. Therefore we can estimate the last term in (47) by n c X √ g n t=1 t n 0 p n 2 c XX 2 g Yt−1 ≤ √ n t=1 k=1 k t n p X 2 Yt−j j=0 √ √ X n 2 n kgk kn = c p 2 OP (log n) = oP (1), mn k=1 p ≤ c p sup −p≤t≤T Yt2 where the last inequality uses the fact that sup−p≤t≤T Yt2 = OP (log n), which follows as in the proof of Lemma 5.9 of Dahlhaus and Polonik (2009). Proof of supα∈[0,1] | IIn (α) − c(α) Gn,γ (1) − EGn,γ (1)) | = oP (1). Define bαnc 1X z F n (α, z) := EFn (α, z) = F . n t=1 σ( nt ) We can write IIn (α) = √ n (Fn − F n )(α, qbγ ) − (Fn − F n )(α, qγ ) + √ n (Fn − F n )(α, −b qγ ) − (Fn − F n )(α, −qγ ) + √ n (F n (α, qbγ ) − F n (α, qγ )) − The process νn (α, z, 0) = √ (49) √ n (F n (α, −b qγ ) − F n (α, −qγ )) (50) (51) n (Fn − F n )(α, z) is a sequential empirical process, or a Kiefer28 Müller process, based on independent, but not necessarily identically distributed random variables. This process is asymptotically stochastically equicontinuous, uniformly in α with respect to ρn (v, w) = F n (1, v) − F 1,n (1, w), i.e. for every η > 0 there exists an > 0 with lim sup P n→∞ |νn (α, z1 , 0) − νn (α, z2 , 0)| > η = 0. sup (52) α∈[0,1], ρn (z1 ,z2 )≤ In fact, with ρn α1 , z1 ), (α2 , z2 ) = |α1 − α2 | + ρn (z1 , z2 ) we have sup sup |νn (α, z1 , 0) − νn (α, z2 , 0)| α∈[0,1] z1 ,z2 ,∈R, ρn (z1 ,z2 )≤ ≤ sup |νn (α1 , z1 , 0) − νn (α2 , z2 , 0)|. α1 ,α2 ∈[0,1], z1 ,z2 ∈R ρn ( (α,z1 ), (α,z2 ) )≤ Thus, (52) follows from asymptotic stochastic dn -equicontinuity of νn (α, z, 0). This in turn follows from a proof similar to, but simpler than, the proof of Lemma 1. In fact, it can be seen from (26) that for g1 = g2 = 0 we simply can use the metric ρ (α1 , z1 ), (α2 , z2 ) = |α1 − α2 | + ρn (z1 , z2 ) in the estimation of the quadratic variation, which in the simple case of g1 = g2 = 0 amounts to the estimation of the variance, because the randomness only comes in through the t . With this modification the proof of the ρn -equicontinuity of νn (α, z1 , 0) follows the proof of Lemma 1. Thus, if qbγ is consistent for qγ with respect to ρn , then it follows that both (49) and (50) are oP (1). We now prove this consistency of qbγ . R1 R1 R1 qγ z −z Recall that Ψ(u) = 0 F ( σ(u) ) du − 0 F ( σ(u) ) du. Observe that 0 F ( σ(u) ) du is close to qγ F n (1, qγ ). In fact, since by assumption u → F ( σ(u) ) is of bounded variation, their difference is of the order O(1/n). Consequently we have (1 − γ) − F n (1, qγ ) − F n (1, −qγ ) = Ψ(1, qγ ) − F n (1, qγ ) − F n (1, −qγ ) ≤ c n−1 (53) for some c ≥ 0. In fact (53) holds uniformly in γ. This follows from the fact that the functions z u → F σ(u) are Lipschitz continuous uniformly in z. (Notice that in the case where σ(·) ≡ σ0 ) = F n (1, z) − F n (1, −z). In other words, in is constant on [0, 1] then Ψ(z) = F ( σz0 ) − F ( −z σ0 this case we can choose c = 0.) We now show that under our assumptions we have for any 29 fixed 0 < γ < 1 that ρn (b qγ , qγ ) = F n (1, qbγ ) − F n (1, qγ ) = oP (1). (54) By assumption Ψ(z) is a strictly monotonic function in z. Together with (53) this implies qγ ) − F n (1, qγ ) − F n (1, −qγ ) = oP (1), that (54) is equivalent to F n (1, qbγ ) − F n (1, −b which (by using (53)) follows from qγ ) − 1 − γ = oP (1), F n (1, qbγ ) − F n (1, −b (55) Since by definition of qbγ we have Fbn (1, qbγ ) − Fbn (1, −b qγ ) = 1 − γ, (55) follows from sup |Fbn (1, z) − F n (1, z)| = oP (1). (56) z To see (56) notice that supz |Fbn (1, z)−Fn (1, z)| = oP (1). This follows from (40), (42), (43) and (44). Utilizing triangular inequality it remains to show that supz |Fn (1, z)−F n (1, z)| = oP (1). This uniform law of large numbers result follows from the arguments given below. It can also be seen directly by observing that for every fixed z we have |Fn (1, z) − F n (1, z)| = oP (1) (which easily follows by observing that the variance of this quantity tends to zero as n → ∞) together with a standard argument as in the proof of the classical Glivenko-Cantelli theorem for continuous random variables, utilizing monotonicity of F n (1, z) and Fn (1, z). This completes the proof of (54), and as outlined above, this implies that both (49) and (50) are oP (1), uniformly in α. It remains to consider the quantity in (51). First, we derive an upper bound for qbγ . Let Bn () = {|F n (b qγ ) − F n (qγ )| < ; supz |(Fbn − F n )(1, z)| < δ /6} where δ > 0 is such that |F n (qγ±δ ) − F n (qγ )| < . We already know that P (Bn ()) → 1 as n → ∞ for any > 0. Now choose > 0 small enough (such that γ > δ /6 in order for the below to be well defined). On Bn () we have qbγ = inf{z ≥ 0 : Fbn (1, z) − Fbn (1, −z) ≥ 1 − γ; |F n (z) − F n (qγ )| < } ≤ inf{z ≥ 0 : F n (1, z) − F n (1, −z) ≥ 1 − γ − (Fbn − F n )(1, qγ ) − (Fbn − F n )(1, −qγ ) +2 |(Fbn − F n )(1, v) − |(Fbn − F n )(1, w)|} sup |F n (v)−F n (w)|≤ ≤ inf{z ≥ 0 : Ψ(z) ≥ 1 − γ − (Fbn − F n )(1, qγ ) − (Fbn − F n )(1, −qγ ) 30 +2 |(Fbn − F n )(1, v) − |(Fbn − F n )(1, w)| + c n−1 } sup |F n (v)−F n (w)|≤ −1 = Ψ (1 − γ − Qn + rn ), where for short rn = 2 sup|F n (v)−F n (w)|≤ |(Fbn − F n )(1, v) − |(Fbn − F n )(1, w)| + cn−1 with c from (53), and Qn = (Fbn − F n )(1, qγ ) − (Fbn − F n )(1, −qγ ) . Now let Z Ψ(α, z) = 0 α z F ( σ(u) ) du Z α −z F ( σ(u)) ) du, − 0 z ≥ 0, α ∈ [0, 1]. (57) Observe that Ψ(z) = Ψ(1, z), where Ψ(z) is defined in (22). Since by definition of qγ we have qγ = Ψ−1 (1 − γ), and since Ψ(α, z) is strictly increasing in z for any α we have on Bn (), F n (α, qbγ ) − F n (α, qγ ) − F n (α, −b qγ ) − F n (α, −qγ ) ≤ Ψ(α, qbγ ) − Ψ(α, qγ ) + c n−1 Ψ α, Ψ−1 1 − γ − Qn + rn − Ψ α, Ψ−1 (1 − γ) + c n−1 = − = ∂ Ψ(α, Ψ−1 (ξn+ )) ∂z Ψ0 (Ψ−1 (ξn+ )) (Qn − rn ) + c n−1 Rα = −1 + −1 + f (Ψ (ξ )) + f (−Ψ (ξ )) du u u n n − R0 1 (Qn − rn ) + c n−1 −1 + −1 + fu (Ψ (ξn )) + fu (−Ψ (ξn )) du 0 =: − (c(α) + oP (1)) (Qn − rn ) + c n−1 with ξn+ ∈ [1 − γ, 1 − γ − Qn + rn ], fu (z) = z ∂ F ( σ(u) ) ∂z = 1 σ(u) f z σ(u) , and the oP (1)-term equals cn (α) − c(α) with cn (α) the ratio from the second to last line in the above formula, and Rα fu (qγ )) + fu (−qγ ) du . c(α) = R 1 f (q ) + f (−q ) du u γ u γ 0 0 The fact that |cn (α) − c(α)| = oP (1) follows from our smoothness assumptions together with the fact that |Qn − rn | = oP (1). Similarly, we obtain a lower bound of the form F n (α, qbγ ) − F n (α, qγ ) − F n (α, −b qγ ) − F n (α, −qγ ) ≥ −(c(α) + oP (1)) (Qn + rn ) − c n−1 . 31 From the above we know that √ and n rn = oP (1), so that √ n Qn = √ √ n (Fn − F n )(1, qγ ) − n (Fn − F n )(1, −qγ ) + oP (1) √ n Fn (α, qbγ ) − F n (α, qγ ) − Fn (α, −b qγ ) − F n (α, −qγ ) n c(α) X 1(σ 2 ns 2t ≤ qγ2 ) − P (σ 2 nt 2t ≤ qγ2 ) + oP (1) = −√ n t=1 √ = − c(α) n (Gn,γ (1) − EGn,γ (1) ) + oP (1). Finally, observe that under H0 , where σ(u) ≡ σ0 , we have c(α) = α, because fu does not depend on u ∈ [a, b]. The proof is complete once we have shown that |b qγ − qγ | = oP (1) as n → ∞ (cf. comment right after (44)). To see that, first notice that our regularity conditions imply that inf u fu (qγ ) =: d∗ > 0. Again using the fact that our assumptions imply the uniform Lipschitz continuity of the functions z → fu (z) we obtain the existence of an 0 > 0 qγ − qγ | ≤ 0 such that fu (z) > d2∗ for all |z − qγ | ≤ 0 and all u ∈ [0, 1]. Consequently, if |b R q b then F σ(qbγt ) − F σ(qγt ) = qγγ f nt (u) du ≥ d2∗ |b qγ − qγ | ∀ t = 0, 1, . . . , n. On the other n n hand, if |b qγ − qγ | > 0 then, because the fu(z) are all non-negative, F σ(qbγs ) − F σ(qγs ) = n n R qbγ s d∗ qγ f T (u) du ≥ 2 0 ∀ s = 0, 1, . . . , n. Thus, for any > 0 there exists an η > 0 with {|b qγ − qγ | > } ⊂ F qbγ s σ( n ) −F qγ s σ( n ) > η ∀ t = 0, 1, . . . , n . (58) Since all the function z → F Ts (z) are strictly monotone, we have n 1X F dn (b qγ , qγ ) = F n (b qγ ) − F n (qγ ) = n t=1 qbγ s σ( n ) −F qγ . s σ( n ) Consequently, (58) implies that if |b qγ − qγ | is not oP (1), then also dn (b qγ , qγ ) is not oP (1). This is a contradiction to (54) and this completes our proof. 6 References: AKRITAS, M.G. and VAN KEILEGOM, I. (2001): Non-paramteric estimation of the residual distribution. Scand. J. Statist. 28, 549–567. ALEXANDER, K.S. and PYKE, R. (1986): A uniform central limit theorem for set-indexed partial-sum processes with finite variance. Ann. Probab. 14, 582–597. 32 CHANDLER, G. and POLONIK, W. (2006). Discrimination of locally stationary time series based on the excess mass functional. J. Amer. Statist. Assoc. 101, 240–253. CHANDLER, G. and POLONIK, W. (2012): Mode Identification of volatility in time-varying autoregression. J. Amer. Statist. Assoc. 107, 1217-1229. CHENG, F. (2005). Asymptotic distributions of error density and distribution function estimators in nonparametric regression. J. Statist. Plann. Inference 128, 327–349. DAHLHAUS, R. (1997). Fitting time series models to nonstationary processes. Ann. Statist. 25, 1–37. DAHLHAUS, R., NEUMANN, M. and V. SACHS, R. (1999). Nonlinear wavelet estimation of time-varying autoregressive processes. Bernoulli 5, 873–906. DAHLHAUS, R. and POLONIK, W. (2005). Nonparametric maximum likelihood estimation and empirical spectral processes for Gaussian locally stationary processes. Technical report. DAHLHAUS, R. and POLONIK, W. (2006). Nonparametric quasi-maximum likelihood estimation for gaussian locally stationary processes. Ann. Statist. 34, 2790–2824. DAHLHAUS, R. and POLONIK, W. (2009). Empirical spectral processes for locally stationary time series Bernoulli 15, 1–39. DREES, H. and STǍRICǍ, C. (2002). A simple non-stationary model for stock returns. Preprint. EOM, K.B. (1999). Analysis of acoustic signatures from moving vehicles using time-varying autoregressive models, Multidimensional Systems and Signal Processing 10, 357–378. FRYZLEWICZ, P., SAPATINAS, T., and SUBBA RAO, S. (2006). A Haar-Fisz technique for locally stationary volatility estimation. Biometrika 93, 687–704. GIRAULT, J-M., OSSANT, F. OUAHABI, A., KOUAMÉ, D. and PATAT, F. (1998). Timevarying autoregressive spectral estimation for ultrasound attenuation in tissue characterization. IEEE Trans. Ultrasonic, Ferroeletric, and Frequency Control 45, 650–659. GRENIER, Y. (1983): Time-dependent ARMA modeling of nonstationary signals, IEEE Trans. on Acoustics, Speech, and Signal Processing 31 , 899–911. HALL, M.G., OPPENHEIM, A.V. and WILLSKY, A.S. (1983), Time varying parametric modeling of speech, Signal Processing 5, 267–285. HORVÁTH, L., KOKOSZKA, P., TEYESSIÉRE, G. (2001): Empirical process of the 33 squared residuals of an ARCH sequence. Ann. Statist. 29, 445–469. KOUL, H. L. (2002). Weighted empirical processes in dynamic nonlinear models, 2nd edn., Lecture Notes in Statistics, 166, Springer, New York. KOUL, H.L., LING, S. (2006): Fitting an error distribution in some heteroscedastic time series models. Ann. Statist. 34, 994–1012. KÜNSCH, H.R. (1995): A note on causal solutions for locally stationary AR processes. Preprint. ETH Zürich. LAÏB, N., LEMDANI, M. and OLUD-SAÏD, E. (2008): On residual empirical processes of GARCH-SM models: Application to conditional symmetry tests. J. Time Ser. Anal. 29, 762–782. MÜLLER, U., SCHICK, A. and WEFELMEYER, W. (2007): Estimating the error distribution function in semiparametric regression. Statistics & Decisions 25, 1–18. MÜLLER, U., SCHICK, A. and WEFELMEYER, W. (2009a): Estimating the innovation distribution in nonparametric autoregression. Probab. Theory Relat. Fields 144, 53–77 MÜLLER, U., SCHICK, A. and WEFELMEYER, W. (2009b): Estimating the error distribution function in nonparametric regression with multivariate covariates. Statist. Probab. Letters 79, 957–964. MÜLLER, U., SCHICK, A. and WEFELMEYER, W. (2012): Estimating the error distribution function in semiparametric additive regression models. J. Statist. Plan. Inference 142, 552–566. NICKL, R. and PÖTSCHER, B.M. (2007). Bracketing metric entropy rates and empirical central limit theorems for function classes of Besov- and Sobolev-type. J. Thoer. Probab. 20, 177–199. ORBE, S., FERREIRA, E., and RODRIGUES-POO, J. (2005). Nonparametric estimation of time varying parameters under shape restrictions. J. Econometrics 126, 53–77. PRIESTLEY, M.B. (1965). Evolutionary spectra and non-stationary processes. J. Royal Statist. Soc. - Ser. B 27, 204–237. RAJAN, J.J. and Rayner, P.J. (1996). Generalized feature extraction for time-varying autoregressive models, IEEE Trans. on Signal Processing 44, 2498–2507. STUTE, W. (2001). Residual analysis for ARCH(p)-time series. Test 1, 393–403. 34 SUBBA RAO, T. (1970). The fitting of nonstationary time-series models with time- dependent parameters, J. Royal Statist. Soc. - Ser. B 32, 312–322. VAN DE GEER, S. (2000): Empirical processes in M-estimation. Cambridge University Press. VAN DER VAART, A.W. and WELLNER, J. (1996). Weak convergence and empirical processes. Springer, New York. VAN KEILEGOM, I., GONZÁLEZ MANTEIGA, W. and SÁNCHEZ SELLERO, C. (2008). Goodness-of-fit tests in parametric regression based on the estimation of the error distribution, Test 17, 401–415. 35