Empirical spectral processes for locally stationary time series Rainer Dahlhaus∗ Institut für Angewandte Mathematik Universität Heidelberg Im Neuenheimer Feld 294 69120 Heidelberg Germany Wolfgang Polonik Department of Statistics University of California Davis, CA 95616-8705 USA January 21, 2007 Abstract A time varying empirical spectral process indexed by classes of functions is defined for locally stationary time series. We derive weak convergence in a function space, prove a maximal exponential inequality and a Glivenko-Cantelli type convergence result. The results use conditions based on metric entropy of the index class. In contrast to related earlier work no Gaussian assumption is made. As applications quasi likelihood estimation, goodness of fit testing and inference under model misspecification are discussed. In an extended application uniform rates of convergence are derived for local Whittle-estimates of the parameter curves of locally stationary time series models. Running title: Empirical spectral processes for time series. Keywords: Empirical spectral process, asymptotic normality, quadratic forms, locally stationary processes, nonstationary time series. AMS 2000 Mathematics Subject Classification: Primary 62M10; secondary 62E20. 0 1 Introduction In recent years several methods have been derived for locally stationary time series models, that is for models which can locally be approximated by stationary processes. Out of the large literature we mention the work of Priestley (1965) on oscillatory processes, Dahlhaus (1997) on locally stationary processes, Neumann and von Sachs (1997) on wavelet estimation of evolutionary spectra, Nason, von Sachs and Kroisandt (2000) on a wavelet-based model of evolutionary spectra, and more recent work such as Davis, Lee and Rodriguez-Yam (2005) on piecewise stationary processes, Fryzlewicz, Sapatinas and Subba Rao (2006) on locally stationary volatility estimation and Sakiyama and Taniguchi (2004)on discriminant analysis for locally stationary processes. In this paper we emphasize the relevance of the empirical spectral process for locally stationary time series. During the last decade the theory of empirical processes has developed considerably and the number of statistical problems approached utilizing concepts from empirical process theory is steeply increasing. In this paper we show how large parts of the existing methodology on empirical processes can fruitfully be used for time series analysis of locally stationary processes. In our setup the role of the empirical distribution of iid data is taken over by the empirical time-varying spectral measure. This generalizes a similar approach for stationary time series, c.f. Dahlhaus (1988), Mikosch and Norvaisa (1997), Fay and Soulier (2001). An overview over these methods and some references to the existing literature on empirical process techniques in other settings may be found in Dahlhaus and Polonik (2002). In Section 2 we introduce the empirical spectral process indexed by classes of functions, derive its convergence including a functional central limit theorem, prove a maximal exponential inequality and a Glivenko-Cantelli type convergence result. These results use conditions based on the metric entropy of the index class. The empirical spectral process plays a key role in many statistical applications. In Section 3 we briefly discuss parametric quasi likelihood estimation, nonparametric quasi likelihood estimation, inference under model misspecification by stationary models an local estimates. An extended application is given in Section 4 where uniform rates of convergence are derived for local Whittle-estimates of the parameter curves of locally stationary time series models. Although our concept is based on empirical process techniques in the frequency domain, there exist many applications in the time domain. Section 3 and Section 4 contain many examples in the time domain - particularly with time varying ARMA - models. The empirical spectral process for locally stationary processes has also briefly been considered in Dahlhaus and Polonik (2006) in the special context of nonparametric estimation. In comparison to that paper we consider here also the case of non-Gaussian processes and use 1 weaker assumptions on the underlying process. We mention that the assumptions on the underlying process are very week allowing for jumps in the parameter curves by assuming bounded variation instead of continuity in time direction. All proofs are delegated without further reference to Section 5. 2 The time varying empirical spectral process In this section we define the empirical spectral process and derive its properties including a functional central limit theorem and a maximal exponential inequality. 2.1 Locally stationary processes Locally stationary processes were introduced in Dahlhaus (1997) by using a time varying spectral representation. In contrast to this we use in this paper a time varying MA(∞)representation and formulate the assumptions in the time domain. As in nonparametric regression we rescale the functions in time to the unit interval in order to achieve a meaningful asymptotic theory. The following definition of locally stationarity is the same as used in Dahlhaus and Polonik (2006). It is more general than for example in Dahlhaus (1997) since the parameter curves are allowed to have jumps. Let V (g) = sup m nX k=1 |g(xk ) − g(xk−1 )| : 0 ≤ xo < . . . < xm ≤ 1, m ∈ N o (1) be the total variation of a function g on [0, 1], and for some κ > 0 let ( 1, |j| ≤ 1 ℓ(j) := 1+κ |j| log |j|, |j| > 1. Definition 2.1 (Locally stationary processes) The sequence of stochastic processes Xt,n (t = 1, . . . , n) is called a locally stationary process if Xt,n has a representation Xt,n = ∞ X at,n (j) εt−j (2) j=−∞ satisfying the following conditions: sup |at,n (j)| ≤ t K ℓ(j) (with K not depending on n), 2 (3) and there exist functions a(·, j) : (0, 1] → R with sup |a(u, j)| ≤ u sup j n X K , ℓ(j) (4) t |at,n (j) − a( , j)| ≤ K, n t=1 V (a(·, j)) ≤ K . ℓ(j) (5) (6) The εt are assumed to be independent and identically distributed with Eεt ≡ 0 and Eε2t ≡ 1. In addition we assume in this paper that all moments of εt exist. We set κ4 := cum4 (εt ). The above assumptions are discussed in Remark 2.12 below. Definition 2.2 (Time varying spectral density and covariance) Let Xt,n be a locally stationary process. The function f (u, λ) := with A(u, λ) := ∞ X 1 |A(u, λ)|2 2π a(u, j) exp(−iλj) j=−∞ is the time varying spectral density, and c(u, k) := Z π f (u, λ) exp(iλk)dλ = −π ∞ X a(u, k + j)a(u, j) (7) j=−∞ is the time varying covariance of lag k at rescaled time u. For a deeper understanding of the time varying covariance see also Proposition 5.4. A simple example of a process Xt,n which fulfills the above assumptions is Xt,n = φ( nt )Yt where Yt = Σj a(j)εt−j is stationary with |a(j)| ≤ K/ℓ(j) and φ is of bounded variation. From the following proposition it follows, that time varying ARMA (tvARMA) models whose coefficient functions are of bounded variation are locally stationary in the above sense. The result is proved in Section 6. Proposition 2.3 (tvARMA) Consider the system of difference equations p X j=0 q X t t−k t βk ( )σ( )εt−k αj ( )Xt−j,n = n n n k=0 3 (8) where εt are iid with Eεt = 0, E|εt | < ∞, α0 (u) ≡ β0 (u) ≡ 1 and αj (u) = αj (0), βk (u) = P βk (0) for u < 0. If all αj (·) and βk (·) as well as σ 2 (·) are of bounded variation and pj=0 αj (u)z j 6= 0 for all u and all 0 < |z| ≤ 1 + δ for some δ > 0 then there exists a solution of the form ∞ X at,n (j) εt−j Xt,n = j=0 which fulfills (3)-(6) of Definition 2.1. The time varying spectral density is given by Pq σ 2 (u) | k=0 βk (u) exp(iλk)|2 P f (u, λ) = . 2π | pj=0 αj (u) exp(iλj)|2 2.2 Convergence of the empirical spectral process The empirical spectral process is defined by √ En (φ) = n Fn (φ) − F (φ) where F (φ) = Z 0 1Z π φ(u, λ) f (u, λ) dλ du (9) (10) −π and n 1X Fn (φ) = n t=1 Z π t t φ( , λ) Jn ( , λ) dλ n n −π (11) X[t+1/2+k/2],n X[t+1/2−k/2],n exp(−iλk). (12) with the pre-periodogram Jn t 1 ,λ = n 2π X k:1≤[t+1/2±k/2]≤n If X[t+1/2+k/2],n X[t+1/2−k/2],n is regarded as a (raw-) estimate of c( nt , k) then Jn ( nt , λ) can be regarded as a (raw-) estimate of f ( nt , λ) - however, in order to become consistent Jn ( nt , λ) needs to be smoothed in time and frequency direction. The pre-periodogram Jn was first defined by Neumann and von Sachs (1997). Many statistics occurring in the analysis of nonstationary time series can be written as a functional of Fn (φ). Several examples are discussed in Section 3 and Section 4. We first prove a central limit theorem for En (φ) under the assumption, that we have bounded 4 variation in both components of φ(u, λ). Besides the definition in (1) we need a definition in two dimensions. Let V 2 (φ) = sup ℓ,m n X |φ(uj , λk ) − φ(uj−1 , λk ) − φ(uj , λk−1 ) + φ(uj−1 , λk−1 )| : j,k=1 For simplicity we set o 0 ≤ uo < . . . < uℓ ≤ 1; −π ≤ λo < . . . < λm ≤ π; ℓ, m ∈ N . kφk∞,V := sup V φ(u, ·) , kφkV,∞ := sup V φ(·, λ) , u kφkV,V := V 2 (φ) λ and kφk∞,∞ := sup |φ(u, λ)|. u,λ Theorem 2.4 Let Xt,n be a locally stationary process and φ1 , . . . , φk functions with kφj k∞,V , kφj kV,∞ , kφj kV,V and kφj k∞,∞ being finite (j = 1, . . . , k). Then D En (φj ) j=1,...,k → E(φj ) j=1,...,k where E(φj ) j=1,...,k is a Gaussian random vector with mean 0 and cov E(φj ),E(φk ) = 2π + κ4 Z 0 Z 1 Z π 0 1Z π −π φj (u, λ) [φk (u, λ) + φk (u, −λ)] f 2 (u, λ) dλ du φj (u, λ1 )f (u, λ1 ) dλ1 Z π −π −π φk (u, λ2 )f (u, λ2 ) dλ2 du. Remark 2.5 (i) We mention that Theorem 5.3 contains a similar statement under a different set of conditions which is obtained as a by-product from our calculations. Furthermore we mention that Theorem 2.4 also holds if a data-taper is used, i.e. if Fn (φ) and F (φ) are defined as in (42) we in addition need and (44) (in that case h Assumption 5.1 and cov E(φj ), E(φk ) must be replaced by cE (φj , φk ) as defined in Theorem 5.3). For simplicity we consider the tapered case only in Section 5. (ii) In contrast to earlier results (cf. Dahlhaus and Neumann, 2001, Lemma 2.1) the assumptions on φ(u, λ) and f (u, λ) are very weak. In particular we allow for noncontinuous behavior. (iii) In the stationary case where φj (u, λ) = φ̃j (λ) and f (u, λ) = f˜(λ) this is the classical central limit theorem for the weighted periodogram (see Example 3.3 below). (iv) The limit behavior for complex valued φj can easily be derived from Theorem 2.4 by considering the real and imaginary part separately. 5 In Theorem 2.10 a functional central limit theorem indexed by function spaces and in Theorem 2.11 a Glivenko-Cantelli type theorem are proved. The central ingredient of their proofs will be an exponential inequality for the empirical spectral process and a maximal inequality derived in the next section. 2.3 A maximal exponential inequality Let ρ2 (φ) := and Z 0 1Z π −π 2 φ(u, λ) dλ du 1/2 Ẽn (φ) := , √ ρ2,n (φ) := n Z 1 X n t=1 1/2 t φ( , λ)2 dλ n −π π n Fn (φ) − EFn (φ) . (13) (14) Theorem 2.6 (Exponential inequality) Let Xt,n be a locally stationary process with E|εt |k ≤ Cεk for all k ∈ N for the εt from Definition 2.1. Then we have for all η > 0 r η (15) P (|Ẽn (φ)| ≥ η) ≤ c1 exp − c2 ρ2,n (φ) with some constants c1 , c2 > 0 independent of n. √ Remark 2.7 (i) In the Gaussian case it is possible to omit the · in (15) or to prove a Bernstein-type inequality which is even stronger (cf. Dahlhaus and Polonik, 2006, Theorem 4.1). P Rπ (ii) To treat the bias EFn (φ) − F (φ) we set F + (φ) := n1 nt=1 −π φ( nt , λ) f ( nt , λ) dλ. Then we have (for a proof see (5.9)) √ n |EFn (φ) − F + (φ)| ≤ K ρ2,n (φ), (16) √ K n |F + (φ) − F (φ)| ≤ √ kφkV,∞ + kφk∞,∞ , (17) n and 4π ρ2,n (φ)2 ≤ ρ2 (φ)2 + kφkV,∞ kφk∞,∞ , (18) n leading for example to the exponential inequality o n η 1/2 (19) P (|En (φ)| ≥ η) ≤ c′1 exp − c′2 1/2 . ρ2 (φ) + √1n kφkV,∞ + √1n kφk∞,∞ An alternative inequality used later is √ K log n n |EFn (φ) − F + (φ)| ≤ K √ kφk∞,V + √ kφk∞,∞ . n n 6 (20) The above exponential inequality is the core of the proof of the following result which then leads to stochastic equicontinuity of the empirical spectral process. Analogously to standard empirical process theory, stochastic equicontinuity is crucial for proving tightness. As for the standard empirical process the results for the function indexed empirical spectral process En (φ), φ ∈ Φ , are derived under conditions on the richness of Φ measured by the metric entropy. For each ǫ > 0, the covering number of Φ with respect to the metric ρ2 is defined by N (ǫ, Φ, ρ2 ) = inf {n ≥ 1 : ∃ φ1 , . . . , φn ∈ Φ such that ∀ φ ∈ Φ ∃ 1 ≤ i ≤ n with ρ2 (φ, φi ) ≤ ǫ}, and the metric entropy of Φ with respect to ρ2 by H(ǫ, Φ, ρ2 ) = log N (ǫ, Φ, ρ2 ). (21) For technical reasons we assume that H(ǫ, Φ, ρ2 ) ≤ H̃Φ (ǫ) with H̃Φ (·) continuous and strictly monotonically decreasing. We mention that in standard empirical process theory often so-called bracketing numbers are used instead. Here, we do not use bracketing covering numbers since the empirical spectral process is not monotone in φ. Let τ∞,V := sup kφk∞,V , τV,∞ := sup kφkV,∞ , φ∈Φ τV,V := sup kφkV,V φ∈Φ φ∈Φ and τ∞,∞ := sup kφk∞,∞ . φ∈Φ Measurability: In order to avoid further technical assumptions the following results assume measurability of all random quantities without further mentioning it. Theorem 2.8 (Maximal inequality) Let Xt,n be a locally stationary process with E|εt |k ≤ Cεk for all k ∈ N for the εt from Definition 2.1. Suppose that Φ is such that τ∞,V , τV,∞ , τV,V , and τ∞,∞ are finite and sup ρ2 (φ) ≤ τ2 < ∞. φ∈Φ Let L = max{K1 , K2 , K} where K1 , K2 > 0 are the constants from Lemma 5.13 and K is the constant from (17) and (20). 7 Then there exists a set Bn (see (74)) with limn→∞ P (Bn ) = 1 such that for all η satisfying η ≥ 26 L max{τ∞,V , τV,∞ , τV,V , τ∞,∞} and 72 η≥ 2 c2 we have Z α 2 H̃Φ (s) ds with 0 P sup |Ẽn (φ)| > η , Bn and φ∈Φ P sup |En (φ)| > η , Bn φ∈Φ where c1 , c2 > 0 are the constants from (15). α := H̃Φ−1 (log n)3 √ n c rη 2 , 4 τ2 n c rη o 2 ≤ 3 c1 exp − 4 τ2 n c2 ≤ 3 c1 exp − 4 r η o . τ2 (22) (23) (24) (25) Remark 2.9 (i) A maximal inequality for { Ẽn (φ), φ ∈ Φ } assuming Gaussian innovations can be found in Dahlhaus and Polonik (2006). The additional Gaussian R α assumption enables a weakening of the crucial assumption (23), essentially replacing 0 H̃Φ2 (s) ds by q Rα η H̃ (s) ds. Furthermore, the resulting exponential inequality is stronger, replacing Φ τ 0 in the exponent of (24) by τη . On the other hand the proof in the present non-Gaussian case is much more complicated. (ii) The restriction to the set Bn has several advantages. First it allows for replacing ρ2,n (φ) by ρ2 (φ) (more precisely by τ2 = supφ∈Φ ρ2 (φ)) which makes the results much simpler. Furthermore, the extra terms due to the bias as in (19) can be avoided. For many results the set Bn means no restriction, in particular if the probability of an event is calculated as for equicontinuity. 2.4 A functional central limit theorem and a GC-type result The two Theorems 2.4 and 2.8 are the main ingredients for deriving the following weak convergence result for the process { En (φ); φ ∈ Φ } in the space ℓ∞ (Φ) of uniformly bounded (real-valued) functions on Φ, i.e. with kgkΦ := supφ∈Φ |g(φ)| we have ℓ∞ (Φ) = { g : Φ → R; kgkΦ < ∞ }. Theorem 2.10 (Functional limit theorem) Let Xt,n be a locally stationary process with E|εt |k ≤ Cεk for all k ∈ N for the εt from Definition 2.1. Furthermore let Φ be such that τ∞,V , τV,∞ , τV,V , τ∞,∞ and supφ∈Φ ρ2 (φ) are finite. If in addition Z 1 H̃Φ (s)2 ds < ∞, (26) 0 8 then we have as n → ∞ that En (·) → E(·) weakly in ℓ∞ (Φ) where { E(φ), φ ∈ Φ } denotes a tight, mean zero Gaussian process with covariance structure as given in Theorem 2.4. Weak convergence in the above theorem means that E∗ α(En ) → Eα(E) as n → ∞ for every bounded, continuous real-valued function α on ℓ∞ (Φ) equipped with the supremum norm where E∗ denotes outer expectation. This Hoffman-Jørgensen type formulation of weak convergence avoids measurability considerations for the process {En (φ), φ ∈ Φ }. Measurability of En might become problematic in particular if Φ is not separable. Nevertheless, this notion of weak convergence admits applications of useful probabilistic tools such as continuous mapping theorems. For more details we refer to van der Vaart and Wellner (1996). Finally we mention our conjecture that it should be possible to prove another version of the above central limit theorem under much weaker moment assumptions on the εt if the class Φ is smaller (by avoiding the use of the maximal inequality for proving equicontinuity). As another application of the maximal inequality we now prove a Glivenko-Cantelli type theorem for the empirical spectral process. Here we allow for a class Φ = Φn which may be (n) increasing with n. We set τ∞,V := supφ∈Φn kφk∞,V etc. Theorem 2.11 Let Xt,n be a locally stationary process with E|εt |k ≤ Cεk for all k ∈ N for (n) (n) (n) (n) the εt from Definition 2.1. Suppose that Φn is such that τ∞,V , τV,∞ , τV,V , and τ∞,∞ are of √ (n) order o( logn3 n ), supφ∈Φn ρ2 (φ) ≤ τ2 = o( n) and Z Then 1 √ H̃Φn (s)2 ds = o( n) . 0 1 P sup |Fn (φ) − F (φ)| = sup | √ En (φ)| → 0 . n φ∈Φn φ∈Φn Remark 2.12 (Discussion of Definition 2.1) (i) The rather complicated construction with different coefficients at,n (j) and a( nt , j) is necessary since we need on the one hand a certain smoothness in time direction (guaranteed by bounded variation of the functions a(u, j)) and a class which is rich enough to cover interesting examples. For instance Proposition 2.3 implies that the process 9 Xt,n = φ( nt )Xt−1,n + εt has a representation of the form (1). However, the proof of Proposition 2.3 reveals that this Xt,n does not have a representation of the form Xt,n = ∞ X j=−∞ t a( , j)εt−j . n (27) (ii) The conditions may give the impression that only homoscedastic innovations εt are allowed. This is not true since a time varying scaling factor of the εt may be included in the at,n (j). An example are the tvARMA-models from Proposition 2.3. (iii) The time varying MA(∞)-representation (2) can easily be transformed into a time varying spectral representation as used e.g. in Dahlhaus (1997). If the εt are assumed to be stationary then there exists a Cramér representation Z π 1 exp(iλt) dξ(λ) εt = √ 2π −π where ξ(λ) is a process with mean 0 and orthonormal increments (cf. Brillinger, 1981). Let ∞ X at,n (j) exp(−iλj). (28) At,n (λ) := j=−∞ Then Xt,n 1 =√ 2π Z π exp(iλt)At,n (λ)dξ(λ). (29) −π If (5) is replaced by the stronger condition K t sup |at,n (j) − a( , j)| ≤ n n ℓ(j) t then it follows t sup |At,n (λ) − A( , λ)| ≤ Kn−1 n t,λ (30) which was assumed in Dahlhaus (1997). Conversely, if we start with (29) and (30) then the conditions of Definition 2.1 can be derived from adequate smoothness conditions on A(u, λ). 3 Applications In this section we give several examples for the statistic Fn (φ). In all cases the results from Section 2 can be applied. As a nontrivial application the uniform convergence of local Whittle estimates is proved in the next section. 10 Example 3.1 (Parametric quasi likelihood estimation) In Dahlhaus (2000) it has been shown that n Ln (θ) := 1 1X 4π n t=1 Z π −π n Jn ( nt , λ) o t log 4π 2 fθ ( , λ) + dλ n fθ ( nt , λ) (31) is an approximation to − log Gaussian likelihood of a locally stationary process. The above likelihood is a generalization of the Whittle-likelihood (Whittle, 1953) to locally stationary processes. An example for a locally stationary process with finite dimensional parameter θ is the tvARMA process from Proposition 2.3 with coefficient functions being polynomials in time. Proving the asymptotic properties of θbn := argmin Ln (θ) θ∈Θ is simplified a lot by using the above properties of the empirical spectral process. We give a brief sketch. Let Z 1Z π n f (u, λ) o 1 dλ du (32) log 4π 2 fθ (u, λ) + L(θ) := 4π 0 −π fθ (u, λ) be (up to a constant) the asymptotic Kullback-Leibler divergence between the true process and the fitted model (cf. Dahlhaus, 1996a, Theorem 3.4 ff) and θ0 := argmin L(θ) θ∈Θ the best approximating parameter from Θ (this is the true parameter if the model is correctly specified). We have 1 1 Ln (θ) − L(θ) = √ En ( fθ−1 ) + Rlog (fθ ) n 4π with Rlog (fθ ) := 1 4π Z π −π n h1 X t log fθ ( , λ) − n t=1 n Z 0 1 i log fθ (u, λ) du dλ. (33) Thus, ignoring the Rlog -term, uniform convergence follows from the Glivenko-Cantelli type Theorem 2.11. The Rlog -term can be treated as in Dahlhaus and Polonik (2006), Lemma A.2. If Θ is compact and the minimum θ0 is unique this implies consistency of θbn . Furthermore, √ 1 n ∇Ln (θ0 ) = En ( ∇fθ−1 ) 4π 11 and n 1 1X 1 1 ∇ Ln (θ) = √ En ( ∇2 fθ−1 ) + n 4π n t=1 4π 2 Z ′ t t ∇ log fθ ( , λ) ∇ log fθ ( , λ) dλ. n n −π π The first term of ∇2 Ln (θ) converges uniformly to 0 while the second term converges for θ → θ0 to the Fisher information matrix. The usual Taylor expansion then gives a cen√ tral limit theorem for n(θbn − θ0 ). For details about the result and examples we refer to Dahlhaus (2000), Theorem 3.1, whose proof is greatly simplified by using the above arguments. Furthermore, the present assumptions are weaker. Due to (36) below the result also covers the misspecified stationary case where the stationary Whittle-likelihood is used with a stationary model but the true process is only locally stationary. Another application of the empirical spectral process is model selection for Whittle estimates. In Van Bellegem and Dahlhaus (2006) a model selection criterion for semiparametric model selection has been derived. Furthermore, an upper bound for the risk has been proved by using the exponential inequality for the empirical spectral process. Example 3.2 (Nonparametric quasi likelihood estimation) In Dahlhaus and Polonik (2006) we have considered the corresponding nonparametric estimator fbn = argmin Ln (g) g∈F with n Ln (g) = 1X 1 n 4π t=1 Z Jn ( nt , λ) t {log g( , λ) + } dλ. n g( nt , λ) −π π (34) where the contrast functional now is minimized over an “infinite dimensional” target class F of spectral densities whose complexity is characterized by metric entropy conditions. The optimal rate of convergence has been derived for sieve estimates ı́n the Gaussian case by using a ’peeling device’ and ’chaining’ together with an exponential inequality similar to the one in Theorem 2.6 (the exponential inequality is stronger due to the additional Gaussian assumption). It is an open problem whether the optimal rates of convergence can also be achieved for full nonparametric maximum likelihood estimates or (as in the present paper) without the assumption of Gaussianity. Example 3.3 (Stationary processes/Model misspecification by stationary models) We start by showing how several classical results for the stationary case can be obtained from the results above. Let φ(u, λ) = φ̃(λ) be time invariant. Then Z π n t 1X Jn ( , λ) dλ. (35) Fn (φ) = φ̃(λ) n t=1 n −π 12 However we have n n t=1 t=1 X 1X t 1X 1 Jn ( , λ) = n n n 2π 1 = 2π = X[t+0.5+k/2],n X[t+0.5−k/2],n exp(−iλk) k 1≤[t+0.5+k/2],[t+0.5−k/2]≤n n−1 X k=−(n−1) n X 1 | 2πn n−|k| 1 X Xt Xt+|k| n t=1 ! exp(−iλk) Xs exp(−iλs)|2 = In (λ) (36) s=1 where In (λ) is the classical periodogram. Therefore Fn (φ) is the classical spectral mean in the stationary case with the following applications: (i) φ(u, λ) = φ̃(λ) = I[0,µ] (λ) gives the empirical spectral measure; 1 (ii) φ(u, λ) = φ̃(λ) = 4π ∇fθ−1 (λ) is the score function of the Whittle-likelihood (similar to Example 3.1 above); (iii) φ(u, λ) = φ̃(λ) = cos λk is the empirical covariance estimator of lag k. Theorem 2.4 gives in all cases the asymptotic distribution - both in the stationary case and in the misspecified case where the true underlying process is only locally stationary. In case (i) Theorem 2.10 leads to a functional central theorem on C[0, π] with the supremumnorm. If φ̃(λ) is a kernel we obtain a kernel estimate of the spectral density (see the remark below). Example 3.4 (Local estimates) There is an even larger variety of local estimates - some of them are listed below. The asymptotic distribution of these estimates is not covered by Theorem 2.4 since the function φ(u, λ) depends on n in this case. However, in all cases the uniform rate of convergence of these estimators may be derived by using the maximal inequality in Theorem 2.8. A detailed example is given in the next section where the uniform rate of convergence of local Whittle estimates is derived ((iii) below). For a short overview let kn (x) = x 1 bn K( bn ) be some kernel with bandwidth bn . Then (i) φ(u, λ) = kn (u − u0 ) kn (λ − λ0 ) gives an estimator of the time varying spectral density f (u0 , λ0 ); (ii) φ(u, λ) = kn (u − u0 ) cos λk gives a local estimator of the covariance function c(u0 , k); 1 ∇fθ−1 (λ) is the score function of the local Whittle-estimator of (iii) φ(u, λ) = kn (u − u0 ) 4π the parameter curve θ(u0 ). 13 4 Uniform convergence of local Whittle estimates We now study kernel estimates for parameter curves of locally stationary processes and derive uniform consistency from the Glivenko-Cantelli type Theorem 2.11 and a uniform rate of convergence from the maximal inequality in Theorem 2.8. We investigate locally stationary processes where the time varying spectral density is of the form f (u, λ) = fθ(u) (λ) with θ(u) ∈ Θ ⊆ Rd for all u ∈ [0, 1]. An example is the tvARMA-process from Proposition 2.3. Let θbn (u) := argmin Ln (u, θ) θ∈Θ with n Ln (u, θ) := u − t/n 1 1X 1 K 4π n bn bn t=1 Z π −π n log 4π 2 fθ (λ) + Jn ( nt , λ) o dλ. fθ (λ) (37) We assume that the kernel K has compact support on [− 21 , 12 ] and is of bounded variation R 1/2 R 1/2 with −1/2 x K(x) dx = 0 and −1/2 K(x) dx = 1. Furthermore let bn → 0 and nbn → ∞ as n → ∞. In case of a tvAR(p)-process θb n (u) is the solution of the local Yule-Walker equations: Let u−t/n 1 P 1 b cn (u, k) := n t bn K bn X[t+1/2+k/2],n X[t+1/2−k/2],n (cf. Proposition 5.4), Cn (u) = (b cn (u, 1), . . . , b cn (u, p))′ and Σn (u) = {b cn (u, i−j)}i,j=1,...,p . If θbn (u) = (b α1 (u), . . . , α bp (u), σ b2 (u))′ then it is not difficult to show that and (b α1 (u), . . . , α bp (u))′ = −Σn (u)−1 Cn (u) 2 σ b (u) = b cn (u, 0) + p X k=1 α bk (u) b cn (u, k). We now derive a uniform rate of convergence for θbn (u). Theorem 4.1 Let Xt,n be a locally stationary process with E|εt |k ≤ Cεk for all k ∈ N for the εt from Definition 2.1 and time varying spectral density f (u, λ) = fθ0 (u) (λ). Suppose (i) θ is identifiable from fθ (i.e. fθ (λ) = fθ′ (λ) for all λ implies θ = θ ′ ) and θ0 (u) lies in the interior of the compact parameter space Θ ⊆ Rd for all u; (ii) θ0 (u) is twice differentiable with Lipschitz continuous second derivative; (iii) fθ (λ) is twice differentiable in θ with Lipschitz continuous second derivative; (iv) fθ−1 (λ) and the components of ∇fθ−1 (λ), ∇2 fθ−1 (λ) are uniformly bounded in λ and θ; ′ Rπ 1 ∇ log fθ (λ) dλ is bounded from (v) the minimal eigenvalue of I(θ) := 4π −π ∇ log fθ (λ) 14 below uniformly in θ. Then we have for bn n >> (log n)6 1 b θn (u) − θ0 (u)2 = Op √ + b2n , bn n u ∈ [bn /2 , 1−bn /2] sup that is for bn ∼ n−1/5 we obtain the uniform rate Op (n−2/5 ). REMARKS. (i) Instead of assumptions (ii) and (iii) we only need second order differentiability with Lipschitz continuous second order derivative of fθ0 (u) (λ) in u. This enters the following proof only in the estimation of the second summand of (41). (ii) We conjecture that a similar results holds in case of model misspecification where the model spectral density fθ0 (u) (λ) is only an approximation to the true spectral density f (u, λ). (iii) For tvAR(p)-processes it follows from Moulines, Priouret and Roueff (2005) that this is the optimal rate of convergence. Proof. We note at the beginning that the difficult parts of the following proof are handled by using the empirical spectral process and applying Theorems 2.8 and 2.11. We start by proving consistency. We have with Z πn f (u, λ) o 1 dλ log 4π 2 fθ (λ) + L(u, θ) := 4π −π fθ (λ) 1 1 u − · 1 −1 Ln (u, θ) − L(u, θ) = √ En K( ) f n bn bn 4π θ Z π Z 1 u − v f (v, λ) − f (u, λ) 1 1 + dv dλ K 4π −π 0 bn bn fθ (λ) Z π n u − t/n 1 X 1 1 −1 . K log 4π 2 fθ (λ) dλ + 4π −π n t=1 bn bn 1 −1 ) 4π fθ ( · ) u ∈ [bn /2, 1−bn /2], θ ∈ Θ . (d+4)/2 d+2 R1 It is straightforward to show that N (ε, Φn , ρ2 ) = K bn ε i.e 0 H̃Φn (s)2 ds = O((log bn )2 ). We now apply Theorem 2.11 with Φn = (n) (n) (n) 1 u−· bn K( bn (n) (n) Furthermore τ∞,V , τV,∞ , τV,V , and τ∞,∞ are of order O(b−1 n ) and τ2 −1/2 O bn . Thus for bn n ≥ log4 n Theorem 2.11 implies sup P sup Ln (u, θ) − L(u, θ) → 0 . u ∈ [bn /2 , 1−bn /2] θ∈Θ 15 = supφ∈Φn ρ2 (φ) = (38) The identifiability condition implies that the minimum θ0 (u) of L(u, θ) is unique for all u. By using standard arguments we therefore can conclude that P b (39) sup θn (u) − θ0 (u)2 → 0 . u ∈ [bn /2 , 1−bn /2] We now derive the rate of convergence by using the maximal inequality. We have for each u (40) Zn (u) := ∇Ln u, θbn (u) − ∇Ln u, θ0 (u) = ∇2 Ln u, θ̄n (u) θbn (u) − θ0 (u) with θ̄n (u) − θ0 (u) ≤ b θn (u) − θ0 (u). The main term on the left hand side is ∂ 1 Ln u, θ0 (u) = √ En (φn ) (41) ∂θj n Z πZ 1 u − v ∂ −1 1 + fθ0 (v) (λ) − fθ0 (u) (λ) K f (λ)|θ=θ0 (u) dv dλ bn bn ∂θj θ −π 0 −1 u−v 1 ∂ 1 bn Kt ( bn ) 4π ∂θj fθ (λ)|θ=θ0 (u) . We apply the maximal inequality of 1 −1 1 ∂ u ∈ [bn /2, 1−bn /2], θ ∈ Θ . Again Theorem 2.8 to the class Φn = bn K( u−· ) f ( · ) θ bn 4π ∂θj (d+4)/2 R1 we can show N (ε, Φn , ρ2 ) = K bn εd+2 i.e 0 H̃Φn (s)2 ds = O((log bn )2 ). Further(n) (n) (n) (n) (n) −1/2 more τ∞,V , τV,∞ , τV,V , and τ∞,∞ are of order O(b−1 = supφ∈Φn ρ2 (φ) ∼ bn . n ) and τ2 We now apply Theorem 2.8 with η = τ2 δ for arbitrary δ. If bn n >> log6 n the conditions with φn (v, λ) := (22) and (23) are fulfilled and we obtain n c √ o 2 P sup |En (φ)| > τ2 δ , Bn ≤ 3 c1 exp − δ , 4 φ∈Φn i.e. supφ∈Φn | √1n En (φ)| = Op ( √b1 n ). Since fθ (λ) and θ0 (u) are twice differentiable (in θ n and u respectively) with Lipschitz continuous second derivative the second summand of (41) can be uniformly bounded with a Taylor expansion by Op (b2n ), that is we obtain sup u ∈ [bn /2 , 1−bn /2] ∇Ln u, θ0 (u) = Op ( √ 1 + b2 ). n 2 bn n θn (u) − θ(u)2 ≥ κ for some κ > 0 and If θbn (u) lies on the boundary of Θ for some u then b P ∇Ln u, θbn (u) > δ √ 1 2 bn n u ∈ [bn /2 , 1−bn /2] b ≤P sup θn (u) − θ0 (u)2 ≥ κ → 0, sup u ∈ [bn /2 , 1−bn /2] 16 i.e. supu ∈ [bn /2 , 1−bn /2] kZn (u)k2 = Op ( √b1 n ). In order to obtain the assertion of the theon rem from (40) we now prove that the minimal eigenvalue of the matrix ∇2 Ln u, θ̄n (u) is uniformly bounded from below in probability. We have 1 1 u − · 1 2 −1 ∇2 Ln (u, θ) = √ En ∇ fθ K n bn bn 4π Z π n ′ u − t/n 1 1X 1 ∇ log fθ (λ) ∇ log fθ (λ) dλ. K( ) + n t=1 bn bn 4π −π Since fθ is twice differentiable in θ with Lipschitz continuous second derivative we obtain exactly as above from Theorem 2.11 for bn n ≥ log4 n and i, j = 1, . . . , d sup u ∈ [bn /2 , 1−bn /2] Therefore also 1 1 u − · 1 ∂2 P K fθ−1 → 0 . sup √ En n bn bn 4π ∂θi ∂θj θ∈Θ 1 1 u − · 1 P K ∇2 fθ−1 |θ=θ̄n (u) → 0. √ En n b b 4π spec n n u ∈ [bn /2 , 1−bn /2] sup Rπ ′ 1 Since the minimal eigenvalue of I(θ) := 4π −π ∇ log fθ (λ) ∇ log fθ (λ) dλ is bounded from below by λmin (I) > 0 uniformly in θ this implies P Since sup u ∈ [bn /2 , 1−bn /2] 2 ∇ Ln u, θ̄n (u) −1 ≤ spec 2 λmin (I) → 1. −1 Zn (u) b θn (u) − θ0 (u)2 ≤ ∇2 Ln u, θ̄n (u) 2 spec this implies the result. 5 Proofs: CLT and exponential inequality In this section we provide the proofs for the results of Section 2. In particular we derive the asymptotic behavior of the moments of the empirical process. (h ) First we extend the definitions of Section 2 to tapered data Xt,nn = hn nt · Xt,n where hn : (0, 1] → [0, ∞) is a data taper (with hn (u) = I(0,1] (u) being the nontapered case of Section 2). This is done for two reasons: (i) The main reason is that all proofs are greatly simplified since the data taper now automatically takes care of the range of summation (hn (t/n) is zero for all t outside the 17 observation domain). The consideration of arbitrary tapers hn instead of the ’no-taper’ I(0,1] does not introduce any extra technical complexity at all. (ii) The use of a data-taper in the periodogram is standard for stationary time series. It leads to a better small sample performance in the presence of strong peaks in the spectrum. It may turn out that this also holds in the present situation (which requires further investigation). Furthermore, missing data can be modelled with adequate tapers. √ As before the empirical spectral process is defined by En (φ) = n Fn (φ) − F (φ) where Fn (φ) now is Z n X h2n ( nt ) π t t Fn (φ) = φ( , λ)Jn(hn ) ( , λ) dλ (42) H2,n −π n n t=1 with the tapered pre-periodogram 1 t t (hn ) ,λ = hn ( )−2 Jn n 2π n X (h ) (h ) n n exp(−iλk) X[t+1/2−k/2],n X[t+1/2+k/2],n (43) k:1≤[t+1/2±k/2]≤n and H2,n := Pn t 2 t=1 hn ( n ) . F (φ) is its theoretical counterpart Z Z 1 2 h (u) π φ(u, λ)f (u, λ) dλ du. F (φ) = 2 0 khk2 −π (44) with h defined below. For stationary time series it is often assumed that the percentage of tapered data-points tends to zero as n → ∞. For this reason we allow for dependence of the taper on n. Assumption 5.1 The data taper hn : (0, 1] → [0, ∞) fulfills supn V (hn ) ≤ C and supu,n hn (u) ≤ C for some C < ∞. There exists a function h with khn − hk1 = O(n−1+κ ) with some κ ∈ [0, 1/2). Note that κ = 0 if hn ≡ h (in particular in the nontapered case). We need the following notation. With φ̂(u, j) := Z π φ(u, λ) exp(iλj) dλ we define ρ∞ (φ) := ṽ(φ) := sup V (φ̂(·, j)) (45) −π ∞ X j=−∞ and j sup |φ̂(u, j)|, vΣ (φ) := ∞ X j=−∞ 18 (46) u V (φ̂(·, j)). (47) We mention that 1 sup |φ(u, λ)| ≤ ρ∞ (φ), 2π u, λ ṽ(φ) ≤ vΣ (φ), ṽ(φ) ≤ Z V (φ(·, λ)) dλ, ρ2 (φ) ≤ √ 1 ρ∞ (φ). 2π The idea now is to prove the CLT in Theorem 2.4 by the convergence of all cumulants. The convergence of the cumulants is derived below under the assumptions ρ∞ (φ) < ∞ and vΣ (φ) < ∞. Unfortunately these assumptions are not fulfilled for functions of bounded variation as assumed in Theorem 2.4. Therefore its proof also uses certain approximation arguments; cf. Section 5.8. The following CLT is a by-product which follows immediately from the cumulant calculations below. It is of independent interest since the result does not follow from Theorem 2.4 (the condition ρ∞ (φ) < ∞ does not imply bounded variation in λ-direction; furthermore the conditions may be easier to check in some situations). Therefore the proof of Theorem 2.4 also uses certain approximation arguments; cf. section 5.8. Assumption 5.2 Suppose φ : [0, 1] × [−π, π] → R fulfills ρ∞ (φ) < ∞ and ṽ(φ) < ∞. If the process Xt,n is non-Gaussian we assume in addition that vΣ (φ) < ∞. Theorem 5.3 Let Xt,n be a locally stationary process. Suppose that Assumptions 5.1 and 5.2 hold for φ1 , . . . , φk . Then D En (φj ) j=1,...,k → E(φj ) j=1,...,k where E(φj ) j=1,...,k is a Gaussian random vector with mean 0 and cov(E(φj ), E(φk )) = chE (φj , φk ) with Z Z 1 4 h (u) π h φj (u, λ) [φk (u, λ) + φk (u, −λ)] f 2 (u, λ) dλ du cE (φj ,φk ) = 2π 4 0 khk2 −π Z Z 1 4 Z π h (u) π du. φ (u, λ )f (u, λ )dλ φ (u, λ )f (u, λ )dλ + κ4 2 2 2 j 1 1 1 k 4 −π −π 0 khk2 Proof. The result follows from the convergence of all cumulants which is proved in Lemma 5.5(ii), Lemma 5.6(ii) and Lemma 5.7(iii). For the following proofs we first need a result on the behaviour (decay) of the covariances of the process. The case hn (u) = I(0,1] (u) gives the results for the ordinary covariances. Proposition 5.4 Let Xt,n be a locally stationary process. Suppose that Assumption 5.1 holds. Then we have for all k with some K independent of k and n (h ) (h ) n )| ≤ sup |cov(Xt,nn , Xt+k,n t 19 K , ℓ(k) (48) sup |c(u, k)| ≤ u n X t=1 K , ℓ(k) (49) t t min{|k1 |, n} (h ) (h ) |cov(Xt+kn 1 ,n Xt−kn 2 ,n ) − hn ( )2 c( , k1 + k2 )| ≤ K 1 + , n n ℓ(k1 + k2 ) V (c(·, k)) ≤ K . ℓ(k) (50) (51) Proof. (49) follows from (7), (4) and the relation ∞ X j=−∞ 1 1 K ≤ ℓ(k + j) ℓ(j) ℓ(k) (52) which is easily established. Furthermore, we have with k = k1 + k2 (h ) (h ) cov(Xt+kn 1 ,n , Xt−kn 2 ,n ) = hn t + k 1 n hn ∞ t − k X 2 n at+k1 (j + k)at−k2 (j). j=−∞ For k1 = k and k2 = 0 this gives (48) by using (3). Replacing t + k1 and t − k2 successively by t gives (50). (51) is obtained similarly. A trick which greatly simplifies the following proofs is to set at,n (j) = 0 for t 6∈ {1, . . . , n} and j ∈ Z, a(u, j) = 0 for u 6∈ (0, 1] and j ∈ Z, φ(u, λ) = 0 for u 6∈ (0, 1] and λ ∈ [−π, π] and hn (u) = 0 for u 6∈ (0, 1]. With this convention (3) - (6), (48) - (51) remain to hold for u ∈ R, t ∈ Z, V (f ) now denoting the total variation over R, and t in (5) and (50) ranging from −∞ to ∞. Furthermore the summation range of k in (12) and t in (11) can be extended from −∞ to ∞. Therefore, all summation ranges are from −∞ to ∞ in the following proofs unless otherwise indicated. We also set ã(j) = supu |a(u, j)|, φ̃(j) = max{supu |φ̂(u, j)|, supu |φ̂(u, −j)|} and c̃(j) = supu |c(u, j)|. Lemma 5.5 (i) Let Xt,n be a locally stationary process. Suppose Assumption 5.1 holds and φ : [0, 1] × [−π, π] → R is a function possibly depending on n. Then we have with K > 0 |EFn (φ) − F (φ)| ≤ Kn−1+κ ρ∞ (φ) + ṽ(φ) . (ii) If in addition φ is independent of n and fulfills Assumption 5.2 then EEn (φ) = O(n−1/2+κ ). 20 Proof. (i) We have n X X 1 t (hn ) (hn ) Fn (φ) = . X[t+1/2−k/2],n φ̂( , −k)X[t+1/2+k/2],n 2πH2,n t=1 n (53) k Since H2,n = nkhk22 + O(nκ ) we therefore obtain from Proposition 5.4 EFn (φ) = X 1 t (hn ) (hn ) φ̂( , −k) cov(X[t+1/2+k/2],n ) , X[t+1/2−k/2],n 2πH2,n n (54) t, |k|≤n X t t t 1 h2n ( ) φ̂( , −k) c( , k) + R = 2πH2,n n n n t,k with |R| ≤ X min(|k|, n) 1 K K X φ̃(k) [1 + φ̃(k) ]+K ≤ ρ∞ (φ). n ℓ(k) ℓ(k) n |k|≤n (55) |k|>n Furthermore, | X 1 t t t h2n ( ) φ̂( , −k) c( , k) − F (φ)| 2πH2,n n n n t,k 1 X Z 1/n h2 ( t ) t t n n φ̂( , −k) c( , k) ≤ 1 2π n n n H2,n t,k 0 (56) h2 ( t−1 t−1 t−1 n + x) + x, −k) c( + x, k) dx φ̂( − 2 n n khk2 K X ≤ V (φ̂(·, −k)) c̃(k) + φ̃(k) V (c(·, k)) + nκ φ̃(k)c̃(k) 2πn k leading to the result. (ii) follows immediately. Lemma 5.6 (i) Let Xt,n be a locally stationary process. Suppose Assumption 5.1 holds and φ1 , φ2 : [0, 1] × [−π, π] → R are functions possibly depending on n. Then we have cov En (φ1 ), En (φ2 ) = chE (φ1 , φ2 ) + Rn 21 with i Xh min{|k1 |, n} K X ˜ φ1 (k1 )φ̃2 (k2 ) 1 + |Rn | ≤ + Kn−1+κ φ̃1 (k1 )φ̃2 (k2 ) n ℓ(k1 + k2 ) k1 ,k2 k1 ,k2 i min{|k | + |k | + |k |, n} K X h 1 2 3 φ̃1 (k1 )V (φ̂2 (·, k2 )) + V (φ̂1 (·, k1 ))φ̃2 (k2 ) + n ℓ(k3 )ℓ(k1 + k2 + k3 ) k1 ,k2 ,k3 i 1 1 K Xh + + φ̃1 (k1 )V (φ̂2 (·, k2 )) + V (φ̂1 (·, k1 ))φ̃2 (k2 ) . n ℓ(k1 ) ℓ(k2 ) k1 ,k2 where the last term can be omitted if Xt,n is Gaussian. (ii) If φ1 and φ2 are independent of n and fulfill Assumption 5.2 then Rn = o(1). Proof. (i) We have with (53) cov En (φ1 ), En (φ2 ) = n cov Fn (φ1 ), Fn (φ2 ) X t1 n t2 φ̂1 ( , −k1 ) φ̂2 ( , −k2 )× = (57) 2 n n (2π)2 H2,n t1 ,t2 ,k1 ,k2 h (h ) (h ) (h ) (h ) × cov(X[t1n+1/2+k1 /2],n , X[t2n+1/2+k2 /2],n )cov(X[t1n+1/2−k1 /2],n , X[t2n+1/2−k2 /2],n ) (h ) (h ) (h ) (h ) + cov(X[t1n+1/2+k1 /2],n X[t2n+1/2−k2 /2],n )cov(X[t1n+1/2−k1 /2],n X[t2n+1/2+k2 /2],n ) i (h ) (h ) (h ) (h ) + cum(X[t1n+1/2+k1 /2],n , X[t1n+1/2−k1 /2],n , X[t2n+1/2+k2 /2],n , X[t2n+1/2−k2 /2],n ) . Let k3 := t1 − t2 + [k1 /2 + 1/2] − [k2 /2 + 1/2]. By using Proposition 5.4 we replace the first summand in [. . .] by hn ( tn1 )4 c( tn1 , k3 )c( tn1 , k3 + k2 − k1 ). The remainder can be bounded by n K X 1 min{|k1 |, n} o φ̃1 (k1 )φ̃2 (k2 ) 1 + n ℓ(k3 ) ℓ(k3 + k2 − k1 ) k1 ,k2 ,k3 min{|k1 |, n} o 1 n 1+ . + φ̃1 (k1 )φ̃2 (k2 ) ℓ(k3 ) ℓ(k3 + k2 − k1 ) (52) implies that this is bounded as asserted. Therefore the first term is equal to X n t1 t1 + ko t1 , −k2 ) hn ( )4 φ̂1 ( , −k1 ) φˆ2 ( 2 2 n n n (2π) H2,n t1 ,k1 ,k2 ,k3 t1 t1 × c( , k3 ) c( , k3 + k2 − k1 ) + Rn n n (58) o ˆ t1 where ko = −k3 + [k1 /2 + 1/2] − [k2 /2 + 1/2]. Replacing φ̂2 ( t1 +k n , −k2 ) by φ2 ( n , −k2 ) yields the error term X t + ko t K X φ̃(k1 )c̃(k3 )c̃(k3 + k2 − k1 ) |φ̂2 ( , −k2 ) − φˆ2 ( , −k2 )| n n n t k1 ,k2 ,k3 22 which also is bounded as asserted. As in (56) we now replace the n1 Σt1 − sum in (58) (with ko = 0) by the integral over [0,1] with the same replacement error. Furthermore, we replace hn by h. Direct calculation (or repeated application of Parseval’s equality) yields Z h4 (u) X φ̂1 (u, −k1 ) φ̂2 (u, −k2 ) c(u, k3 ) c(u, k3 + k2 − k1 ) du 4 0 khk2 k ,k ,k 1 2 3 Z 1 4 Z h (u) π = 2π φ1 (u, λ)φ2 (u, −λ)f (u, λ)2 dλdu. 4 khk 0 2 −π 1 (2π)2 1 The second term is with the representation P∞term in (57) is treated in the same way. The+third + Xt,n = j=−∞ at,n (t − j) εj and the abbreviations tν = tν (tν , kν ) = [tν + 1/2 + kν /2], − (t , k ) = [t + 1/2 − k /2] equal to = t t− ν ν ν ν ν ν X n κ4 2 (2π)2 H2,n × X i φ̂1 ( t1 t2 , −k1 ) φ̂2 ( , −k2 ) n n t1 ,t2 , k1 , k2 t− t+ t− t+ hn ( 1 )hn ( 1 )hn ( 2 )hn ( 2 ) at+ ,n (t+ 1 n n n n 1 + − − i) at− ,n (t− 1 − i) at+ ,n (t2 − i) at− ,n (t2 − i). 1 2 2 By using Definition 2.1 and (52) we now replace this by n κ4 2 (2π)2 H2,n X φ̂1 ( t1 ,t2 , k1 , k2 × X hn ( i t2 t1 , −k1 ) φ̂2 ( , −k2 ) n n (59) t2 t1 t1 − t2 + t2 − t1 2 ) hn ( )2 a( , t+ 1 − i) a( , t1 − i) a( , t2 − i) a( , t2 − i) n n n n n n P with replacement error Kn−1 k1 ,k2 φ˜1 (k1 )φ̃2 (k2 ). We now replace the term a( tn2 , t− 2 − i) in t1 − the above expression by a( n , t2 − i) leading with the substitutions d = t2 − t1 and j = i − t1 to a replacement error of K X ˜ φ1 (k1 )φ̃2 (k2 ) n k1 , k2 × X 1 k1 2 ]− 1 k1 2 ]− [ 12 1 23 1 + k2 ℓ + j ℓ − j ℓ [d + 2 ]−j X t1 + d t1 1 k2 1 k2 , [d + − ] − j − a , [d + − ] − j . × a n 2 2 n 2 2 t d,j [ 12 1 2 The last sum is bounded by |d| V (a(·, [d + 1 2 − k2 2 ]− j)) leading to the upper bound K X ˜ φ1 (k1 )φ̃2 (k2 ) n k1 , k2 X [d + 1 + k2 ] − j + [d + 1 − k2 ] − j + [ 1 + k1 ] − j + [ 1 − k1 ] − j 2 2 2 2 2 2 2 2 × ℓ [ 21 + k21 ] − j ℓ [ 21 − k21 ] − j ℓ [d + 12 + k22 ] − j ℓ [d + 21 − k22 ] − j d,j K X ˜ ≤ φ1 (k1 )φ̃2 (k2 ) n k1 , k2 for the replacement error. In the same way we replace (59) by X n κ4 t1 t1 φ̂1 ( , −k1 ) φ̂2 ( , −k2 ) 2 2 n n (2π) H2,n t1 , t2 , k1 , k2 X t1 t1 − t1 + t1 − t1 × hn ( )4 a( , t+ 1 − i) a( , t1 − i) a( , t2 − i) a( , t2 − i) n n n n n i X X t1 t1 t1 t1 n κ4 t1 φ̂1 ( , −k1 ) φ̂2 ( , −k2 ) c( , k1 ) c( , k2 ) = hn ( )4 2 2 n n n n n (2π) H2,n t 1 (60) (61) k1 , k2 P with replacement error Kn−1 k1 ,k2 φ˜1 (k1 ) φ̃2 (k2 ) + V (φ̂2 (·, k2 )) ℓ(k11 ) + ℓ(k12 ) . As in (56) we now replace the n1 Σt1 − sum by the integral over [0,1] and hn by h. Application of Parseval’s equality gives the final form of the fourth order cumulant term. √ √ (ii) Considering the cases |k| ≤ n and |k| > n separately shows that 1X 1 1X min{|k|, n}φ̃i (k) = o(1) and min{|k|, n} = o(1). (62) n n ℓ(k) k k This implies that the first term of Rn tends to zero. Since min{|k1 | + |k2 | + |k3 |, n} ≤ 2 min{|k1 + k2 + k3 |, n} + 2 min{|k1 |, n} + 2 min{|k3 |, n} and |V (φ̂i (·, k))| ≤ K also the third term of Rn tends to zero. Lemma 5.7 Let Xt,n be a locally stationary process. Suppose Assumption 5.1 holds and φ1 , . . . , φℓ : [0, 1] × [−π, π] → R are functions possibly depending on n. (i) If ℓ ≥ 2 then ℓ Y ρ∞ (φj ) |cum En (φ1 ), . . . , En (φℓ ) | ≤ Kn1−ℓ/2 ρ2,n (φ1 ) ρ2,n (φ2 ) j=3 24 with a constant K independent of n. (ii) If ℓ ≥ 2 then ℓ Y cum En (φ1 ), . . . , En (φℓ ) ≤ K ℓ (2ℓ)! ρ2,n (φj ). (63) j=1 with a constant K independent of n and ℓ. (iii) If ℓ ≥ 3 and φ1 , . . . , φℓ are independent of n and fulfill Assumption 5.2 then cum En (φ1 ), . . . , En (φℓ ) = o(1). Proof. (i) We have with (53) cum En (φ1 ), . . . , En (φℓ ) = nℓ/2 cum Fn (φ1 ), . . . , Fn (φℓ ) = X X nℓ/2 t1 tℓ = φ̂1 , −k1 · · · φ̂ℓ , −kℓ × ℓ ℓ n n (2π) H2,n t ,...,t 1 × cum ℓ k1 ,...,kℓ (hn ) (h ) (h ) (h ) X[t1n+1/2+k1 /2],n X[t1n+1/2−k1 /2],n , . . . , X[tℓ n+1/2+kℓ /2],n X[tℓ +1/2−k . ℓ /2],n P We now use the representation Xt,n = ∞ j=−∞ at,n (t − j) εj and obtain with the product theorem for cumulants (c.f. Brillinger, 1981, Theorem 2.3.2) and the abbreviations t+ ν = − = t− (t , k ) = [t + 1/2 − k /2] that this is equal to (t , k ) = [t + 1/2 + k /2], t t+ ν ν ν ν ν ν ν ν ν ν ν X nℓ/2 ℓ ℓ (2π) H2,n t ,...,t 1 × × X ℓ X φ̂1 ( k1 ,...,kℓ ℓ h Y i1 ,...,iℓ , j1 ,...,jℓ ν=1 m Y X {P1 ,...,Pm } i.p. j=1 hn ( t1 tℓ , −k1 ) · · · φ̂ℓ ( , −kℓ ) × n n i t+ t− ν + − ) hn ( ν ) at+ (t − i ) a − (t − j ) ν ν tν ,n ν ν ,n ν n n cum εs |s ∈ Pj where the last sum is over all indecomposable partitions (i.p.) {P1 , . . . , Pm } of the table i1 j1 . . . iℓ (64) jℓ 25 with |Pν | ≥ 2 (since EX(t)=0). Using the upper bound supt |at,n (j)| ≤ K ℓ(j) gives |cum En (φ1 ), . . . , En (φℓ ) | X X tℓ t1 ≤ Kn−ℓ/2 φ̂1 ( , −k1 ) · · · φ̂ℓ ( , −kℓ ) × n n t ,...,t 1 × × ℓ X (65) k1 ,...,kℓ ℓ h Y i1 ,...,iℓ , j1 ,...,jℓ ν=1 m Y X {P1 ,...,Pm } i.p. j=1 i 1 1 − ℓ(t+ ν − iν ) ℓ(tν − jν ) cum εs |s ∈ Pj . The Cauchy-Schwarz inequality (with squares of φ1 and φ2 ) now leads to the upper bound ( X X t1 −ℓ/2 φ̂1 ( , −k1 )2 φ̃3 (k3 ) · · · φ̃ℓ (kℓ ) × (66) Kn n t ,...,t 1 × × ℓ k1 ,...,kℓ X i1 ,...,iℓ , j1 ,...,jℓ X ℓ h Y i 1 1 − ℓ(t+ ν − iν ) ℓ(tν − jν ) ν=1 )1/2 ( )1/2 m Y cum εs |s ∈ Pj similar term {P1 ,...,Pm } i.p. j=1 which by using (52) is bounded by ( X X t1 −ℓ/2 φ̂1 ( , −k1 )2 φ̃3 (k3 ) · · · φ̃ℓ (kℓ ) × Kn n t 1 × × (67) k1 ,...,kℓ X i1 ,...,iℓ , j1 ,...,jℓ X ℓ Y 1 1 1 + − ℓ(k − iν + jν ) ℓ(t1 − i1 ) ℓ(t1 − j1 ) ν=2 ν )1/2 ( )1/2 m Y cum εs |s ∈ Pj . similar term {P1 ,...,Pm } i.p. j=1 Note that the term cum εs |s ∈ Pj leads to the restriction that all iν , jν ∈ Pj are equal. We now sum over the remaining indices from k2 , i1 , . . . , iℓ , j1 , . . . , jℓ leading due to the 26 indecomposability of the partition and the fact that 1/ℓ(j) ≤ K to the upper bound Kn−ℓ/2 ≤ Kn ( X X t1 k1 , k3 ,...,kℓ 1−ℓ/2 )1/2 ( )1/2 t1 φ̂1 ( , −k1 )2 φ̃3 (k3 ) · · · φ̃ℓ (kℓ ) similar term n ρ2,n (φ1 ) ρ2,n (φ2 ) ℓ Y j=3 ρ∞ (φj ) and therefore to the result. In (ii) the generic constant K needs to be independent of ℓ. Again we have (65) (with K replaced by K ℓ ). Remember that the term cum εs |s ∈ Pj leads to the restriction that all iν , jν ∈ Pj are equal. We denote this index by i(j) (j = 1, . . . , m). We start by considering the case ℓ even. For each fixed partition {P1 , . . . , Pm } we can renumber the indices {1, . . . , ℓ} in such a way that for every j ∈ {1, . . . , m} there exists at least one even ν ∈ {1, . . . , ℓ} and one odd ν ∈ {1, . . . , ℓ} such that i(j) = iν or i(j) = jν . This can be derived from the indecomposability of the partition and |Pk | ≥ 2 for all k. The Cauchy-Schwarz inequality now yields as an upper bound for (65) ( X K ℓ n−ℓ/2 {P1 ,...,Pm }i.p. × X X Y t1 ,...,tℓ k1 ,...,kℓ j even ( tj φ̂j ( , −kj )2 n the same term with . . . Y j odd ℓ h Y X i(1) ,...,i(m) ν=1 ... )1/2 i 1 1 − ℓ(t+ ν − iν ) ℓ(tν − jν ) . )1/2 (68) where iν = i(j) if iν ∈ Pj and jν = i(j) if jν ∈ Pj . By using relation (52) we have X ℓ(t+ ν tν ,kν X 1 1 1 ≤K ≤K − ℓ(kν − iν + jν ) − iν ) ℓ(tν − jν ) k ν that is the first bracket in (68) is bounded by Kℓ ( X tν ,kν ; ν even ν ≤ K ℓ nℓ/4 Y Y tν φ̂ν ( , −kν )2 n even X Y h i(1) ,...,i(m) ν even i 1 1 − ℓ(t+ ν − iν ) ℓ(tν − jν ) )1/2 ρ2,n (φν ). ν even The same applies for the second bracket in (68). Since the number of indecomposable partitions is bounded by 4ℓ (2ℓ)! we obtain (63). 27 The case ℓ odd is a bit more involved. For each partition {P1 , . . . , Pm } with m < ℓ the result follows as in the case ℓ even. For m = ℓ we can renumerate the indices {1, . . . , ℓ} such that each Pν contains exactly one element of {iν , jν } and {iν+1 , jν+1 } (where iℓ+1 = i1 , jℓ+1 = j1 ). For simplicity we treat the case where Pν = {iν , jν+1 } (ν = 1, . . . , ℓ) (the other cases follow analogously). We obtain with the Cauchy-Schwarz inequality as an upper bound for (65) X tℓ φ̂ ( K ℓ n−ℓ/2 , −k ) ℓ ℓ × n tℓ ,kℓ ( ) ℓ h i 1/2 X Y X Y X tj 1 1 φ̂j ( , −kj )2 × − n ℓ(t+ ν − iν ) ℓ(tν − iν−1 ) t1 ,...,tℓ−1 k1 ,...,kℓ−1 j even i1 ,...,iℓ ν=1 ( )1/2 Y × the same term with . . . ... j=1,3,...,ℓ−2 where i0 = iℓ . By using relation (52) this is bounded by Kℓ n ℓ−2 Y j=2 × ( o X tℓ ρ2,n (φj ) n−3/2 φ̂ℓ ( , −kℓ ) × n tℓ ,kℓ X tℓ−1 ,kℓ−1 tℓ−1 1 φ̂ℓ−1 ( , −kℓ−1 )2 − n ℓ(tℓ − t+ ℓ−1 ) )1/2 ( X t1 ,k1 t1 1 φ̂1 ( , −k1 )2 + n ℓ(tℓ − t− 1) )1/2 . The Cauchy-Schwarz inequality now yields Kℓ n ℓ−2 Y j=2 o ρ2,n (φj ) ρ2,n (φℓ ) × × n−1 ≤ Kℓ ( X t1 ,k1 ,tℓ−1 ,kℓ−1 ℓ nY j=1 o ρ2,n (φj ) X t1 1 tℓ−1 1 φ̂1 ( , −k1 )2 φ̂ℓ−1 ( , −kℓ−1 )2 + − + n n ℓ(tℓ − tℓ−1 ) ℓ(tℓ − t− 1) t ,k ℓ which finally leads to (ii). (iii) follows from (i) since ρ2,n (φ)2 ≤ ρ2 (φ)2 + n1 ρ∞ (φ)ṽ(φ). 28 ℓ )1/2 5.8 (Proof of Theorem 2.4) (The proof in the tapered case is exactly the same - compare Remark 2.5). We have for each φj (denoted by φ for simplicity) and k 6= 0 φ̂(u, k) = Z 2π 0 exp(−ikλ) − 1 φR (u, dλ) ik where φR (u, dλ) is the signed measure corresponding to φR (u, λ) := limµ↓λ φR (u, µ) (since φ is of bounded variation in λ the limit exists; for the same reason φR (u, dλ) is a signed measure). This implies for k 6= 0 φ̃(k) ≤ sup |φ̂(u, k)| ≤ u and for k = 0 K sup V φ(u, ·) |k| u φ̃(0) = sup |φ̂(u, 0)| ≤ 2π kφk∞,∞ u and and K 2 V φ̂(·, k) ≤ V (φ) |k| V φ̂(·, 0) ≤ 2π sup V φ(·, λ) . (69) (70) λ Thus ρ∞ (φ) is not necessarily bounded and Theorem 5.3 cannot be applied. The trick now is to smooth φ(u, λ) in λ - direction and to prove asymptotic normality instead for the resulting sequence of approximations: Let k(x) := √12π exp{− 21 x2 } be the Gaussian kernel, kb (x) := 1 b k( xb ) and φn (u, λ) = Z ∞ −∞ kb (λ − µ) φ(u, µ) dµ with b = bn → 0 as n → ∞ (where φ(u, µ) = 0 for |µ| > π). We have φbn (u, k) = φ̂(u, k) kbb (k) with kbb (k) = exp(−k2 b2 /2). We obtain from Lemma 5.7(i) n ∞ X 2 KX t t n var Fn (φn ) − Fn (φ) ≤ φbn ( , k) − φ̂( , k) n n n t=1 k=−∞ X exp(−k2 b2 /2) − 1 2 ≤K . k2 k 2 2 Since |1 − exp(−k2 b2 /2)| ≤ min{1, k 2b } this is bounded by K X 1 X k 2 b4 +K = O(b) 4 k2 |k|≤1/b |k|>1/b 29 (71) (72) which implies √ P (73) n {Fn (φn ) − EFn (φn )} − {Fn (φ) − EFn (φ)} → 0. √ We now derive a CLT for n Fn (φj,n ) − EFn (φj,n ) j=1...,k by applying Lemma 5.6(i) and Lemma 5.7(i). We obtain from (70) and (71) ρ∞ (φn ) ≤ 2πkφk∞,∞ + Kkφk∞,V ∞ X 1 exp(−k2 b2 /2) ≤ Kb−1 |k| k=1 and vΣ (φn ) ≤ Kb−1 . Therefore the remainder term Rn in Lemma 5.6(i) and the higher cumulants in Lemma 5.7(i) converge to zero provided we choose b such that bn(1−κ)/2 → ∞. Furthermore cE (φj,n , φk,n ) = cE (φj , φk ) + O(b1/2 ) (i, j = 1, . . . , k). R This follows easily by application of the Cauchy-Schwarz inequality and Rsupu (φj,n (u, λ) − 2 φj (u, λ))2 dλ = O(b) (obtained with the Parseval formula as in (72)) and (φj,n (u, λ)) dλ ≤ √ K. This gives the required CLT and with (73) also the CLT for n Fn (φj )−EFn (φj ) j=1...,k . We obtain from (55) and (56) √ nEFn (φ) − F (φ) h X X 1 X 1 i + Kn−1/2+κ φ̃(k) + |k|φ̃(k) ≤ Kn−1/2 ℓ(k) ℓ(k) |k|≤n ≤ Kn −1/2+κ |k|>n k log n which finally proves Theorem 2.4. 5.9 (Proof of Theorem 2.6 and Remark 2.7) We obtain from Lemma 5.7(ii) for the ℓ-th order cumulant in the case ℓ ≥ 2 cumℓ En (φ) ≤ K ℓ (2ℓ)! ρ2,n (φ)ℓ . The result now follows in the same way as in the proof of Lemma 2.3 in Dahlhaus (1988). We now prove the inequalities in Remark 2.7: We obtain from the proof of Lemma 5.5 and an application of the Cauchy-Schwarz inequality √ √ n |EFn (φ) − F + (φ) | ≤ n ρ2,n (φ) × n n X o1/2 1 X X t 2 t (hn ) (hn ) ) − c( , k) + , X[t+1/2−k/2],n cov(X[t+1/2+k/2],n c( , k)2 . × n t=1 n n |k|≤n |k|>n 30 Application of Proposition 5.4 yields that the term in the bracket is of order n−1/2 leading to (16). (17) and (18) follow by straightforward calculations noting that Definition 2.1 implies supu,λ |f (u, λ)| ≤ ∞. (20) follows from an upper bound of (55) obtained by using (70). 5.10 (Proof of Theorem 2.8) The proof uses a chaining technique as in Alexander (1984). Let Bn = { max |Xt,n | ≤ 2 log n}. t=1,...,n Lemma 5.14 gives limn→∞ P (Bn ) = 1. We will replace φ by Z u ∗ φ(v, λ) dv (with φ(v, λ) = 0 for v < 0). φn (u, λ) = n 1 u− n (74) (75) The reason for doing so is that otherwise we needed the exponential inequality (15) to hold with ρ2 (φ) instead of ρ2,n (φ). Such an inequality does not hold. Instead we exploit the following property of φ∗n : Z t n Z n Z 2 n 1X π ∗ t 1X π φ(u, λ) du dλ φn ( , λ)2 dλ = n t−1 n n n n t=1 −π t=1 −π n Z π Z t X n ≤ φ2 (u, λ) du dλ = ρ22 (φ). 2 ρ2,n (φ∗n ) = t=1 −π t−1 n Denote Ẽn∗ (φ) := Ẽn (φ∗n ) = √ n (Fn − EFn )(φ∗n ). (76) (77) Since the assertion and the proof of Theorem 2.6 are for n fixed we obtain from (15) r η ∗ P (|Ẽn (φ)| ≥ η) ≤ c1 exp − c2 . (78) ρ2 (φ) On Bn we have by using Lemma 5.13 and Lemma 5.14 that √ √ |Ẽn∗ (φ) − Ẽn (φ)| ≤ n |Fn (φ∗n ) − Fn (φ)| + n |EFn (φ∗n ) − EFn (φ)| (log n)2 (log n)3 + kφk ≤ 4 K1 kφkV,V V,∞ n1/2 n1/2 1 + K2 kφkV,∞ 1/2 n and therefore with (22) sup |Ẽn∗ (φ) − Ẽn (φ)| ≤ φ∈Φ 31 η . 2 (79) Thus P ( sup |Ẽn (φ)| > η, Bn ) ≤ P ( sup |Ẽn∗ (φ)| > φ∈Φ Let α := H̃Φ−1 c2 4 q φ∈Φ . We obtain for any sequence (δj )j with α = δ0 > δ1 > . . . > 0 η τ2 where δj+1 ≤ δj /2 with ηj+1 := 18 η ≥ 2 4 c2 Z η , Bn ). 2 α 0 9 c22 δj+1 H̃Φ (δj+1 )2 H̃Φ (s)2 ds ≥ ∞ ∞ X 18 X 2 ηj+1 . (δ − δ ) H̃ (δ ) ≥ j+1 j+2 Φ j+1 c22 j=0 j=0 (80) For each number δj choose a finite subset Aj corresponding to the definition of the covering numbers N (δj , Φ, ρ2 ). In other words, the set Aj consists of the smallest possible number Nj = N (δj , Φ, ρ2 ) of midpoints of ρ2 -balls of radius δj such that the corresponding balls cover Φ. Now telescope Ẽn∗ (φ) = Ẽn∗ (φ0 ) + ∞ X j=0 Ẽn∗ (φj+1 − φj ) (81) where the φj are the approximating functions to φ from Aj , i.e. ρ2 (φ, φj ) < δj . The above equality holds on Bn , because Lemma 5.13 implies that sup |Ẽn∗ (φ − φj )| ≤ 5 K3 n (log n)2 sup ρ2 (φ − φj ) ≤ 5 K3 n (log n)2 δj → 0 for all n. φ∈Φ φ∈Φ Thus P (sup |Ẽn∗ (φ)| > φ∈Φ η , Bn ) 2 ≤ P (sup |Ẽn∗ (φ0 )| > φ∈Φ = I + II. ∞ X η Nj Nj+1 sup P ( |Ẽn∗ (φj+1 − φj )| > ηj+1 ) )+ 4 φ∈Φ j=0 Hence, using exponential inequality (78) we have by definition of α that r n c rη o n c2 η o 2 = c1 exp − . I ≤ c1 exp H̃Φ (α) − 2 τ2 4 τ2 In order to estimate II we need a particular definition of the δj . We set 1 √ . δj+1 = sup x : x ≤ δj /2 ; H̃Φ (x) ≥ H̃Φ (δj ) + j+1 32 (82) Since Pj+1 ℓ=1 √1 ℓ II R j+1 √ dx = 2 j + 1 ≥ 2 log(j + 1) we obtain for term II: v u9 ∞ u 2 δj+1 H̃Φ (δj+1 )2 o n X t c2 c1 exp 2 H̃Φ (δj+1 ) − c2 ≤ δj+1 ≥ ≤ 0 j=0 ∞ X j=0 √1 x ∞ n o n o X c1 exp −H̃(α) − 2 log(j + 1) c1 exp −H̃(δj+1 ) ≤ j=0 n c 2 = c1 exp − 4 r η o τ2 ∞ X j=1 n c 1 2 ≤ 2 c exp − 1 j2 4 r η o . τ2 This implies the maximal inequality (24). To prove (25) we note that |En (φ) − Ẽn (φ)| = √ n |EFn (φ) − F (φ)|, that is we obtain instead of (79) on Bn with (83), (84), (17), (20) and (22) sup |Ẽn∗ (φ) − En (φ)| ≤ 13 L max{τ∞,V , τV,∞ , τV,V , τ∞,∞ } φ∈Φ η (log n)3 √ ≤ . n 2 The rest of the proof is the same, i.e. we also obtain (25). 5.11 (Proof of Theorem 2.10) To prove weak convergence of En we have to show weak convergence of the finite dimensional distributions and asymptotic equicontinuity in probability of En (cf. van der Vaart and Wellner, 1996, Theorems 1.5.4 and 1.5.7). Convergence of the finite dimensional distributions has been shown in Theorem 2.4. Asymptotic equicontinuity means that for every ǫ, η > 0 there exists an τ2 > 0 such that lim inf P sup |En (φ − ψ)| > η < ǫ. n ρ2 (φ,ψ)<τ2 In order to see this we apply Theorem 2.8. For fixed η > 0 there exists a τ2 > 0 small enough such that (23) holds. To see this notice that α → 0 as τ2 → 0, and hence, using assumption (26), it follows that the integral on the right hand side of (23) also tends to zero if τ2 → 0. As η > 0 is fixed (22) holds for n large enough. Hence we obtain with Bn from (74) for τ2 small enough lim inf P sup |En (φ − ψ)| > η n ρ2 (φ,ψ)<τ2 ≤ lim inf P n sup ρ2 (φ,ψ)<τ2 n c 2 ≤ 3 c1 exp − 4 r |En (φ − ψ)| > η, Bn + lim P (Bnc ) n η o < ǫ. τ2 33 5.12 (Proof of Theorem 2.11) Let δ > 0. The assumptions of Theorem 2.8 are fulfilled √ for η = δ n and n sufficiently large. For those n we obtain from Theorem 2.8 √ P sup |Fn (φ) − F (φ)| > δ = P sup |En (φ)| > δ n φ∈Φn φ∈Φn n c2 ≤ 3 c1 exp − 4 s √ δ n o (n) τ2 + P (Bnc ) → 0. Lemma 5.13 (Properties of Fn (φ∗n )) Ru Let Xt,n be a locally stationary process and φ∗n (u, λ) = n u− 1 φ(v, λ) dv (with φ(v, λ) = 0 n for v < 0). Then we have with X(n) := maxt=1,...,n |Xt,n | log n 1 2 kφkV,V , |Fn (φ) − Fn (φ∗n )| ≤ K1 X(n) + kφkV,∞ n n 1 |EFn (φ) − EFn (φ∗n )| ≤ K2 kφkV,∞ , n √ 2 + 1 ρ2 (φ). |Fn (φ∗n ) − EFn (φ∗n )| ≤ K3 n X(n) (83) (84) (85) Proof. We have with (70) n Z π 1 X t t t φ( , λ) − φ∗n ( , λ) Jn ( , λ) dλ |Fn (φ) − Fn (φ∗n )| = n n n n t=1 −π t Z n n t X n 1 X 2 b −k) d u b , −k) − φ(u, ≤ O(X(n) ) n φ( t−1 n n t=1 k=−n n X n 1 V (φ̂(·, k)) n k=−n 1 log n 2 . + sup V φ(·, λ) V 2 (φ) ≤ K1 X(n) n n λ 2 ≤ O(X(n) ) Furthermore we obtain with Proposition 5.4 and (70) |EFn (φ) − EFn (φ∗n )| Z t n n n 1 1X X b −k)du cov(X[t+1/2+k/2],n , X[t+1/2−k/2],n b t , −k) − φ(u, φ( ≤ n t−1 2π n n t=1 k=−n n 1 ≤ K2 sup V φ(·, λ) . n λ 34 (85) has been proved in Lemma A.3 of Dahlhaus and Polonik (2006). Lemma 5.14 Let Xt,n be a locally stationary process with E|εt |k ≤ Cεk for all k ∈ N for the εt from Definition 2.1 and let X(n) := maxt=1,...,n |Xt,n |. Then we have P(X(n) > 2 log n ) → 0. P Proof. From (3) we have supt,n ∞ j=−∞ |at,n (j)| < m0 < ∞. The monotone convergence theorem and Jensen’s inequality then imply k E |Xt,n | ≤ E ∞ X j=−∞ ∞ k X |at,n (j)| k |εt−j |k ≤ mk0 Cεk |at,n (j)| |εt−j | ≤ m0 E m0 j=−∞ leading to P(X(n) > 2 log n ) ≤ n max P(Xt,n > 2 log n ) t=1,...,n ≤n 6 ∞ 1 X mk0 Cεk 1 E e|Xt,n | ≤ ≤ em0 Cε → 0. 2 log n e n k! n k=0 Appendix: Proof of Proposition 2.3 6.1 (Proof of Proposition 2.3) We only give the proof for tvAR-processes (q = 0). The extension to tvARMA-processes then is straightforward. The proof is similar to Künsch (1995) who proved the existence of a solution of the form (2) under the assumption that the functions αi (u) are continuous. Let −α1 (u) −α2 (u) . . . . . . −αp (u) 1 0 ... ... 0 .. .. .. α(u) = 0 . . . .. . .. .. .. .. . . . . 0 ... 0 1 0 and α(u) = α(0) for u < 0. Since det(λEp − α(u)) = λp (Σpj=0 αj (u)λ−j ) it follows that 1 for all u where δ(A) := max{|λ| : λ eigenvalue of A}. Let δ(α(u)) ≤ 1+δ at,n (j) = j−1 Y ℓ=0 t−j t−ℓ ) σ( ) α( n n 11 35 and Xt,n = ∞ X at,n (j) εt−j . j=0 It is easy to check that Xt,n is a solution of (8) provided the coefficients are absolutely summable. To prove this we note (cf. Householder, 1964, p.46) that for every ε > 0 and u ∈ [0, 1] there exists a matrix M (u) with kα(u)kM (u) ≤ δ α(u) + ε Pp −1 x) |. where kAkM := sup{kAxkM : kxkM = 1} and kxkM = kM −1 xk1 = i i=1 |(M Since the αi (u) are functions of bounded variation (i.e. the difference of two monotonic functions) there exists for all ε > 0 a finite partition of intervals I1 ∪ · · · ∪ Im = [0, 1] such that |αi (u) − αi (v)| < ε for all i whenever u, v are in the same Ik . Let Mk := M (uk ) for an arbitrary uk ∈ Ik . Therefore m (and the partition) can be chosen such that kα(v)kMk ≤ ρ := 1 + δ −1 <1 2 for all v ∈ Ik . We now replace the first interval I1 by I1 ∪ (−∞, P0) (remember that α(u) = α(0) for u < 0). There exists a constant c0 such that kBk1 := i,j |Bi,j | ≤ c0 kBkMk for all k. For t and n fixed we now define Lk := {ℓ ≥ 0 : t−ℓ n ∈ Ik } and Lk,j := Lk ∩ {0, . . . , j − 1}. Then j−1 j−1 Y Y t − ℓ t − ℓ t − j σ( t − j ) ) ≤ |at,n (j)| = α α σ( n n n n 1 11 ℓ=0 ℓ=0 m Y m Y Y Y t − ℓ t − ℓ t−j t−j m α α ) ≤ c ) σ( ≤ σ( 0 n n n n 1 Mk ℓ∈Lk,j k=1 ≤ cm 0 sup σ(u) u k=1 m Y ρ|Lk,j | = Kρj ℓ∈Lk,j (since m is fixed), k=1 Pp t−k t t that is we have proved (3). Since kα( t−k i=1 |αi ( n ) − αi ( n )| we obtain n ) − α( n )k1 = with similar arguments and a(u, j) := (α(u)j )11 σ(u) 36 n X t=1 j−1 j−1 n X t − k X t t − ℓ t Y t − j t k |at,n (j) − a( , j)| ≤ α α −α α σ n n n n n n 1 t=1 k=1 ℓ=k+1 + n X t=1 ≤ j−1 n X X c0 ρk t=1 k=1 2 j−1 ≤ Kj ρ t t j t − j − σ α σ n n n 1 p X t t−k j−1−k ) − αi ( ) cm αi ( 0 ρ n n i=1 i.e. (5). (4) and (6) follow similarly. Acknowledgement. This work has been partially supported by the NSF (grants #0103606 and #0406431) and the Deutsche Forschungsgemeinschaft (DA 187/15-1). References Brillinger, D.R. (1981). Time Series: Data Analysis and Theory. Holden Day, San Francisco. Dahlhaus, R. (1988). Empirical spectral processes and their applications to time series analysis. Stoch. Proc. Appl. 30 69-83. Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. Ann. Statist. 25 1-37. Dahlhaus, R. (2000). A likelihood approximation for locally stationary processes. Ann. Statist. 28 1762-1794. Dahlhaus, R. and Neumann, M.H. (2001). Locally adaptive fitting of semiparametric models to nonstationary time series. Stoch. Proc. and Appl. 91 277-308. Dahlhaus, R. and Polonik, W. (2002). Empirical spectral processes and nonparametric maximum likelihood estimation for time series. In Empirical Process Techniques for Dependent Data, H. Dehling, T. Mikosch, and M. Sørensen, editors, pp. 275 - 298. Dahlhaus, R. and Polonik, W. (2006). Nonparametric quasi maximum likelihood estimation for Gaussian locally stationary processes. Ann. Statist. to appear. Davis, R.A. and Lee, T., and Rodriguez-Yam, G. (2005). Structural Break Estimation for Nonstationary Time Series Models. J. Am. Statist. Ass. 101 223 - 239. 37 Fay, G. and Soulier, P. (2001) The periodogram of an i.i.d. sequence. Stochastic Processes and Their Applications 92 315 - 343. Fryzlewicz, P., Sapatinas, T. and Subba Rao, S. (2006). A Haar-Fisz technique for locally stationary volatility estimation. Biometrika 93 687-704. Künsch, H.R. (1995). A note on causal solutions for locally stationary AR processes. Preprint. ETH Zürich. Nason, G. P.,von Sachs, R. and Kroisandt, G. (2000). Wavelet processes and adaptive estimation of evolutionary wavelet spectra. J Royal Statist. Soc. B 62 271-292. Mikosch, T. and Norvaisa, R. (1997). Uniform convergence of the empirical spectral distribution function. Stoch. Proc. Appl. 70 85 - 114. Moulines, E., Priouret, P. and Roueff, F. (2005). On recursive estimation for locally stationary time varying autoregressive processes. Ann. Statist. 33 2610-2654. Neumann, M.H. and von Sachs, R. (1997). Wavelet thresholding in anisotropic function classes and applications to adaptive estimation of evolutionary spectra. Ann. Statist. 25 38 - 76. Priestley, M.B. (1965). Evolutionary spectra and non-stationary processes. J Royal Statist. Soc. B 27 204-237. Sakiyama, K. and Taniguchi, M. (2004). Discriminant analysis for locally stationary processes. J. Multiv. Anal. 90 282-300. Van Bellegem, S. and Dahlhaus, R. (2006). Semiparametric estimation by model selection for locally stationary processes. J Royal Statist. Soc. B 68 721-746. van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes. Springer, New York Whittle, P. (1953) Estimation and information in stationary time series. Ark. Mat 2 423-434. 38