This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/authorsrights Author's personal copy Journal of Multivariate Analysis 117 (2013) 100–119 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Test of independence for functional data✩ Lajos Horváth a , Marie Hušková b , Gregory Rice a,∗ a Department of Mathematics, University of Utah, Salt Lake City, UT 84112-0090, USA b Department of Statistics, MFF UK, Sokolovská 83, CZ-18600 Praha, Czech Republic article info Article history: Received 1 August 2012 Available online 18 February 2013 AMS 1991 subject classifications: primary 62G20 secondary 62G10 62M10 Keywords: Variables in Hilbert spaces Test for independence Sample autocovariances Karhunen–Loéve expansion Asymptotic normality abstract We wish to test the null hypothesis that a collection of functional observations are independent and identically distributed. Our procedure is based on the sum of the L2 norms of the empirical correlation functions. The limit distribution of the proposed test statistic is established under the null hypothesis. Under the alternative the sample exhibits serial correlation, and consistency is shown when the sample size as well as the number of lags used in the test statistic tend to ∞. A Monte Carlo study illustrates the small sample behavior of the test and the procedure is applied to data sets, Eurodollar futures and magnetogram records. © 2013 Elsevier Inc. All rights reserved. 1. Introduction and results The majority of statistical methods in the book of Ramsay and Silverman [36] are based on the assumption that the functional observations are independent and identically distributed. This assumption is ensured in designed experiments (cf. [31]) or reasonably justified by data generating processes (cf. [1,11,4]) in some cases. However, it is not always obvious that this assumption holds if data is generated by observing a time series, as is often the case with physical phenomena (cf. [30]) or economic activities (cf. [23]). It has been observed by Horváth et al. [21] that neglecting dependence in functional observations reduces the power of statistical tests and may cause misleading results. Hence it is crucial when using these procedures to check for the validity of the independence assumption in the data. Tests for the independence of real and vector valued observations have been developed in the time series literature (cf. [9,27]). Due to the popularity of the Box–Ljung–Pierce approach (cf. [7,26]), the majority of tests used to check the independence assumption verify that all autocovariances and/or autocorrelations up to lag H are suitably close to 0. The univariate results of Box and Pierce [7] and Ljung and Box [26] were extended to multivariate time series by Chitturi [12], Hosking [22] and Li and McLeod [25]. It is noted in [17] that the methods used for univariate and vector valued time series cannot be used in the case of functional observations. In this paper we follow the Box–Ljung–Pierce approach in the case of functional observations and provide autocovariance based procedures to test if a given functional time series is a white noise sequence. We assume that we have observations 1 X1 (t ), . . . , Xn (t ) which are square integrable random functions on [0, 1]. Let ∥f ∥ = ( f 2 (t )dt )1/2 , where = 0 . ✩ Research supported by NSF grant DMS 0905400 and grant CAČR P201/12/1277. ∗ Corresponding author. E-mail address: rice@math.utah.edu (G. Rice). 0047-259X/$ – see front matter © 2013 Elsevier Inc. All rights reserved. doi:10.1016/j.jmva.2013.02.005 Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 101 We wish to test the null hypothesis H0 : X1 , . . . , Xn are independent and identically distributed random functions, against the alternative H A : X1 , . . . , Xn is a stationary and ergodic sequence such that for some h0 ≥ 1 Ch20 (t , s)dtds > 0, where Ch0 (t , s) = cov(X1 (t ), X1+h0 (s)). Such tests are typically referred to as ‘‘portmanteau’’ tests; the null hypothesis H0 is well specified, however any test of H0 cannot have power against all possible alternatives. The reason for defining HA as we have done is quite simple. Since this test is to be applied to a functional time series, it should have power to detect whether the data is a white noise sequence, or if instead it follows one of the available models for dependent functional time series, such as the functional autoregressive model of order one (FAR(1)) (cf. [23,2]), or functional linear process (cf. [6]). Such models exhibit serial correlation by construction and thus will satisfy HA . A similar argument is behind the definition of the alternative hypothesis for the analogous tests for univariate and multivariate data, in which case the data follows an ARIMA model under the alternative. To motivate the definition of the test statistic we note that under H0 , cov(X1 (t ), X1+h (s)) = 0 for all h ≥ 1 and for almost all (s, t ) ∈ [0, 1]2 , hence all sample autocovariance functions Ĉn,h (t , s) = n−h 1 Xi (t ) − X̄n (t ) Xi+h (s) − X̄n (s) , n i=1 h ≥ 0, should be close to 0 for all h ≥ 1, where X̄n (t ) = n 1 n i =1 Xi (t ) denotes the sample mean. The function Ĉn,0 (t , s) is an estimator for the covariance function C0 (t , s) = E [(X1 (t ) − EX1 (t ))(X1 (s) − EX1 (s))], assuming that Xi , i ≥ 1 is a stationary sequence. The weak convergence of the covariance operator given by Ĉn,0 was studied by Mas [29]. Panaretos et al. [32] and Fremdt et al. [15] used the estimated covariance function Ĉn,0 in statistical inference. The testing procedures for multivariate observations in [12,22,25] are based on quadratic forms of the sample correlation or covariance matrices. In our approach these quadratic forms are replaced with the square integrals of the sample covariance functions Ĉn,h (t , s). Under HA at least one of the functions Ĉn,h (t , s) is significantly different from zero, and so we reject H0 if V̂n,H = H Ĉn2,h (t , s)dtds h =1 is large. In the case of univariate and vector valued observations, the number of correlations used in the testing procedure has been a fixed number. The limit distributions of these portmanteau tests are χ 2 , where the degree of freedom depends on the number of lags used. However, it has been observed that the χ 2 approximation might be poor and several modifications have been suggested (cf. [34,24,14,28]). The approximation tends to work well if both the sample size and the number of lags used to define the statistic are large. Hence, in this paper we consider the case when H, the number of lags used in the definition of V̂n,H , depends on the sample and tends to ∞ as the sample size increases: Assumption 1.1. H = H (n) → ∞ and H = O((log n)α ) with some α > 0, as n → ∞. First we consider the asymptotic behavior of V̂n,H under the null hypothesis. We assume Assumption 1.2. E ∥Xi ∥4 < ∞, for all i ≥ 1, which implies that C0 (t , s) is square integrable, and there are eigenvalues λ1 ≥ λ2 ≥ · · · ≥ 0 and orthonormal eigenfunctions ϕ1 , ϕ2 , . . . satisfying λi ϕi (t ) = C0 (t , s)ϕi (s)ds. (1.1) We also note that C 0 ( t , s) = ∞ ℓ=1 λℓ ϕℓ (t )ϕℓ (s) (1.2) Author's personal copy 102 L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 (cf. [18]). It is a simple consequence of the Cauchy–Schwarz inequality that under Assumption 1.2 ∞ ∞ λℓ < ∞ and therefore ℓ=1 ℓ=1 λ2ℓ < ∞. (1.3) Theorem 1.1. If H0 , Assumptions 1.1 and 1.2 are satisfied, then 1 (2H σ 2 )1/2 nV̂n,H − H µ D → N (0, 1), where N (0, 1) stands for a standard normal random variable, µ= ∞ 2 λℓ and σ = 2 ℓ=1 ∞ ℓ=1 2 λℓ 2 . It follows from (1.3) that both µ and σ 2 are finite. Since the eigenvalues are unknown in practice we need to estimate µ and σ 2 from the sample. It follows from (1.2) that C0 (t , t )dt = ∞ λℓ and C02 (t , s)dtds = ℓ=1 ∞ ℓ=1 λ2ℓ . Since Ĉn,H (t , s), the sample covariance, is a well established and studied estimator for C0 (t , s), we use µ̂n = Ĉn,0 (t , t )dt 2 and σ̂ = 2 n Ĉn2,0 2 (t , s)dtds to estimate µ and σ 2 . Interestingly, Theorem 1.1 remains true when the theoretical µ and σ 2 are replaced with the corresponding estimators. Theorem 1.2. I f H0 , Assumptions 1.1 and 1.2 are satisfied, then 1 2 1/2 n (2H σ̂ ) nV̂n,H − H µ̂n D → N (0, 1), where N (0, 1) stands for a standard normal random variable. Gabrys and Kokoszka [17] and Gabrys et al. [16] provide a different approach to test H0 . By the Karhunen–Loéve expansion (cf. [20]) we have Xi ( t ) = ∞ ξi,ℓ ϕℓ (t ), (1.4) ℓ=1 where ξi,ℓ = ⟨Xi , ϕℓ ⟩ and ⟨f , g ⟩ = f (t )g (t )dt denotes the inner product in the Hilbert space L2 . Gabrys and Kokoszka [17] suggest using the first p coefficients in expansion (1.4) and test if the vectors (ξi,1 , . . . , ξi,p )T ∈ Rp , 1 ≤ i ≤ n are independent using the sum of the sample correlation matrices up to lag H. The portmanteau test in [17] is extended to test for independence in the residuals of a functional linear model by Gabrys et al. [16]. In the papers of Gabrys and Kokoszka [17] and Gabrys et al. [16] it is assumed that the number of lags H and the number of projections used p are fixed and do not depend on the sample size. Furthermore, since the eigenfunctions ϕℓ are unknown, they are estimated with ϕ̂ℓ from the sample and the empirical projections ⟨Xi , ϕ̂ℓ ⟩ are used in the statistical analysis. They showed under the independence null hypothesis that the limit distribution of the portmanteau test statistic is χ 2 with p2 H degrees of freedom. For large p and H the χ 2 is suitable. The choice of p is widely discussed in the literature (cf. [36]), but there is no optimal choice. The estimation of the ϕℓ ’s uses the fda package, where the observations are replaced with smoothed curves. The extra smoothing step can introduce bias and might mask important features of the original data. In our approach we do not use projections so we do not need to fix the value of p, and we allow H to increase with the sample size. Also, our method has the advantage that the empirical eigenfunctions ϕ̂ℓ need not be computed. Next we consider the consistency of the testing procedure based on Theorem 1.2 under HA . We only require under the alternative HA that Assumption 1.3. E ∥X1 ∥2 < ∞ and Assumption 1.4. H = H (n), n/H 3/2 → ∞. Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 103 Theorem 1.3. If HA , Assumptions 1.3 and 1.4 are satisfied, then 1 2 1/2 n (2H σ̂ ) nV̂n,H − H µ̂n P → ∞. Next we show an example when the conditions of Theorem 1.3 are satisfied. Example 1.1. The FAR(1) process satisfies the recursion Xn (t ) = ψ(t , u)Xn−1 (u)du + εn (t ), 0 ≤ t ≤ 1, −∞ < n < ∞, (1.5) where εn (t ), −∞ < n < ∞ are independent and identically distributed random processes with E εn (t ) = 0 and E ∥εn ∥ < ∞. Bosq [6] proved, if ∥ψ∥ < 1, then (1.5) has a unique stationary and ergodic solution. Since C0 (t , s) is a positive definite function we have (1.1) and (1.2), and we can assume that ϕℓ , 1 ≤ ℓ < ∞ is a basis or it can be extended into a basis. The square integrability of ψ yields that ψ(s, u) = ∞ µi,j ϕi (s)ϕj (u). (1.6) i,j=1 Multiplying both sides of (1.5) with Xn−1 (t ) and then taking expected values we obtain the equation C1 (t , s) = ψ(s, u)C0 (u, t )du. Now the expansions in (1.2) and (1.6) with the orthonormality of the ϕℓ ’s imply C 1 ( t , s) = ∞ µi,j λj ϕi (s)ϕj (t ) i,j=1 and therefore C12 (t , s)dtds = ∞ i ,j = 1 µ2i,j λ2j . If all λi ’s are positive (i.e. Xn is not spanned by finitely many functions) and ∥ψ∥ > 0, then HA holds with h0 = 1. Remark 1.1. One can show that under the conditions of Theorem 1.1 for every fixed h0 nĈn2,h (t , s)dtds, 1 ≤ h ≤ h0 D → (φh , 1 ≤ h ≤ h0 ) , where φh , 1 ≤ h ≤ h0 are independent identically distributed random variables which can be represented as an infinite sum of weighted χ 2 (1) random variables. However, the weights depend on the unknown eigenvalues λ1 , λ2 , . . . , so the implementation of the Bonferroni inequality is not immediate when finitely many lags are used. Remark 1.2. To have a model for a local alternative in the functional setting one would consider Xi (t ) = Xi,n (t ) = Yi (t ) + γn Zi (t ), 0 ≤ t ≤ 1, 1 ≤ i ≤ n, where {Yi } is an independent, identically distributed sequence, and {Zi } is a Bernoulli shift, {4 + δ}-decomposable for some δ > 0 in the sense of Hörmann and Kokoszka [19]. We assume that {Yi } and {Zi } are independent of each other, EYi (t ) = EZi (t ) = 0, and E ∥Yi ∥4 < ∞. The deviation from the null hypothesis is asymptotically small in the sense that H 1/2 γn → 0 as n → ∞. The sequence {Zi } is correlated at some lag, which means that there is h0 ≥ 1 such that A2h0 (t , s)dtds > 0, where Ah (t , s) = EZ1 (t )Z1+h (s). (1.7) We also assume that H satisfies Assumption 1.1. We illustrate at the end of Section 4 that if γn4 nH 1/2 → 0, then Theorem 1.2 remains true. If γn4 nH 1/2 → ∞, then we reject H0 with probability tending to 1, since (nV̂n,H − H µ̂n )/(2H σ̂n2 )1/2 → ∞ in 4 1/2 2 1/2 probability. In the borderline case when ∞ γn2 nH → c > 0, then (nV̂n,H − H µ̂n )/(2H σ̂n ) converges to a normal random 2 −1/2 variable with mean c (2σ̄ ) Ah (t , s)dtds. In this case the rejection probability does not go to zero as n → ∞. The h =1 asymptotic variance σ̄ 2 is defined as the limit of the sum of the squares of the eigenvalues of E X1,n (t )X1,n (s) as n → ∞. We note that σ̄ 2 is also the sum of the squares of the eigenvalues of E [Y1 (t )Y1 (s)]. Author's personal copy 104 L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 Table 2.1 Empirical size of the test when the observations are independent Wiener processes. n 10% 5% 1% n H 10% 5% 1% n H 10% 5% 1% 50 H 5 10 20 25 8.3 9.2 9.0 9.1 5.3 5.4 5.8 5.6 2.9 2.7 2.9 2.8 100 10 20 30 40 8.9 10.1 10.4 9.9 5.2 6.6 5.6 5.5 2.3 3.5 2.1 2.2 150 10 20 30 40 10.0 9.4 9.9 10.0 6.2 6.0 6.2 6.5 2.4 2.3 2.4 2.5 200 10 20 30 40 10.2 10.9 9.5 9.5 5.9 5.5 6.1 5.6 2.1 2.1 2.1 2.6 250 10 20 30 40 10.7 9.4 10.9 9.3 5.8 4.8 5.7 5.1 1.4 2.0 2.0 2.5 300 10 20 30 40 8.6 9.7 10.0 11.0 5.4 6.0 5.8 6.4 2.5 1.6 2.0 1.8 2. Simulations and applications In the first part of this section we investigate the finite-sample properties of the test in Theorem 1.2. The estimator Ĉn,h is biased, so to improve the finite sample performance we replaced H µ̂n with H ∗ µ̂n with H ∗ = H − H (H + 1)/(2n). Let Vn,H = (2H σ̂n2 )−1/2 nV̂n,H − H ∗ µ̂n . To investigate the size of the test in the case of finite sample sizes, we simulated sample paths of independent Wiener processes (standard Brownian motions) and Brownian bridges. To simulate Wiener processes, cumulative sums of independent normal random variables were computed on a grid of 100 equi-spaced points in the interval [0, 1] and linearly connected. Table 2.1 shows the percentage of Vn,H ≥ Φ −1 (1 −α) (α = 0.1, 0.05, 0.01) based on 1000 repetitions. Table 2.2 provides the same information when the Xi ’s are independent and identically distributed Brownian bridges. As one can see from the empirical sizes in Tables 2.1 and 2.2, the convergence of the distribution of the test statistic to that of the standard normal appears to be somewhat slow in the tails. This is not surprising, since for every h the limit of nV̂n,h is a weighted sum of independent χ 2 (1) random variables, so the limit has an exponential tail, where χ 2 (ν) denotes a χ 2 random variable with ν degrees of freedom. It has been known since [13] that the normal approximation to the χ 2 when the degree of freedom is large does not work well on the tails. To get better finite sample performance, transformations of the χ 2 variables have been suggested (cf. [13,33,10]). By a simple manipulation, Theorem 1.2 implies that 2H µ̂2n −1/2 nV̂n,H µ̂n σ̂n2 σ2 − H µ2 D → N (0, 1). σ2 This can be interpreted as nV̂n,H σ 2 /µ being distributed approximately as a χ 2 (H µ2 /σ 2 ) random variable. It is shown in [10] that if m(ν) = 5 6 − 1 9ν − 7 648ν 2 + 25 2187ν 3 and v(ν) = 1 18ν + 1 − 162ν 2 37 11664ν 3 , then the power transformation T (χ 2 (ν)) = v(ν)−1/2 χ 2 (ν) ν 1/6 − 1 2 χ 2 (ν) ν 1/3 + 1 3 χ 2 (ν) ν 1/2 − m(ν) is closer to being distributed as the standard normal in terms of the maximum absolute error than the ordinary normal approximation of a χ 2 (ν) random variable. Applying this transformation to nV̂n,H µ̂n /σ̂n2 and using degrees of freedom H µ̂2n /σ̂n2 provides some improvement to the empirical size of the test. Fig. 2.1 shows the cumulative distribution function of the standard normal along with the empirical distribution functions of Vn,H and T (nV̂n,H µ̂n /σ̂n2 ) computed from 1000 simulations using n = 100 and H = 20. The study of the power of the test is based on the alternative when Xn (t ) is a Hilbert-space valued autoregressive(1) process (HAR(1)) of Example 1.1. In order to compare the power of the test in Theorem 1.2 with Table 2 in [17] we used the kernel ψc (t , s) = c exp t 2 + s2 2 . With the choice of c = 0.3416, we have that ∥ψc ∥ ≈ 0.5. Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 105 Table 2.2 Empirical size of the test when the observations are independent Brownian bridges. n 10% 5% 1% n H 10% 5% 1% n H 10% 5% 1% 50 H 5 10 20 25 9.4 8.9 9.2 8.5 5.8 6.1 5.5 5.4 3.2 2.6 2.3 2.5 100 10 20 30 40 9.3 10.2 10.0 10.5 6.2 5.7 5.9 7.0 2.2 2.3 2.7 2.9 150 10 20 30 40 9.6 10.3 11.1 10.8 4.8 6.2 6.3 7.0 2.3 2.2 3.2 3.3 200 10 20 30 40 11.4 10.6 10.0 9.5 6.3 6.3 6.4 5.7 2.4 2.1 1.9 2.1 250 10 20 30 40 11.8 9.4 10.1 10.2 6.7 5.7 6.2 5.8 2.4 2.3 1.9 1.7 300 10 20 30 40 9.6 9.8 9.5 10.0 5.9 5.6 5.2 6.3 1.7 1.8 1.5 1.5 Fig. 2.1. The graphs of the standard normal distribution function (solid) and the simulated distribution functions of V100,20 (◦ ◦ ◦) and T (nV̂100,20 µ̂n /σ̂n2 ) (+ + +). Table 2.3 Empirical power of test (in %). H 5 10 15 20 n = 50 H 10% 5% 1% 79.7 70.1 65.8 64.7 74.1 62.9 58.6 56.6 61.1 49.7 44.9 41.4 10 20 40 50 n = 100 H 10% 5% 1% 96.1 93.3 88.2 86.3 95.5 89.4 84.0 81.2 90.6 81.1 73.6 70.1 20 40 60 100 n = 200 10% 5% 1% 100 99.4 99.2 98.8 100 99.0 98.8 97.1 99.4 97.3 96.9 93.5 Table 2.3 shows the percentage of Vn,H ≥ Φ −1 (1 − α) for α = 0.1, 0.05 and 0.01 based on 1000 repetitions under the alternative when Xn follows a HAR(1) process. To illustrate the applicability of our test we consider two data sets which have been widely studied using functional data analysis, the first of which consists of Eurodollar futures rate data. Such data was considered in [23], where it is argued that the curves constructed from the prices of Eurodollar futures contracts with decreasing expiration dates follow a HAR(1) model. In this case the curves are constructed from 114 points per day; point i of Xn corresponds to the price of a contract with closing date i months from day n. The sample we consider consists of 100 days of data taken from January 24 to June 17, 1997. Fig. 2.2 gives the first seven rate curves from this sample, normalized to the [0, 1] interval, and Fig. 2.3 shows the same curves after centering with the sample mean. As expected, the test rejects the null hypothesis at the 1% level for all lags H between 10 and 40. We also applied our test to ground-based magnetogram records taken from Honolulu in 2001. In particular, we focused on the horizontal intensity, which is the component of the magnetic field tangent to the Earth’s surface and pointing toward the magnetic north pole. In this case each daily curve is constructed from measurements of the horizontal intensity taken every minute, giving 1440 total measurements per day. We considered 365 of such curves taken from January 1 to December 31, 2001. Seven of these curves are illustrated in Fig. 2.4. Following Xu and Kamide [39] and Gabrys and Kokoszka [17], we subtracted the linear change over each day to obtain the sample to which we applied our test. Fig. 2.5 shows this Author's personal copy 106 L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 Fig. 2.2. The graphs of seven Eurodollar futures rate curves taken from January 24 to February 1, 1997 normalized to the [0, 1] interval. Fig. 2.3. The graphs of seven Eurodollar futures rate curves taken from January 24 to February 1, 1997 after subtracting the sample mean and normalizing to the [0, 1] interval. transformation applied to the curves in Fig. 2.4. Using H = 40 we get a p-value less than 10−4 and therefore we reject the i.i.d. null hypothesis at the 1% level. 3. Proofs of Theorems 1.1 and 1.2 Throughout this section we assume that H0 is satisfied. It follows from the definition of V̂n,H that it does not depend on the value of EX1 (t ) and therefore in this section we assume EX1 (t ) = 0 holds. First we show that V̂n,H and Vn,H have the same limit distribution, where Vn,h = H Cn2,h (t , s)dtds h=1 and Cn,h (t , s) = n 1 n i=1 Xi (t )Xi+h (s). Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 107 Fig. 2.4. The graphs of seven horizontal intensity curves taken from March 1 to March 7 2001. Fig. 2.5. The graphs of seven horizontal intensity curves taken from March 1 to March 7 2001 after subtracting the linear change in each day as in [39]. Lemma 3.1. If H0 , Assumptions 1.1 and 1.2 hold, then n H 1/2 V̂n,H − Vn,H = oP (1), as n → ∞. Proof. First we show that n H 1/2 V̂n,H − Ṽn,H = oP (1), where Ṽn,H = H h =1 C̃n2,h (t , s)dtds as n → ∞, (3.1) Author's personal copy 108 L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 with C̃n,h (t , s) = n −h 1 n i=1 Xi (t )Xi+h (s). It is easy to see that 2 2 Ĉ ( t , s ) − C̃ ( t , s ) n ,h ≤ 2|C̃n,h (t , s)an,h (t , s)| + a2n,h (t , s), n,h (1) (2) (3) where an,h (t , s) = −an,h (t , s) + an,h (t , s) + an,h (t , s) with (1) an,h (t , s) = 1 + h/n X̄n (t )X̄n (s), (2) an,h (t , s) = X̄n (s) 1 n n i=n−h+1 Xi ( t ) h 1 (3) and an,h (t , s) = X̄n (t ) Xi (s). n i=1 Using H0 we get that E (X̄n (t )X̄n (s))2 ≤ 1 nEX12 (t )X12 (s) + 3n2 EX12 (t )EX12 (s) 4 n (3.2) and EX12 (t )X12 (s)dtds = E X12 (t )dt 2 = E ∥X1 ∥4 , EX12 (t )EX12 (s)dtds = (E ∥X1 ∥2 )2 ≤ E ∥X1 ∥4 , (3.3) resulting in (X̄n (t )X̄n (t ))2 dtds ≤ E 4 E ∥X 1 ∥4 . n2 (3.4) Repeating the arguments leading to (3.4) we obtain that (2) 4 (3) n2 4 E (an,h (t , s)) ≤ E (an,h (t , s)) ≤ n2 (EX1 (t )2 X12 (s) + EX12 (t )EX12 (s)), (3.5) (EX1 (t )2 X12 (s) + EX12 (t )EX12 (s)) (3.6) and therefore E c1 a2n,h (t , s)dtds ≤ n2 E ∥X1 ∥4 (3.7) with some constant c1 . Elementary arguments give E C̃n2,h (t , s) ≤ 1 n EX12 (t )EX12 (s). (3.8) By the Cauchy–Schwarz inequality for expected values, (3.5), (3.6) and (3.8) we have with some constant c2 that E |C̃n,h (t , s)an,h (t , s)| ≤ (E C̃n2,h (t , s)Ea2n,h (t , s))1/2 c2 ≤ 3/2 (EX12 (t )EX22 (s))1/2 {(EX12 (t )X12 (s))1/2 + (EX12 (t )EX12 (s))1/2 } n and by the Cauchy–Schwarz inequality for integrals and then for expected values we obtain that ( EX12 (t )EX1 (s)) 1/2 ( EX12 (t ) X12 (s)) 1/2 dtds ≤ = EX12 (t ) EX12 (s)dtds EX12 (t )dt E = E ∥X1 ∥2 (E ∥X1 ∥4 )1/2 ≤ E ∥X1 ∥4 , 1/2 EX12 (t ) 2 1/2 X12 (t )dt X12 (s)dtds Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 109 where the last line is obtained by Hölder’s inequality. Hence by (3.3) we conclude E |C̃n,h (t , s)an,h (t , s)| ≤ 2c2 n3/2 E ∥X1 ∥4 . Thus we showed that E V̂n,H − Ṽn,H ≤ c3 1 (nH )1/2 , and therefore (3.1) follows from Assumption 1.1 via Markov’s inequality. Next we prove that n H 1/2 Ṽn,H − Vn,H = oP (1), as n → ∞. (3.9) As in the proof of (3.1) we have 2 C̃n,h (t , s) − Cn2,h (t , s) ≤ 2|Cn,h (t , s)ān,h (t , s)| + ā2n,h (t , s) with ān,h (t , s) = n 1 n i=n−h+1 Xi (t )Xi+h (s). It is easy to see that E ā2n,h (t , s) ≤ h n2 EX12 (t )X12 (s) and ECn2,h (t , s) ≤ 1 n EX12 (t )X12 (s). Thus we get E Ṽn,H − Vn,H ≤ c1 H h1/2 h =1 n3/2 , so Assumption 1.1 and Markov’s inequality imply (3.9). Lemma 3.1 is an immediate consequence of (3.1) and (3.9). Next we note that for all i = 1, 2, . . . E ξi,ℓ = 0, E ξi2,ℓ = λℓ and E ξi,ℓ ξi,k = 0 if k ̸= ℓ, (3.10) where ξi,ℓ is defined in (1.4). Using (1.4) we obtain that Cn2,h (t , s)dtds = n ∞ 1 n2 i,j=1 ℓ,k=1 ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k . Let T > 1 be an integer and define Qn (T ) = H n T 1 1 (ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k − E ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k ) H 1/2 h=1 n i,j=1 ℓ,k=1 Qn,1 (T ) = Qn,2 (T ) = Qn,3 (T ) = 1 H n T ∞ 1 (ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k − E ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k ) H 1/2 h=1 n i,j=1 ℓ=1 k=T +1 1 H n ∞ T 1 (ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k − E ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k ) H 1/2 h=1 n i,j=1 ℓ=T +1 k=1 1 H n ∞ 1 (ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k − E ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k ). H 1/2 h=1 n i,j=1 ℓ,k=T +1 If there is a T0 such that λ1 ≥ λ2 ≥ · · · λT0 > λT0 +1 = 0, then Xi (t ) in (1.4) is a finite sum and Qn,i (T ) = 0 for all i = 1, 2, 3. So in this case Lemma 3.2 automatically holds, and the proofs are mathematically much simpler. Hence from now on we assume that λj > 0 for all j = 1, 2, . . . . Author's personal copy 110 L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 Lemma 3.2. If H0 , Assumptions 1.1 and 1.2 hold, then for every ε > 0 lim lim sup P {|Qn,i (T )| > ε} = 0, T →∞ n→∞ i = 1, 2, 3. (3.11) Proof. It follows from (3.10) that Qn,1 (T ) = H T ∞ 1 H 1/2 n h=1 ℓ=1 k=T +1 n 2 2 (ξi,ℓ ξi+h,k − λℓ λk ) + i =1 2 H 1/2 n H T ∞ ξi,ℓ ξi+h,k ξj,ℓ ξj+h,k h=1 ℓ=1 k=T +1 1≤i<j≤n = Qn,1,1 (T ) + 2Qn,1,2 (T ). Introducing ζi+h = ζi+h (T ) = ∞ ξi2+h,k , ηi = ηi (T ) = T ℓ=1 k=T +1 ξi2,ℓ and wi,h = ηi ζi+h − E ηi E ζi+h , we can write Qn,1,1 (T ) = 1 H n H 1/2 n h=1 i=1 wi,h . Clearly, EQn,1,1 (T ) = 0 and EQn2,1,1 (T ) = n H 1 Hn2 i,j=1 h,k=1 E wi,h wj,k . Using the independence of the Xi ’s we observe that E wi,h wj,k = 0, except in the following cases: (i) i ̸= j and i = j + k: The number of such terms is bounded by n2 H and |E wi,h wj,k | ≤ (|E (η1 ζ1 )E η1 E ζ1 | + E (η1 ζ1 )2 + (E η1 E ζ1 )2 ) (ii) i ̸= j and j = k + h: As in (i) the number of such terms is less than n2 H and |E wi,h wj,k | ≤ (|E (η1 ζ1 )E η1 E ζ1 |+ E (η1 ζ1 )2 + (E η1 E ζ1 )2 ) (iii) i ̸= j and i + h = j + k: The number of such terms is again less than n2 H and |E wi,h wj,k | ≤ ((E η1 )2 E ζ12 + (E η1 E ζ1 )2 ) (iv) i = j: The number of such terms is less than nH 2 and |E wi,h wj,k | ≤ (E η12 E ζ12 + E η12 (E ζ1 )2 + (E η1 E ζ1 )2 ). Thus we get with some constant c1 EQn2,1,1 (T ) ≤ c1 Q ∗ (T ), (3.12) where Q ∗ (T ) = |E (η1 ζ1 )E η1 E ζ1 | + E (η1 ζ1 )2 + (E η1 )2 E ζ12 + E η12 (E ζ1 )2 + (E η1 E ζ1 )2 . With ζ̄i,j,h = ζ̄i,j,h (T ) = ∞ ξi+h,k ξj+h,k and η̄i,j = η̄i,j (T ) = T ξi,ℓ ξj,ℓ ℓ=1 k=T +1 we can write Qn,1,2 (T ) = 1 H H 1/2 n h=1 1≤i<j≤n ζ̄i,j,h η̄i,j . By the independence of the Xi ’s we have EQn,1,2 (T ) = 0 and EQn2,1,2 (T ) = 1 H nH 2 h,k=1 1≤i <j ≤n 1≤i <j ≤n 1 1 2 2 E [ζ̄i1 ,j1 ,h η̄i1 ,j1 ζ̄i2 ,j2 ,k η̄i2 ,j2 ]. Due to the independence of the Xi ’s, E [ζ̄i1 ,j1 ,h η̄i1 ,j1 ζ̄i2 ,j2 ,k η̄i2 ,j2 ] = 0 except in the case when i1 = i2 , j1 = j2 and h = k. In this case E [ζ̄i1 ,j1 ,h η̄i1 ,j1 ζ̄i2 ,j2 ,k η̄i2 ,j2 ] = E ζ̄12,2,0 E η̄12,2 . Thus we have that EQn2,1,2 (T ) ≤ E ζ̄12,2,0 E η̄12,2 . Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 111 Next we observe that by (3.10) E ζ̄12,2,0 = ∞ E ξ1,k1 ξ2,k1 ξ1,k2 ξ2,k2 = k1 ,k2 =T +1 ∞ 2 E ξ12,k = k=T +1 ∞ 2 λi k=T +1 and similarly E η̄12,2 = T 2 λi . k=1 It follows that lim lim sup EQn2,1,2 (T ) = 0. T →∞ (3.13) n→∞ Using (1.4) and (3.10) we get E ∥X 1 ∥4 = E ∞ 2 2 ξ1,ℓ ϕℓ (t ) dt ℓ=1 =E ∞ ℓ=1 = ∞ ℓ=1 2 ξ12,ℓ E ξ14,ℓ + λℓ λk , (3.14) 1≤ℓ̸=k<∞ and therefore Assumption 1.2 yields that ∞ ℓ=1 E ξ14,ℓ < ∞. (3.15) Using the definition of Q ∗ (T ) and (3.15) we conclude that Q ∗ (T ) → 0, as T → ∞, and therefore by (3.12) lim lim sup EQn2,1,1 (T ) = 0. T →∞ (3.16) n→∞ Now (3.11) follows from (3.13) and (3.16) via Chebyshev’s inequality if i = 1. The other two cases can be proved in the same way, so the details are omitted. To study the asymptotic behavior of Qn (T ) we note that Qn (T ) = 1 H T H 1/2 h=1 ℓ,k=1 2 n 1 ξi,ℓ ξi+h,k − λℓ λk . 1/2 n i =1 The goal of the next lemmas is to provide an approximation with rates for the vector of partial sums n−1/2 ℓ, k ≤ T , 1 ≤ h ≤ H. Next we introduce the standardized variables χi,h;ℓ,k = n i =1 ξi,ℓ ξi+h,k , 1 ≤ ξi,ℓ ξi+h,k (λℓ λk )1/2 and the column vector χi ∈ Rd , d = HT 2 , whose coordinates are χi,h;ℓ,k ordered lexicographically in h, ℓ, k. Lemma 3.3. If H0 , Assumptions 1.1 and 1.2 hold, then E χi = 0, (3.17) E χi χ = Id , (3.18) T i where Id is the d × d identity matrix, E m i =1 χi m i=1 T χi = mId , (3.19) Author's personal copy 112 L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 and for all m ≥ H E m 4 χi,h;ℓ,k ≤ i=1 c λ4T H 2 m2 , (3.20) where c only depends on E ∥X1 ∥4 . Proof. The independence of ξi,ℓ and ξi+h,k gives (3.17), since E ξi,ℓ = 0. It follows from (3.10) and independence that E χi2,h;ℓ,k = 1 and the coordinates of χi are uncorrelated, proving (3.18). Also, χ1 , χ2 , . . . are uncorrelated random vectors, (3.18) implies (3.19). Let Q (r ) = Q (r ; h) = {i : 1 ≤ i ≤ m, mod(i, h + 1) = r }, r = 0, 1, . . . , h and decompose m h χi,h;ℓ,k = χi,h;ℓ,k . r =0 i∈Q (r ) i =1 Since for every r , i∈Q (r ) χi,h;ℓ,k is a sum of independent and identically distributed random variables, the Rosenthal inequality (cf. [35, p. 59]) yields 4 E χi,h;ℓ,k i∈Q (r ) 2 m m E χ14,h;ℓ,k + . ≤ c1 h + 1 h+1 Also, by the independence of the Xi ’s we obtain from (3.14) that E χ14,h;ℓ,k = 1 E ξ14,ℓ E ξ14,k ≤ 2 λ2ℓ λk (E ∥X1 ∥4 )2 , λ2ℓ λ2k and therefore 4 E χi,h;ℓ,k i∈Q (r ) ≤ c2 1 m2 λ4T (h + 1)2 . By Hölder’s inequality E m 4 χi,h;ℓ,k ≤ (h + 1)3 i=1 h r =0 4 E χi,h;ℓ,k ≤ i∈Q (r ) and therefore we obtain immediately (3.20). c3 λ4T h2 m 2 , Let |x| denote the Euclidean norm of vectors and matrices. Lemma 3.4. Let δ1 , δ2 , . . . , δm be independent and identically distributed random vectors in Rd with E δ1 = 0, E δ1 δT1 = Id and E |δ1 |3 < ∞. Then for all m we can define γ m , a standard normal vector in Rd such that m −1/2 P m δi − γ m ≥ cm−1/8 d1/4 (E |δ1 |3 + E |γ m |3 )1/4 ≤ cm−1/8 d1/4 (E |δ1 |3 + E |γ m |3 )1/4 , i =1 where c is an absolute constant. Proof. The result is an immediate consequence of Theorem 6.4.1 in [37, p. 207] and the corollary to Theorem 11 in [38]. For further results on the dependence of the rate of convergence of the dimension of the summands in the central limit theorem we refer to Bentkus [3]. We note that (E |δ1 |3 + E |γ m |3 )1/4 ≤ (E |δ1 |3 )1/4 + (E |γ m |3 )1/4 ≤ (E |δ1 |4 )3/16 + (E |γ m |4 )3/16 (3.21) and since |γ m |2 has a χ 2 distribution with d degrees of freedom, E |γ m |4 ≤ c1 d2 . (3.22) Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 113 Lemma 3.5. If H0 , Assumptions 1.1 and 1.2 hold, then for every n we can define a standard normal vector γ n ∈ Rd such that n −1/2 3 / 4 3/4 P n χi − γ n ≥ cn−2/21 H 11/8 T /λT ≤ cn−2/21 H 11/8 T /λT , i =1 where c only depends on E ∥X1 ∥4 . Proof. Since χ1 , . . . , χn are not independent random variables, Lemma 3.4 cannot be used immediately. However, the sequence χi , i ≥ 1, is H-dependent. Let N > H and define the vectors δi = χj and δi,H = j∈U (i) χj , (3.23) j∈J (i) where U (i) = {j : 1 ≤ j ≤ n, (N + H )(i − 1) + 1 ≤ j ≤ (N + H )(i − 1) + N } and J (i) = {j : 1 ≤ j ≤ n, (N + H )(i − 1) + N + 1 ≤ j ≤ (N + H )i}. It is easy to see that the number of random vectors defined in (3.23) is not more then K + 1, where K = ⌊n/(N + H )⌋. Also, δi , 1 ≤ i ≤ K + 1 are independent and δi , 1 ≤ i ≤ K are identically distributed. Similarly, δi,H , 1 ≤ i ≤ K + 1 are independent and, with the possible exception of δK +1,H , are identically distributed. Clearly, n χi = i =1 K δi + i =1 K +1 δi,H + δK +1 . (3.24) i=1 For every x > 0 we have by Markov’s inequality that P n −1/2 4 K +1 K +1 1 δ + δ E > x ≤ δ + δ i ,H K +1 K +1 i=1 i,H x 4 n2 i = 1 4 K +1 4 2 ≤ 4 2 E δi,H + E |δK +1 |4 . x n i =1 (3.25) It follows from (3.20) that E |δK +1 |4 ≤ c1 d4 H 2 N 2 /λ4T . (3.26) We use the notation δi,H (ℓ) for the ℓth coordinate of δi,H . It is easy to see that 4 4 4 K +1 K +1 d d K +1 δi,H (j) ≤ d3 E δ ≤E E δi,H (j) . i = 1 i ,H j=1 i=1 j =1 i =1 Using Rosenthal’s inequality (cf. [35, p. 59]) with Lemma 3.3 we obtain that E K +1 4 δi,H (j) ≤ c2 KH 4 /λ4T + (KH )2 ≤ c2 K 2 H 4 /λ4T . i=1 Hence (3.25) and (3.26) imply K +1 c3 d4 P n−1/2 δi,H + δK +1 > x ≤ 4 2 4 (H 2 N 2 + K 2 H 4 ). i =1 x n λT (3.27) The vectors N −1/2 δi ∈ Rd , 1 ≤ i ≤ K satisfy all the conditions of Lemma 3.4, since they are independent and identically distributed and according to Lemma 3.3, EN −1/2 δi = 0, E (N −1/2 δi )(N −1/2 δi )T = Id . So in light of (3.22) we only need to compute E |N −1/2 δi |4 . It follows from (3.20) that E |N −1/2 δi |4 ≤ c4 d4 H 2 /λ4T , and therefore for any standard normal random vector γ m ∈ Rd 3/4 (E |N −1/2 δ1 |4 )3/16 + (E |γ m |4 )3/16 ≤ c5 d12/16 H 3/8 /λT . Author's personal copy 114 L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 Applying Lemma 3.4, we can find a standard normal random vector γ̄ K ∈ Rd such that K −1/2 −1/2 3/4 3/4 P K N δi − γ̄ K ≥ c6 K −1/8 dH 3/8 /λT ≤ c6 K −1/8 dH 3/8 /λT . i =1 (3.28) = ⌊n5/21 ⌋ in the definitions of δi and δi,H . By Assumption 1.1, K ≈ n16/21 . So using (3.24), (3.27) with 3/4 x = n−2/21 dH 3/8 /λT and (3.28), we conclude 1/2 n NK 3/4 3/4 γ̄ K ≥ c7 n−2/21 dH 3/8 /λT P n−1/2 ≤ c7 n−2/21 dH 3/8 /λT . (3.29) χi − n i =1 Let N Elementary arguments show that (NK )1/2 = O(Hn−5/21 ) − 1 n1/2 and since |γ̄ K | is a χ 2 random variable with d degrees of freedom, there is a constant c8 such that P {|γ̄ K | > c8 d1/2 log n} ≤ c8 d/n2 . Hence the lemma follows from (3.29) with the choice of γ n = γ̄ K and using d = HT 2 . Lemma 3.6. If H0 , Assumptions 1.1 and 1.2 hold, then for every n we can define independent standard normal random variables γn (h; ℓ, k), 1 ≤ h ≤ H , 1 ≤ k, ℓ ≤ T such that for every T > 0 |Qn (T ) − Zn∗ (T )| = oP (1), as n → ∞, where Zn∗ (T ) = 1 H T (γn2 (h; ℓ, k) − 1)λℓ λk . H 1/2 h=1 ℓ,k=1 Proof. The result is an immediate consequence of Lemma 3.5 and Assumption 1.1. Let γ (h; ℓ, k), 1 ≤ k, ℓ, h < ∞ be independent standard normal random variables and define Z̄n (T ) = 1 H T H 1/2 h=1 ℓ,k=1 (γ 2 (h; ℓ, k) − 1)λℓ λk and Zn = 1 H ∞ (γ 2 (h; ℓ, k) − 1)λℓ λk . H 1/2 h=1 ℓ,k=1 (We note that Assumption 1.1 implies that Zn is a well defined random variable.) It is clear that for every n and T D Zn∗ (T ) = Z̄n (T ). Lemma 3.7. If H0 , Assumptions 1.1 and 1.2 hold, then for every ε > 0 we have lim lim sup P {|Z̄n (T ) − Zn | > ε} = 0. T →∞ n→∞ Proof. The result can be established along the lines of the proof of Lemma 3.2, but the arguments can be simplified since Z̄n (T ) as well as Zn are weighted sums of independent χ 2 random variables with one degree of freedom. We omit the details. Lemma 3.8. If H0 , Assumptions 1.1 and 1.2 hold, then Zn 2 1/2 (2σ ) D → N (0, 1), where N (0, 1) stands for a standard normal random variable and σ 2 is defined in Theorem 1.1. Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 Proof. The variables ℓ,k=1 (γ ∞ 2 (h; ℓ, k) − 1)λℓ λk , 1 ≤ h ≤ H, are independent and identically distributed with zero mean and variance 2σ , so the result follows immediately from the central limit theorem. 2 115 Proof of Theorem 1.1. The proof is done in several steps. It follows from Lemma 3.1 that it is enough to consider Vn,H . Next we use the Karhunen–Loéve expansion of (1.4) and we express Vn,h as an infinite sum of the projection coefficients ξi,ℓ = ⟨Xi , ϕℓ ⟩. Lemma 3.2 proves that it is sufficient to establish the normality of Qn (T ), i.e. the infinite series representation of Vn,H we need to keep finitely many terms. Lemmas 3.6 and 3.7 approximate Qn (T ) with a weighted average of independent χ 2 random variables with one degree of freedom. The final step of the proof is the central limit theorem given in Lemma 3.8. Proof of Theorem 1.2. It follows from the definition of Ĉn,0 that Ĉn,0 (t , t )dt = n 1 Xi2 (t )dt − n i=1 X̄n2 (t )dt . Since by Assumption 1.2 we have that E ( Xi2 (t )dt )2 < ∞, the central limit theorems yields that Ĉn,0 (t , t )dt − C0 (t , t )dt = OP (n−1/2 ). Elementary calculations give E X̄n2 (t )dt = 1 n C0 (t , t )dt , so by Markov’s inequality we have X̄n2 (t )dt = OP (1/n). Thus we get immediately that |µ̂n − µ| = OP (n−1/2 ). Next we observe that Xi (t )Xi (s), i ≥ 1 is a sequence of independent and identically distributed random variables in L2 ([0, 1]2 ) and by Assumption 1.2 (Ĉn,0 (t , s) − C0 (t , s))2 dtds = O(1/n). E Hence P σ̂n2 → σ 2 follows from Markov’s inequality. By Assumption 1.1 we have that H /n → 0, so Theorem 1.1 implies the result. 4. Proof of Theorem 1.3 and the outline of the proof of Remark 1.2 Proof of Theorem 1.3. We can assume that H ≥ h0 and clearly 1 (2H σ̂n2 )1/2 nV̂n,H − H µ̂n ≥ 1 n (2H σ̂n2 )1/2 Ĉn2,h0 dtds − H µ̂n . We can assume without loss of generality that EX1 (t ) = 0. Elementary arguments yield µ̂n = n 1 n i=1 Xi2 (t )dt − X̄n2 (t )dt Under the assumptions of Theorem 1.3, [8, p. 118]) n 1 n i=1 Xi2 (t )dt → EX12 (t )dt 2 . Xi2 (t )dt is a stationary and ergodic sequence, so by the ergodic theorem (cf. a.s. Next we use Theorem A.1 with Yi (t ) = Xi (t ) to conclude X̄n2 (t )dt → 0 a.s. (4.1) Author's personal copy 116 L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 and therefore µ̂n → µ a.s. (4.2) Similarly, σ̂ = 2 n 2 n 1 Xi (t )Xi (s) n i =1 dtds − 2 n 1 n i=1 Xi (t )Xi (s) X̄n (t )X̄n (s)dtds + (X̄n (t )X̄n (s))2 dtds. Next we use Theorem A.1 with Yi (t , s) = Xi (t )Xi (s) to conclude 2 n 1 Xi (t )Xi (s) n i=1 dtds → (EX1 (t )X1 (s))2 dtds a.s. Hence the Cauchy–Schwartz inequality via (4.1) implies 1/2 2 n n 1 1 Xi (t )Xi (s) X̄n (t )X̄n (s)dtds ≤ Xi (t )Xi (s) dtds X̄n2 (t )dt → 0 a.s. n i=1 n i=1 Hence we get σ̂n2 → σ 2 a.s. (4.3) Repeating the arguments leading to (4.3), one can easily verify that Ĉn2,h0 dtds → Ch20 dtds > 0 a.s. Now Theorem 1.3 follows from Assumption 1.4. Outline of the proof of Remark 1.2. One can verify that Ĉn,h (t , s) can be approximated with n 1 Ĉn∗,h (t , s) = n i=1 Yi (t )Yi+h (s) + n γn n i =1 Yi (t )Zi+h (s) + n γn n i=1 Zi (t )Yi+h (s) + n γn2 n Zi (t )Zi+h (s), i =1 so nV̂n,H = H n1/2 i=1 h =1 n γn + = n 1 n1/2 i=1 H n γn n1/2 i=1 n 2 1/2 1 nn Zi (t )Yi+h (s) + γ n 1 n1/2 i=1 h =1 Yi (t )Yi+h (s) + n i=1 2 Zi (t )Zi+h (s) 2 1/2 1 nn Yi (t )Yi+h (s) + γ Yi (t )Zi+h (s) n n i =1 dtds 2 Zi (t )Zi+h (s) dtds + oP (H 1/2 ) 1/2 γn → 0. For every h {Zi Zi+h } satisfies the central limit theorem in L2 according to Berkes et al. [5], thus n 1 Zi (t )Zi+h (s) − Ah (t , s) = OP (n−1/2 ), n i=1 since H where the square integrable function Ah (t , s) is defined in (1.7). By the Cauchy–Schwarz inequality we have H n1/2 i=1 h=1 = n 1 H h =1 Yi (t )Yi+h (s) + γ n 1 n1/2 i=1 2 nn n 1 n1/2 i=1 h=1 dtds 2 Yi (t )Yi+h (s) + γ 1/2 A2h (t , s)dtds Zi (t )Zi+h (s) 2 1/2 Ah nn Due to the {4 + δ}-decomposability of {Zi } ∞ 2 <∞ ( t , s) dtds + OP (γn4 Hn1/2 ). Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 117 holds, therefore H n 1 n1/2 i=1 h=1 2 1/2 nn + 2γ 2 Yi (t )Yi+h (s) + γn2 n1/2 Ah (t , s) H = n 1 n1/2 i=1 n 1 n1/2 i=1 h =1 n 1 n1/2 i=1 h =1 h =1 H dtds = H Yi (t )Yi+h (s) Ah (t , s)dtds + γ 4 nn H 2 Yi (t )Yi+h (s) dtds A2h (t , s)dtds h =1 2 Yi (t )Yi+h (s) dtds + γn4 n H A2h (t , s)dtds + OP (γn2 n1/2 ). h=1 The remark follows from Theorem 1.1. Appendix. Ergodic theorem in Hilbert spaces In this section we assume Y1 (t), Y2 (t), . . . are random functions defined on a measurable set C ⊆ Rd satisfying the following conditions: Assumption A.1. Y1 , Y2 , . . . is a stationary and ergodic sequence of square integrable random functions and Assumption A.2. E ∥Y1 ∥ < ∞. Theorem A.1. If Assumptions A.1 and A.2 hold, then, as n → ∞, n 1 Yi → ∥EY1 ∥ a.s. n i =1 Proof. Since | ∥(1/n) i=1 Yi ∥−∥EY1 ∥ | ≤ ∥(1/n) i=1 (Yi − EYi )∥, so we can assume without loss of generality that EYi = 0. Let vℓ , 1 ≤ ℓ < ∞ be a basis for the square integrable functions defined on C . Then we have n Y i ( t) = ∞ n ⟨Yi , vℓ ⟩vℓ (t) and ∥Yi ∥2 = ∞ ⟨Yi , vℓ ⟩2 . ℓ=1 ℓ=1 According to Assumption A.2 ∞ E ⟨Y1 , vℓ ⟩2 1/2 <∞ ℓ=1 and therefore lim E K →∞ ∞ 1/2 ⟨Y1 , vℓ ⟩2 = 0. (A.1) ℓ=K +1 For every fixed K ≥ 1 we define Yi,K (t) = K ⟨Yi , vℓ ⟩vℓ (t), 1 ≤ i < ∞. ℓ=1 By the triangle inequality we have n n n n n Yi ≤ (Yi − Yi,K ) + Yi,K ≤ ∥Yi − Yi,K ∥ + Yi,K . i =1 i =1 i=1 i =1 i =1 It follows from Assumption A.1 that ∥Yi − Yi,K ∥, 1 ≤ i < ∞ is a stationary and ergodic sequence and by Assumption A.1 E ∥Y1 − Y1,K ∥ < ∞. So using the ergodic theorem for random variables (cf. [8, p. 118]) we conclude that n 1 n i=1 ∥Yi − Yi,K ∥ → E ∥Y1 − Y1,K ∥ a.s. (A.2) Author's personal copy 118 L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 Let ε > 0. Putting together (A.1) and (A.2) we can find an integer K = K (ε) such that lim n→∞ n 1 n i=1 ∥Yi − Yi,K ∥ ≤ ε a.s. (A.3) For every fixed K we can write n Yi,K = K n ⟨Yi , vℓ ⟩vℓ (t) ℓ=1 i=1 i=1 and therefore 2 2 n K n Y = ⟨Yi , vℓ ⟩ . i=1 i,K ℓ=1 i=1 For every ℓ ≥ 1, the sequence ⟨Yi , vℓ ⟩, 1 ≤ i < ∞ is stationary and ergodic and by Assumption A.2 E |⟨Y1 , vℓ ⟩| ≤ E ∥Y1 ∥ < ∞. Hence EY1 (t) = 0 implies E ⟨Y1 , vℓ ⟩ = ℓ that C EY1 (t)vℓ (t)dt = 0. So using again the ergodic theorem in [8, p. 118] we get for any n 1 n i =1 ⟨Yi , vℓ ⟩ → 0 a.s. resulting in n 1 Yi,K → 0 a.s. n i=1 (A.4) for every K ≥ 1. It follows from (A.2) and (A.4) that n 1 lim sup Yi ≤ ε a.s. n i=1 n→∞ for any ε > 0, which completes the proof of Theorem A.1. References [1] T. Ando, S. Imoto, S. Miyano, Functional data analysis of the dynamics of gene regulatory networks, in: Knowledge Exploration in Life Science Informatics, in: Lecture Notes in Computer Science, vol. 3303, Springer, 2004, pp. 69–83. [2] A. Antoniadis, E. Paparoditis, T. Sapatinas, A functional wavelet-kernel approach for time series prediction, Journal of the Royal Statistical Society Series B 68 (2006) 837–857. [3] V. Bentkus, On the dependence of the Berry–Esseen bound on dimension, Journal of Statistical Planning and Inference 113 (2003) 385–402. [4] I. Berkes, R. Gabrys, L. Horváth, P. Kokoszka, Detecting changes in the mean of functional observations, Journal of the Royal Statistical Society Series B 70 (2009) 927–946. [5] I. Berkes, L. Horváth, G. Rice, Weak invariance principles for sums of dependent random functions, Stochastic Processes and their Applications 123 (2013) 385–403. [6] D. Bosq, Linear Processes in Function Spaces, Springer Verlag, 2000. [7] G.E.P. Box, D.A. Pierce, Distribution of residual autocorrelations in autoregressive-integrated moving average time series models, Journal of the American Statistical Association 65 (1970) 1509–1526. [8] L. Breiman, Probability, Addison-Wesley, 1968. [9] P.J. Brockwell, R.A. Davis, Time Series: Theory and Methods, Springer, New York, 1987. [10] L. Canal, A normal approximation for the chi-square distribution, Journal of Computational Statitics and Data Analysis 48 (2005) 803–808. [11] J.R. Carey, P. Liedo, L. Harshman, H.-G. Müller, J.-L. Wang, Life history response of Mediterranean fruit flies to dietary restrictions, Aging Cell 1 (2002) 140–148. [12] R.V. Chitturi, Distribution of multivariate white noise autocorrelation, Journal of the American Statistical Association 71 (1976) 223–226. [13] R.A. Fisher, On the interpretation of χ 2 from contingency tables and calculation of P, Journal of the Royal Statistical Society Series A 85 (1922) 87–94. [14] C. Francq, H. Raïssi, Multivariate portmanteau test for autoregressive models with uncorrelated but nonindependent errors, Journal of Time Series Analysis 28 (2006) 454–470. [15] S. Fremdt, L. Horváth, P. Kokoszka, J. Steinebach, Testing the equality of covariance operators in functional samples, Scandinavian Journal of Statistics 40 (2013) 138–152. [16] R. Gabrys, L. Horváth, P. Kokoszka, Tests for error correlation in the functional linear model, Journal of the American Statistical Association 105 (2010) 1113–1125. [17] R. Gabrys, P. Kokoszka, Portmanteau test of independence for functional observations, Journal of the American Statistical Association 102 (2007) 1338–1348. [18] I. Gohberg, S. Golberg, M.A. Kaashoek, Classes of Linear Operators, in: Operator Theory: Advances and Applications, vol. 49, Birkhäuser, 1990. [19] S. Hörmann, P. Kokoszka, Weakly dependent functional data, Annals of Statistics 38 (2010) 1845–1884. [20] L. Horváth, P. Kokoszka, Inference for Functional Data with Applications, Springer, New York, 2012. Author's personal copy L. Horváth et al. / Journal of Multivariate Analysis 117 (2013) 100–119 119 [21] L. Horváth, P. Kokoszka, R. Reeder, Estimation of the mean of functional time series and a two sample problem, Journal of the Royal Statistical Society Series B 75 (2013) 103–122. [22] J.R.M. Hosking, The multivariate portmanteau statistic, Journal of the American Statistical Association 75 (1980) 602–608. [23] V. Kargin, A. Onatski, Curve forecasting by functional autoregression, Journal of Multivariate Analysis 99 (2008) 2508–2526. [24] W.K. Li, Diagnostic Checks in Time Series, Chapman and Hall/CRC, New York, 2004. [25] W.K. Li, A.I. McLeod, Distribution of the residual autocorrelations in multivariate ARMA time series models, Journal of the Royal Statistical Society Series B 43 (1981) 231–239. [26] G. Ljung, G. Box, On a measure of lack of fit in time series models, Biometrika 66 (1978) 297–303. [27] H. Lütkepohl, New Introduction to Multiple Time Series Analysis, Springer, Berlin, 2005. [28] E. Mahdi, A.I. McLeod, Improved multivariate portmanteau test, Journal of Time Series Analysis 33 (2012) 211–222. [29] A. Mas, Weak convergence for the covariance operators of a Hilbertian linear process, Stochastic Processes and their Applications 99 (2002) 117–135. [30] I. Maslova, P. Kokoszka, J. Sojka, L. Zhu, Removal of nonconstant daily variation by means of wavelet and functional data analysis, Journal of Geophysical Research 114 (2009) A03202. [31] H.-G. Müller, U. Stadtmüller, Generalized functional linear models, Annals of Statistics 33 (2005) 774–805. [32] V.M. Panaretos, D. Kraus, J.H. Maddocks, Second-order comparison of Gaussian random functions and the geometry of DNA minicircles, Journal of the American Statistical Association 105 (2010) 670–682. [33] D.B. Peizer, J.W. Pratt, A normal approximation for binomial, F , beta, and other common, related tail probabilities, Journal of the American Statistical Association 63 (1968) 1416–1456. [34] D. Peña, J. Rodrígez, A powerful portmanteau test of lack of fit for time series, Journal of the American Statistical Association 97 (2002) 601–610. [35] V.V. Petrov, Limit Theorems of Probability Theory, in: Oxford Science Publications, 1995. [36] J.O. Ramsay, B.W. Silverman, Functional Data Analysis, Springer, 2005. [37] V.V. Senatov, Normal Approximation: New Results, Methods and Problems, VSP, 1998. [38] V. Strassen, The existence of probability measures with given marginals, Annals of Mathematical Statistics 36 (1965) 423–439. [39] W.-Y. Xu, Y. Kamide, Decomposition of daily geomagnetic variations by using method of natural orthogonal components, Journal of Geophysical Research 109 (2004) A05218.