Spectral estimation in a random effect model Luengo I.1 , Hernández, C. N.2 , and Saavedra P.2 1 2 Departamento de Informática y Sistemas mluengo@dis.ulpgc.es Departamento de Matemáticas cflores@dma.ulpgc.es; saavedra@dma.ulpgc.es Summary. A set of time series evaluated on a sample of objects from a population at the same time points is observed. In this paper an additive random effects model is considered for the aforementioned set of time series, also based on the asymptotic representation of the general linear processes. In section 2 the choice of model is justified and in section 3 the estimates are proposed for both the population component and the individual components and their properties are analyzed. Key words: Replicated time series, spectral analysis, kernel estimation. 1 Introduction When analysing a set of time series corresponding to the levels of LH hormone in the blood of a sample of subjects from a population, Diggle, P. J. and Al-Wasel, I. (1993) observed that such series could not be considered realisations of only one linear stationary process. This was due to the variability among the sample subjects. Thus they proposed a random effect model for the spectral analysis of the corresponding set of time series, based on the asymptotic representation of the periodogram for general linear processes. This model involves a parameter called the population spectrum, a random component which specifies each individual or time series and finally, a term of error related to the errors of the individual periodograms. Hernández et al., using a more general model, estimate the population spectrum by means of the bootstrap and analyze the consistency of this method using the Mallows distance. Saavedra et al. (2000) develop a doubly stochastic stationary processes theory to analyze a set of replicated time series from the frequency domain. According to this theory, the population spectrum is approximated by means of kernel estimates. Here, the smoothing parameter depends on the number of time series and the number of observations per each time series. The asymptotic properties of the estimate are therefore based on the number of time series, the number of observations per series and the bandwidth. In this paper we estimate not only the population pattern but also the individual component of each series and the properties of both estimates are analyzed. 1218 Luengo I., Hernández, C. N., and Saavedra P. 2 Model Let (B, B,PB ) be a probabilistic space so that each object b ∈ B have a general linear process {X (b, t) : t ∈ Z} associated. Let be {Xi (t) : i = 1, · · · , r; t = 1, · · · , N } be a set of time series evaluated on a random sample of r objects b1 , . . . , br from B and observed at the same N times. The periodogram of each series at the j-th Fourier frequency ωj = 2πj/N , for j = 1, · · · , ν = [N/2] − 1 is defined as: 1 Ii,N (ωj ) = 2πN 2 N X −iωj t Xi (t) e (1) t=1 Let C0 = −0.5771 the Euler constant and let be yij = log Ii (ωj ) − C0 . We suppose that, for any objects b1 , . . . , br1 ∈ B and for any times t = 1, . . . , N , yij can be write: yij = µ (ωj ) + Zi (ωj ) + eij (2) for i = 1, · · · , r; j = 1, · · · , ν = [N/2] − 1, where: 1. µ (ω) for |ω| ≤ π represents an underlying pattern in the population. 2. {Zi (ω) ; |ω| ≤ π}, i = 1, · · · , r are r independent trajectories of a stochastic process {Z (ω)} such that E [Z (ωj )] = 0 for j = 1, . . . , ν. We will write cov [Z (ω) , Z (θ)] = ΨZ (ω, θ), for |ω| , |θ| ≤ π. 3. {eij } are independent and identically distributed on i random variables, with E [eij ] = 0 and E e2ij = σe2 < ∞ for j = 1, . . . , ν. Observe that {Zi (ω) ; |ω| ≤ π} involves the specificity of the i-th object in the population. Remark 1. The proposed model is approximately satisfied by sets of time series such as: (i) For each i = 1, . . . , rl , {Xi (t) , t ∈ Z}, is a trajectory of a general linear process with absolutely continuous spectral distribution, being the corresponding spectral density {Qi (ω) , |ω| ≤ π}. (ii) The process {Xi (t) , t ∈ Z} verify the conditions of theorem 6.2.2 in Priestley (1981, pgs 424-425) Under these conditions the periodogram of each trajectory {Xi (t) : t = 1, . . . , ν} satisfies: Ii (ωj ) = Qi (ωj ) Uij + Ri,j (3) where for each i = 1, · · · , rl , Ri,j denotes a term which is asymptotically negligible, and Uij are asymptotically independent random variables having the standard exponential distribution (this distribution is exact if the process are Gaussian). Thus, we can consider E [log Ulij ] ≈ C0 , the above mentioned Euler constant. Neglecting the term Rl,i,N (ωj ) (as in Franke and Härdle (1992)), and making the necessary transformations in (3), we obtain: Yij = µ (ωj ) + Zi (ωj ) + eij for i = 1, · · · , r; j = 1, · · · , ν Spectral estimation in a random effect model where µ (ωj ) = E [log Qi (ωj )]; Zi (ωj ) eij = log Uij (ωj ) − C0 = 1219 log Qi (ωj ) − E [log Qi (ωj )] and It is obvious that E [Zi (ωj )] = 0, for all i and |ω| ≤ π and that the random variables eij are independent on i and approximately identically distributed on j and E[eij ] ≈ 0 (see Davis and Jones 1968). 3 Estimation of the population pattern. In this section we propose a kernel estimate for the population parameter µ (ω) defined in equation (6) with conditions (i) and (ii), and likewise for each of the individual trajectories Zi (ω). With this aim we consider a kernel function K (θ) having the following properties: K1) K (θ) is a symmetric, nonnegative function on the real line, with compact support R ∞[−κ, κ], uniformly Lipschitz R ∞ 2with constant Lκ . 1 1 K2) 2π K (θ) dθ = 1 and 2π θ K (θ) dθ = 1. −∞ −∞ In this way, the kernel estimate of µ (ω) is defined as a smoothing of the averages P y·j = (1/r) ri=1 yij in the form: µ b (ω) = ν ω − ω 1 X j y·j K N h j=−ν h (4) being h the bandwidth. Once this parameter has been estimated, we smooth the residuals yij − µ b (ωj ) to define the estimate of each individual trajectory Zi (ω), for i = 1, . . . , r so bi (ω) = Z ν 1 X ω − ωj (yij − µ b (ωj )) K N λi j=−ν λi (5) being λi the corresponding bandwidth. In order to establish the properties of the estimates we suppose that the following conditions are satisfied: M1) M2) The function µ (ω) is twice continuously differentiable on [−π, π]. The function ΨZ (ω, θ) = cov [Zi (ω) , Zi (θ)] is twice continuously differentiable on the square [−π, π] x [−π, π]. bi (ω). b (ω) and Z The following theorems provide properties of the estimates µ Theorem 1. Let us suppose that (M1) holds and µ b (ω) is the estimate defined in (4) where the kernel K (θ) satisfies the assumptions (K1) and (K2). Then, for h → 0, b (ω) satisfies: r, N h → ∞, µ 2 −1 b (ω)] = µ (ω) + h2 µ′′ (ω) + o h2 + O (N h) (a) E [µ 2 2 1 (b) var [µ b (ω)] ≤ r var [Zi (ω)] + σe + O h + O (N h)−1 σ2 where kKk22 e kKk22 + o (rN h)−1 ·I (var [Zi (ω)] > 0) + rNh R 2 = 1/ (2π) K (θ) dθ and I is the indicatrix function. 1220 Luengo I., Hernández, C. N., and Saavedra P. The proofs of all theorems are deferred to the Appendix. Corollary 1. Under the hypothesis of theorem (1) and E [eij eil ] = O(N −1 ) uniformly in ω for j 6= l, then 1 var [Zi (ω)] + O h2 + O (N h)−1 I (var [Zi (ω)] > 0) r σe2 |K|22 + O((rN )−1 ) + o (rN h)−1 + rN h var [µ b (ω)] = where kKk22 = 1/ (2π) R K (θ)2 dθ These expressions are useful to determinate the order of the optimum bandwidth, for example, those that minimize the mean integrated squared error (MISE). Theorem 2. Let us suppose that µ (ω)satisfies (M1) and the covariance function Ψ (ω, θ) (M2). We will further suppose that K (θ) satisfies (K1) and (K2). Then for h, λ → 0, r, N h, N λ → ∞, the estimate Zi (ω) defined in (5) satisfies: E bi (ω) − Zi (ω) Z 2 1 var [Zi (ω)] + σe2 + r ≤ +O h2 + O λ2 + O (N h)−1 + O (N λ)−1 uniformly in |ω| < π − κ max {h, λ}. Corollary 2. Under the hypothesis of theorem (2) and E [eij eil ] = O(N −1 ) uniformly in ω for j 6= l, then E 2 bi (ω) − Zi (ω) Z = 1 var [Zi (ω)] + r +O h2 + O λ2 + O (N h)−1 + O (N λ)−1 Remark 2. The bandwidth λ is deterministic and independent of the trajectory Zi (ω). It seems reasonable, however, to select bandwidths based on trajectories or even on observations across the objects. 4 Appendix Lemma 1. [Franke y Härdle (1992)] Let K (θ) satisfy the assumptions (K1) and (K2), and, for |ω| < π − κh, let p (ω) be twice continuously differentiable on [ω − κh, ω + κh]. Then, for h → 0 and N h → ∞ ν ω − ω h2 ′′ j 1 X p (ωj ) − p (ω) − p (ω) K Nh h 2 j=−ν h2 c supθ |p (θ)| + hsupθ p′ (θ) + sup p′′ (θ) − p′′ (ω) Nh 2 where c is a suitable constant and the supreme are taken over the interval [ω − κh, ω + κh]. ≤ Spectral estimation in a random effect model 1221 Lemma 2. Let K (θ) satisfy the assumptions (K1) and (K2), and p (x, y) twice continuously differentiable on the rectangle [ω ± κh] × [ϕ ± κλ]. Then, for h, λ → 0 and N h, N λ → ∞, ν ν X X ω − ωj ϕ − ωl 1 K K 2 N hλ h λ j=−ν l=−ν h2 ′′ λ2 ′′ −p(ω, ϕ) − pxx (ω, ϕ) − p (ω, ϕ) − hλp′′xy (ω, ϕ) 2 2 yy o 1 n ≤ c1 sup(x,y) |p(x, y)| + c2 hsup(x,y) p′x (x, y) Nh o 1 n + c3 sup(x,y) |p(x, y)| + c4 λsup(x,y) p′y (x, y) Nλ λ2 h2 + sup(x,y) p′′xx (x, y) − p′′xx (ω, ϕ) + sup(x,y) p′′yy (x, y) − p′′yy (ω, ϕ) 2 2 +hλsup(x,y) p′′xy (x, y) − p′′xy (ω, ϕ) where c1 , c2 , c3 , c4 are suitable constants and the supreme are taken over the rectangle [ω ± κh] × [θ ± κλ]. Proof. The compactness of the support of K, its Lipschitz continuity and the differentiability of p imply, uniformly in |ω| < π − κh, |ϕ| < π − κλ ν ν ω − ω ϕ − ω X X 1 j l K K 2 N hλ h λ j=−ν l=−ν Z Z − ≤ 1 4π 2 1 Nh 1 + Nλ R2 ( K(θ1 )K(θ2 )p(ω + hθ1 , ϕ + λθ2 )dθ1 dθ2 ) c1 sup |p(x, y)| + c2 h sup p′x (x, y) (x,y) (x,y) ) ′ c3 sup |p(x, y)| + c4 λ sup py (x, y) ( (x,y) (x,y) The assertion follows from the Taylor expansion of p (ω + hθ1 , ϕ + λθ2 ), using the conditions (K1) and (K2) for K (θ). Lemma 3. Let us suppose that model (6) satisfies (M1) and (M2) and the kernel function satisfies the assumptions (K1) and (K2) the following results are reached: h i (N) (N) (N) (N) (ii) E µ µ = µ (ωj ) µ (ωl ) + r1 ΨZ ωj , ωl + b ωj b ωl −1 2 +O h + O (N h) −1 2 (i) E [yij µ b (ωl )] = µ (ωj ) µ (ωl ) + 1r ΨZ (ωj , ωl ) + O h2 + O (N h)−1 and O h , and O (N h) are independent of i, j and l Proof. Directly from lemmas 1 and 2. Proof (of theorem 1). 1222 Luengo I., Hernández, C. N., and Saavedra P. (a) One have b(ω)] = E [µ ν ν ω − ω ω − ω 1 X 1 X j j E [y·j ] = µ (ωj ) K K N h j=−ν h N h j=−ν h the result follows from lemma 1 (b) One have 2 b (ω)] = E (µ b (ω) − E [µ b (ω)]) = var [µ ν ν ω − ω ω − ω X X 1 j l K K · {ΨZ (ωj , ωl ) + E [eij eil ]} 2 2 rN h j=−ν l=−ν h h From lemma 2, we have: ν ν X X 1 K rN 2 h2 j=−ν (N) ω − ωj h l=−ν ! (N) K ω − ωl h ! ΨZ (ωj , ωl ) = 1 ΨZ (ω, ω) + O h2 + O (N h)−1 I (ΨZ (ω, ω) > 0) r uniformly in ω. The second term can be written as: = ν ω − ω 2 X σe2 j K rN 2 h2 j=−ν h + ν ν ω − ω ω − ω X X 1 j l K K E [eij ] 2 2 rN h j=−ν l=−ν h h l6=j But E [eij eil ] ≤ σe2 , b (ω)] ≤ var [µ 1 r thus var [Zi (ω)] + σe2 + O h2 + O (Nh)−1 I (var [Zi (ω)] > 0) + 2 σe rNh kKk22 + o (rN h) −1 · which completes the proof of (b). Proof (of theorem 2). Throughout this proof we will denote Kj = K E bi (ω) − Zi (ω) Z 2 h i ω−ωj λ h , j = 1, . . . , ν. So i bi (ω)2 − 2E Z bi (ω) Zi (ω) + E Zi (ω)2 =E Z Let us now calculate each of the terms separately: h i bi (ω)2 = E E Z = ≤ 1 N 2 λ2 "( ν ν X X ν 1 X Kj (yij − µ b (ωj )) N λ j=−ν )2 # = b (ωl ) − yil µ b (ωj ) + µ b (ωj ) µ b (ωl )] Kj Kl E [yij yil − yij µ j=−ν l=−ν r−1 ΨZ (ω, ω) + σe2 + O h2 + O (N h)−1 + O λ2 + O (N λ)−1 r We can write the second term as Spectral estimation in a random effect model h i h h bi (ω) Zi (ω) = E E Z bi (ω) Zi (ω) | bi E Z ii h h bi (ω) | bi = E Zi (ω) E Z 1223 ii Let us first calculate the conditional mean h bi (ω) | bi E Z i = ν 1 X b (ωj ) |bi ]} Kj {E [yij |bi ] − E [µ N λ j=−ν = ν 1 X b (ωj ) + Zi (ωj )} Kj {µ N λ j=−ν − ν ν ω − ω X X 1 j l Kj K {µ b (ωl ) + Z· (ωl )} 2 N hλ j=−ν l=−ν h Thus, ν h i X bi (ω) Zi (ω) = 1 E Z Kj ΨZ (ω, ωj ) N λ j=−ν − ν ν ω − ω X X 1 j l K K E [Zi (ω) Z· (ωl )] j N 2 hλ j=−ν h l=−ν = ΨZ (ω, ω) + O λ2 + O (N λ)−1 − ν 1 X Kj ΨZ (ω, ωj ) + O h2 + O (N h)−1 rN λ j=−ν Therefore, C2 = r−1 ΨZ (ω, ω) + O h2 + O (N h)−1 + O λ2 + O (N λ)−1 r Obviously, the third summand is E Zi (ω)2 = ΨZ (ω, ω) and we finally have: E bi (ω) − Zi (ω) Z 2 = 1 ΨZ (ω, ω) + O h2 + O (N h)−1 r +O λ2 + O (N λ)−1 uniformly in |ω| < π − κ max {h, λ}. This complete the proof. References [Bick81] Bickel, P. and Freedman, D.: Some Asymptotic Theory for the Bootstrap. Ann. Statist., 9, 1196–1217 (1981). [Dav68] Davis , H.T. and Jones, R.H.: Estimation of the Innovation Variance of a Stationary Time Series. Journal Amer. Statist. Assoc., 63, 141–149 (1968). [Dig93] Diggle, P. J. and Al-Wasel, I.: On Periodogram-Based Spectral Estimation for Replicated Time Series, in: Subba Rao (Ed), Developments in Time Series Analysis. it Chapman and Hall, Great Britain, 341–354 (1992). [Fran92] Franke, J. and Härdle, W.: On Bootstraping Kernel Spectral Estimates. Ann. Statist., 20, 121–145 (1992). 1224 Luengo I., Hernández, C. N., and Saavedra P. [Hern99] Hernández-Flores, C.N., Artiles-Romero, J., Saavedra-Santana, P.: Estimation of the Population Spectrum with Replicated Time Series. Comp. Stat. and Data Anal., 30, 271–280 (1999). [Prie81] Priestley, M.B.: Spectral Analysis and Time Series. Wiley, New York (1981). [Saa00] Saavedra, P., Hernández, C.N. and Artiles, J.: Spectral Analysis with Replicated Time Series. Communications in Statistics Theory and Methods, 29, 2343–2362 (2000)