Spectral estimation in a random effect model

advertisement
Spectral estimation in a random effect model
Luengo I.1 , Hernández, C. N.2 , and Saavedra P.2
1
2
Departamento de Informática y Sistemas mluengo@dis.ulpgc.es
Departamento de Matemáticas cflores@dma.ulpgc.es;
saavedra@dma.ulpgc.es
Summary. A set of time series evaluated on a sample of objects from a population
at the same time points is observed. In this paper an additive random effects model
is considered for the aforementioned set of time series, also based on the asymptotic
representation of the general linear processes. In section 2 the choice of model is justified and in section 3 the estimates are proposed for both the population component
and the individual components and their properties are analyzed.
Key words: Replicated time series, spectral analysis, kernel estimation.
1 Introduction
When analysing a set of time series corresponding to the levels of LH hormone in the
blood of a sample of subjects from a population, Diggle, P. J. and Al-Wasel, I. (1993)
observed that such series could not be considered realisations of only one linear stationary process. This was due to the variability among the sample subjects. Thus
they proposed a random effect model for the spectral analysis of the corresponding
set of time series, based on the asymptotic representation of the periodogram for
general linear processes. This model involves a parameter called the population spectrum, a random component which specifies each individual or time series and finally,
a term of error related to the errors of the individual periodograms. Hernández et
al., using a more general model, estimate the population spectrum by means of the
bootstrap and analyze the consistency of this method using the Mallows distance.
Saavedra et al. (2000) develop a doubly stochastic stationary processes theory to
analyze a set of replicated time series from the frequency domain. According to this
theory, the population spectrum is approximated by means of kernel estimates. Here,
the smoothing parameter depends on the number of time series and the number of
observations per each time series. The asymptotic properties of the estimate are
therefore based on the number of time series, the number of observations per series
and the bandwidth. In this paper we estimate not only the population pattern but
also the individual component of each series and the properties of both estimates
are analyzed.
1218
Luengo I., Hernández, C. N., and Saavedra P.
2 Model
Let (B, B,PB ) be a probabilistic space so that each object b ∈ B have a general
linear process {X (b, t) : t ∈ Z} associated.
Let be {Xi (t) : i = 1, · · · , r; t = 1, · · · , N } be a set of time series evaluated on a
random sample of r objects b1 , . . . , br from B and observed at the same N times.
The periodogram of each series at the j-th Fourier frequency ωj = 2πj/N , for
j = 1, · · · , ν = [N/2] − 1 is defined as:
1
Ii,N (ωj ) =
2πN
2
N
X
−iωj t Xi (t) e
(1)
t=1
Let C0 = −0.5771 the Euler constant and let be yij = log Ii (ωj ) − C0 . We suppose
that, for any objects b1 , . . . , br1 ∈ B and for any times t = 1, . . . , N , yij can be write:
yij = µ (ωj ) + Zi (ωj ) + eij
(2)
for i = 1, · · · , r; j = 1, · · · , ν = [N/2] − 1, where:
1. µ (ω) for |ω| ≤ π represents an underlying pattern in the population.
2. {Zi (ω) ; |ω| ≤ π}, i = 1, · · · , r are r independent trajectories of a stochastic process {Z (ω)} such that E [Z (ωj )] = 0 for j = 1, . . . , ν. We will write
cov [Z (ω) , Z (θ)] = ΨZ (ω, θ), for |ω| , |θ| ≤ π.
3. {eij } are independent
and identically distributed on i random variables, with
E [eij ] = 0 and E e2ij = σe2 < ∞ for j = 1, . . . , ν.
Observe that {Zi (ω) ; |ω| ≤ π} involves the specificity of the i-th object in the
population.
Remark 1. The proposed model is approximately satisfied by sets of time series such
as:
(i) For each i = 1, . . . , rl , {Xi (t) , t ∈ Z}, is a trajectory of a general linear process with absolutely continuous spectral distribution, being the corresponding
spectral density {Qi (ω) , |ω| ≤ π}.
(ii) The process {Xi (t) , t ∈ Z} verify the conditions of theorem 6.2.2 in Priestley
(1981, pgs 424-425)
Under these conditions the periodogram of each trajectory {Xi (t) : t = 1, . . . , ν}
satisfies:
Ii (ωj ) = Qi (ωj ) Uij + Ri,j
(3)
where for each i = 1, · · · , rl , Ri,j denotes a term which is asymptotically negligible,
and Uij are asymptotically independent random variables having the standard exponential distribution (this distribution is exact if the process are Gaussian). Thus,
we can consider E [log Ulij ] ≈ C0 , the above mentioned Euler constant. Neglecting
the term Rl,i,N (ωj ) (as in Franke and Härdle (1992)), and making the necessary
transformations in (3), we obtain:
Yij = µ (ωj ) + Zi (ωj ) + eij for i = 1, · · · , r; j = 1, · · · , ν
Spectral estimation in a random effect model
where µ (ωj ) = E [log Qi (ωj )]; Zi (ωj )
eij = log Uij (ωj ) − C0
=
1219
log Qi (ωj ) − E [log Qi (ωj )] and
It is obvious that E [Zi (ωj )] = 0, for all i and |ω| ≤ π and that the random
variables eij are independent on i and approximately identically distributed on j
and E[eij ] ≈ 0 (see Davis and Jones 1968).
3 Estimation of the population pattern.
In this section we propose a kernel estimate for the population parameter µ (ω)
defined in equation (6) with conditions (i) and (ii), and likewise for each of the
individual trajectories Zi (ω). With this aim we consider a kernel function K (θ)
having the following properties:
K1)
K (θ) is a symmetric, nonnegative function on the real line, with compact
support
R ∞[−κ, κ], uniformly Lipschitz
R ∞ 2with constant Lκ .
1
1
K2) 2π
K (θ) dθ = 1 and 2π
θ K (θ) dθ = 1.
−∞
−∞
In this way,
the kernel estimate of µ (ω) is defined as a smoothing of the averages
P
y·j = (1/r) ri=1 yij in the form:
µ
b (ω) =
ν
ω − ω 1 X
j
y·j
K
N h j=−ν
h
(4)
being h the bandwidth.
Once this parameter has been estimated, we smooth the residuals yij − µ
b (ωj )
to define the estimate of each individual trajectory Zi (ω), for i = 1, . . . , r so
bi (ω) =
Z
ν
1 X
ω − ωj
(yij − µ
b (ωj ))
K
N λi j=−ν
λi
(5)
being λi the corresponding bandwidth.
In order to establish the properties of the estimates we suppose that the following
conditions are satisfied:
M1)
M2)
The function µ (ω) is twice continuously differentiable on [−π, π].
The function ΨZ (ω, θ) = cov [Zi (ω) , Zi (θ)] is twice continuously differentiable
on the square [−π, π] x [−π, π].
bi (ω).
b (ω) and Z
The following theorems provide properties of the estimates µ
Theorem 1. Let us suppose that (M1) holds and µ
b (ω) is the estimate defined in (4)
where the kernel K (θ) satisfies the assumptions (K1) and (K2). Then, for h → 0,
b (ω) satisfies:
r, N h → ∞, µ
2
−1
b (ω)] = µ (ω) + h2 µ′′ (ω) + o h2 + O (N h)
(a) E [µ
2
2
1
(b) var [µ
b (ω)] ≤ r var [Zi (ω)] + σe + O h + O (N h)−1
σ2
where
kKk22
e
kKk22 + o (rN h)−1
·I (var [Zi (ω)] > 0) + rNh
R
2
= 1/ (2π) K (θ) dθ and I is the indicatrix function.
1220
Luengo I., Hernández, C. N., and Saavedra P.
The proofs of all theorems are deferred to the Appendix.
Corollary 1. Under the hypothesis of theorem (1) and E [eij eil ] = O(N −1 ) uniformly in ω for j 6= l, then
1
var [Zi (ω)] + O h2 + O (N h)−1 I (var [Zi (ω)] > 0)
r
σe2
|K|22 + O((rN )−1 ) + o (rN h)−1
+
rN h
var [µ
b (ω)] =
where kKk22 = 1/ (2π)
R
K (θ)2 dθ
These expressions are useful to determinate the order of the optimum bandwidth,
for example, those that minimize the mean integrated squared error (MISE).
Theorem 2. Let us suppose that µ (ω)satisfies (M1) and the covariance function
Ψ (ω, θ) (M2). We will further suppose that K (θ) satisfies (K1) and (K2). Then for
h, λ → 0, r, N h, N λ → ∞, the estimate Zi (ω) defined in (5) satisfies:
E
bi (ω) − Zi (ω)
Z
2 1
var [Zi (ω)] + σe2 +
r
≤
+O h2 + O λ2 + O (N h)−1 + O (N λ)−1
uniformly in |ω| < π − κ max {h, λ}.
Corollary 2. Under the hypothesis of theorem (2) and E [eij eil ] = O(N −1 ) uniformly in ω for j 6= l, then
E
2 bi (ω) − Zi (ω)
Z
=
1
var [Zi (ω)] +
r
+O h2 + O λ2 + O (N h)−1 + O (N λ)−1
Remark 2. The bandwidth λ is deterministic and independent of the trajectory
Zi (ω). It seems reasonable, however, to select bandwidths based on trajectories
or even on observations across the objects.
4 Appendix
Lemma 1. [Franke y Härdle (1992)] Let K (θ) satisfy the assumptions (K1) and
(K2), and, for |ω| < π − κh, let p (ω) be twice continuously differentiable on
[ω − κh, ω + κh]. Then, for h → 0 and N h → ∞
ν
ω − ω h2 ′′
j
1 X
p (ωj ) − p (ω) −
p (ω)
K
Nh
h
2
j=−ν
h2
c supθ |p (θ)| + hsupθ p′ (θ) +
sup p′′ (θ) − p′′ (ω)
Nh
2
where c is a suitable constant and the supreme are taken over the interval
[ω − κh, ω + κh].
≤
Spectral estimation in a random effect model
1221
Lemma 2. Let K (θ) satisfy the assumptions (K1) and (K2), and p (x, y) twice continuously differentiable on the rectangle [ω ± κh] × [ϕ ± κλ]. Then, for h, λ → 0 and
N h, N λ → ∞,
ν
ν
X
X
ω − ωj ϕ − ωl 1
K
K
2
N hλ
h
λ
j=−ν l=−ν
h2 ′′
λ2 ′′
−p(ω, ϕ) −
pxx (ω, ϕ) −
p (ω, ϕ) − hλp′′xy (ω, ϕ)
2
2 yy
o
1 n
≤
c1 sup(x,y) |p(x, y)| + c2 hsup(x,y) p′x (x, y)
Nh
o
1 n
+
c3 sup(x,y) |p(x, y)| + c4 λsup(x,y) p′y (x, y)
Nλ
λ2
h2
+ sup(x,y) p′′xx (x, y) − p′′xx (ω, ϕ) +
sup(x,y) p′′yy (x, y) − p′′yy (ω, ϕ)
2
2
+hλsup(x,y) p′′xy (x, y) − p′′xy (ω, ϕ)
where c1 , c2 , c3 , c4 are suitable constants and the supreme are taken over the rectangle
[ω ± κh] × [θ ± κλ].
Proof. The compactness of the support of K, its Lipschitz continuity and the differentiability of p imply, uniformly in |ω| < π − κh, |ϕ| < π − κλ
ν
ν
ω − ω ϕ − ω X
X
1
j
l
K
K
2
N hλ
h
λ
j=−ν l=−ν
Z Z
−
≤
1
4π 2
1
Nh
1
+
Nλ
R2
(
K(θ1 )K(θ2 )p(ω + hθ1 , ϕ + λθ2 )dθ1 dθ2 )
c1 sup |p(x, y)| + c2 h sup p′x (x, y)
(x,y)
(x,y)
)
′
c3 sup |p(x, y)| + c4 λ sup py (x, y)
(
(x,y)
(x,y)
The assertion follows from the Taylor expansion of p (ω + hθ1 , ϕ + λθ2 ), using the
conditions (K1) and (K2) for K (θ).
Lemma 3. Let us suppose that model (6) satisfies (M1) and (M2) and the kernel
function satisfies the assumptions (K1) and (K2) the following results are reached:
h i
(N)
(N)
(N)
(N)
(ii) E µ
µ
= µ (ωj ) µ (ωl ) + r1 ΨZ ωj , ωl
+
b ωj
b ωl
−1
2
+O h + O (N h)
−1
2
(i) E [yij µ
b (ωl )] = µ (ωj ) µ (ωl ) + 1r ΨZ (ωj , ωl ) + O h2 + O (N h)−1
and O h , and O (N h)
are independent of i, j and l
Proof. Directly from lemmas 1 and 2.
Proof (of theorem 1).
1222
Luengo I., Hernández, C. N., and Saavedra P.
(a) One have
b(ω)] =
E [µ
ν
ν
ω − ω ω − ω 1 X
1 X
j
j
E [y·j ] =
µ (ωj )
K
K
N h j=−ν
h
N h j=−ν
h
the result follows from lemma 1
(b) One have
2
b (ω)] = E (µ
b (ω) − E [µ
b (ω)]) =
var [µ
ν
ν
ω − ω ω − ω X
X
1
j
l
K
K
· {ΨZ (ωj , ωl ) + E [eij eil ]}
2
2
rN h j=−ν l=−ν
h
h
From lemma 2, we have:
ν
ν
X
X
1
K
rN 2 h2 j=−ν
(N)
ω − ωj
h
l=−ν
!
(N)
K
ω − ωl
h
!
ΨZ (ωj , ωl ) =
1
ΨZ (ω, ω) + O h2 + O (N h)−1 I (ΨZ (ω, ω) > 0)
r
uniformly in ω. The second term can be written as:
=
ν
ω − ω 2
X
σe2
j
K
rN 2 h2 j=−ν
h
+
ν
ν
ω − ω ω − ω X
X
1
j
l
K
K
E [eij ]
2
2
rN h j=−ν l=−ν
h
h
l6=j
But E [eij eil ] ≤
σe2 ,
b (ω)] ≤
var [µ
1
r
thus
var [Zi (ω)] + σe2 + O h2 + O (Nh)−1
I (var [Zi (ω)] > 0) +
2
σe
rNh
kKk22
+ o (rN h)
−1
·
which completes the proof of (b).
Proof (of theorem 2).
Throughout this proof we will denote Kj = K
E
bi (ω) − Zi (ω)
Z
2 h
i
ω−ωj
λ
h
, j = 1, . . . , ν. So
i
bi (ω)2 − 2E Z
bi (ω) Zi (ω) + E Zi (ω)2
=E Z
Let us now calculate each of the terms separately:
h
i
bi (ω)2 = E
E Z
=
≤
1
N 2 λ2
"(
ν
ν
X
X
ν
1 X
Kj (yij − µ
b (ωj ))
N λ j=−ν
)2 #
=
b (ωl ) − yil µ
b (ωj ) + µ
b (ωj ) µ
b (ωl )]
Kj Kl E [yij yil − yij µ
j=−ν l=−ν
r−1 ΨZ (ω, ω) + σe2 + O h2 + O (N h)−1 + O λ2 + O (N λ)−1
r
We can write the second term as
Spectral estimation in a random effect model
h
i
h
h
bi (ω) Zi (ω) = E E Z
bi (ω) Zi (ω) | bi
E Z
ii
h
h
bi (ω) | bi
= E Zi (ω) E Z
1223
ii
Let us first calculate the conditional mean
h
bi (ω) | bi
E Z
i
=
ν
1 X
b (ωj ) |bi ]}
Kj {E [yij |bi ] − E [µ
N λ j=−ν
=
ν
1 X
b (ωj ) + Zi (ωj )}
Kj {µ
N λ j=−ν
−
ν
ν
ω − ω X
X
1
j
l
Kj K
{µ
b (ωl ) + Z· (ωl )}
2
N hλ j=−ν l=−ν
h
Thus,
ν
h
i
X
bi (ω) Zi (ω) = 1
E Z
Kj ΨZ (ω, ωj )
N λ j=−ν
−
ν
ν
ω − ω X
X
1
j
l
K
K
E [Zi (ω) Z· (ωl )]
j
N 2 hλ j=−ν
h
l=−ν
= ΨZ (ω, ω) + O λ2 + O (N λ)−1
−
ν
1 X
Kj ΨZ (ω, ωj ) + O h2 + O (N h)−1
rN λ j=−ν
Therefore,
C2 =
r−1
ΨZ (ω, ω) + O h2 + O (N h)−1 + O λ2 + O (N λ)−1
r
Obviously, the third summand is E Zi (ω)2 = ΨZ (ω, ω) and we finally have:
E
bi (ω) − Zi (ω)
Z
2 =
1
ΨZ (ω, ω) + O h2 + O (N h)−1
r
+O λ2 + O (N λ)−1
uniformly in |ω| < π − κ max {h, λ}. This complete the proof.
References
[Bick81]
Bickel, P. and Freedman, D.: Some Asymptotic Theory for the Bootstrap.
Ann. Statist., 9, 1196–1217 (1981).
[Dav68] Davis , H.T. and Jones, R.H.: Estimation of the Innovation Variance of
a Stationary Time Series. Journal Amer. Statist. Assoc., 63, 141–149
(1968).
[Dig93] Diggle, P. J. and Al-Wasel, I.: On Periodogram-Based Spectral Estimation
for Replicated Time Series, in: Subba Rao (Ed), Developments in Time
Series Analysis. it Chapman and Hall, Great Britain, 341–354 (1992).
[Fran92] Franke, J. and Härdle, W.: On Bootstraping Kernel Spectral Estimates.
Ann. Statist., 20, 121–145 (1992).
1224
Luengo I., Hernández, C. N., and Saavedra P.
[Hern99] Hernández-Flores, C.N., Artiles-Romero, J., Saavedra-Santana, P.: Estimation of the Population Spectrum with Replicated Time Series. Comp.
Stat. and Data Anal., 30, 271–280 (1999).
[Prie81] Priestley, M.B.: Spectral Analysis and Time Series. Wiley, New York
(1981).
[Saa00] Saavedra, P., Hernández, C.N. and Artiles, J.: Spectral Analysis with
Replicated Time Series. Communications in Statistics Theory and Methods, 29, 2343–2362 (2000)
Download