Moments Estimators for Moving-Average Models on Z d Chrysoula Dimitriou-Fakalou

advertisement
Moments Estimators for Moving-Average
Models on Zd
Chrysoula Dimitriou-Fakalou
chrysoula@stats.ucl.ac.uk
From a causal auto-regression that takes place on Zd , we generalize the standard
Yule-Walker equations, in order to derive equations that involve the moments corresponding to the invertible moving-average model on Zd , with the same polynomial.
With observations from the moving-average process, we imitate these equations and
we set the new estimators. Under very weak conditions and for any number of dimensions d, the estimators derived are shown to be consistent and asymptotically
normal. We provide the form of their variance matrix too.
Key words: Edge-effect; Method of moments estimation; Moving-average model;
Yule-Walker equations
1
1
Research report No 287, Department of Statistical Science, University College London. Date:
December 2007
1
Introduction
The estimation of the parameters for a causal auto-regression on Zd , where Z =
{0, ±1, · · · } and d is any positive integer, is considered an easy task. This is because, for any number of dimensions d, a finite auto-regressive representation always
allows to a finite number of observations, to be transformed to a finite number of uncorrelated random variables from the starting process, which has caused the process
of interest. Nevertheless when d ≥ 2, treating all other stationary processes through
their AR(∞) representation during estimation, resurrects the problem known as the
‘edge-effect’. For example, Yao and Brockwell (9) suggested a way of confining the
edge-effect for ARMA(p, q) processes on Z2 .
On the other hand, an invertible moving-average process possesses an AR(∞)
representation, but it also has the privilege of a finite moving-average representation.
Guyon (6) used a modification on the spectral domain version of the Gaussian
likelihood, as this was proposed by Whittle (7), in order to produce estimators that
would defeat the edge-effect for any stationary process on Zd . As one of its special
cases, N observations from a moving-average process could also be used to compute
estimators, which would have a bias of ideal order N −1 . Moreover, the fact that the
moving-average model relates to a finite auto-covariance function, would imply that
a simplified version of the Gaussian likelihood should be maximized.
Both the finite auto-regressive and moving-average models relate to exceptional
second-order properties, which are reflected in the spectral densities of interest. An
auto-regression has a spectral density with finite denominator; for Gaussian pro-
1
cesses on Z2 , Besag (2) demonstrated how this is translated into a finite conditional
expectation of the value on one location, based on the values of a finite set of
neighboring locations. A moving-average process has a spectral density with finite
numerator and, thus, an auto-covariance function that takes non-zero values only
on a finite set of lags of Zd .
The need to establish results for estimators of parameters relating to processes
taking place on Zd for any positive integer d, stems directly from the growing use of
spatial and spatio-temporal statistics. A spatial process is often taking place on two
dimensions of a surface; it has been claimed that a spatial ARMA model cannot be
meaningful, as it is based on an unnatural, unilateral ordering of locations implied
by the causality and invertibility conditions. In that case, it might be useful to
resort to the revolutionary analysis introduced by Besag (2) or (3) for data observed
on regular or irregular sites, respectively. Nevertheless, the spatio-temporal ARMA
model for processes taking place on more than two dimensions, can be an extremely
useful tool to adopt the second-order properties of interest; the inclusion of the time
axis then, gives a meaningful interpretation to the unilateral ordering of locations.
In this paper, we will demonstrate a new way of defining estimators for invertible
moving-average models of fixed order on Zd . The new estimators are consistent
and asymptotically normal, as they have escaped the edge-effect. They have been
defined as solutions of equations, similarly to the Yule-Walker estimators for causal
auto-regressions, which are also least squares and maximum conditional Gaussian
likelihood estimators. Unlike the modified likelihood estimators of Guyon (6), our
method of moments estimators have been defined especially for the case of the
2
moving-average models and our methods follow the time domain. Our attempt
has been to highlight the special second-order characteristics of a moving-average
process, and use them most advantageously for estimation, like we are used to with
the case of an auto-regression of fixed order. While Guyon (6) requires the strong
condition of a finite fourth moment, in order to use the estimators on any number
of dimensions, and Yao and Brockwell (9) relax this condition but deal with the
two-dimensional processes only, for our new results to be established on any number
of dimensions, we require the weak condition of a finite second moment only. This
is the same condition that has been used when estimating the parameters of causal
auto-regressions on Zd .
1
Theoretical moments equations
We write ‘τ ’ for the transpose operator. For any vτ ∈ Zd , we consider the invertible
moving-average process {Y (v)} defined by the equation
Y (v) = ε(v) +
q
X
θin ε(v − in ),
(1)
n=1
where {ε(v)} is a sequence of uncorrelated zero-mean random variables with variance
σ 2 . In (1), we have considered that a unilateral ordering of the fixed lags, 0 < i1 <
· · · < iq , is taking place. When d = 1, this is the standard ordering on Z, and
when d = 2, we refer to the conventional unilateral ordering, as it was introduced by
Whittle (7). The case of ordering two locations on Zd when d ≥ 3, was generalized
by Guyon (6). As for the invertibility condition, it is only imposed to make sure
3
that we can write
ε(v) = Y (v) +
X
Θi Y (v − i),
i>0
X
|Θi | < ∞,
i>0
which also uses the unilateral ordering i > 0. For a sufficient condition on the
invertibility of the filter of interest, we refer to Anderson and Jury (1).
We consider the complex numbers z1 , · · · , zd and the vector z = (z1 , · · · , zd ),
such that we can define the polynomial
θ(z) ≡ 1 +
q
X
θin zin .
(2)
n=1
In (2), we consider that
zi ≡ z(i1 ,··· ,id ) = z1i1 · · · zdid .
Similarly, for the backwards operators
B1i1 ε(v1 , v2 · · · , vd ) ≡ ε(v1 − i1 , v2 · · · , vd ),
..
.
Bdid ε(v1 , · · · , vd−1 , vd ) ≡ ε(v1 , · · · , vd−1 , vd − id )
and the vector backwards operator
Bi ε(v) ≡ (B1i1 , · · · , Bdid ) ε(v) ≡ ε(v − i).
We may now re-write (1) as
Y (v) = θ(B) ε(v).
(3)
From the same sequence {ε(v)}, we define the unilateral auto-regression {X(v)}
from the equation
−1
θ(B ) X(v) = X(v) +
q
X
n=1
4
θin X(v + in ) = ε(v).
(4)
The auto-regression is unilateral but not causal, in the sense that we can write
X(v) = ε(v) +
X
Θi ε(v + i),
X
i>0
|Θi | < ∞,
i>0
as a function of the ‘future’ terms ε(v + i), i ≥ 0. Moreover, if we write the two
polynomials
γ(z) = θ(z) θ(z−1 ) ≡
X
γi z i ,
c(z) = γ(z)−1 ≡
iτ ∈F
X
ci zi ,
(5)
iτ ∈Zd
then it holds that
Y (v) = γ(B)X(v),
X(v) = c(B)Y (v).
(6)
In (5) the set F ⊂ Zd is a set with a finite cardinality, such that it holds
F ≡ {iτ : i = in , −in , in − im , n, m = 1, · · · , q}.
We write (2q ? + 1) for the cardinality of F, where q ? is the cardinality of the set,
say
F+ ≡ {iτ : i = in , in − im , n, m = 1, · · · , q, n > m}.
The original Yule-Walker equations for the unilateral auto-regression {X(v)} dictate
c0 +
q
X
θin cin = 1
(7)
θin ci−in = 0 i > 0,
(8)
n=1
and
ci +
q
X
n=1
where E(X(v)X(v − i)) = σ 2 ci . Indeed, the spectral density of the auto-regression
can be written as
gX (ω1 , · · · , ωd ) ≡
1
σ2
σ2
=
c(z),
(2π)d θ(z) θ(z−1 )
(2π)d
zk = e−iωk , ωk ∈ (−π, π),
k = 1, · · · , d,
5
according to (5). Similarly, the spectral density for the moving average process is
gY (ω1 , · · · , ωd ) ≡
σ2
σ2
−1
θ(z)
θ(z
)
=
γ(z),
(2π)d
(2π)d
zk = e−iωk , ωk ∈ (−π, π),
k = 1, · · · , d,
which also implies that E(Y (v)Y (v − i)) = σ 2 γi .
According to (8), we may write for i > 0 the following (q + 1) equations
ci + θi1 ci−i1 + θi2 ci−i2 + · · · + θiq ci−iq = 0
θi1 ci+i1 + θi21 ci + θi1 θi2 ci+i1 −i2 + · · · + θi1 θiq ci+i1 −iq = 0
..
.
θiq ci+iq + θiq θi1 ci+iq −i1 + θiq θi2 ci+iq −i2 + · · · + θi2q ci = 0
and their sum
X
γj ci−j =
jτ ∈F
X
X
γj cj−i =
γ−j c−j−i =
−jτ ∈F
jτ ∈F
X
γj cj+i = 0.
jτ ∈F
We may re-write it as
X
cj γj+i = 0, i 6= 0.
(9)
jτ +iτ ∈F
Similarly to (9), we can create the equation
X
cj γj = 1
jτ ∈F
from (7) and (8).
We may explicitly derive (9) and (10) from the fact that
γ(z) c(z) = 1.
6
(10)
However, the way we have chosen to derive these equations, reveals that they
are a more general version of the Yule-Walker equations; they have been constructed according to the coefficients θi1 , · · · , θiq , similarly to the standard YuleWalker equations. Nevertheless, they involve the non-zero auto-covariances of the
process {Y (v)}, as well as auto-covariances of the process {X(v)}, rather than the
coefficients θi1 , · · · , θiq , themselves. Thus, for the parameters of a moving-average
model, they will be used as theoretical prototypes to be imitated by their sample
analogues for the sake of estimation.
2
Method of moments estimators
We have now recorded observations {Y (v), vτ ∈ S} from the moving-average process defined by (3), where S ⊂ Zd is a set of finite cardinality N . We are interested in
estimating the q moving-average parameters from the recordings we have available.
Given the set S, we define for any vτ ∈ Zd , the set Fv ⊂ Zd . We consider
iτ ∈ Fv , if and only if vτ − iτ ∈ S. Next, we consider the corrected set S ∗ with
N ∗ elements, such that vτ ∈ S ∗ , if and only if vτ + iτ ∈ S for every iτ ∈ F . This
implies that for any vτ ∈ S ∗ , it holds that F ⊆ Fv .
We imitate (9) and we set the estimators θ̂ = (θ̂i1 , · · · , θ̂iq )τ to be the solutions
of equations
X
vτ ∈S ∗
{
X
ĉj Y (v + in − j)}Y (v) ≡ 0, n = 1, · · · , q,
jτ −iτn ∈Fv
7
(11)
where, of course,
θ̂(z) ≡ 1 +
q
X
θ̂in ,
ĉ(z) ≡
n=1
X
ĉi zi ≡ {θ̂(z) θ̂(z−1 )}−1 .
iτ ∈Zd
However, based on the next proposition, we will additionally use the equations
X
vτ ∈S ∗
{
X
ĉj Y (v + in − im − j)}Y (v) ≡ 0, n, m = 1, · · · , q, n > m, (12)
jτ −(iτn −iτm )∈Fv
in order to make sure that the estimators are consistent later on. The next proposition guarantees that the use of q ? , instead of q equations, is sufficient, under the
invertibility condition of the process of interest, to provide a unique solution for the
non-linear equations, which have been used as prototypes for estimation.
Finally, we need to make clear that our moments estimators are not identical
to the estimators, say θ̃ (0) , derived by setting q ? unbiased sample auto-covariances
with as many pairs as possible, say,
P
γ̃i = γ̃−i =
v
Y (v)Y (v − i) τ
, i ∈ F+
Ni
equal to the relevant functions of θ̃ (0) in
γ̃(z) = θ̃(z)θ̃(z−1 ).
An indication for that is that our equations use different pairs of observations for
any two lags i and −i, where iτ ∈ F+ . In Section 2.1, this becomes obvious for
the simple case of an MA(1) model on Z. Moreover, we always choose to use as
many pairs as the elements of S ∗ , though more pairs might be available from the
sample, for specific lag iτ ∈ F+ . The difference between the two methods might
look minimal, especially when d = 1. However, the steps we suggest need to be
8
followed exactly, in order to take advantage later on, of the fact that Y (v − i)
and X(v) are two independent random variables when i > 0 only, if {ε(v)} is a
sequence of independent random variables; the same cannot be said for Y (v + i)
and X(v) for i ≥ 0. Thus, using this property will allow us to relax our conditions,
in order to establish the asymptotic normality. For example, if in (11) we had used
Y (v − in − j), jτ + iτn ∈ Fv , vτ ∈ S ∗ , instead of Y (v + in − j), for at least one
n = 1, · · · , q, we would not be able to proceed without a condition on a finite fourth
moment, unless we could assume that {u(v)} is also a sequence of independent
random variables, where we define
Y (v) ≡ θ(B−1 )u(v)
or
θ(B)X(v) ≡ u(v).
Furthermore, the passage from the coefficients of γ̃(z) to the estimators θ̃ (0)
in θ̃(z), would not be immediate nor cheap; in the end of this paper, we propose a
straightforward way of approximating our estimators θ̂, instead, using the equations
according to which we have defined them.
Proposition 1. For the given lags 0 < i1 < · · · < iq , for the set F, and for fixed
numbers θi1 ,0 , · · · , θiq ,0 , such that it holds that
θ0 (z) = 1 +
q
X
θin ,0 zin ,
θ0 (z)−1 = 1 +
n=1
X
i>0
Θi,0 zi ,
X
|Θi,0 | < ∞,
i>0
we define the polynomials
γ0 (z) = θ0 (z)θ0 (z−1 ) =
X
γj,0 zj ,
jτ ∈F
c0 (z) = γ0 (z)−1 =
X
cj,0 zj .
jτ ∈Zd
Then, if we consider all the possible sets of numbers θi1 , · · · , θiq , such that it holds
9
that
θ(z) = 1 +
q
X
θin zin ,
θ(z)−1 = 1 +
n=1
γ(z) = θ(z)θ(z−1 ) =
X
X
i>0
γj z j ,
c(z) = γ(z)−1 =
jτ ∈F
Θi zi ,
X
X
|Θi | < ∞,
i>0
cj zj ,
jτ ∈Zd
and the q ? equations
X
γj,0 cj−in = 0, n = 1, · · · , q,
jτ ∈F
X
γj,0 cj−(in −im ) = 0, n, m = 1, · · · , q, n > m,
jτ ∈F
then the unique set of solutions that satisfies the above equations is
θin = θin ,0 , n = 1, · · · , q.
For mathematical convenience, we define a new variable that depends on the sampling set,
HY (v) ≡



 Y (v), vτ ∈ S



0, otherwise
and we may re-write (11) as
X
{
X
cj,0 HY (v + in − j)}Y (v) − Jn (θ̂ − θ 0 ) = 0, n = 1, · · · , q,
(13)
vτ ∈S ∗ jτ ∈Zd
or
X
{c0 (B) HY (v + in )}Y (v) − Jn (θ̂ − θ 0 ) = 0, n = 1, · · · , q,
(14)
vτ ∈S ∗
where θ 0 = (θi1 ,0 , · · · , θiq ,0 )τ is the true parameter vector and we denote with zero
sub-index all the quantities that refer to it. In (13) and (14), we write Jn =
10
(Jn,1 , · · · , Jn,q ) with elements
Jn,m =
X
(θ0 (B)−1 HY (v + in − im )
{c0 (B)
vτ ∈S ∗
+ θ0 (B−1 )−1 HY (v + in + im ))
}Y (v)
+ OP (N ||θ̂ − θ 0 ||).
(15)
Finally, if we imitate (10), we may also define the estimator of the error variance
σ̂ 2 ≡
X
jτ ∈F
2.1
ĉj
X
Y (v)Y (v − j)/N ∗ .
(16)
vτ ∈S ∗
A special case
For the sake of example, we make a reference to the one-dimensional processes. In
the cases where d = 1, it holds that q = q ? , as we can always write
in − im = ir , n, m = 1, · · · , q,
for some r = 1, · · · , q.
For the simplest case when q = 1, we record observations {Y (v), v = 1, · · · , N }
from the process defined by
Y (v) = e(v) + θ e(v − 1), |θ| < 1, θ 6= 0,
where {e(v)} is a sequence of uncorrelated random variables with variance unity.
Our estimator θ̂ comes as a solution of the quadratic equation
N
−1
X
v=2
Y (v)Y (v + 1) − θ̂
N
−1
X
2
Y (v) + θ̂
v=2
2
N
−1
X
Y (v)Y (v − 1) = 0,
v=2
which reduces to
qP
PN −1
PN −1
PN −1
−1
2
2
Y
(v)
±
( N
v=2 Y (v)Y (v + 1))
v=2
v=2 Y (v)Y (v − 1))(
v=2 Y (v)) − 4(
θ̂ =
PN −1
2 v=2 Y (v)Y (v − 1)
11
or
θ̂ =
with
PN −1
ρ̂+
1
=
v=2
1±
Y (v)Y (v − 1)
PN −1
v=2
Y (v)2
p
−
1 − 4ρ̂+
1 ρ̂1
,
2ρ̂+
1
PN −1
ρ̂−
1
,
=
Y (v)Y (v + 1)
.
PN −1
2
Y
(v)
v=2
v=2
For the actual auto-correlation at lags ±1
ρ1 =
θ
,
1 + θ2
it holds that |ρ1 | < 1/2 and D = 1 − 4ρ1 ρ1 > 0. As a result, if 0 < θ < 1 and
0 < ρ1 < 1/2, then 1/2ρ1 > 1 and the value
1+
√
√
1 − 4ρ1 ρ1
1
D
=
+
2ρ1
2ρ1
2ρ1
is bigger than 1. If, on the other hand, −1 < θ < 0 and −1/2 < ρ1 < 0, then
1/2ρ1 < −1 and the same value
1+
√
√
1 − 4ρ1 ρ1
D
1
=
+
2ρ1
2ρ1
2ρ1
is smaller than −1.
Thus, we conclude with the estimator of the parameter
qP
PN −1
PN −1
PN −1
−1
2
2
( N
v=2 Y (v)Y (v + 1))
v=2 Y (v) −
v=2 Y (v)Y (v − 1))(
v=2 Y (v)) − 4(
.
θ̂ =
PN −1
2 v=2 Y (v)Y (v − 1)
Though we cannot really guarantee for the distribution of θ̂ for small sample sizes,
−
as |ρ̂+
1 |, |ρ̂1 | might be larger than or equal to 1/2 with positive probability, we will
establish next, that our estimators are consistent and that they are asymptotically
normal. For the cases of d = 1 dimension, the use of maximum Gaussian likelihood
12
arguments might improve dramatically the properties of the estimator presented
above; however, the edge-effect does not allow to the standard Gaussian likelihood
estimators to be asymptotically normal for more than two dimensions. Thus, our
methods can be more useful for higher dimensionalities and for large sample sizes,
due to the weak conditions required, in order to establish the consistency and the
asymptotic normality next.
3
Properties of estimators
In this section, we establish the consistency and the asymptotic normality of our
estimators. While the consistency is a straightforward derivation from the definition
of our estimators, the asymptotic normality for estimators of the parameters of
stationary processes on Zd is, in general, problematic when d ≥ 2. As Guyon (6)
demonstrated, there is a bias, which is of order N −1/d , while we would want the
absolute bias to tend asymptotically to 0, if multiplied by N 1/2 . This happens for
sure only when d = 1; for example, we refer to Yao and Brockwell (8).
Regarding our estimators, in (13) we can see that the biases will come from the
expected value of the quantities
X
{
X
cj,0 (Y (v + in − j) − HY (v + in − j))Y (v)}/N ∗ , n = 1, · · · , q,
vτ ∈S ∗ jτ ∈Zd
which express what is ‘missing’ from the sample. The good news is that, if the biases
are multiplied by N ∗1/2 , then they produce
N ∗−1/2
X
X
vτ ∈S ∗
vτ +iτn −jτ ∈S
/
cj,0 E{Y (v + in − j)Y (v)} = 0, n = 1, · · · , q,
13
which are zero due to our selections in S ∗ . This is a consequence of the special
characteristic of a moving-average process, that the auto-covariance function cuts
off to zero outside a finite set of lags. If the auto-covariance function was decaying at
an exponential rate, a different sequel should be followed for each dimensionality d.
For example, Yao and Brockwell (9) have proposed a modification of mathematical
nature on the sampling set, which works for ARMA models on Z2 only.
While Yao and Brockwell (9) have used a series of mathematical arguments to
deal with the edge-effect for specific number of dimensions, Guyon (6) has cancelled
the edge-effect for any positive integer number of dimensions, using some instinctive,
spectral domain arguments. The spectral domain version of Gaussian likelihood,
as this was given by Whittle (7), involves the sample auto-covariances and, thus,
the unbiased estimators were plugged-in there; that would imply that this could
only be a modification of the likelihood, as the auto-covariances used, would not
necessarily correspond to a non-negative definite sample variance matrix. Dahlhaus
and Künsch (5) dealt with such problems, but paid the price of having to restrict
the dimensionality d, in order to secure their results. Finally, the conditions used
by Guyon (6) to obtain the asymptotic normality of the estimators would be strong
and would require a finite fourth moment of the process of interest.
Our suggestion skips the Gaussian likelihood and uses the moments or general
Yule-Walker equations for estimation. It follows the time domain and can be applied
for moving-average models of fixed order. It holds for any number of dimensions,
unlike the suggestion of Yao and Brockwell (9). The conditions required are weak
and relate to a finite second moment only, unlike the suggestion of Guyon (6).
14
We will also be using the following two conditions. The second condition was
used by Guyon (6). The first part (i) is needed for the consistency of the estimators;
for the asymptotic normality, part (ii) is necessary too.
CONDITION C1. We consider the parameter space Θ ⊂ Rq to be a compact set
containing the true value θ 0 . Further, for any θ ∈ Θ, the moving-average model (3)
is invertible.
CONDITION C2. (i) For a set S ≡ SN ⊂ Zd of cardinality N , we write N → ∞
if the length M of the minimal hypercube including S, say S ⊆ CM , and the length
m of the maximal hypercube included in S, say Cm ⊆ S, are such that M, m → ∞.
(ii) Further, as M, m → ∞ it holds that M/m is bounded away from ∞.
Theorem 1. If {ε(v)} ∼ IID(0, σ 2 ), then under condition (C1), it holds that
P
P
σ̂ 2 −→ σ 2 ,
θ̂ −→ θ 0 ,
as N → ∞ and (C2)(i) holds.
From (13) and (14), we may stack all the q equations together and write
N 1/2 (θ̂ − θ 0 ) = (J/N )−1 (N −1/2
X
HY (v)),
(17)
vτ ∈S ∗
where Jτ = (Jτ1 , · · · , Jτq ) and


 c0 (B)HY (v + i1 )


..
HY (v) ≡ 
.



c0 (B)HY (v + iq )
15



 Y (v), vτ ∈ Zd ,



(18)
which depends on the sampling set S. The next proposition reveals what happens
to the part (J/N ) in (17). Then, Theorem 2 establishes the asymptotic normality
of the estimators.
Proposition 2. Let the polynomial
−1
θ0 (z)
= (1 +
q
X
θin ,0 zin )−1 ≡
n=1
X
Θi,0 zi , Θ0,0 = 1.
i≥0
If {ε(v)} ∼ IID(0, σ 2 ), then under (C1), it holds that

1
0
0
···



 Θ
1
0
···
 i2 −i1 ,0
P
J/N −→ σ 2 Θ0 ≡ σ 2 

..
..

.
.



Θiq −i1 ,0 Θiq −i2 ,0 Θiq −i3 ,0

0 


0 







1
as N → ∞ and (C2)(i) holds.
Theorem 2. Let {W (v)} ∼ IID(0, 1), and the auto-regression {η(v)} defined by
θ0 (B) η(v) ≡ W (v).
Also let the vector ξ ≡ (η(−i1 ), · · · , η(−iq ))τ and the variance matrix
Wq∗ ≡ Var(ξ | W (−i1 − i, i > 0, i 6= i2 − i1 , · · · , iq − i1 )).
If {ε(v)} ∼ IID(0, σ 2 ), then under (C1), it holds that
D
N 1/2 (θ̂ − θ 0 )−→ N (0, Wq∗−1 )
as N → ∞ and (C2) holds.
16
4
Approximate moments estimators
Our estimators have not been defined as minimizers of a random quantity; if that
was the case, we would be able to compute this quantity for as many different values
of the parameter space as possible. Next, we would select this set of values that
would guarantee that the quantity has reached its minimum. Instead, we have to
propose a different way of getting close enough to our estimators and, consequently,
to their properties.
We consider that we have initial estimators θ̃ (0) , which need to be consistent.
These estimates must either be already available, or they must be computed from
the same set of observations, before the moments estimators are derived. Later, we
will refer to different ways to define consistent estimators. For now, similarly to
(17), we define
θ̃ ≡ (J̃)−1
X
H̃Y (v) + θ̃ (0) ,
(19)
vτ ∈S ∗
where J̃ is a (q × q) matrix with (n, m) − th element equal to
J˜n,m ≡
X
(θ̃(B)−1 HY (v + in − im )
{c̃(B)
vτ ∈S ∗
+ θ̃(B−1 )−1 HY (v + in + im ))
and
}Y (v)


 c̃(B) HY (v + i1 )


..
H̃Y (v) ≡ 
.



c̃(B) HY (v + iq )



 Y (v).



We consider the polynomials θ̃(z), c̃(z) to refer to the estimators θ̃ (0) . The estimators defined by (19) can be computed, if the original estimators θ̃ (0) and the
17
observations {Y (v), vτ ∈ S} are available.
The fact that we may compute the estimators θ̃ instead, but still derive the
statistical properties of θ̂, comes straight from the following arguments. We can
write another Taylor expansion from (11)
X
{c̃(B) HY (v + in )}Y (v) − J̃∗n (θ̂ − θ̃ (0) ) = 0, n = 1, · · · , q,
(20)
vτ ∈S ∗
∗
∗
where we write J̃∗n = (J˜n,1
, · · · , J˜n,q
) with elements
∗
J˜n,m
= J˜n,m + OP (N ||θ̂ − θ̃ (0) ||), n, m = 1, · · · , q.
Since both the estimators θ̂, using all the q ? equations to define them, and θ̃ (0) are
consistent, it holds that
P
∗
J˜n,m
/N − J˜n,m /N −→ 0, n, m = 1, · · · , q,
as N → ∞ and (C2)(i) holds. If we consider J̃∗ the (q × q) matrix with elements
∗
J˜n,m
, n, m = 1, · · · , q, we can put together all the equations (20)
X
θ̂ = (J̃∗ )−1
H̃Y (v) + θ̃ (0) .
(21)
vτ ∈S ∗
Thus, from (19) and (21), we may conclude that
N
1/2
n
∗
−1
(θ̂ − θ̃) = (J̃ /N )
−1
− (J̃/N )
o
N −1/2
X
H̃Y (v),
vτ ∈S ∗
and all the elements of this vector tend to 0 in probability, as N → ∞ and (C2)
holds; this is exactly what we want.
Finally, if a set of initial estimates is not available, we will also need to define
consistent estimators, prior to finding our moments estimates. Consistency is a minimal statistical requirement and the estimators of the parameters for models on Zd
18
are not deprived of it, if d > 1. Thus, all the modifications that have been proposed on the standard estimators, are there to maintain the asymptotic normality.
The standard maximum Gaussian likelihood estimators are consistent. For example,
both Guyon (6) and Yao and Brockwell (9) have demonstrated that. Again, no more
than a finite second-moment condition is needed, in order to secure the consistency.
Appendix: Outline proofs
A.1. Proof of Proposition 1
Under the invertibility condition for the polynomials θ(z), there is a one-to-one
correspondence between the q coefficients θi1 , · · · , θiq and the q ? auto-correlations
ρi1 , · · · , ρiq , ρi2 −i1 , · · · , ρiq −i1 , where ρj = γj /γ0 , jτ ∈ Zd .
We know that the coefficients θin ,0 , n = 1, · · · , q, generate the numbers cj,0 , jτ ∈
Zd , and they can be a solution to the the q ? equations of interest. Let us also imagine
that there is another solution, say θin ,1 , n = 1, · · · , q, generating cj,1 , jτ ∈ Zd , for
which it holds that
X
ρj,0 cj−in ,1 = −cin ,1 , n = 1, · · · , q,
jτ ∈F, j6=0
X
ρj,0 cj−(in −im ),1 = −cim −in ,1 , n, m = 1, · · · , q, n > m.
jτ ∈F , j6=0
19
On the other hand, the general Yule-Walker equations for this solution imply that
X
jτ ∈F,
X
jτ ∈F ,
ρj,1 cj−in ,1 = −cin ,1 , n = 1, · · · , q,
j6=0
ρj,1 cj−(in −im ),1 = −cim −in ,1 , n, m = 1, · · · , q, n > m.
j6=0
Thus, we may derive the q ? linear equations with q ∗ unknowns
X
(ρj,0 − ρj,1 ) cj−in ,1 = 0, n = 1, · · · , q,
jτ ∈F, j6=0
X
jτ ∈F ,
(ρj,0 − ρj,1 ) cj−(in −im ),1 = 0, n, m = 1, · · · , q, n > m,
j6=0
with a unique solution, since the auto-covariances cj,1 , jτ ∈ Zd , refer to a (weakly)
stationary process, say {X1 (v)}. Indeed, it holds then that the variance matrix
?
[cln +lm ,1 + cln −lm ,1 ]qn,m=1 =
C1 τ
, l ∈ F+ , n = 1, · · · , q ? ,
2 n
with
?
C1 = [Cov(X1 (v + ln ) + X1 (v − ln ), X1 (v + lm ) + X1 (v − lm )]qn,m=1 .
has an inverse, and there is a unique solution to our equations
ρj,1 = ρj,0 , jτ ∈ F .
20
A.2. Proof of Theorem 1
We can re-write (11) as
X
X
ĉj Y (v + in − j)Y (v)/N ∗
vτ ∈S ∗ jτ −iτn ∈Fv
X
=
ĉj
jτ −iτn ∈Zd
X
−
X
Y (v + in − j)Y (v)/N ∗
vτ ∈S ∗
X
ĉj Y (v + in − j)Y (v)/N ∗ = 0, n = 1, · · · , q.
(22)
vτ ∈S ∗ jτ −iτn ∈F
/ v
Under the assumption that {ε(v)} is a sequence of independent and identically
distributed random variables, we can derive as N → ∞ that
X
P
Y (v + in − j)Y (v)/N ∗ −→ E{Y (v + in − j)Y (v)} = σ 2 γj−in ,0 ,
vτ ∈S ∗
according to Proposition 6.3.10 or the Weak Law of Large Numbers and Proposition 7.3.5 of Brockwell and Davis (4), which can be extended to include the cases
d ≥ 2. Then for n = 1, · · · , q, for the first of two terms in (22), we can write
X
ĉj
X
Y (v + in − j)Y (v)/N ∗ − σ 2
vτ ∈S ∗
jτ −iτn ∈Zd
X
P
ĉj γj−in ,0 −→ 0
jτ −iτn ∈Zd
or
X
ĉj
jτ −iτn ∈Zd
X
Y (v + in − j)Y (v)/N ∗ − σ 2
vτ ∈S ∗
X
P
ĉj γj−in ,0 −→ 0.
jτ −iτn ∈F
For the second term, we may write
E|
X
X
ĉj Y (v + in − j)Y (v)/N ∗ |
vτ ∈S ∗ jτ −iτn ∈F
/ v
≤
X
X
E|ĉj Y (v + in − j)Y (v)|/N ∗
vτ ∈S ∗ jτ −iτn ∈F
/ v
≤
X
X
E(ĉ2j )1/2 E(Y (v + in − j)2 Y (v)2 )1/2 /N ∗
vτ ∈S ∗ jτ −iτn ∈F
/ v
= E(Y (v)2 )
X
X
E(ĉ2j )1/2 /N ∗ ,
vτ ∈S ∗ jτ −iτn ∈F
/ v
21
due to the Cauchy-Schwartz inequality and the independence of the random variables
Y (v), Y (v − j), jτ ∈
/ F. Now for any θ ∈ Θ, it holds that cj ≡ cj (θ) is the
corresponding auto-covariance function of a causal auto-regression. This guarantees
that the auto-covariance function decays at an exponential rate and we can find
constants C(θ) > 0 and α(θ) ∈ (0, 1), such that we can write
Pd
cj (θ)2 ≤ C(θ) α(θ)
k=1
|jk |
, j = (j1 , · · · , jd ).
Similarly for the estimator θ̂, we can write
Pd
ĉ2j ≤ C(θ̂) α(θ̂) ≤ sup C(θ) α(θ)
k=1
|jk |
θ∈Θ
≤ sup C(θ) {sup α(θ)}
θ∈Θ
Pd
k=1
|jk |
θ∈Θ
with probability 1 and, thus,
E(ĉ2j ) ≤ sup C(θ) {sup α(θ)}
θ∈Θ
Pd
k=1
|jk |
.
θ∈Θ
For the case of observations on a hyper-rectangle, if (C2)(ii) holds, we can easily
verify that
X
X
E(ĉ2j ) = O(N (d−1)/d ).
vτ ∈S ∗ jτ −iτn ∈F
/ v
For example, we can see the arguments of Yao and Brockwell (9) when d = 2. In
general, we can write that
X
X
E(ĉ2j )/N ∗ → 0
vτ ∈S ∗ jτ −iτn ∈F
/ v
and that
X
X
P
ĉj Y (v + in − j)Y (v)/N ∗ −→ 0,
vτ ∈S ∗ jτ −iτn ∈F
/ v
22
as (C2)(i) holds.
After combining the two results for the terms of (22), we may write that
X
X
ĉj Y (v + in − j)Y (v)/N ∗ − σ 2
vτ ∈S ∗ jτ −iτn ∈Fv
X
P
ĉj γj−in ,0 −→ 0, n = 1, · · · , q,
jτ −iτn ∈F
where the first term has been defined to be equal to 0. Thus,
X
P
ĉj γj−in ,0 −→ 0
jτ −iτn ∈F
exactly like the theoretical analogue
X
cj,0 γj−in ,0 = 0
jτ −iτn ∈F
would imply. Since we have used q ? instead of q equations, there is a unique solution
θ 0 , according to Proposition 1, and
P
θ̂ −→ θ 0
as N → ∞ and (C2)(i) holds. Finally, the consistency for θ̂ implies, according to
(16), that
P
σ̂ 2 −→ σ 2
X
cj,0 γj,0 = σ 2 ,
(23)
jτ ∈F
since
X
cj,0 γj,0 = 1.
jτ ∈F
A.3. Proof of Proposition 2
According to (15), for the (n, m)-th element of J/N , it suffices to look at
X
{c0 (B)(
θ0 (B)−1 HY (v + in − im )
vτ ∈S ∗
+ θ0 (B−1 )−1 HY (v + in + im ) )}Y (v)/N
+ oP (1), n, m = 1, · · · , q,
23
(24)
where the last term tends to 0 in probability, thanks to the consistency of the
estimators from the use of all the q ? equations. If we define the polynomial
d0 (z) ≡ θ0 (z)−1 c0 (z) = θ0 (z)−1 c0 (z−1 ) ≡
X
di,0 zi
iτ ∈Zd
then, for the second term in (24), we can write
X
(d0 (B−1 )HY (v + in + im )) Y (v)/N
vτ ∈S ∗
X
=
(d0 (B−1 )Y (v + in + im )) Y (v)/N + oP (1).
vτ ∈S ∗
This comes straight from the fact that
E|1/N
X
vτ ∈S ∗
X
≤ 1/N
X
{
di,0 Y (v + in + im + i)}Y (v)|
−iτn −iτm −iτ ∈F
/ v
X
|di,0 | E|Y (v + in + im + i)Y (v)|
vτ ∈S ∗ −iτn −iτm −iτ ∈F
/ v
= (E|Y (v|)2 1/N
X
X
vτ ∈S ∗
−iτn −iτm −iτ ∈F
/ v
|di,0 | → 0,
as N → ∞ and (C2)(i) holds. The limit comes from the same argument, as for the
proof of the consistency for the estimators. For example, if (C2)(ii) is true, we can
write
P
P
vτ ∈S ∗
iτ ∈F
/ v
|di,0 | = O(N (d−1)/d ), since for any iτ = (i1 , · · · , id )τ ∈ Zd , it
holds that |di,0 | ≤ Cα
Pd
k=1
|ik |
for constants C > 0 and α ∈ (0, 1). Similar action
might be taken for the first term in (24).
For the auto-regression {X(v)}, as it was defined in (4), we can see immediately
that Y (v) is uncorrelated to X(v + i), i > 0, since the latter is a linear function of
‘future’ error terms only. In general, we can write according to (6) that
E(Y (v)X(v − i)) =
X
γj,0 E(X(v − j)X(v − i)) = σ 2
jτ ∈F
X
jτ ∈F
24
γj,0 cj−i,0 ,
which brings us back to the general Yule-Walker equations. Thus,
E(Y (v)X(v)) = σ 2 ,
E(Y (v)X(v − i)) = 0, i 6= 0,
and Y (v) is uncorrelated to X(v − i) for any i 6= 0. As a result, it holds for
n, m = 1, · · · , q, that
E((θ0 (B−1 )−1 c0 (B)Y (v + in + im )) Y (v))
= E((θ0 (B−1 )−1 X(v + in + im ))Y (v)) = 0
and that
E((θ0 (B)−1 c0 (B)Y (v + in − im )) Y (v))
= E((θ0 (B)−1 X(v + in − im ))Y (v)) = σ 2 Θin −im ,0 , n ≥ m.
The proof is completed when we see that both Y (v) and X(v) are linear functions
of members from the sequence {ε(v)} and, thus,
X
(θ0 (B−1 )−1 c0 (B)Y (v + in + im )) Y (v)/N
vτ ∈S ∗
P
−→
E((θ0 (B−1 )−1 X(v + in + im )) Y (v))
and
X
(θ0 (B)−1 c0 (B)Y (v + in − im )) Y (v)/N
vτ ∈S ∗
P
−→
E((θ0 (B)−1 X(v + in − im ))Y (v)).
25
A.4. Proof of Theorem 2
First, we can write from (13), (17), (18) and for n = 1, · · · , q,
N −1/2
= N −1/2
X X
cj,0 HY (v + in − j) Y (v)
vτ ∈S τ
jτ ∈Zd
vτ ∈S τ
jτ ∈Zd
X X
cj,0 Y (v + in − j) Y (v) + oP (1).
The convergence in probability to zero of the remainder might be justified by the
fact that its expected value is equal to zero, as we explained in the beginning of
Section 3, and that its variance is equal to
Var(N −1/2
X
X
cj,0 Y (v + in − j) Y (v) ≡ Var(
vτ ∈S ∗ jτ −iτn ∈F
/ v
X
ũn (v))/N,
vτ ∈S ∗
where
ũn (v) ≡
X
cj,0 Y (v + in − j) Y (v), n = 1, · · · , q,
jτ −iτn ∈F
/ v
for vτ ∈ Zd and for given sampling set S. First, we see that when {ε(v)} are
independent and identically distributed, then
X
E(ũn (v)2 ) = E(Y (v)2 ) E((
cj,0 Y (v + in − j))2 )
jτ −iτn ∈F
/ v
without the assumption of a finite third or fourth moment. Under (C2)(ii), we can
write that
X
vτ ∈S ∗
Var(ũn (v)) =
X
E(ũn (v)2 ) = O(N (d−1)/d )
vτ ∈S ∗
and a similar argument can be written for the cross-terms due to the CauchySchwartz inequality. For the case d = 2 and observations on a rectangle, we may
find a justification for that in Yao and Brockwell (9). Thus, we can write
Var(N −1/2
X
vτ ∈S ∗
26
ũn (v)) → 0,
as N → ∞ and (C2) holds, which guarantees the convergence in probability to 0.
We can now re-write (17) as
N 1/2 (θ̂ − θ 0 ) = (J/N )−1 (N −1/2
X
U(v)) + oP (1),
(25)
vτ ∈S ∗
where


 X(v + i1 )


..
U(v) ≡ 
.



X(v + iq )



 Y (v), vτ ∈ Zd .



It holds that Y (v) is a linear function of ε(v − i), i = 0, i1 , · · · , iq , and X(v) is a
function of ε(v + i), i ≥ 0. Then for n, m = 1, · · · , q, we can write that
E(X(v + in )Y (v)X(v + im + j)Y (v + j)) =
E(E(X(v + in )Y (v)X(v + im + j)Y (v + j) | ε(v + im + j + i), i ≥ 0)) =
E(Y (v)Y (v + j)) E(X(v + in )X(v + im + j)) = σ 4 γj,0 cj+im −in ,0
for any j ≥ 0. Thus, for
X(v) ≡ (X(v + i1 ), · · · , X(v + iq ))τ ,
Cj,0 ≡ E(X(v)X(v + j)τ )/σ 2 ,
we can write that E(U(v)) = 0 and that
Cov(U(v), U(v + j)) = σ 4 γj,0 Cj,0 , j ≥ 0.
Now, for any positive integer K, we define the set
BK ≡ {(i1 , i2 , · · · , id )τ : i1 = 1, · · · , K, ik = 0, ±1, · · · , ±K, k = 2, · · · , d}
∪ {(0, i2 , · · · , id )τ : j2 = 1, · · · , K, ik = 0, ±1, · · · , ±K, k = 3, · · · , d}
∪ · · · ∪ {(0, 0, · · · , id )τ : id = 1, · · · , K}.
27
According to the MA(∞) representation of X(v), we also define for fixed K the new
process {X (K) (v)} from
X (K) (v) ≡ ε(v) +
X
Θi,0 ε(v + i).
iτ ∈BK
Similarly, we define


 X (K) (v + i1 )


..
(K)
U (v) ≡ 
.



X (K) (v + iq )



 Y (v), vτ ∈ Zd ,



and
X(K) (v) ≡ (X (K) (v + i1 ), · · · , X (K) (v + iq ))τ ,
(K)
Cj,0
≡ E(X(K) (v)X(K) (v + j)τ )/σ 2 .
Then, for the same reasons as before, it holds that E(U(K) (v)) = 0 and that
(K)
Cov(U(K) (v), U(K) (v + j)) = σ 4 γj,0 Cj,0 , j ≥ 0.
For any vector λ ∈ Rq , it holds that {λτ U(K) (v)} is a strictly stationary and
K ∗ -dependent process, for a positive and finite integer K ∗ . The definition of Kdependent processes, as well as a theorem for the asymptotic normality for strictly
stationary and K-dependent processes on Zd might be given similarly to the onedimensional case in Brockwell and Davis (4). Then, we may write that
N −1/2
X
D
λτ U(K) (v)−→ N (0, σ 4 λτ MK λ),
vτ ∈S ∗
as N → ∞ and under (C2), where
(K)
MK ≡ γ0,0 C0,0 +
X
jτ ∈F , j>0
28
(K)
(K)τ
γj,0 (Cj,0 + Cj,0 ).
Similarly, if we define
X
M ≡ γ0,0 C0,0 +
γj,0 (Cj,0 + Cτj,0 ),
(26)
jτ ∈F, j>0
then it holds as K → ∞
λτ MK λ → λτ M λ.
Using Chebychev’s inequality, we may verify that
P (|N −1/2
X
λτ U(v) − N −1/2
vτ ∈S ∗
X
λτ U(K) (v)| > ²)
vτ ∈S ∗
≤ (1/²2 ) (N ∗ /N ) λτ Var(U(v) − U(K) (v)) λ → 0
as K → ∞ and, thus, it holds that
N −1/2
X
D
λτ U(v)−→ N (0, σ 4 λτ M λ),
vτ ∈S ∗
or
N −1/2
X
D
U(v)−→ N (0, σ 4 M),
(27)
vτ ∈S ∗
as N → ∞ and under (C2). According to (26), the (n, m)-th element of M will be
equal to
X
γ0,0 c0+im −in ,0 +
(γj,0 cj+im −in ,0 + γj,0 cj+in −im ,0 )
jτ ∈F ,j>0
X
= γ0,0 c0+im −in ,0 +
γj,0 cj+im −in ,0 +
jτ ∈F ,j>0
X
= γ0,0 c0+im −in ,0 +
=
X
X
γj,0 c−j+in −im ,0
jτ ∈F,j<0
γj,0 cj+im −in ,0 +
jτ ∈F ,j>0
X
γj,0 cj+im −in ,0
jτ ∈F,j<0
γj,0 cj+im −in ,0
jτ ∈F
which brings us back to the general Yule-Walker equations and
M = σ 4 Iq ,
29
with Iq the identity matrix of order q. We may re-write (27)
X
N −1/2
D
U(v)−→ N (0, σ 4 Iq ),
(28)
vτ ∈S ∗
as N → ∞ and under (C2). If we combine (25), (28) and Proposition 2, we can
write that
D
N 1/2 (θ̂ − θ 0 )−→ N (0, (Θτ0 Θ0 )−1 ),
as N → ∞ and under (C2). The proof will be completed when we show that
Θτ0 Θ0 = Wq∗ .
Indeed, we may let the vector W = (W (−i1 ), · · · , W (−iq ))τ and then write
ξ = Θτ0 W + R
where R is a (q × 1) random vector that is independent of W, since it is a linear
function of W (−i1 − i), i > 0, i 6= i2 − i1 , · · · , iq − i1 . The required result might be
obtained then, since
Var(W | W (−i1 − i), i > 0, i 6= i2 − i1 , · · · , iq − i1 ) = Var(W) = Iq .
Acknowledgements
Part of the PhD thesis titled ‘Statistical Inference for Spatial and Spatio-Temporal
Processes’, which has been submitted for the University of London. The PhD studies
were funded by the Leverhulme Trust.
30
References
[1] Anderson, B. D. O. and Jury, E. I. (1974). Stability of Multidimensional
Digital Filters. IEEE Trans. Circuits Syst. I Regul. Pap. 21 300–304.
[2] Besag, J. (1974). Spatial Interaction and the Statistical Analysis of Lattice
Systems (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 36 192–236.
[3] Besag, J. (1975). Statistical Analysis of Non-Lattice Data. Statistician 24 179–
195.
[4] Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed. Springer–Verlag, New York.
[5] Dahlhaus, R. and Künsch, H. (1987). Edge Effects and Efficient Parameter
Estimation for Stationary Random Fields. Biometrika 74 877–882.
[6] Guyon, X. (1982). Parameter Estimation for a Stationary Process on a dDimensional Lattice. Biometrika 69 95–105.
[7] Whittle, P. (1954). On Stationary Processes in the Plane. Biometrika 41
434–449.
[8] Yao, Q. and Brockwell, P. J. (2006). Gaussian Maximum Likelihood Estimation for ARMA models I: Time Series. J. Time Ser. Anal. 27 857–875.
[9] Yao, Q. and Brockwell, P. J. (2006). Gaussian Maximum Likelihood Estimation for ARMA models II: Spatial Processes. Bernoulli 12 403–429.
31
Download