Moments Estimators for Moving-Average Models on Zd Chrysoula Dimitriou-Fakalou chrysoula@stats.ucl.ac.uk From a causal auto-regression that takes place on Zd , we generalize the standard Yule-Walker equations, in order to derive equations that involve the moments corresponding to the invertible moving-average model on Zd , with the same polynomial. With observations from the moving-average process, we imitate these equations and we set the new estimators. Under very weak conditions and for any number of dimensions d, the estimators derived are shown to be consistent and asymptotically normal. We provide the form of their variance matrix too. Key words: Edge-effect; Method of moments estimation; Moving-average model; Yule-Walker equations 1 1 Research report No 287, Department of Statistical Science, University College London. Date: December 2007 1 Introduction The estimation of the parameters for a causal auto-regression on Zd , where Z = {0, ±1, · · · } and d is any positive integer, is considered an easy task. This is because, for any number of dimensions d, a finite auto-regressive representation always allows to a finite number of observations, to be transformed to a finite number of uncorrelated random variables from the starting process, which has caused the process of interest. Nevertheless when d ≥ 2, treating all other stationary processes through their AR(∞) representation during estimation, resurrects the problem known as the ‘edge-effect’. For example, Yao and Brockwell (9) suggested a way of confining the edge-effect for ARMA(p, q) processes on Z2 . On the other hand, an invertible moving-average process possesses an AR(∞) representation, but it also has the privilege of a finite moving-average representation. Guyon (6) used a modification on the spectral domain version of the Gaussian likelihood, as this was proposed by Whittle (7), in order to produce estimators that would defeat the edge-effect for any stationary process on Zd . As one of its special cases, N observations from a moving-average process could also be used to compute estimators, which would have a bias of ideal order N −1 . Moreover, the fact that the moving-average model relates to a finite auto-covariance function, would imply that a simplified version of the Gaussian likelihood should be maximized. Both the finite auto-regressive and moving-average models relate to exceptional second-order properties, which are reflected in the spectral densities of interest. An auto-regression has a spectral density with finite denominator; for Gaussian pro- 1 cesses on Z2 , Besag (2) demonstrated how this is translated into a finite conditional expectation of the value on one location, based on the values of a finite set of neighboring locations. A moving-average process has a spectral density with finite numerator and, thus, an auto-covariance function that takes non-zero values only on a finite set of lags of Zd . The need to establish results for estimators of parameters relating to processes taking place on Zd for any positive integer d, stems directly from the growing use of spatial and spatio-temporal statistics. A spatial process is often taking place on two dimensions of a surface; it has been claimed that a spatial ARMA model cannot be meaningful, as it is based on an unnatural, unilateral ordering of locations implied by the causality and invertibility conditions. In that case, it might be useful to resort to the revolutionary analysis introduced by Besag (2) or (3) for data observed on regular or irregular sites, respectively. Nevertheless, the spatio-temporal ARMA model for processes taking place on more than two dimensions, can be an extremely useful tool to adopt the second-order properties of interest; the inclusion of the time axis then, gives a meaningful interpretation to the unilateral ordering of locations. In this paper, we will demonstrate a new way of defining estimators for invertible moving-average models of fixed order on Zd . The new estimators are consistent and asymptotically normal, as they have escaped the edge-effect. They have been defined as solutions of equations, similarly to the Yule-Walker estimators for causal auto-regressions, which are also least squares and maximum conditional Gaussian likelihood estimators. Unlike the modified likelihood estimators of Guyon (6), our method of moments estimators have been defined especially for the case of the 2 moving-average models and our methods follow the time domain. Our attempt has been to highlight the special second-order characteristics of a moving-average process, and use them most advantageously for estimation, like we are used to with the case of an auto-regression of fixed order. While Guyon (6) requires the strong condition of a finite fourth moment, in order to use the estimators on any number of dimensions, and Yao and Brockwell (9) relax this condition but deal with the two-dimensional processes only, for our new results to be established on any number of dimensions, we require the weak condition of a finite second moment only. This is the same condition that has been used when estimating the parameters of causal auto-regressions on Zd . 1 Theoretical moments equations We write ‘τ ’ for the transpose operator. For any vτ ∈ Zd , we consider the invertible moving-average process {Y (v)} defined by the equation Y (v) = ε(v) + q X θin ε(v − in ), (1) n=1 where {ε(v)} is a sequence of uncorrelated zero-mean random variables with variance σ 2 . In (1), we have considered that a unilateral ordering of the fixed lags, 0 < i1 < · · · < iq , is taking place. When d = 1, this is the standard ordering on Z, and when d = 2, we refer to the conventional unilateral ordering, as it was introduced by Whittle (7). The case of ordering two locations on Zd when d ≥ 3, was generalized by Guyon (6). As for the invertibility condition, it is only imposed to make sure 3 that we can write ε(v) = Y (v) + X Θi Y (v − i), i>0 X |Θi | < ∞, i>0 which also uses the unilateral ordering i > 0. For a sufficient condition on the invertibility of the filter of interest, we refer to Anderson and Jury (1). We consider the complex numbers z1 , · · · , zd and the vector z = (z1 , · · · , zd ), such that we can define the polynomial θ(z) ≡ 1 + q X θin zin . (2) n=1 In (2), we consider that zi ≡ z(i1 ,··· ,id ) = z1i1 · · · zdid . Similarly, for the backwards operators B1i1 ε(v1 , v2 · · · , vd ) ≡ ε(v1 − i1 , v2 · · · , vd ), .. . Bdid ε(v1 , · · · , vd−1 , vd ) ≡ ε(v1 , · · · , vd−1 , vd − id ) and the vector backwards operator Bi ε(v) ≡ (B1i1 , · · · , Bdid ) ε(v) ≡ ε(v − i). We may now re-write (1) as Y (v) = θ(B) ε(v). (3) From the same sequence {ε(v)}, we define the unilateral auto-regression {X(v)} from the equation −1 θ(B ) X(v) = X(v) + q X n=1 4 θin X(v + in ) = ε(v). (4) The auto-regression is unilateral but not causal, in the sense that we can write X(v) = ε(v) + X Θi ε(v + i), X i>0 |Θi | < ∞, i>0 as a function of the ‘future’ terms ε(v + i), i ≥ 0. Moreover, if we write the two polynomials γ(z) = θ(z) θ(z−1 ) ≡ X γi z i , c(z) = γ(z)−1 ≡ iτ ∈F X ci zi , (5) iτ ∈Zd then it holds that Y (v) = γ(B)X(v), X(v) = c(B)Y (v). (6) In (5) the set F ⊂ Zd is a set with a finite cardinality, such that it holds F ≡ {iτ : i = in , −in , in − im , n, m = 1, · · · , q}. We write (2q ? + 1) for the cardinality of F, where q ? is the cardinality of the set, say F+ ≡ {iτ : i = in , in − im , n, m = 1, · · · , q, n > m}. The original Yule-Walker equations for the unilateral auto-regression {X(v)} dictate c0 + q X θin cin = 1 (7) θin ci−in = 0 i > 0, (8) n=1 and ci + q X n=1 where E(X(v)X(v − i)) = σ 2 ci . Indeed, the spectral density of the auto-regression can be written as gX (ω1 , · · · , ωd ) ≡ 1 σ2 σ2 = c(z), (2π)d θ(z) θ(z−1 ) (2π)d zk = e−iωk , ωk ∈ (−π, π), k = 1, · · · , d, 5 according to (5). Similarly, the spectral density for the moving average process is gY (ω1 , · · · , ωd ) ≡ σ2 σ2 −1 θ(z) θ(z ) = γ(z), (2π)d (2π)d zk = e−iωk , ωk ∈ (−π, π), k = 1, · · · , d, which also implies that E(Y (v)Y (v − i)) = σ 2 γi . According to (8), we may write for i > 0 the following (q + 1) equations ci + θi1 ci−i1 + θi2 ci−i2 + · · · + θiq ci−iq = 0 θi1 ci+i1 + θi21 ci + θi1 θi2 ci+i1 −i2 + · · · + θi1 θiq ci+i1 −iq = 0 .. . θiq ci+iq + θiq θi1 ci+iq −i1 + θiq θi2 ci+iq −i2 + · · · + θi2q ci = 0 and their sum X γj ci−j = jτ ∈F X X γj cj−i = γ−j c−j−i = −jτ ∈F jτ ∈F X γj cj+i = 0. jτ ∈F We may re-write it as X cj γj+i = 0, i 6= 0. (9) jτ +iτ ∈F Similarly to (9), we can create the equation X cj γj = 1 jτ ∈F from (7) and (8). We may explicitly derive (9) and (10) from the fact that γ(z) c(z) = 1. 6 (10) However, the way we have chosen to derive these equations, reveals that they are a more general version of the Yule-Walker equations; they have been constructed according to the coefficients θi1 , · · · , θiq , similarly to the standard YuleWalker equations. Nevertheless, they involve the non-zero auto-covariances of the process {Y (v)}, as well as auto-covariances of the process {X(v)}, rather than the coefficients θi1 , · · · , θiq , themselves. Thus, for the parameters of a moving-average model, they will be used as theoretical prototypes to be imitated by their sample analogues for the sake of estimation. 2 Method of moments estimators We have now recorded observations {Y (v), vτ ∈ S} from the moving-average process defined by (3), where S ⊂ Zd is a set of finite cardinality N . We are interested in estimating the q moving-average parameters from the recordings we have available. Given the set S, we define for any vτ ∈ Zd , the set Fv ⊂ Zd . We consider iτ ∈ Fv , if and only if vτ − iτ ∈ S. Next, we consider the corrected set S ∗ with N ∗ elements, such that vτ ∈ S ∗ , if and only if vτ + iτ ∈ S for every iτ ∈ F . This implies that for any vτ ∈ S ∗ , it holds that F ⊆ Fv . We imitate (9) and we set the estimators θ̂ = (θ̂i1 , · · · , θ̂iq )τ to be the solutions of equations X vτ ∈S ∗ { X ĉj Y (v + in − j)}Y (v) ≡ 0, n = 1, · · · , q, jτ −iτn ∈Fv 7 (11) where, of course, θ̂(z) ≡ 1 + q X θ̂in , ĉ(z) ≡ n=1 X ĉi zi ≡ {θ̂(z) θ̂(z−1 )}−1 . iτ ∈Zd However, based on the next proposition, we will additionally use the equations X vτ ∈S ∗ { X ĉj Y (v + in − im − j)}Y (v) ≡ 0, n, m = 1, · · · , q, n > m, (12) jτ −(iτn −iτm )∈Fv in order to make sure that the estimators are consistent later on. The next proposition guarantees that the use of q ? , instead of q equations, is sufficient, under the invertibility condition of the process of interest, to provide a unique solution for the non-linear equations, which have been used as prototypes for estimation. Finally, we need to make clear that our moments estimators are not identical to the estimators, say θ̃ (0) , derived by setting q ? unbiased sample auto-covariances with as many pairs as possible, say, P γ̃i = γ̃−i = v Y (v)Y (v − i) τ , i ∈ F+ Ni equal to the relevant functions of θ̃ (0) in γ̃(z) = θ̃(z)θ̃(z−1 ). An indication for that is that our equations use different pairs of observations for any two lags i and −i, where iτ ∈ F+ . In Section 2.1, this becomes obvious for the simple case of an MA(1) model on Z. Moreover, we always choose to use as many pairs as the elements of S ∗ , though more pairs might be available from the sample, for specific lag iτ ∈ F+ . The difference between the two methods might look minimal, especially when d = 1. However, the steps we suggest need to be 8 followed exactly, in order to take advantage later on, of the fact that Y (v − i) and X(v) are two independent random variables when i > 0 only, if {ε(v)} is a sequence of independent random variables; the same cannot be said for Y (v + i) and X(v) for i ≥ 0. Thus, using this property will allow us to relax our conditions, in order to establish the asymptotic normality. For example, if in (11) we had used Y (v − in − j), jτ + iτn ∈ Fv , vτ ∈ S ∗ , instead of Y (v + in − j), for at least one n = 1, · · · , q, we would not be able to proceed without a condition on a finite fourth moment, unless we could assume that {u(v)} is also a sequence of independent random variables, where we define Y (v) ≡ θ(B−1 )u(v) or θ(B)X(v) ≡ u(v). Furthermore, the passage from the coefficients of γ̃(z) to the estimators θ̃ (0) in θ̃(z), would not be immediate nor cheap; in the end of this paper, we propose a straightforward way of approximating our estimators θ̂, instead, using the equations according to which we have defined them. Proposition 1. For the given lags 0 < i1 < · · · < iq , for the set F, and for fixed numbers θi1 ,0 , · · · , θiq ,0 , such that it holds that θ0 (z) = 1 + q X θin ,0 zin , θ0 (z)−1 = 1 + n=1 X i>0 Θi,0 zi , X |Θi,0 | < ∞, i>0 we define the polynomials γ0 (z) = θ0 (z)θ0 (z−1 ) = X γj,0 zj , jτ ∈F c0 (z) = γ0 (z)−1 = X cj,0 zj . jτ ∈Zd Then, if we consider all the possible sets of numbers θi1 , · · · , θiq , such that it holds 9 that θ(z) = 1 + q X θin zin , θ(z)−1 = 1 + n=1 γ(z) = θ(z)θ(z−1 ) = X X i>0 γj z j , c(z) = γ(z)−1 = jτ ∈F Θi zi , X X |Θi | < ∞, i>0 cj zj , jτ ∈Zd and the q ? equations X γj,0 cj−in = 0, n = 1, · · · , q, jτ ∈F X γj,0 cj−(in −im ) = 0, n, m = 1, · · · , q, n > m, jτ ∈F then the unique set of solutions that satisfies the above equations is θin = θin ,0 , n = 1, · · · , q. For mathematical convenience, we define a new variable that depends on the sampling set, HY (v) ≡ Y (v), vτ ∈ S 0, otherwise and we may re-write (11) as X { X cj,0 HY (v + in − j)}Y (v) − Jn (θ̂ − θ 0 ) = 0, n = 1, · · · , q, (13) vτ ∈S ∗ jτ ∈Zd or X {c0 (B) HY (v + in )}Y (v) − Jn (θ̂ − θ 0 ) = 0, n = 1, · · · , q, (14) vτ ∈S ∗ where θ 0 = (θi1 ,0 , · · · , θiq ,0 )τ is the true parameter vector and we denote with zero sub-index all the quantities that refer to it. In (13) and (14), we write Jn = 10 (Jn,1 , · · · , Jn,q ) with elements Jn,m = X (θ0 (B)−1 HY (v + in − im ) {c0 (B) vτ ∈S ∗ + θ0 (B−1 )−1 HY (v + in + im )) }Y (v) + OP (N ||θ̂ − θ 0 ||). (15) Finally, if we imitate (10), we may also define the estimator of the error variance σ̂ 2 ≡ X jτ ∈F 2.1 ĉj X Y (v)Y (v − j)/N ∗ . (16) vτ ∈S ∗ A special case For the sake of example, we make a reference to the one-dimensional processes. In the cases where d = 1, it holds that q = q ? , as we can always write in − im = ir , n, m = 1, · · · , q, for some r = 1, · · · , q. For the simplest case when q = 1, we record observations {Y (v), v = 1, · · · , N } from the process defined by Y (v) = e(v) + θ e(v − 1), |θ| < 1, θ 6= 0, where {e(v)} is a sequence of uncorrelated random variables with variance unity. Our estimator θ̂ comes as a solution of the quadratic equation N −1 X v=2 Y (v)Y (v + 1) − θ̂ N −1 X 2 Y (v) + θ̂ v=2 2 N −1 X Y (v)Y (v − 1) = 0, v=2 which reduces to qP PN −1 PN −1 PN −1 −1 2 2 Y (v) ± ( N v=2 Y (v)Y (v + 1)) v=2 v=2 Y (v)Y (v − 1))( v=2 Y (v)) − 4( θ̂ = PN −1 2 v=2 Y (v)Y (v − 1) 11 or θ̂ = with PN −1 ρ̂+ 1 = v=2 1± Y (v)Y (v − 1) PN −1 v=2 Y (v)2 p − 1 − 4ρ̂+ 1 ρ̂1 , 2ρ̂+ 1 PN −1 ρ̂− 1 , = Y (v)Y (v + 1) . PN −1 2 Y (v) v=2 v=2 For the actual auto-correlation at lags ±1 ρ1 = θ , 1 + θ2 it holds that |ρ1 | < 1/2 and D = 1 − 4ρ1 ρ1 > 0. As a result, if 0 < θ < 1 and 0 < ρ1 < 1/2, then 1/2ρ1 > 1 and the value 1+ √ √ 1 − 4ρ1 ρ1 1 D = + 2ρ1 2ρ1 2ρ1 is bigger than 1. If, on the other hand, −1 < θ < 0 and −1/2 < ρ1 < 0, then 1/2ρ1 < −1 and the same value 1+ √ √ 1 − 4ρ1 ρ1 D 1 = + 2ρ1 2ρ1 2ρ1 is smaller than −1. Thus, we conclude with the estimator of the parameter qP PN −1 PN −1 PN −1 −1 2 2 ( N v=2 Y (v)Y (v + 1)) v=2 Y (v) − v=2 Y (v)Y (v − 1))( v=2 Y (v)) − 4( . θ̂ = PN −1 2 v=2 Y (v)Y (v − 1) Though we cannot really guarantee for the distribution of θ̂ for small sample sizes, − as |ρ̂+ 1 |, |ρ̂1 | might be larger than or equal to 1/2 with positive probability, we will establish next, that our estimators are consistent and that they are asymptotically normal. For the cases of d = 1 dimension, the use of maximum Gaussian likelihood 12 arguments might improve dramatically the properties of the estimator presented above; however, the edge-effect does not allow to the standard Gaussian likelihood estimators to be asymptotically normal for more than two dimensions. Thus, our methods can be more useful for higher dimensionalities and for large sample sizes, due to the weak conditions required, in order to establish the consistency and the asymptotic normality next. 3 Properties of estimators In this section, we establish the consistency and the asymptotic normality of our estimators. While the consistency is a straightforward derivation from the definition of our estimators, the asymptotic normality for estimators of the parameters of stationary processes on Zd is, in general, problematic when d ≥ 2. As Guyon (6) demonstrated, there is a bias, which is of order N −1/d , while we would want the absolute bias to tend asymptotically to 0, if multiplied by N 1/2 . This happens for sure only when d = 1; for example, we refer to Yao and Brockwell (8). Regarding our estimators, in (13) we can see that the biases will come from the expected value of the quantities X { X cj,0 (Y (v + in − j) − HY (v + in − j))Y (v)}/N ∗ , n = 1, · · · , q, vτ ∈S ∗ jτ ∈Zd which express what is ‘missing’ from the sample. The good news is that, if the biases are multiplied by N ∗1/2 , then they produce N ∗−1/2 X X vτ ∈S ∗ vτ +iτn −jτ ∈S / cj,0 E{Y (v + in − j)Y (v)} = 0, n = 1, · · · , q, 13 which are zero due to our selections in S ∗ . This is a consequence of the special characteristic of a moving-average process, that the auto-covariance function cuts off to zero outside a finite set of lags. If the auto-covariance function was decaying at an exponential rate, a different sequel should be followed for each dimensionality d. For example, Yao and Brockwell (9) have proposed a modification of mathematical nature on the sampling set, which works for ARMA models on Z2 only. While Yao and Brockwell (9) have used a series of mathematical arguments to deal with the edge-effect for specific number of dimensions, Guyon (6) has cancelled the edge-effect for any positive integer number of dimensions, using some instinctive, spectral domain arguments. The spectral domain version of Gaussian likelihood, as this was given by Whittle (7), involves the sample auto-covariances and, thus, the unbiased estimators were plugged-in there; that would imply that this could only be a modification of the likelihood, as the auto-covariances used, would not necessarily correspond to a non-negative definite sample variance matrix. Dahlhaus and Künsch (5) dealt with such problems, but paid the price of having to restrict the dimensionality d, in order to secure their results. Finally, the conditions used by Guyon (6) to obtain the asymptotic normality of the estimators would be strong and would require a finite fourth moment of the process of interest. Our suggestion skips the Gaussian likelihood and uses the moments or general Yule-Walker equations for estimation. It follows the time domain and can be applied for moving-average models of fixed order. It holds for any number of dimensions, unlike the suggestion of Yao and Brockwell (9). The conditions required are weak and relate to a finite second moment only, unlike the suggestion of Guyon (6). 14 We will also be using the following two conditions. The second condition was used by Guyon (6). The first part (i) is needed for the consistency of the estimators; for the asymptotic normality, part (ii) is necessary too. CONDITION C1. We consider the parameter space Θ ⊂ Rq to be a compact set containing the true value θ 0 . Further, for any θ ∈ Θ, the moving-average model (3) is invertible. CONDITION C2. (i) For a set S ≡ SN ⊂ Zd of cardinality N , we write N → ∞ if the length M of the minimal hypercube including S, say S ⊆ CM , and the length m of the maximal hypercube included in S, say Cm ⊆ S, are such that M, m → ∞. (ii) Further, as M, m → ∞ it holds that M/m is bounded away from ∞. Theorem 1. If {ε(v)} ∼ IID(0, σ 2 ), then under condition (C1), it holds that P P σ̂ 2 −→ σ 2 , θ̂ −→ θ 0 , as N → ∞ and (C2)(i) holds. From (13) and (14), we may stack all the q equations together and write N 1/2 (θ̂ − θ 0 ) = (J/N )−1 (N −1/2 X HY (v)), (17) vτ ∈S ∗ where Jτ = (Jτ1 , · · · , Jτq ) and c0 (B)HY (v + i1 ) .. HY (v) ≡ . c0 (B)HY (v + iq ) 15 Y (v), vτ ∈ Zd , (18) which depends on the sampling set S. The next proposition reveals what happens to the part (J/N ) in (17). Then, Theorem 2 establishes the asymptotic normality of the estimators. Proposition 2. Let the polynomial −1 θ0 (z) = (1 + q X θin ,0 zin )−1 ≡ n=1 X Θi,0 zi , Θ0,0 = 1. i≥0 If {ε(v)} ∼ IID(0, σ 2 ), then under (C1), it holds that 1 0 0 ··· Θ 1 0 ··· i2 −i1 ,0 P J/N −→ σ 2 Θ0 ≡ σ 2 .. .. . . Θiq −i1 ,0 Θiq −i2 ,0 Θiq −i3 ,0 0 0 1 as N → ∞ and (C2)(i) holds. Theorem 2. Let {W (v)} ∼ IID(0, 1), and the auto-regression {η(v)} defined by θ0 (B) η(v) ≡ W (v). Also let the vector ξ ≡ (η(−i1 ), · · · , η(−iq ))τ and the variance matrix Wq∗ ≡ Var(ξ | W (−i1 − i, i > 0, i 6= i2 − i1 , · · · , iq − i1 )). If {ε(v)} ∼ IID(0, σ 2 ), then under (C1), it holds that D N 1/2 (θ̂ − θ 0 )−→ N (0, Wq∗−1 ) as N → ∞ and (C2) holds. 16 4 Approximate moments estimators Our estimators have not been defined as minimizers of a random quantity; if that was the case, we would be able to compute this quantity for as many different values of the parameter space as possible. Next, we would select this set of values that would guarantee that the quantity has reached its minimum. Instead, we have to propose a different way of getting close enough to our estimators and, consequently, to their properties. We consider that we have initial estimators θ̃ (0) , which need to be consistent. These estimates must either be already available, or they must be computed from the same set of observations, before the moments estimators are derived. Later, we will refer to different ways to define consistent estimators. For now, similarly to (17), we define θ̃ ≡ (J̃)−1 X H̃Y (v) + θ̃ (0) , (19) vτ ∈S ∗ where J̃ is a (q × q) matrix with (n, m) − th element equal to J˜n,m ≡ X (θ̃(B)−1 HY (v + in − im ) {c̃(B) vτ ∈S ∗ + θ̃(B−1 )−1 HY (v + in + im )) and }Y (v) c̃(B) HY (v + i1 ) .. H̃Y (v) ≡ . c̃(B) HY (v + iq ) Y (v). We consider the polynomials θ̃(z), c̃(z) to refer to the estimators θ̃ (0) . The estimators defined by (19) can be computed, if the original estimators θ̃ (0) and the 17 observations {Y (v), vτ ∈ S} are available. The fact that we may compute the estimators θ̃ instead, but still derive the statistical properties of θ̂, comes straight from the following arguments. We can write another Taylor expansion from (11) X {c̃(B) HY (v + in )}Y (v) − J̃∗n (θ̂ − θ̃ (0) ) = 0, n = 1, · · · , q, (20) vτ ∈S ∗ ∗ ∗ where we write J̃∗n = (J˜n,1 , · · · , J˜n,q ) with elements ∗ J˜n,m = J˜n,m + OP (N ||θ̂ − θ̃ (0) ||), n, m = 1, · · · , q. Since both the estimators θ̂, using all the q ? equations to define them, and θ̃ (0) are consistent, it holds that P ∗ J˜n,m /N − J˜n,m /N −→ 0, n, m = 1, · · · , q, as N → ∞ and (C2)(i) holds. If we consider J̃∗ the (q × q) matrix with elements ∗ J˜n,m , n, m = 1, · · · , q, we can put together all the equations (20) X θ̂ = (J̃∗ )−1 H̃Y (v) + θ̃ (0) . (21) vτ ∈S ∗ Thus, from (19) and (21), we may conclude that N 1/2 n ∗ −1 (θ̂ − θ̃) = (J̃ /N ) −1 − (J̃/N ) o N −1/2 X H̃Y (v), vτ ∈S ∗ and all the elements of this vector tend to 0 in probability, as N → ∞ and (C2) holds; this is exactly what we want. Finally, if a set of initial estimates is not available, we will also need to define consistent estimators, prior to finding our moments estimates. Consistency is a minimal statistical requirement and the estimators of the parameters for models on Zd 18 are not deprived of it, if d > 1. Thus, all the modifications that have been proposed on the standard estimators, are there to maintain the asymptotic normality. The standard maximum Gaussian likelihood estimators are consistent. For example, both Guyon (6) and Yao and Brockwell (9) have demonstrated that. Again, no more than a finite second-moment condition is needed, in order to secure the consistency. Appendix: Outline proofs A.1. Proof of Proposition 1 Under the invertibility condition for the polynomials θ(z), there is a one-to-one correspondence between the q coefficients θi1 , · · · , θiq and the q ? auto-correlations ρi1 , · · · , ρiq , ρi2 −i1 , · · · , ρiq −i1 , where ρj = γj /γ0 , jτ ∈ Zd . We know that the coefficients θin ,0 , n = 1, · · · , q, generate the numbers cj,0 , jτ ∈ Zd , and they can be a solution to the the q ? equations of interest. Let us also imagine that there is another solution, say θin ,1 , n = 1, · · · , q, generating cj,1 , jτ ∈ Zd , for which it holds that X ρj,0 cj−in ,1 = −cin ,1 , n = 1, · · · , q, jτ ∈F, j6=0 X ρj,0 cj−(in −im ),1 = −cim −in ,1 , n, m = 1, · · · , q, n > m. jτ ∈F , j6=0 19 On the other hand, the general Yule-Walker equations for this solution imply that X jτ ∈F, X jτ ∈F , ρj,1 cj−in ,1 = −cin ,1 , n = 1, · · · , q, j6=0 ρj,1 cj−(in −im ),1 = −cim −in ,1 , n, m = 1, · · · , q, n > m. j6=0 Thus, we may derive the q ? linear equations with q ∗ unknowns X (ρj,0 − ρj,1 ) cj−in ,1 = 0, n = 1, · · · , q, jτ ∈F, j6=0 X jτ ∈F , (ρj,0 − ρj,1 ) cj−(in −im ),1 = 0, n, m = 1, · · · , q, n > m, j6=0 with a unique solution, since the auto-covariances cj,1 , jτ ∈ Zd , refer to a (weakly) stationary process, say {X1 (v)}. Indeed, it holds then that the variance matrix ? [cln +lm ,1 + cln −lm ,1 ]qn,m=1 = C1 τ , l ∈ F+ , n = 1, · · · , q ? , 2 n with ? C1 = [Cov(X1 (v + ln ) + X1 (v − ln ), X1 (v + lm ) + X1 (v − lm )]qn,m=1 . has an inverse, and there is a unique solution to our equations ρj,1 = ρj,0 , jτ ∈ F . 20 A.2. Proof of Theorem 1 We can re-write (11) as X X ĉj Y (v + in − j)Y (v)/N ∗ vτ ∈S ∗ jτ −iτn ∈Fv X = ĉj jτ −iτn ∈Zd X − X Y (v + in − j)Y (v)/N ∗ vτ ∈S ∗ X ĉj Y (v + in − j)Y (v)/N ∗ = 0, n = 1, · · · , q. (22) vτ ∈S ∗ jτ −iτn ∈F / v Under the assumption that {ε(v)} is a sequence of independent and identically distributed random variables, we can derive as N → ∞ that X P Y (v + in − j)Y (v)/N ∗ −→ E{Y (v + in − j)Y (v)} = σ 2 γj−in ,0 , vτ ∈S ∗ according to Proposition 6.3.10 or the Weak Law of Large Numbers and Proposition 7.3.5 of Brockwell and Davis (4), which can be extended to include the cases d ≥ 2. Then for n = 1, · · · , q, for the first of two terms in (22), we can write X ĉj X Y (v + in − j)Y (v)/N ∗ − σ 2 vτ ∈S ∗ jτ −iτn ∈Zd X P ĉj γj−in ,0 −→ 0 jτ −iτn ∈Zd or X ĉj jτ −iτn ∈Zd X Y (v + in − j)Y (v)/N ∗ − σ 2 vτ ∈S ∗ X P ĉj γj−in ,0 −→ 0. jτ −iτn ∈F For the second term, we may write E| X X ĉj Y (v + in − j)Y (v)/N ∗ | vτ ∈S ∗ jτ −iτn ∈F / v ≤ X X E|ĉj Y (v + in − j)Y (v)|/N ∗ vτ ∈S ∗ jτ −iτn ∈F / v ≤ X X E(ĉ2j )1/2 E(Y (v + in − j)2 Y (v)2 )1/2 /N ∗ vτ ∈S ∗ jτ −iτn ∈F / v = E(Y (v)2 ) X X E(ĉ2j )1/2 /N ∗ , vτ ∈S ∗ jτ −iτn ∈F / v 21 due to the Cauchy-Schwartz inequality and the independence of the random variables Y (v), Y (v − j), jτ ∈ / F. Now for any θ ∈ Θ, it holds that cj ≡ cj (θ) is the corresponding auto-covariance function of a causal auto-regression. This guarantees that the auto-covariance function decays at an exponential rate and we can find constants C(θ) > 0 and α(θ) ∈ (0, 1), such that we can write Pd cj (θ)2 ≤ C(θ) α(θ) k=1 |jk | , j = (j1 , · · · , jd ). Similarly for the estimator θ̂, we can write Pd ĉ2j ≤ C(θ̂) α(θ̂) ≤ sup C(θ) α(θ) k=1 |jk | θ∈Θ ≤ sup C(θ) {sup α(θ)} θ∈Θ Pd k=1 |jk | θ∈Θ with probability 1 and, thus, E(ĉ2j ) ≤ sup C(θ) {sup α(θ)} θ∈Θ Pd k=1 |jk | . θ∈Θ For the case of observations on a hyper-rectangle, if (C2)(ii) holds, we can easily verify that X X E(ĉ2j ) = O(N (d−1)/d ). vτ ∈S ∗ jτ −iτn ∈F / v For example, we can see the arguments of Yao and Brockwell (9) when d = 2. In general, we can write that X X E(ĉ2j )/N ∗ → 0 vτ ∈S ∗ jτ −iτn ∈F / v and that X X P ĉj Y (v + in − j)Y (v)/N ∗ −→ 0, vτ ∈S ∗ jτ −iτn ∈F / v 22 as (C2)(i) holds. After combining the two results for the terms of (22), we may write that X X ĉj Y (v + in − j)Y (v)/N ∗ − σ 2 vτ ∈S ∗ jτ −iτn ∈Fv X P ĉj γj−in ,0 −→ 0, n = 1, · · · , q, jτ −iτn ∈F where the first term has been defined to be equal to 0. Thus, X P ĉj γj−in ,0 −→ 0 jτ −iτn ∈F exactly like the theoretical analogue X cj,0 γj−in ,0 = 0 jτ −iτn ∈F would imply. Since we have used q ? instead of q equations, there is a unique solution θ 0 , according to Proposition 1, and P θ̂ −→ θ 0 as N → ∞ and (C2)(i) holds. Finally, the consistency for θ̂ implies, according to (16), that P σ̂ 2 −→ σ 2 X cj,0 γj,0 = σ 2 , (23) jτ ∈F since X cj,0 γj,0 = 1. jτ ∈F A.3. Proof of Proposition 2 According to (15), for the (n, m)-th element of J/N , it suffices to look at X {c0 (B)( θ0 (B)−1 HY (v + in − im ) vτ ∈S ∗ + θ0 (B−1 )−1 HY (v + in + im ) )}Y (v)/N + oP (1), n, m = 1, · · · , q, 23 (24) where the last term tends to 0 in probability, thanks to the consistency of the estimators from the use of all the q ? equations. If we define the polynomial d0 (z) ≡ θ0 (z)−1 c0 (z) = θ0 (z)−1 c0 (z−1 ) ≡ X di,0 zi iτ ∈Zd then, for the second term in (24), we can write X (d0 (B−1 )HY (v + in + im )) Y (v)/N vτ ∈S ∗ X = (d0 (B−1 )Y (v + in + im )) Y (v)/N + oP (1). vτ ∈S ∗ This comes straight from the fact that E|1/N X vτ ∈S ∗ X ≤ 1/N X { di,0 Y (v + in + im + i)}Y (v)| −iτn −iτm −iτ ∈F / v X |di,0 | E|Y (v + in + im + i)Y (v)| vτ ∈S ∗ −iτn −iτm −iτ ∈F / v = (E|Y (v|)2 1/N X X vτ ∈S ∗ −iτn −iτm −iτ ∈F / v |di,0 | → 0, as N → ∞ and (C2)(i) holds. The limit comes from the same argument, as for the proof of the consistency for the estimators. For example, if (C2)(ii) is true, we can write P P vτ ∈S ∗ iτ ∈F / v |di,0 | = O(N (d−1)/d ), since for any iτ = (i1 , · · · , id )τ ∈ Zd , it holds that |di,0 | ≤ Cα Pd k=1 |ik | for constants C > 0 and α ∈ (0, 1). Similar action might be taken for the first term in (24). For the auto-regression {X(v)}, as it was defined in (4), we can see immediately that Y (v) is uncorrelated to X(v + i), i > 0, since the latter is a linear function of ‘future’ error terms only. In general, we can write according to (6) that E(Y (v)X(v − i)) = X γj,0 E(X(v − j)X(v − i)) = σ 2 jτ ∈F X jτ ∈F 24 γj,0 cj−i,0 , which brings us back to the general Yule-Walker equations. Thus, E(Y (v)X(v)) = σ 2 , E(Y (v)X(v − i)) = 0, i 6= 0, and Y (v) is uncorrelated to X(v − i) for any i 6= 0. As a result, it holds for n, m = 1, · · · , q, that E((θ0 (B−1 )−1 c0 (B)Y (v + in + im )) Y (v)) = E((θ0 (B−1 )−1 X(v + in + im ))Y (v)) = 0 and that E((θ0 (B)−1 c0 (B)Y (v + in − im )) Y (v)) = E((θ0 (B)−1 X(v + in − im ))Y (v)) = σ 2 Θin −im ,0 , n ≥ m. The proof is completed when we see that both Y (v) and X(v) are linear functions of members from the sequence {ε(v)} and, thus, X (θ0 (B−1 )−1 c0 (B)Y (v + in + im )) Y (v)/N vτ ∈S ∗ P −→ E((θ0 (B−1 )−1 X(v + in + im )) Y (v)) and X (θ0 (B)−1 c0 (B)Y (v + in − im )) Y (v)/N vτ ∈S ∗ P −→ E((θ0 (B)−1 X(v + in − im ))Y (v)). 25 A.4. Proof of Theorem 2 First, we can write from (13), (17), (18) and for n = 1, · · · , q, N −1/2 = N −1/2 X X cj,0 HY (v + in − j) Y (v) vτ ∈S τ jτ ∈Zd vτ ∈S τ jτ ∈Zd X X cj,0 Y (v + in − j) Y (v) + oP (1). The convergence in probability to zero of the remainder might be justified by the fact that its expected value is equal to zero, as we explained in the beginning of Section 3, and that its variance is equal to Var(N −1/2 X X cj,0 Y (v + in − j) Y (v) ≡ Var( vτ ∈S ∗ jτ −iτn ∈F / v X ũn (v))/N, vτ ∈S ∗ where ũn (v) ≡ X cj,0 Y (v + in − j) Y (v), n = 1, · · · , q, jτ −iτn ∈F / v for vτ ∈ Zd and for given sampling set S. First, we see that when {ε(v)} are independent and identically distributed, then X E(ũn (v)2 ) = E(Y (v)2 ) E(( cj,0 Y (v + in − j))2 ) jτ −iτn ∈F / v without the assumption of a finite third or fourth moment. Under (C2)(ii), we can write that X vτ ∈S ∗ Var(ũn (v)) = X E(ũn (v)2 ) = O(N (d−1)/d ) vτ ∈S ∗ and a similar argument can be written for the cross-terms due to the CauchySchwartz inequality. For the case d = 2 and observations on a rectangle, we may find a justification for that in Yao and Brockwell (9). Thus, we can write Var(N −1/2 X vτ ∈S ∗ 26 ũn (v)) → 0, as N → ∞ and (C2) holds, which guarantees the convergence in probability to 0. We can now re-write (17) as N 1/2 (θ̂ − θ 0 ) = (J/N )−1 (N −1/2 X U(v)) + oP (1), (25) vτ ∈S ∗ where X(v + i1 ) .. U(v) ≡ . X(v + iq ) Y (v), vτ ∈ Zd . It holds that Y (v) is a linear function of ε(v − i), i = 0, i1 , · · · , iq , and X(v) is a function of ε(v + i), i ≥ 0. Then for n, m = 1, · · · , q, we can write that E(X(v + in )Y (v)X(v + im + j)Y (v + j)) = E(E(X(v + in )Y (v)X(v + im + j)Y (v + j) | ε(v + im + j + i), i ≥ 0)) = E(Y (v)Y (v + j)) E(X(v + in )X(v + im + j)) = σ 4 γj,0 cj+im −in ,0 for any j ≥ 0. Thus, for X(v) ≡ (X(v + i1 ), · · · , X(v + iq ))τ , Cj,0 ≡ E(X(v)X(v + j)τ )/σ 2 , we can write that E(U(v)) = 0 and that Cov(U(v), U(v + j)) = σ 4 γj,0 Cj,0 , j ≥ 0. Now, for any positive integer K, we define the set BK ≡ {(i1 , i2 , · · · , id )τ : i1 = 1, · · · , K, ik = 0, ±1, · · · , ±K, k = 2, · · · , d} ∪ {(0, i2 , · · · , id )τ : j2 = 1, · · · , K, ik = 0, ±1, · · · , ±K, k = 3, · · · , d} ∪ · · · ∪ {(0, 0, · · · , id )τ : id = 1, · · · , K}. 27 According to the MA(∞) representation of X(v), we also define for fixed K the new process {X (K) (v)} from X (K) (v) ≡ ε(v) + X Θi,0 ε(v + i). iτ ∈BK Similarly, we define X (K) (v + i1 ) .. (K) U (v) ≡ . X (K) (v + iq ) Y (v), vτ ∈ Zd , and X(K) (v) ≡ (X (K) (v + i1 ), · · · , X (K) (v + iq ))τ , (K) Cj,0 ≡ E(X(K) (v)X(K) (v + j)τ )/σ 2 . Then, for the same reasons as before, it holds that E(U(K) (v)) = 0 and that (K) Cov(U(K) (v), U(K) (v + j)) = σ 4 γj,0 Cj,0 , j ≥ 0. For any vector λ ∈ Rq , it holds that {λτ U(K) (v)} is a strictly stationary and K ∗ -dependent process, for a positive and finite integer K ∗ . The definition of Kdependent processes, as well as a theorem for the asymptotic normality for strictly stationary and K-dependent processes on Zd might be given similarly to the onedimensional case in Brockwell and Davis (4). Then, we may write that N −1/2 X D λτ U(K) (v)−→ N (0, σ 4 λτ MK λ), vτ ∈S ∗ as N → ∞ and under (C2), where (K) MK ≡ γ0,0 C0,0 + X jτ ∈F , j>0 28 (K) (K)τ γj,0 (Cj,0 + Cj,0 ). Similarly, if we define X M ≡ γ0,0 C0,0 + γj,0 (Cj,0 + Cτj,0 ), (26) jτ ∈F, j>0 then it holds as K → ∞ λτ MK λ → λτ M λ. Using Chebychev’s inequality, we may verify that P (|N −1/2 X λτ U(v) − N −1/2 vτ ∈S ∗ X λτ U(K) (v)| > ²) vτ ∈S ∗ ≤ (1/²2 ) (N ∗ /N ) λτ Var(U(v) − U(K) (v)) λ → 0 as K → ∞ and, thus, it holds that N −1/2 X D λτ U(v)−→ N (0, σ 4 λτ M λ), vτ ∈S ∗ or N −1/2 X D U(v)−→ N (0, σ 4 M), (27) vτ ∈S ∗ as N → ∞ and under (C2). According to (26), the (n, m)-th element of M will be equal to X γ0,0 c0+im −in ,0 + (γj,0 cj+im −in ,0 + γj,0 cj+in −im ,0 ) jτ ∈F ,j>0 X = γ0,0 c0+im −in ,0 + γj,0 cj+im −in ,0 + jτ ∈F ,j>0 X = γ0,0 c0+im −in ,0 + = X X γj,0 c−j+in −im ,0 jτ ∈F,j<0 γj,0 cj+im −in ,0 + jτ ∈F ,j>0 X γj,0 cj+im −in ,0 jτ ∈F,j<0 γj,0 cj+im −in ,0 jτ ∈F which brings us back to the general Yule-Walker equations and M = σ 4 Iq , 29 with Iq the identity matrix of order q. We may re-write (27) X N −1/2 D U(v)−→ N (0, σ 4 Iq ), (28) vτ ∈S ∗ as N → ∞ and under (C2). If we combine (25), (28) and Proposition 2, we can write that D N 1/2 (θ̂ − θ 0 )−→ N (0, (Θτ0 Θ0 )−1 ), as N → ∞ and under (C2). The proof will be completed when we show that Θτ0 Θ0 = Wq∗ . Indeed, we may let the vector W = (W (−i1 ), · · · , W (−iq ))τ and then write ξ = Θτ0 W + R where R is a (q × 1) random vector that is independent of W, since it is a linear function of W (−i1 − i), i > 0, i 6= i2 − i1 , · · · , iq − i1 . The required result might be obtained then, since Var(W | W (−i1 − i), i > 0, i 6= i2 − i1 , · · · , iq − i1 ) = Var(W) = Iq . Acknowledgements Part of the PhD thesis titled ‘Statistical Inference for Spatial and Spatio-Temporal Processes’, which has been submitted for the University of London. The PhD studies were funded by the Leverhulme Trust. 30 References [1] Anderson, B. D. O. and Jury, E. I. (1974). Stability of Multidimensional Digital Filters. IEEE Trans. Circuits Syst. I Regul. Pap. 21 300–304. [2] Besag, J. (1974). Spatial Interaction and the Statistical Analysis of Lattice Systems (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 36 192–236. [3] Besag, J. (1975). Statistical Analysis of Non-Lattice Data. Statistician 24 179– 195. [4] Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed. Springer–Verlag, New York. [5] Dahlhaus, R. and Künsch, H. (1987). Edge Effects and Efficient Parameter Estimation for Stationary Random Fields. Biometrika 74 877–882. [6] Guyon, X. (1982). Parameter Estimation for a Stationary Process on a dDimensional Lattice. Biometrika 69 95–105. [7] Whittle, P. (1954). On Stationary Processes in the Plane. Biometrika 41 434–449. [8] Yao, Q. and Brockwell, P. J. (2006). Gaussian Maximum Likelihood Estimation for ARMA models I: Time Series. J. Time Ser. Anal. 27 857–875. [9] Yao, Q. and Brockwell, P. J. (2006). Gaussian Maximum Likelihood Estimation for ARMA models II: Spatial Processes. Bernoulli 12 403–429. 31