Improved Likelihood-Based Inference for the MA(1) Model Fang Chang Marie Rekkas Augustine Wong

advertisement
Improved Likelihood-Based Inference for the MA(1) Model
Fang Chang∗
Marie Rekkas†
Augustine Wong‡
Abstract
An improved likelihood-based method is proposed to test for the significance of the firstorder moving average model. Compared with commonly used tests which depend on the asymptotic properties of the maximum likelihood estimate and the likelihood ratio statistic, the proposed method has remarkable accuracy. Application of the method to two historical data
sets is presented to demonstrate the implementation of the method. Simulation studies are
subsequently performed to illustrate the accuracy of the method compared to the traditional
methods. Additionally, a simple and effective bias correction is used to deal with a boundary
problem.
Keywords: Moving average model; Likelihood analysis; p-value
∗ Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J
1P3, email: changf@yorku.ca
† Corresponding Author. Department of Economics, Simon Fraser University, 8888 University Drive, Burnaby,
British Columbia, Canada V5A 1S6, email: mrekkas@sfu.ca, phone: (778) 782-6793, fax: (778) 782-5944
‡ Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J
1P3, email: august@yorku.ca
1
1
Introduction
Consider the model
Xt = µ + σvt , t = 1, 2, · · · , n
(1)
where the error term vt is specified as a first-order moving average process
vt = εt + αεt−1 .
The term σ is a scaling factor, the error terms vt , t = 1, · · · , n are independent standard normal
errors, and α is the weight of the error term εt−1 contributing to vt with |α| < 1. This time series
model is referred to as an MA(1) process.
Given the error process specified for vt , and if we let σs = Cov(vt , vt−s ), s ≥ 0, then we have
σ0 = 1 + α2 , σ1 = α, σs = 0, s ≥ 2.
In contrast to an autocorrelated model specification where all error terms are correlated, the MA(1)
model allows for correlation between only the first and second error terms. This feature of the MA(1)
model makes it very attractive in certain situations where precisely this type of correlation modeling
is instructive.
Given these results, we are able to compute the mean and variance of Xt as follows:
E(Xt ) = E(µ + σεt + σαεt−1 ) = µ
and
E(Xt − µ)2 = E(σεt + σαεt−1 )2 = 1 + α2 .
And in a similar fashion, we can compute the autocovariances:
E(Xt − µ)(Xt−1 − µ) = E(σεt + σαεt−1 )(σεt−1 + σαεt−2 ) = α
and
E(Xt − µ)(Xt−2 − µ) = 0.
2
Note that in fact for all other covariances,
E(Xt − µ)(Xt−s − µ) = 0.
As the mean and covariances do not depend on time, the MA(1) process is a stationary process
regardless of the value of α.
The MA(1) model can be also presented in multivariate form. Denote 1 as a size n column
e
vector with all elements being 1, then we have
X = µ + σv ,
e
e
e
where
0
X = (X1 , X2 , · · · , Xn ) ,
e
0
µ = (µ, µ, · · · , µ) = µ1.
e
e
0
The vector v = (v1 , v2 , · · · , vn ) is a random vector distributed as an n-dimensional normal distrie
bution with mean 0 and variance covariance matrix
Σ = (ωij ), ωij = σ|i−j| , i, j = 1, 2 · · · , n.
If we are interested in testing the parameter α, there are two commonly used statistics: the
maximum likelihood estimate (MLE) and the likelihood ratio. The asymptotic distributions of
these statistics were first investigated by Wald (1943) and Wilks (1938). Further studies surrounding
these statistics can also be found in Shao (2003). The p-value functions for this parameter (or any
other scalar parameter from the model) can be approximated using the results in Wald (1943)
and Wilks (1938). Note that these approximations are termed first-order methods as the rates of
convergence are O(n−1/2 ). It is noted that the approximation based on the MLE is not invariant
to reparameterization, in contrast to the approximation based on the likelihood ratio statistic. In
addition to these two first-order methods, Wold (1949) proposed a test statistic which follows a χ2
distribution for testing serial coefficients in a large sample context, the method however is rather
intricate and will not be used in this paper.
In this paper, we propose a likelihood-based method for inference concerning any scalar parameter of interest of the MA(1) model. Theoretically, this proposed method is a third-order method,
3
which means that the rate of convergence is O(n−3/2 ). Third-order inference concerning the significance of the lags of AR(1) and AR(2) models has been examined in Rekkas, Sun and Wong
(2008) and Chang and Wong (2010), respectively. Their proposed methods are extremely accurate
in small and medium sample size cases.
In Section 2, the likelihood-based method is introduced for obtaining the p-value function of any
scalar parameter of interest from a general model. In Section 3, the method is applied to the MA(1)
model to obtain a p-value function for the primary parameter of interest α. We also implement in
this section a simple and effective bias correction that can be used when the MLE hits its boundary.
Numerical examples, including two real-life examples and three simulation studies, are presented in
Section 4 to illustrate the implementation and accuracy of the proposed method. Some concluding
remarks are given in Section 5.
2
An overview of a likelihood-based third-order method
Let x = (x1 , . . . , xn ) be a random sample from a canonical exponential family model with loglikelihood function
`(θ) = `(θ; x) = ψt + λs + k(θ),
where θ = (ψ, λ0 )0 is a p-dimensional canonical parameter, ψ is a scalar parameter of interest, λ
is the (p − 1)-dimensional nuisance parameter, and k(θ) is known function of θ. Notice that ψ is
a component parameter of dimension 1 of the canonical parameter. In this model, the statistic
(t, s) = (t(x), s(x))0 is a minimal sufficient statistic. Let θ̂ = (ψ̂, λ̂0 )0 be the overall maximum
likelihood estimate of θ, which satisfies
∂`(θ) `θ (θ̂) =
= 0.
∂θ θ=θ̂
The observed information matrix evaluated at θ̂ is


jψψ (θ̂) jψλ0 (θ̂)
∂ 2 `(θ) ,
jθθ0 (θ̂) = −
=
∂θ∂θ0 θ=θ̂
jψλ (θ̂) jλλ0 (θ̂)
−1
and jθθ
0 (θ̂) is an estimate of the variance-covariance matrix of θ̂. A commonly used test statistic,
(θ̂ − θ)0 jθθ0 (θ̂)(θ̂ − θ), proposed by Wald (1943) has an asymptotic Chi-square distribution with p
4
degrees of freedom. For the parameter of interest ψ, where θ = ψ, we have
h
i−1/2
q = (ψ̂ − ψ) vd
ar(ψ̂)
,
asymptotically distributed as the standard normal distribution. Fraser, Reid and Wong (1991)
applied the sequential saddlepoint method to obtain a version for q which takes into consideration
the elimination of the nuisance parameter λ,
(
q = (ψ̂ − ψ)
|jθθ0 (θ̂)|
)1/2
|jλλ0 (θ̂ψ )|
,
(2)
where θ̂ψ = (ψ, λ̂0ψ )0 is the constrained maximum likelihood estimate of θ for a given ψ satisfying
∂`(θ) `λ (θ̂ψ ) =
= 0,
∂λ θ=θ̂ψ
and jλλ0 (θ̂ψ ) is the nuisance information matrix evaluated at θ̂ψ . Note that q is directly affected
by a change in the parameterization. A familiar statistic for a scalar parameter of interest that
is invariant to reparameterization is the signed log-likelihood ratio statistic, whose asymptotic
distribution is derived in Wilks (1938) and takes the form
n h
io1/2
r = r(ψ) = sgn(ψ̂ − ψ) 2 `(θ̂) − `(θ̂ψ )
.
(3)
This statistic r is asymptotically distributed as the standard normal distribution. The p-value for
a fixed ψ, denoted as p(ψ), can then be approximated by either Φ(q) or Φ(r), where Φ(·) is the
cumulative distribution function of the standard normal distribution. Both approximations have
first-order accuracy but the one based on the signed log-likelihood ratio statistic is widely viewed as
more reliable as in Doganaksoy and Schmee (1993). The function p(ψ) is referred to as the p-value
function for ψ as it can provide the p-value for any given ψ.
To improve the accuracy of approximating the p-value function for ψ, Barndorff-Nielsen (1986,
1991) proposed
1
q
Φ(r∗ (ψ)) = Φ r + log
r
r
(4)
where q and r are defined in (2) and (3), respectively. Barndorff-Nielsen’s method adjusts the
signed log-likelihood ratio statistic such that the p-value function obtained from r∗ (ψ) is close to
5
the true p-value function. Fraser (1990) showed that this adjustment has third-order accuracy.
Now consider a the general exponential family model with canonical parameter ϕ = ϕ(θ) and
scalar parameter of interest ψ = ψ(θ). Since the signed log-likelihood ratio statistic, r = r(ψ),
is invariant to reparameterization, it remains unchanged and is defined as in (3). However, the
quantity q = q(ψ) must now be expressed in the canonical parameter, ϕ(θ), scale. Let ϕθ (θ) and
ϕλ (θ) be the derivatives of ϕ(θ) with respect to θ and λ, respectively. Denote ϕψ (θ) as the row of
ψ
ψ
ϕ−1
θ (θ) that corresponds to ψ, and let ||ϕ (θ)|| denote the Euclidean distance of the vector ϕ (θ).
Moreover, let
χ(θ) =
ϕψ (θ̂ψ )
||ϕψ (θ̂ψ )||
ϕ(θ)
(5)
be a rotated coordinate of ϕ(θ) that agrees with ψ(θ) at θ̂ψ . Then χ(θ̂) − χ(θ̂ψ ) measures the
maximum likelihood estimate departure, (ψ̂ − ψ), in ϕ(θ) scale.
Since `(θ) = `(ϕ) according to the reparameterization, then by chain rule differentiation, we
have
|jϕϕ0 (θ̂)| = |jθθ0 (θ̂)||ϕθ (θ̂)|−2
and |j(λλ0 ) (θ̂ψ )| = |jλλ0 (θ̂ψ )||ϕ0λ (θ̂ψ )ϕλ (θ̂ψ )|−1 .
As shown in Fraser, Reid and Wu (1999), an estimated variance for χ(θ̂) − χ(θ̂ψ ) in the ϕ(θ)
scale is
|j(λλ0 ) (θ̂ψ )|
|jϕϕ0 (θ̂)|
.
Thus q = q(ψ) in ϕ(θ) scale can be written as
(
q = q(ψ) = sgn(ψ̂ − ψ)|χ(θ̂) − χ(θ̂ψ )|
|jϕϕ0 (θ̂)|
)1/2
|j(λλ0 ) (θ̂ψ )|
.
(6)
A third-order p-value therefore p(ψ) can be obtained using (4) with r and q defined in (3) and (6)
respectively.
For a general model, Fraser and Reid (1995) showed that ϕ(θ) can be obtained by
d`(θ) ϕ0 (θ) =
V,
dy y0
(7)
where
−1 ∂y ∂z(θ, y)
∂z(θ, y) =
V =
∂θ0 (y0 ,θ̂)
∂y 0
∂θ0
(y 0 ,θ̂)
6
(8)
with z(θ, y) being an n-dimenstional pivotal quantity.
For the above asymptotic distributions to be valid, one imposed assumption is that the MLE
exists as an inner point of the parameter space. However, for a small sample with a non-diagonal
covariance structure, it is very likely that the likelihood function will be ill-behaved with a maxima
obtained at the boundary. The asymptotic behavior of the MLE and the likelihood ratio when
the true parameter is on the boundary has been studied in Shapiro (1985) and Self and Liang
(1987). Unfortunately, there is no specific form for the distributions of these two statistics and
they vary case by case. Self and Liang (1987) introduced several examples where the distribution of
the likelihood ratio statistic when the true parameter is at the boundary could be a mixture of χ2
distribution, yet the distribution is associated with the nuisance parameter. We therefore need to be
cautious in applying the above methods when the parameter space is bounded. In the next section
we use a simple and effective bias correction to treat the boundary in the case where the likelihood
function is monotone. This correction term was first derived by Galbraith and Zinde-Walsh (1994)
as the bias for the point estimation when approximating an MA(1) model by an AR(p) model. One
property of this correction term is that its sign is always opposite to that of the MLE, and this
controls the volume of the covariance matrix to be o(n).
3
Applying the proposed method to the MA(1) model
Under our model specification, the moving average process X follows a multivariate normal distrie
bution which belongs to the location-scale family model. In this context, Wong (1992) suggested
working with the logarithm of the scale parameter instead of the original scale parameter. The
log-likelihood function of the MA(1) model is
`(x; θ)
=
−n log σ −
1
1
1
log(1 − α2(n+1) ) + log(1 − α2 ) − e−2 log σ (X − µ)0 Σ−1 (X − µ),
2
2
2
e e
e e
where θ = (α, µ, log σ)0 under the recommendation by Wong (1992) and Σ−1 = A = {σ ij }. According to Shaman (1969), the elements of the precision matrix, A, take the following form
σ ij = (−α)j−i
(1 + α2 + · · · + α2(i−1) )(1 + α2 + · · · + α2(n−j) )
,
1 + α2 + · · · + α2n
7
j ≥ i.
Differentiating the log-likelihood gives
(n + 1)α2n+1
α
1
−
− e−2 log σ (X − µ)0 Aα (X − µ)
1 − α2n+2
1 − α2
2
e e
e e
`µ = e−2 log σ 10 A(X − µ)
e
e e
`log σ = −n + e−2 log σ (X − µ)0 A(X − µ),
e e
e e
`α =
where Aα (i = 1, 2) is the element-wise derivative of A with respective to α. The overall maximum
likelihood estimate of θ, θ̂, can be solved by equating the first-order derivatives to zero. However,
numerical optimization methods will be employed due to the lack of a closed-form expression for
the MLE.
Suppose the scalar parameter of interest is ψ = ψ(θ) = α. We are able to claim that the firstorder moving average structure for the error sequence is not significant if we fail to reject H0 : ψ = 0.
As defined in Section 2, θ̂ψ is the constrained maximum likelihood estimate of θ for a given ψ and
then r can be obtained from (3). Thus p(ψ) can be approximated by Φ(r) with first-order accuracy.
From the result in Section 2, the statistic
h
i−1/2
(ψ̂ − ψ) vd
ar(ψ̂)
(9)
is asymptotically distributed as standard normal where vd
ar(ψ̂) is approximated by the Delta method
based on jθθ0 (θ̂), the observed information matrix evaluated at θ̂.
Throughout this paper we assume the inversion operator is prior to the differentiation operator.
0
Using the Cholesky decomposition, Σ−1 = (L−1 ) L−1 , where L is a lower triangular matrix obtained
0
from the decomposition Σ = LL , a pivotal quantity is readily available for the MA(1) model
L−1 (X − µ)
e e
z(θ, X ) =
σ
e
(10)
where L−1 = {lij } and
l
ii
=
qP
α2k
qP
α2k
i−1
k=0
i
k=0
lij
=
√
1 − α2i
=√
1 − α2i+2
(−α)i−j (1 − α2j )
p
(1 − α2i+2 )(1 − α2i )
, i > j.
The detailed derivation of z, L and L−1 is given in the Appendix.
8
From (7) and (8), the canonical parameters are ϕ0 (θ) = (ϕ1 (θ), ϕ2 (θ), ϕ3 (θ)), with
ϕ1 (θ)
=
ϕ2 (θ)
=
ϕ3 (θ)
=
−e−2 log σ (X − µ)0 AL̂L̂−1
α (X − µ̂),
e e
e e
e−2 log σ (X − µ)0 A,
e e
e−2 log σ (X − µ)0 A(X − µ̂),
e e
e e
−1
where L̂−1
with respect to α evaluated at α̂. Moreover, the
α is the element-wise derivative of L
first-order derivative of the canonical parameters with respect to θ is
0
∂ϕ1 ∂ϕ2 ∂ϕ3
,
,
ϕθ (θ) =
∂θ ∂θ ∂θ

0
0
0
−(X − µ)0 Aα L̂L̂−1
(X
−
µ̂
)
(X
−
µ
)
A
1
(X
−
µ
)
A
(X
−
µ̂
)
α
α
α

e e
e e
e e
e
e e
e e 

−2 log σ 
0
−1
0
0
= e


1 AL̂L̂α (X − µ̂)
−1 A1
−1 A(X − µ̂)


e
e e
e e
e
e e
0
0
2(X − µ)0 AL̂L̂−1
(X
−
µ̂
)
−2(X
−
µ
)
A1
−2(X
−
µ
)
A(X
−
µ̂
)
α
e e
e e
e e e
e e
e e
and hence


0
−1
(X
−
µ̂
)
(X
−
µ̂
)
2(X
−
µ
)
A
L̂
L̂
10 AL̂L̂−1
α
α
 e
e e 
e e
e e


0
0
ϕλ (θ) = e−2 log σ 

−1 A1
−2(X − µ) A1


e e
e e e
−10 A(X − µ̂)
−2(X − µ)0 A(X − µ̂)
e
e e
e e
e e
which stands for the derivatives of canonical parameters with respect to the nuisance parameters λ
by deleting the first column of ϕθ (θ).
−1
Denote ϕψ (θ) be the corresponding row to ψ in ϕ−1
θ (θ), which is the first row of ϕθ (θ). Hence
χ(θ) can be obtained from (5). Furthermore, we have

−`αα
−`αµ


jθθ0 (θ) =  −`αµ
−`µµ

−`α log σ −`µ log σ
−`α log σ
−`µ log σ
−`log σ log σ
and

jλλ0 (θ) = 
−`µµ
−`µ log σ
−`µ log σ
−`log σ log σ
9

,



,

where
(n + 1)(2n + 1)α2n + (n + 1)α4n+2
1 + α2
1
−
− e−2 log σ (X − µ)0 Aαα (X − µ)
(1 − α2n+2 )2
(1 − α2 )2
2
e e
e e
`αµ = e−2 log σ 10 Aα (X − µ)
e
e e
−2 log σ
`α log σ = e
(X − µ)0 Aα (X − µ)
e e
e e
`µµ = −e−2 log σ 10 A1
e e
`µ log σ = −2e−2 log σ 10 A(X − µ)
e
e e
`log σ log σ = −2e−2 log σ (X − µ)0 A(X − µ)
e e
e e
`αα =
with Aαα being the element-wise second derivative of A with respect to α. Hence q can be obtained
from (6) and p(ψ) can then be approximated with third-order accuracy by (4). Finally, a central
(1 − α) × 100% confidence interval for ψ is
min{p−1 (α/2), p−1 (1 − α/2)}, max{p−1 (α/2), p−1 (1 − α/2)} .
(11)
For small- or medium-sized data with a moving average structure, the observed likelihood function could be monotone; that is the MLE may be obtained at the boundary. If so, the Monte Carlo
simulation results would not reflect the true behavior of the methods due to a large proportion
of monotone observed likelihood functions. In order to improve the result in such a situation, we
modify the MLE by a correction function
B(x, k) = x −
x2k+1
.
1 + x2 + · · · + x2k
(12)
This function is taken from Galbraith and Zinde-Walsh (1994). This correction can be applied
as a bias correction to prevent the boundary problem for the MLE. As k increases, the function
monotonically increases for any specific positive x and monotonically decreases for any specific
negative x. Given that x ∈ (−1, 1), the limit of B(x, k) is x and the larger the quantity of x, the
slower B(x, k) converges. We can see from the parametrization, the MLE of the location parameter
µ and scale parameter σ depends on the MLE of α. It is adequate to just correct the MLE of α
since it is the only parameter that has a bound, and the rest can be adjusted accordingly. For
inference on α, we opt to choose a large k(≥ 20), therefore, the correction is only turned on for
a MLE obtained at the boundary. And for inference on a parameter other than α, we need to be
cautious in the choice of k. Since the MLE is biased, correcting α does not necessarily correct the
10
other, however, the other parameters can be corrected through correcting α by an appropriate k.
4
Numerical studies
We first apply the methods discussed in the paper to two real life data sets and obtain the p-value
curves as well as 90% confidence intervals for the various methods. The simulation studies are
carried out to compare the methods discussed in this paper. Note that inference on (α, µ, σ)0 can
be easily obtained. We use the following abbreviations to represent the three methods discussed
in Section 3: “WALD” to represent the Wald test given in (2), “r” to represent the signed loglikelihood ratio statistic in (3), and “BN” to represent the third-order method based on equation
(4).
4.1
4.1.1
Application to historical data sets
I.C.I. closing prices data
This data set records the Imperial Chemical Industries (I.C.I.) closing prices for 107 opening days
from August 25, 1972 to January 19, 1973 published in Anderson (1976). To facilitate inference,
we compress the I.C.I. closing price data by a scale of 100. An MA(1) model specified as in (1)
is applied to the re-scaled data. The overall MLE for θ = (α, µ, log σ)0 is obtained as θ̂ =
(0.7577, 2.8370, −2.8437)0 . Given α̂ is far from the boundary, the likelihood function is likely
well-behaved and the bias correction not necessary.
Figures 1-3 displays the p-value functions, p(α), p(µ), and p(σ), for the I.C.I. data calculated
for each of the three methods against α, µ and σ, respectively. The two horizontal lines in each of
the figures are the datum lines for upper and lower 0.05 levels. In Figure 1, the p-value functions
for α are displayed. From this figure it is clearly evident that the two first-order methods (WALD,
r) give indistinguishable results while the function based on the third-order method (BN) produces
results which vary significantly from these functions. This result is even more pronounced in Figure
3, where the p-value functions are given for the parameter σ. On the other hand, in Figure 2, all
three p-value functions for the location parameter µ are virtually coincident and thus produce very
similar results. To take a specific example from these figures, Table 1 reports the 90% confidence
intervals for each element of (α, µ, σ)0 obtained from each of the three methods. This table bears
the general results gleaned from the figures.
11
Figure 1: P -value function for α for I.C.I. data
1
WALD
r
BN
0.9
0.8
Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.65
0.7
0.75
0.8
Hypothesized value for α
0.85
Figure 2: P -value function for µ for I.C.I. data
1
WALD
r
LR
BN
0.9
0.8
Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
2.81 2.815
4.1.2
2.82
2.825
2.83 2.835 2.84 2.845
Hypothesized value for µ
2.85 2.855
2.86
Thermostat demand data
This data set records the weekly demand for thermostats for 65 straight weeks and is described
in Montgomery and Johnson (1976). To facilitate inference, we compress the data by a scale of
10. The MA(1) model specified in (1) is applied on the re-scaled data sets. The the overall MLE
for θ = (α, µ, log σ)0 is calculated as θ̂ = (0.0501, 23.5835, 1.6240)0 . Once again, α̂ is far from
the boundary, indicating that the likelihood function is well-behaved. The unadjusted results are
therefore reported.
12
Figure 3: P -value function for µ for I.C.I. data
1
WALD
r
BN
0.9
0.8
Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.05 0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068 0.07
Hypothesized value for σ
Table 1: 90% Central confidence intervals for α, µ and σ for I.C.I data
Method
α
µ
σ
WALD
(0.6469, 0.8685)
(2.8208, 2.8532)
(0.0520, 0.0651)
r
(0.6456, 0.8751)
(2.8207, 2.8534)
(0.0522, 0.0654)
BN
(0.6384, 0.8656)
(2.8206, 2.8534)
(0.0528, 0.0664)
Figure 4-6 display the p-value functions, p(α), p(µ) and p(σ), that have been produced calculated
from the above three methods against α, µ and σ, respectively. The two horizontal lines in each
of the figures are the datum lines for upper and lower 0.05 levels. The figures reveal results that
are analogous to those seen in the previous example. In Figure 1, which plots the p-value functions
for α, we once again see that the two first-order methods (WALD, r) produce functions which are
nearly coincident while the third-order method produces a quite different p-value function. This
similar pattern is repeated much stronger for the scale parameter σ in Figure 3. In terms of the
location parameter µ, Figure 2 reveals that the three methods produce almost identical p-values
for mid-range values of µ but discordant results in the tails. Table 2 reports the 90% confidence
intervals for each element of (α, µ, σ)0 obtained from the three methods. As predicted from the
figures, the discrepancy between the first- and third-order confidence intervals are largest for the
scale parameter.
13
Figure 4: P -value function for α for thermostat data
1
WALD
r
BN
0.9
0.8
Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
Hypothesized value for α
0.2
0.25
Figure 5: P -value function for µ for thermostat data
1
WALD
r
BN
0.9
0.8
Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
4.2
22.5
23
23.5
24
Hypothesized value for µ
24.5
Simulation studies
In this section, we perform simulations to assess the accuracy of the first-order methods and the
third-order methods. For each combination of the parameters 10,000 Monte Carlo replications are
performed. The proportions of the true ψ that fall inside of the left tail and right tail of the rejection
region are recorded respectively as “lower error”and “upper error”. Additionally, the proportion of
the true ψ that falls outside of the rejection region is refereed to as “central coverage”. We also
14
Figure 6: P -value function for µ for thermostat data
1
WALD
r
BN
0.9
0.8
Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
4.4
4.6
4.8
5
5.2
5.4
5.6
Hypothesized value for σ
5.8
6
Table 2: 90% Central confidence intervals for α, µ and σ for thermostat data
Method
α
µ
σ
WALD
(−0.1406, 0.2409)
(22.4968, 24.6702)
(4.3922, 5.8610)
r
(−0.1378, 0.2414)
(22.4624, 24.6930)
(4.4212, 5.9040)
BN
(−0.1272, 0.2549)
(22.4246, 24.7252)
(4.5024, 6.0480)
introduce the term “average bias” which, for a 95% confidence interval, is defined as
|Lower error − 0.025| + |Upper error − 0.025|
.
2
In the following result tables, the nominal values we consider for the lower error, the upper error,
central coverage and average bias are: 0.025, 0.025, 0.95 and 0, respectively.
For the first simulation, we consider the model
yt = σ(εt + αεt−1 ), t = 1, 2, · · · n,
where the εt are independent normal errors and ε0 is initialized to be 0. Simulation results are
recorded in Table 3 for various combinations of α and log σ for a fixed n equal to 50. It is clear
that the widely accepted WALD method is not satisfactory for inference for either α or σ given the
low central coverage and high average bias. Overall, it appears as though the WALD has a better
15
central coverage for σ compared to α. The signed log-likelihood ratio method (r) produces better
central coverage than the WALD but both methods suffer from asymmetric errors. The third-order
method (BN) gives overall excellent results.
Table 3: Simulation results for Model 1 (n = 50)
WALD
r
BN
ψ=α
Lower
Error
0.0890
0.0410
0.0308
Upper
Error
0.0100
0.0192
0.0240
Central
Coverage
0.9010
0.9398
0.9452
Average
Bias
0.0395
0.0109
0.0034
ψ=σ
Lower
Error
0.0527
0.0404
0.0378
Upper
Error
0.0095
0.0135
0.0212
Central
Coverage
0.9378
0.9461
0.9410
Average
Bias
0.0216
0.0135
0.0083
(−0.2, log(1))
WALD
r
BN
0.0455
0.0302
0.0253
0.0259
0.0255
0.0252
0.9286
0.9443
0.9495
0.0107
0.0028
0.0002
0.0519
0.0399
0.0273
0.0117
0.0154
0.0243
0.9364
0.9447
0.9484
0.0201
0.0122
0.0015
(−0.2, log(10))
WALD
r
BN
0.0480
0.0317
0.0265
0.0267
0.0259
0.0269
0.9253
0.9424
0.9466
0.0124
0.0038
0.0017
0.0494
0.0391
0.0263
0.0109
0.0166
0.0249
0.9397
0.9443
0.9488
0.0193
0.0113
0.0007
(0, log(1))
WALD
r
BN
0.0340
0.0276
0.0255
0.0334
0.0259
0.0246
0.9326
0.9465
0.9499
0.0087
0.0017
0.0004
0.0501
0.0391
0.0255
0.0114
0.0164
0.0251
0.9385
0.9445
0.9494
0.0193
0.0114
0.0003
(0.2, log(1))
WALD
r
BN
0.0268
0.0246
0.0238
0.0470
0.0307
0.0267
0.9262
0.9447
0.9495
0.0119
0.0031
0.0014
0.0502
0.0390
0.0267
0.0119
0.0172
0.0255
0.9379
0.9438
0.9478
0.0192
0.0109
0.0011
(0.2, log(10))
WALD
r
BN
0.0271
0.0258
0.0257
0.0471
0.0299
0.0268
0.9258
0.9443
0.9475
0.0121
0.0028
0.0012
0.0504
0.0408
0.0284
0.0128
0.0177
0.0253
0.9368
0.9415
0.9463
0.0188
0.0116
0.0018
(0.6, log(1))
WALD
r
BN
0.0136
0.0223
0.0271
0.0869
0.0390
0.0310
0.8995
0.9387
0.9419
0.0367
0.0083
0.0040
0.0522
0.0402
0.0373
0.0098
0.0129
0.0214
0.9380
0.9469
0.9413
0.0212
0.0136
0.0080
θ
Method
(−0.6, log(1))
For the second simulation, consider the model presented in (1), where once again ε0 is set to 0.
The sample size n is set equal to 60. The results from this simulation are recorded in Table 4. The
results show a similar pattern as the results in Table 3. The WALD method produces the poorest
results, followed by the likelihood-ratio method (r). The superior performance of the third-order
method (BN) is clearly evidenced in this table.
The performance of the bias correction given in (12) is presented in Tables 5 and 6 for each of
the two simulations. When the true value of α is close to the boundary, one side behaves well in
terms of the lower or upper errors yet it affects the other side which is also associated with the
central coverage. To obtain the Table 5 results, the values 60 and 5 for k to adjust α are chosen to
test α and σ, respectively. Additionally, to obtain the Table 6 results, the values 25 and 5 for k to
adjust α are chosen to test α and σ, respectively. The adjusted simulation results demonstrate the
advantage of the correction which largely makes the confidence interval more symmetric and closer
16
to the nominal level.
17
18
Method
WALD
r
BN
WALD
r
BN
WALD
r
BN
WALD
r
BN
WALD
r
BN
WALD
r
BN
WALD
r
BN
WALD
r
BN
WALD
r
BN
θ
(−0.4, 0, log(1))
(−0.4, 5, log(10))
(0, 5, log(5))
(0, 0, log(1))
(0, −5, log(5))
(0.4, 0, log(1))
(0.4, 5, log(10))
(0.65, −5, log(5))
(0.65, 5, log(10))
0.0142
0.0262
0.0250
0.0137
0.0273
0.0260
0.0250
0.0312
0.0258
0.0239
0.0295
0.0241
0.0478
0.0402
0.0260
0.0478
0.0402
0.0245
0.0468
0.0387
0.0241
0.0878
0.0607
0.0290
ψ=α
Lower
Error
0.0931
0.0625
0.0283
0.0817
0.0354
0.0283
0.0836
0.0366
0.0287
0.0476
0.0266
0.0264
0.0493
0.0285
0.0273
0.0263
0.0209
0.0253
0.0272
0.0213
0.0258
0.0242
0.0187
0.0243
0.0133
0.0169
0.0265
Upper
Error
0.0149
0.0171
0.0266
0.9041
0.9384
0.9467
0.9027
0.9361
0.9453
0.9274
0.9422
0.9478
0.9268
0.9420
0.9486
0.9259
0.9389
0.9487
0.9250
0.9385
0.9497
0.9290
0.9426
0.9516
0.8989
0.9224
0.9445
Central
Coverage
0.8920
0.9204
0.9451
0.0337
0.0058
0.0016
0.0349
0.0069
0.0023
0.0113
0.0039
0.0011
0.0127
0.0040
0.0016
0.0121
0.0097
0.0006
0.0125
0.0095
0.0006
0.0113
0.0100
0.0008
0.0372
0.0219
0.0027
Average
Bias
0.0391
0.0227
0.0024
0.0305
0.9472
0.0264
0.0324
0.0287
0.0271
0.0328
0.0293
0.0260
0.0337
0.0296
0.0263
0.0408
0.0324
0.0270
0.0391
0.0324
0.0273
0.0373
0.0302
0.0250
0.0539
0.0169
0.0265
ψ=µ
Lower
Error
0.0539
0.0351
0.0272
0.0275
0.0282
0.0231
0.0291
0.0270
0.0255
0.0321
0.0279
0.0247
0.0325
0.0287
0.0256
0.0372
0.0312
0.0270
0.0387
0.0321
0.0265
0.0377
0.0289
0.0244
0.0522
0.0372
0.0249
Upper
Error
0.0541
0.0376
0.0264
0.9420
0.0246
0.9505
0.9385
0.9443
0.9474
0.9351
0.9428
0.9493
0.9338
0.9417
0.9481
0.9220
0.9364
0.9460
0.9222
0.9355
0.9462
0.9250
0.9409
0.9506
0.8939
0.9262
0.9486
Central
Coverage
0.8920
0.9273
0.9464
0.0040
0.0018
0.0017
0.0057
0.0028
0.0013
0.0074
0.0036
0.0006
0.0081
0.0041
0.0009
0.0140
0.0068
0.0020
0.0139
0.0072
0.0019
0.0125
0.0045
0.0003
0.0281
0.0119
0.0008
Average
Bias
0.0290
0.0113
0.0018
Table 4: Simulation results for Model 2 (n = 60)
0.0644
0.0511
0.0387
0.0642
0.0514
0.0384
0.0585
0.0473
0.0276
0.0606
0.0499
0.0257
0.0535
0.0444
0.9503
0.0575
0.0467
0.0280
0.0561
0.0457
0.0265
0.0644
0.0517
0.0363
ψ=σ
Lower
Error
0.0602
0.0475
0.0343
0.0092
0.0135
0.0242
0.0080
0.0108
0.0239
0.0090
0.0115
0.0252
0.0114
0.0163
0.0268
0.0096
0.0139
0.0234
0.0100
0.0132
0.0245
0.0094
0.0134
0.0249
0.0103
0.0138
0.0249
Upper
Error
0.0094
0.0130
0.0239
0.9264
0.9354
0.9371
0.9278
0.9378
0.9377
0.9325
0.9412
0.9472
0.9280
0.9338
0.9475
0.9369
0.9417
0.0263
0.9325
0.9401
0.9475
0.9345
0.9409
0.9486
0.9253
0.9345
0.9388
Central
Coverage
0.9304
0.9395
0.9418
0.0276
0.0188
0.0072
0.0281
0.0203
0.0072
0.0248
0.0179
0.0014
0.0246
0.0168
0.0012
0.0220
0.0153
0.0014
0.0238
0.0168
0.0017
0.0233
0.0161
0.0008
0.0270
0.0190
0.0057
Average
Bias
0.0254
0.0173
0.0052
Table 5: Simulation results for Model 1 with and without adjustment (n = 50)
WALD
r
BN
WALD
r
BN
ψ=α
Lower
Error
0.0000
0.0000
0.5036
0.0000
0.0000
0.0180
Upper
Error
0.0051
0.0151
0.0319
0.0051
0.0151
0.0348
Central
Coverage
0.9949
0.9849
0.4645
0.9949
0.9849
0.9472
Average
Bias
0.0225
0.0175
0.2428
0.0225
0.0175
0.0084
ψ=σ
Lower
Error
0.1348
0.0407
0.3581
0.1307
0.0232
0.0288
Upper
Error
0.1782
0.0093
0.0336
0.1656
0.0067
0.0215
Central
Coverage
0.6870
0.9500
0.6083
0.7037
0.9701
0.9497
Average
Bias
0.1315
0.0157
0.1708
0.1232
0.0101
0.0037
WALD
r
BN
WALD
r
BN
0.0759
0.0397
0.0301
0.0760
0.0397
0.0297
0.0147
0.0220
0.0255
0.0147
0.0220
0.0255
0.9094
0.9383
0.9444
0.9093
0.9383
0.9448
0.0306
0.0089
0.0028
0.0307
0.0089
0.0026
0.1906
0.0374
0.0282
0.1907
0.0372
0.0237
0.1293
0.0164
0.0237
0.1295
0.0164
0.0241
0.6801
0.9462
0.9481
0.6798
0.9464
0.9522
0.1350
0.0105
0.0023
0.1351
0.0104
0.0011
WALD
r
BN
WALD
r
BN
0.0376
0.0290
0.0273
0.0376
0.0290
0.0273
0.0356
0.0288
0.0269
0.0356
0.0288
0.0269
0.9268
0.9422
0.9458
0.9268
0.9422
0.9458
0.0116
0.0039
0.0021
0.0116
0.0039
0.0021
0.1920
0.0369
0.0239
0.1920
0.0369
0.0239
0.1412
0.0177
0.0264
0.1412
0.0177
0.0264
0.6668
0.9454
0.9497
0.6668
0.9454
0.9497
0.1416
0.0096
0.0012
0.1416
0.0096
0.0012
WALD
r
BN
WALD
r
BN
0.0170
0.0247
0.0278
0.0170
0.0247
0.0278
0.0725
0.0395
0.0289
0.0728
0.0395
0.0287
0.9105
0.9358
0.9433
0.9102
0.9358
0.9435
0.0277
0.0074
0.0033
0.0279
0.0074
0.0032
0.1903
0.0367
0.0293
0.1908
0.0364
0.0251
0.1342
0.0157
0.0232
0.1342
0.0157
0.0234
0.6755
0.9476
0.9475
0.6750
0.9479
0.9515
0.1373
0.0105
0.0031
0.1375
0.0104
0.0009
WALD
r
BN
WALD
r
BN
0.0053
0.0162
0.0353
0.0053
0.0162
0.0391
0.0000
0.0000
0.5154
0.0000
0.0000
0.0184
0.9947
0.9838
0.4493
0.9947
0.9838
0.9425
0.0224
0.0169
0.2503
0.0224
0.0169
0.0104
0.1363
0.0399
0.3558
0.1334
0.0228
0.0246
0.1809
0.0131
0.0352
0.1652
0.0094
0.0231
0.6828
0.9470
0.6090
0.7014
0.9678
0.9523
0.1336
0.0134
0.1705
0.1243
0.0089
0.0012
θ
Method
(-0.95, 0)
Adjusted
(-0.5, 0)
Adjusted
(0, 0)
Adjusted
(0.5, 0)
Adjusted
(0.95, 0)
Adjusted
19
Table 6: Simulation results for Model 2 with and without adjustment (n = 60)
WALD
r
BN
WALD
r
BN
ψ=α
Lower
Error
0.4460
0.1800
0.4660
0.0400
0.1340
0.0280
Upper
Error
0.0000
0.0040
0.0260
0.0000
0.0040
0.0260
Central
Coverage
0.5540
0.8160
0.5080
0.9600
0.8620
0.9460
Average
Bias
0.2230
0.0880
0.2210
0.0200
0.0650
0.0020
ψ=σ
Lower
Error
0.0874
0.0732
0.3734
0.0874
0.0732
0.0279
Upper
Error
0.0041
0.0069
0.0288
0.0041
0.0069
0.0339
Central
Coverage
0.9085
0.9199
0.5978
0.9085
0.9199
0.9382
Average
Bias
0.0417
0.0331
0.1761
0.0417
0.0331
0.0059
WALD
r
BN
WALD
r
BN
0.1218
0.0808
0.0442
0.1156
0.0801
0.0285
0.0111
0.0150
0.0264
0.0111
0.0150
0.0264
0.8671
0.9042
0.9294
0.8733
0.9049
0.9451
0.0553
0.0329
0.0103
0.0688
0.0326
0.0024
0.0688
0.0553
0.0528
0.0522
0.0553
0.0284
0.0086
0.0117
0.0229
0.0086
0.0117
0.0241
0.9226
0.9330
0.9243
0.9226
0.9330
0.9475
0.0301
0.0218
0.0150
0.0301
0.0218
0.0022
WALD
r
BN
WALD
r
BN
0.0467
0.0390
0.0251
0.0467
0.0390
0.0251
0.0288
0.0220
0.0267
0.0288
0.0220
0.0267
0.9245
0.9390
0.9482
0.9245
0.9390
0.9482
0.0127
0.0085
0.0009
0.0127
0.0085
0.0009
0.0577
0.0469
0.0277
0.0577
0.0469
0.0277
0.0091
0.0122
0.0237
0.0091
0.0122
0.0237
0.9332
0.9409
0.9486
0.9332
0.9409
0.9487
0.0243
0.0173
0.0020
0.0243
0.0173
0.0020
WALD
r
BN
WALD
r
BN
0.0194
0.0286
0.0246
0.0194
0.0286
0.0246
0.0660
0.0333
0.0279
0.0660
0.0331
0.0275
0.9146
0.9381
0.9475
0.9146
0.9383
0.9479
0.0233
0.0060
0.0017
0.0233
0.0058
0.0014
0.0604
0.0493
0.0309
0.0604
0.0493
0.0296
0.0097
0.0127
0.0233
0.0097
0.0127
0.0235
0.9299
0.9380
0.9458
0.9299
0.9380
0.9469
0.0254
0.0183
0.0038
0.0254
0.0183
0.0031
WALD
r
BN
WALD
r
BN
0.0059
0.0207
0.0283
0.0059
0.0207
0.0287
0.0511
0.0000
0.2510
0.0000
0.0000
0.0298
0.9430
0.9793
0.7207
0.9941
0.9793
0.9415
0.0226
0.0147
0.1147
0.0221
0.0147
0.0042
0.0677
0.0577
0.2221
0.0677
0.0577
0.0309
0.0073
0.0106
0.0291
0.0073
0.0106
0.0246
0.9250
0.9317
0.7488
0.9250
0.9317
0.9445
0.0302
0.0236
0.1006
0.0302
0.0236
0.0032
θ
Method
(-0.8, 1, 0)
Adjusted
(-0.5, 1, 0)
Adjusted
(0, 1, 0)
Adjusted
(0.5, 1, 0)
Adjusted
(0.9, 1, 0)
Adjusted
20
5
Discussion and Conclusion
This paper has investigated the performance of a third-order likelihood-based method against two
commonly used first-order methods for inference concerning a moving average model of order 1.
Based on the simulation studies, the third-order method outperformed the other methods based on
the criteria considered. A bias correction was also used in this paper to handle boundary problems
and was shown through simulations to be effective. The examples and simulations were implemented
in Matlab and the code is available from the authors upon request.
21
References
[1] Anderson, O.D., 1976, Time Series Analysis and Forcasting: The Box-Jenkins Approach, Butterworth & Co Publishers Ltd.
[2] Barndorff-Nielsen, O.E., 1986, Inference on Full or Partial Parameters based on the Standardized Signed Log Likelihood Ratio, Biometrika 73, 307-322.
[3] Barndorff-Nielsen, O.E., 1991, Modified Signed Log-likelihood Ratio Statistic, Biometrika 78,
557-563.
[4] Chang, F., Wong, A., 2010, Improved Likelihood-Based Inference for the Stationary AR(2)
Model, Journal of Statistical Planning and Inference 140, 2099-2110.
[5] Doganaksoy, N., Schmee, J., 1993, Comparisons of Approximate Confidence Intervals for Distributions used in Life-data Analysis, Technometrics 35, 175-184.
[6] Fraser, D.A.S., 1990, Tail Probabilities from Observed Likelihoods, Biometrika 77, 65-76.
[7] Fraser, D.A.S., Reid, N., 1995, Ancillaries and Third Order Significance, Utilitas Mathematica
47, 33-53.
[8] Fraser, D.A.S., Reid, N., Wong, A., 1991, Exponential Linear Models: A Two-pass Procedure
for Saddlepoint Approximation, Journal of the Royal Statistical, Society Series 53(2), 483-492.
[9] Fraser, D.A.S., Reid, N., Wu, J., 1999, A Simple General Formula for Tail Probabilities for
Frequentist and Bayesian Inference, Biometrika 86, 249-264.
[10] Galbraith, J.W., Zinde-Walsh, V., 1994, A Simple Noniterative Estimator for Moving Average
Models, Biometrika 81, 143-155.
[11] Montgomery, D.C., Johnson, L.A., 1976, Forecasting and Time Series Analysis, New York,
McGraw-Hill.
[12] Rekkas, M., Sun, Y., Wong, A.C.M, 2008, Improved Inference for First-Order Autocorrelation
using Likelihood Analysis, Journal of Time Series Analysis 29, 513-532.
22
[13] Self, S.G., Liang, K., 1987, Asymptotic Propertities of Maximum Likelihood Estimator and
Likelihood Ratio Tests under Nonstandard Conditions, Journal of the American Statistical
Association 82, 605-601.
[14] Shaman, P., 1969, On the Inverse of the Covariance Matrix of a First Order Moving Average,
Biometrika 56, 595-600.
[15] Shao, J., 2003, Mathematical Statistics (2nd). Springer-Verlag, New York.
[16] Shapiro, A., 1985, Asymptotic Distribution of Test Statistics in the Analysis of Moment Structures under Inequality Constraints, Biometrika 72, 133-144.
[17] Wald, A., 1943, Tests of Statistical Hypotheses Concerning Several Parameters when the Number of Observations is Large, Transactions of the American Mathematical Society 54, 426-482.
[18] Wilks, S.S., 1938, The Large Sample Distribution of the Likelihood Ratio for Testing Composite
Hypotheses, Annals of Mathematical Statistics 9, 60-62.
[19] Wold, H., 1949, A Large-Sample Test for Moving Averages, Journal of the Royal Statistical
Society B 11, 297-305.
[20] Wong, A., 1992, Converting Observed Likelihood to Levels of Significance for Transformation
Models, Communications in Statistics: Theory and Methods 21, 2809-2823.
23
A
A.1
Appendix
Derivation for L−1 :
Recall that
vt = εt + αεt−1 ,
where εt , t = 0, 1, · · · , n are normally distributed with mean 0 and variance 1. Denote σk =
Cov(vt , vt±k ), k = 0, 1, · · · , n, the elements of the covariance matrix of {vt } possess the following
form
σ0
= E(ε2t ) + 2αE(εt εt−1 ) + α2 E(ε2t−1 ) = 1 + α2 ,
σ1
= E(εt εt−1 ) + αE(ε2t−1 ) + αE(εt εt−2 ) + α2 E(εt−1 εt−2 ) = α,
σk
=
0, k ≥ 2.
The covariance matrix for {vt } of size n is then




Σn = 



1 + α2
α
α
1 + α2
.
..
.
···
0
···
0
..
0
..
.
α

0



.



α
1 + α2
Additionally, from Shanman (1969), we have
Dn = |Σn | = 1 + α2 + α4 + · · · + α2n =
1 − α2(n+1)
.
1 − α2
Assume the Cholesky decomposition for Σn is Σn = Ln L0n , then for n = 2, we have
 √

1 + α2
0
,
L2 = 
√
2
4
1+α
+α
√
1+α2
√ α
1+α2

L−1
2

= 
 −√
√ 1
1+α2
α
(1+α2 +α4 )(1+α2 )
24
0
1
r
1+α2 +α4
1+α2


.

For n = 3, we have
 √
L3


= 



L−1
3
1 + α2
0
q
√ α
1+α2
1+α2 +α4
1+α2
q
α
0
r
1+α2 +α4
1+α2
0

0


,


1+α2 +α4 +α6
1+α2 +α4
√ 1
1+α2
0



α
−√

(1+α2 +α4 )(1+α2 )
= 



α2
 √
2
4
6
2
0
1
r
(1+α +α +α )(1+α +α4 )
−√
0
1+α2 +α4
1+α2
α(1+α2 )
(1+α2 +α4 +α6 )(1+α2 +α4 )
1
r
1+α2 +α4 +α6
1+α2 +α4





.




ij
The above computation results indicate a trend for both Ln = {lij } and L−1
n = {l }, i = 1, · · · , n, j =
1, · · · , n, which is
√
1−α2i+2

√

i=j

2i

 √1−α
2j
lij = √α 1−α
i=j+1 ,
2j+2
1−α




0
otherwise
A.2
lij
 √
2i


√ 1−α

2i+2


 1−α i−j
2j
= √(−α) 2i+2(1−α )2i

(1−α
)(1−α
)




0
i=j
i>j
otherwise
Derivation for pivot z:
Recall that the model under investigation is
X = µ + σv , v ∼ N (0, Σ),
e
e e
e
therefore, X ∼ N (µ, σ 2 Σ). A natural choice for the pivotal quantity is
e
e
L−1 (X − µ)
e e
σ
which is distributed as a multivariate standard normal with dimensionality n.
25
Download