Improved Likelihood-Based Inference for the MA(1) Model Fang Chang∗ Marie Rekkas† Augustine Wong‡ Abstract An improved likelihood-based method is proposed to test for the significance of the firstorder moving average model. Compared with commonly used tests which depend on the asymptotic properties of the maximum likelihood estimate and the likelihood ratio statistic, the proposed method has remarkable accuracy. Application of the method to two historical data sets is presented to demonstrate the implementation of the method. Simulation studies are subsequently performed to illustrate the accuracy of the method compared to the traditional methods. Additionally, a simple and effective bias correction is used to deal with a boundary problem. Keywords: Moving average model; Likelihood analysis; p-value ∗ Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3, email: changf@yorku.ca † Corresponding Author. Department of Economics, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia, Canada V5A 1S6, email: mrekkas@sfu.ca, phone: (778) 782-6793, fax: (778) 782-5944 ‡ Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3, email: august@yorku.ca 1 1 Introduction Consider the model Xt = µ + σvt , t = 1, 2, · · · , n (1) where the error term vt is specified as a first-order moving average process vt = εt + αεt−1 . The term σ is a scaling factor, the error terms vt , t = 1, · · · , n are independent standard normal errors, and α is the weight of the error term εt−1 contributing to vt with |α| < 1. This time series model is referred to as an MA(1) process. Given the error process specified for vt , and if we let σs = Cov(vt , vt−s ), s ≥ 0, then we have σ0 = 1 + α2 , σ1 = α, σs = 0, s ≥ 2. In contrast to an autocorrelated model specification where all error terms are correlated, the MA(1) model allows for correlation between only the first and second error terms. This feature of the MA(1) model makes it very attractive in certain situations where precisely this type of correlation modeling is instructive. Given these results, we are able to compute the mean and variance of Xt as follows: E(Xt ) = E(µ + σεt + σαεt−1 ) = µ and E(Xt − µ)2 = E(σεt + σαεt−1 )2 = 1 + α2 . And in a similar fashion, we can compute the autocovariances: E(Xt − µ)(Xt−1 − µ) = E(σεt + σαεt−1 )(σεt−1 + σαεt−2 ) = α and E(Xt − µ)(Xt−2 − µ) = 0. 2 Note that in fact for all other covariances, E(Xt − µ)(Xt−s − µ) = 0. As the mean and covariances do not depend on time, the MA(1) process is a stationary process regardless of the value of α. The MA(1) model can be also presented in multivariate form. Denote 1 as a size n column e vector with all elements being 1, then we have X = µ + σv , e e e where 0 X = (X1 , X2 , · · · , Xn ) , e 0 µ = (µ, µ, · · · , µ) = µ1. e e 0 The vector v = (v1 , v2 , · · · , vn ) is a random vector distributed as an n-dimensional normal distrie bution with mean 0 and variance covariance matrix Σ = (ωij ), ωij = σ|i−j| , i, j = 1, 2 · · · , n. If we are interested in testing the parameter α, there are two commonly used statistics: the maximum likelihood estimate (MLE) and the likelihood ratio. The asymptotic distributions of these statistics were first investigated by Wald (1943) and Wilks (1938). Further studies surrounding these statistics can also be found in Shao (2003). The p-value functions for this parameter (or any other scalar parameter from the model) can be approximated using the results in Wald (1943) and Wilks (1938). Note that these approximations are termed first-order methods as the rates of convergence are O(n−1/2 ). It is noted that the approximation based on the MLE is not invariant to reparameterization, in contrast to the approximation based on the likelihood ratio statistic. In addition to these two first-order methods, Wold (1949) proposed a test statistic which follows a χ2 distribution for testing serial coefficients in a large sample context, the method however is rather intricate and will not be used in this paper. In this paper, we propose a likelihood-based method for inference concerning any scalar parameter of interest of the MA(1) model. Theoretically, this proposed method is a third-order method, 3 which means that the rate of convergence is O(n−3/2 ). Third-order inference concerning the significance of the lags of AR(1) and AR(2) models has been examined in Rekkas, Sun and Wong (2008) and Chang and Wong (2010), respectively. Their proposed methods are extremely accurate in small and medium sample size cases. In Section 2, the likelihood-based method is introduced for obtaining the p-value function of any scalar parameter of interest from a general model. In Section 3, the method is applied to the MA(1) model to obtain a p-value function for the primary parameter of interest α. We also implement in this section a simple and effective bias correction that can be used when the MLE hits its boundary. Numerical examples, including two real-life examples and three simulation studies, are presented in Section 4 to illustrate the implementation and accuracy of the proposed method. Some concluding remarks are given in Section 5. 2 An overview of a likelihood-based third-order method Let x = (x1 , . . . , xn ) be a random sample from a canonical exponential family model with loglikelihood function `(θ) = `(θ; x) = ψt + λs + k(θ), where θ = (ψ, λ0 )0 is a p-dimensional canonical parameter, ψ is a scalar parameter of interest, λ is the (p − 1)-dimensional nuisance parameter, and k(θ) is known function of θ. Notice that ψ is a component parameter of dimension 1 of the canonical parameter. In this model, the statistic (t, s) = (t(x), s(x))0 is a minimal sufficient statistic. Let θ̂ = (ψ̂, λ̂0 )0 be the overall maximum likelihood estimate of θ, which satisfies ∂`(θ) `θ (θ̂) = = 0. ∂θ θ=θ̂ The observed information matrix evaluated at θ̂ is jψψ (θ̂) jψλ0 (θ̂) ∂ 2 `(θ) , jθθ0 (θ̂) = − = ∂θ∂θ0 θ=θ̂ jψλ (θ̂) jλλ0 (θ̂) −1 and jθθ 0 (θ̂) is an estimate of the variance-covariance matrix of θ̂. A commonly used test statistic, (θ̂ − θ)0 jθθ0 (θ̂)(θ̂ − θ), proposed by Wald (1943) has an asymptotic Chi-square distribution with p 4 degrees of freedom. For the parameter of interest ψ, where θ = ψ, we have h i−1/2 q = (ψ̂ − ψ) vd ar(ψ̂) , asymptotically distributed as the standard normal distribution. Fraser, Reid and Wong (1991) applied the sequential saddlepoint method to obtain a version for q which takes into consideration the elimination of the nuisance parameter λ, ( q = (ψ̂ − ψ) |jθθ0 (θ̂)| )1/2 |jλλ0 (θ̂ψ )| , (2) where θ̂ψ = (ψ, λ̂0ψ )0 is the constrained maximum likelihood estimate of θ for a given ψ satisfying ∂`(θ) `λ (θ̂ψ ) = = 0, ∂λ θ=θ̂ψ and jλλ0 (θ̂ψ ) is the nuisance information matrix evaluated at θ̂ψ . Note that q is directly affected by a change in the parameterization. A familiar statistic for a scalar parameter of interest that is invariant to reparameterization is the signed log-likelihood ratio statistic, whose asymptotic distribution is derived in Wilks (1938) and takes the form n h io1/2 r = r(ψ) = sgn(ψ̂ − ψ) 2 `(θ̂) − `(θ̂ψ ) . (3) This statistic r is asymptotically distributed as the standard normal distribution. The p-value for a fixed ψ, denoted as p(ψ), can then be approximated by either Φ(q) or Φ(r), where Φ(·) is the cumulative distribution function of the standard normal distribution. Both approximations have first-order accuracy but the one based on the signed log-likelihood ratio statistic is widely viewed as more reliable as in Doganaksoy and Schmee (1993). The function p(ψ) is referred to as the p-value function for ψ as it can provide the p-value for any given ψ. To improve the accuracy of approximating the p-value function for ψ, Barndorff-Nielsen (1986, 1991) proposed 1 q Φ(r∗ (ψ)) = Φ r + log r r (4) where q and r are defined in (2) and (3), respectively. Barndorff-Nielsen’s method adjusts the signed log-likelihood ratio statistic such that the p-value function obtained from r∗ (ψ) is close to 5 the true p-value function. Fraser (1990) showed that this adjustment has third-order accuracy. Now consider a the general exponential family model with canonical parameter ϕ = ϕ(θ) and scalar parameter of interest ψ = ψ(θ). Since the signed log-likelihood ratio statistic, r = r(ψ), is invariant to reparameterization, it remains unchanged and is defined as in (3). However, the quantity q = q(ψ) must now be expressed in the canonical parameter, ϕ(θ), scale. Let ϕθ (θ) and ϕλ (θ) be the derivatives of ϕ(θ) with respect to θ and λ, respectively. Denote ϕψ (θ) as the row of ψ ψ ϕ−1 θ (θ) that corresponds to ψ, and let ||ϕ (θ)|| denote the Euclidean distance of the vector ϕ (θ). Moreover, let χ(θ) = ϕψ (θ̂ψ ) ||ϕψ (θ̂ψ )|| ϕ(θ) (5) be a rotated coordinate of ϕ(θ) that agrees with ψ(θ) at θ̂ψ . Then χ(θ̂) − χ(θ̂ψ ) measures the maximum likelihood estimate departure, (ψ̂ − ψ), in ϕ(θ) scale. Since `(θ) = `(ϕ) according to the reparameterization, then by chain rule differentiation, we have |jϕϕ0 (θ̂)| = |jθθ0 (θ̂)||ϕθ (θ̂)|−2 and |j(λλ0 ) (θ̂ψ )| = |jλλ0 (θ̂ψ )||ϕ0λ (θ̂ψ )ϕλ (θ̂ψ )|−1 . As shown in Fraser, Reid and Wu (1999), an estimated variance for χ(θ̂) − χ(θ̂ψ ) in the ϕ(θ) scale is |j(λλ0 ) (θ̂ψ )| |jϕϕ0 (θ̂)| . Thus q = q(ψ) in ϕ(θ) scale can be written as ( q = q(ψ) = sgn(ψ̂ − ψ)|χ(θ̂) − χ(θ̂ψ )| |jϕϕ0 (θ̂)| )1/2 |j(λλ0 ) (θ̂ψ )| . (6) A third-order p-value therefore p(ψ) can be obtained using (4) with r and q defined in (3) and (6) respectively. For a general model, Fraser and Reid (1995) showed that ϕ(θ) can be obtained by d`(θ) ϕ0 (θ) = V, dy y0 (7) where −1 ∂y ∂z(θ, y) ∂z(θ, y) = V = ∂θ0 (y0 ,θ̂) ∂y 0 ∂θ0 (y 0 ,θ̂) 6 (8) with z(θ, y) being an n-dimenstional pivotal quantity. For the above asymptotic distributions to be valid, one imposed assumption is that the MLE exists as an inner point of the parameter space. However, for a small sample with a non-diagonal covariance structure, it is very likely that the likelihood function will be ill-behaved with a maxima obtained at the boundary. The asymptotic behavior of the MLE and the likelihood ratio when the true parameter is on the boundary has been studied in Shapiro (1985) and Self and Liang (1987). Unfortunately, there is no specific form for the distributions of these two statistics and they vary case by case. Self and Liang (1987) introduced several examples where the distribution of the likelihood ratio statistic when the true parameter is at the boundary could be a mixture of χ2 distribution, yet the distribution is associated with the nuisance parameter. We therefore need to be cautious in applying the above methods when the parameter space is bounded. In the next section we use a simple and effective bias correction to treat the boundary in the case where the likelihood function is monotone. This correction term was first derived by Galbraith and Zinde-Walsh (1994) as the bias for the point estimation when approximating an MA(1) model by an AR(p) model. One property of this correction term is that its sign is always opposite to that of the MLE, and this controls the volume of the covariance matrix to be o(n). 3 Applying the proposed method to the MA(1) model Under our model specification, the moving average process X follows a multivariate normal distrie bution which belongs to the location-scale family model. In this context, Wong (1992) suggested working with the logarithm of the scale parameter instead of the original scale parameter. The log-likelihood function of the MA(1) model is `(x; θ) = −n log σ − 1 1 1 log(1 − α2(n+1) ) + log(1 − α2 ) − e−2 log σ (X − µ)0 Σ−1 (X − µ), 2 2 2 e e e e where θ = (α, µ, log σ)0 under the recommendation by Wong (1992) and Σ−1 = A = {σ ij }. According to Shaman (1969), the elements of the precision matrix, A, take the following form σ ij = (−α)j−i (1 + α2 + · · · + α2(i−1) )(1 + α2 + · · · + α2(n−j) ) , 1 + α2 + · · · + α2n 7 j ≥ i. Differentiating the log-likelihood gives (n + 1)α2n+1 α 1 − − e−2 log σ (X − µ)0 Aα (X − µ) 1 − α2n+2 1 − α2 2 e e e e `µ = e−2 log σ 10 A(X − µ) e e e `log σ = −n + e−2 log σ (X − µ)0 A(X − µ), e e e e `α = where Aα (i = 1, 2) is the element-wise derivative of A with respective to α. The overall maximum likelihood estimate of θ, θ̂, can be solved by equating the first-order derivatives to zero. However, numerical optimization methods will be employed due to the lack of a closed-form expression for the MLE. Suppose the scalar parameter of interest is ψ = ψ(θ) = α. We are able to claim that the firstorder moving average structure for the error sequence is not significant if we fail to reject H0 : ψ = 0. As defined in Section 2, θ̂ψ is the constrained maximum likelihood estimate of θ for a given ψ and then r can be obtained from (3). Thus p(ψ) can be approximated by Φ(r) with first-order accuracy. From the result in Section 2, the statistic h i−1/2 (ψ̂ − ψ) vd ar(ψ̂) (9) is asymptotically distributed as standard normal where vd ar(ψ̂) is approximated by the Delta method based on jθθ0 (θ̂), the observed information matrix evaluated at θ̂. Throughout this paper we assume the inversion operator is prior to the differentiation operator. 0 Using the Cholesky decomposition, Σ−1 = (L−1 ) L−1 , where L is a lower triangular matrix obtained 0 from the decomposition Σ = LL , a pivotal quantity is readily available for the MA(1) model L−1 (X − µ) e e z(θ, X ) = σ e (10) where L−1 = {lij } and l ii = qP α2k qP α2k i−1 k=0 i k=0 lij = √ 1 − α2i =√ 1 − α2i+2 (−α)i−j (1 − α2j ) p (1 − α2i+2 )(1 − α2i ) , i > j. The detailed derivation of z, L and L−1 is given in the Appendix. 8 From (7) and (8), the canonical parameters are ϕ0 (θ) = (ϕ1 (θ), ϕ2 (θ), ϕ3 (θ)), with ϕ1 (θ) = ϕ2 (θ) = ϕ3 (θ) = −e−2 log σ (X − µ)0 AL̂L̂−1 α (X − µ̂), e e e e e−2 log σ (X − µ)0 A, e e e−2 log σ (X − µ)0 A(X − µ̂), e e e e −1 where L̂−1 with respect to α evaluated at α̂. Moreover, the α is the element-wise derivative of L first-order derivative of the canonical parameters with respect to θ is 0 ∂ϕ1 ∂ϕ2 ∂ϕ3 , , ϕθ (θ) = ∂θ ∂θ ∂θ 0 0 0 −(X − µ)0 Aα L̂L̂−1 (X − µ̂ ) (X − µ ) A 1 (X − µ ) A (X − µ̂ ) α α α e e e e e e e e e e e −2 log σ 0 −1 0 0 = e 1 AL̂L̂α (X − µ̂) −1 A1 −1 A(X − µ̂) e e e e e e e e 0 0 2(X − µ)0 AL̂L̂−1 (X − µ̂ ) −2(X − µ ) A1 −2(X − µ ) A(X − µ̂ ) α e e e e e e e e e e e and hence 0 −1 (X − µ̂ ) (X − µ̂ ) 2(X − µ ) A L̂ L̂ 10 AL̂L̂−1 α α e e e e e e e 0 0 ϕλ (θ) = e−2 log σ −1 A1 −2(X − µ) A1 e e e e e −10 A(X − µ̂) −2(X − µ)0 A(X − µ̂) e e e e e e e which stands for the derivatives of canonical parameters with respect to the nuisance parameters λ by deleting the first column of ϕθ (θ). −1 Denote ϕψ (θ) be the corresponding row to ψ in ϕ−1 θ (θ), which is the first row of ϕθ (θ). Hence χ(θ) can be obtained from (5). Furthermore, we have −`αα −`αµ jθθ0 (θ) = −`αµ −`µµ −`α log σ −`µ log σ −`α log σ −`µ log σ −`log σ log σ and jλλ0 (θ) = −`µµ −`µ log σ −`µ log σ −`log σ log σ 9 , , where (n + 1)(2n + 1)α2n + (n + 1)α4n+2 1 + α2 1 − − e−2 log σ (X − µ)0 Aαα (X − µ) (1 − α2n+2 )2 (1 − α2 )2 2 e e e e `αµ = e−2 log σ 10 Aα (X − µ) e e e −2 log σ `α log σ = e (X − µ)0 Aα (X − µ) e e e e `µµ = −e−2 log σ 10 A1 e e `µ log σ = −2e−2 log σ 10 A(X − µ) e e e `log σ log σ = −2e−2 log σ (X − µ)0 A(X − µ) e e e e `αα = with Aαα being the element-wise second derivative of A with respect to α. Hence q can be obtained from (6) and p(ψ) can then be approximated with third-order accuracy by (4). Finally, a central (1 − α) × 100% confidence interval for ψ is min{p−1 (α/2), p−1 (1 − α/2)}, max{p−1 (α/2), p−1 (1 − α/2)} . (11) For small- or medium-sized data with a moving average structure, the observed likelihood function could be monotone; that is the MLE may be obtained at the boundary. If so, the Monte Carlo simulation results would not reflect the true behavior of the methods due to a large proportion of monotone observed likelihood functions. In order to improve the result in such a situation, we modify the MLE by a correction function B(x, k) = x − x2k+1 . 1 + x2 + · · · + x2k (12) This function is taken from Galbraith and Zinde-Walsh (1994). This correction can be applied as a bias correction to prevent the boundary problem for the MLE. As k increases, the function monotonically increases for any specific positive x and monotonically decreases for any specific negative x. Given that x ∈ (−1, 1), the limit of B(x, k) is x and the larger the quantity of x, the slower B(x, k) converges. We can see from the parametrization, the MLE of the location parameter µ and scale parameter σ depends on the MLE of α. It is adequate to just correct the MLE of α since it is the only parameter that has a bound, and the rest can be adjusted accordingly. For inference on α, we opt to choose a large k(≥ 20), therefore, the correction is only turned on for a MLE obtained at the boundary. And for inference on a parameter other than α, we need to be cautious in the choice of k. Since the MLE is biased, correcting α does not necessarily correct the 10 other, however, the other parameters can be corrected through correcting α by an appropriate k. 4 Numerical studies We first apply the methods discussed in the paper to two real life data sets and obtain the p-value curves as well as 90% confidence intervals for the various methods. The simulation studies are carried out to compare the methods discussed in this paper. Note that inference on (α, µ, σ)0 can be easily obtained. We use the following abbreviations to represent the three methods discussed in Section 3: “WALD” to represent the Wald test given in (2), “r” to represent the signed loglikelihood ratio statistic in (3), and “BN” to represent the third-order method based on equation (4). 4.1 4.1.1 Application to historical data sets I.C.I. closing prices data This data set records the Imperial Chemical Industries (I.C.I.) closing prices for 107 opening days from August 25, 1972 to January 19, 1973 published in Anderson (1976). To facilitate inference, we compress the I.C.I. closing price data by a scale of 100. An MA(1) model specified as in (1) is applied to the re-scaled data. The overall MLE for θ = (α, µ, log σ)0 is obtained as θ̂ = (0.7577, 2.8370, −2.8437)0 . Given α̂ is far from the boundary, the likelihood function is likely well-behaved and the bias correction not necessary. Figures 1-3 displays the p-value functions, p(α), p(µ), and p(σ), for the I.C.I. data calculated for each of the three methods against α, µ and σ, respectively. The two horizontal lines in each of the figures are the datum lines for upper and lower 0.05 levels. In Figure 1, the p-value functions for α are displayed. From this figure it is clearly evident that the two first-order methods (WALD, r) give indistinguishable results while the function based on the third-order method (BN) produces results which vary significantly from these functions. This result is even more pronounced in Figure 3, where the p-value functions are given for the parameter σ. On the other hand, in Figure 2, all three p-value functions for the location parameter µ are virtually coincident and thus produce very similar results. To take a specific example from these figures, Table 1 reports the 90% confidence intervals for each element of (α, µ, σ)0 obtained from each of the three methods. This table bears the general results gleaned from the figures. 11 Figure 1: P -value function for α for I.C.I. data 1 WALD r BN 0.9 0.8 Probability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.65 0.7 0.75 0.8 Hypothesized value for α 0.85 Figure 2: P -value function for µ for I.C.I. data 1 WALD r LR BN 0.9 0.8 Probability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2.81 2.815 4.1.2 2.82 2.825 2.83 2.835 2.84 2.845 Hypothesized value for µ 2.85 2.855 2.86 Thermostat demand data This data set records the weekly demand for thermostats for 65 straight weeks and is described in Montgomery and Johnson (1976). To facilitate inference, we compress the data by a scale of 10. The MA(1) model specified in (1) is applied on the re-scaled data sets. The the overall MLE for θ = (α, µ, log σ)0 is calculated as θ̂ = (0.0501, 23.5835, 1.6240)0 . Once again, α̂ is far from the boundary, indicating that the likelihood function is well-behaved. The unadjusted results are therefore reported. 12 Figure 3: P -value function for µ for I.C.I. data 1 WALD r BN 0.9 0.8 Probability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.05 0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068 0.07 Hypothesized value for σ Table 1: 90% Central confidence intervals for α, µ and σ for I.C.I data Method α µ σ WALD (0.6469, 0.8685) (2.8208, 2.8532) (0.0520, 0.0651) r (0.6456, 0.8751) (2.8207, 2.8534) (0.0522, 0.0654) BN (0.6384, 0.8656) (2.8206, 2.8534) (0.0528, 0.0664) Figure 4-6 display the p-value functions, p(α), p(µ) and p(σ), that have been produced calculated from the above three methods against α, µ and σ, respectively. The two horizontal lines in each of the figures are the datum lines for upper and lower 0.05 levels. The figures reveal results that are analogous to those seen in the previous example. In Figure 1, which plots the p-value functions for α, we once again see that the two first-order methods (WALD, r) produce functions which are nearly coincident while the third-order method produces a quite different p-value function. This similar pattern is repeated much stronger for the scale parameter σ in Figure 3. In terms of the location parameter µ, Figure 2 reveals that the three methods produce almost identical p-values for mid-range values of µ but discordant results in the tails. Table 2 reports the 90% confidence intervals for each element of (α, µ, σ)0 obtained from the three methods. As predicted from the figures, the discrepancy between the first- and third-order confidence intervals are largest for the scale parameter. 13 Figure 4: P -value function for α for thermostat data 1 WALD r BN 0.9 0.8 Probability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 Hypothesized value for α 0.2 0.25 Figure 5: P -value function for µ for thermostat data 1 WALD r BN 0.9 0.8 Probability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 4.2 22.5 23 23.5 24 Hypothesized value for µ 24.5 Simulation studies In this section, we perform simulations to assess the accuracy of the first-order methods and the third-order methods. For each combination of the parameters 10,000 Monte Carlo replications are performed. The proportions of the true ψ that fall inside of the left tail and right tail of the rejection region are recorded respectively as “lower error”and “upper error”. Additionally, the proportion of the true ψ that falls outside of the rejection region is refereed to as “central coverage”. We also 14 Figure 6: P -value function for µ for thermostat data 1 WALD r BN 0.9 0.8 Probability 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 4.4 4.6 4.8 5 5.2 5.4 5.6 Hypothesized value for σ 5.8 6 Table 2: 90% Central confidence intervals for α, µ and σ for thermostat data Method α µ σ WALD (−0.1406, 0.2409) (22.4968, 24.6702) (4.3922, 5.8610) r (−0.1378, 0.2414) (22.4624, 24.6930) (4.4212, 5.9040) BN (−0.1272, 0.2549) (22.4246, 24.7252) (4.5024, 6.0480) introduce the term “average bias” which, for a 95% confidence interval, is defined as |Lower error − 0.025| + |Upper error − 0.025| . 2 In the following result tables, the nominal values we consider for the lower error, the upper error, central coverage and average bias are: 0.025, 0.025, 0.95 and 0, respectively. For the first simulation, we consider the model yt = σ(εt + αεt−1 ), t = 1, 2, · · · n, where the εt are independent normal errors and ε0 is initialized to be 0. Simulation results are recorded in Table 3 for various combinations of α and log σ for a fixed n equal to 50. It is clear that the widely accepted WALD method is not satisfactory for inference for either α or σ given the low central coverage and high average bias. Overall, it appears as though the WALD has a better 15 central coverage for σ compared to α. The signed log-likelihood ratio method (r) produces better central coverage than the WALD but both methods suffer from asymmetric errors. The third-order method (BN) gives overall excellent results. Table 3: Simulation results for Model 1 (n = 50) WALD r BN ψ=α Lower Error 0.0890 0.0410 0.0308 Upper Error 0.0100 0.0192 0.0240 Central Coverage 0.9010 0.9398 0.9452 Average Bias 0.0395 0.0109 0.0034 ψ=σ Lower Error 0.0527 0.0404 0.0378 Upper Error 0.0095 0.0135 0.0212 Central Coverage 0.9378 0.9461 0.9410 Average Bias 0.0216 0.0135 0.0083 (−0.2, log(1)) WALD r BN 0.0455 0.0302 0.0253 0.0259 0.0255 0.0252 0.9286 0.9443 0.9495 0.0107 0.0028 0.0002 0.0519 0.0399 0.0273 0.0117 0.0154 0.0243 0.9364 0.9447 0.9484 0.0201 0.0122 0.0015 (−0.2, log(10)) WALD r BN 0.0480 0.0317 0.0265 0.0267 0.0259 0.0269 0.9253 0.9424 0.9466 0.0124 0.0038 0.0017 0.0494 0.0391 0.0263 0.0109 0.0166 0.0249 0.9397 0.9443 0.9488 0.0193 0.0113 0.0007 (0, log(1)) WALD r BN 0.0340 0.0276 0.0255 0.0334 0.0259 0.0246 0.9326 0.9465 0.9499 0.0087 0.0017 0.0004 0.0501 0.0391 0.0255 0.0114 0.0164 0.0251 0.9385 0.9445 0.9494 0.0193 0.0114 0.0003 (0.2, log(1)) WALD r BN 0.0268 0.0246 0.0238 0.0470 0.0307 0.0267 0.9262 0.9447 0.9495 0.0119 0.0031 0.0014 0.0502 0.0390 0.0267 0.0119 0.0172 0.0255 0.9379 0.9438 0.9478 0.0192 0.0109 0.0011 (0.2, log(10)) WALD r BN 0.0271 0.0258 0.0257 0.0471 0.0299 0.0268 0.9258 0.9443 0.9475 0.0121 0.0028 0.0012 0.0504 0.0408 0.0284 0.0128 0.0177 0.0253 0.9368 0.9415 0.9463 0.0188 0.0116 0.0018 (0.6, log(1)) WALD r BN 0.0136 0.0223 0.0271 0.0869 0.0390 0.0310 0.8995 0.9387 0.9419 0.0367 0.0083 0.0040 0.0522 0.0402 0.0373 0.0098 0.0129 0.0214 0.9380 0.9469 0.9413 0.0212 0.0136 0.0080 θ Method (−0.6, log(1)) For the second simulation, consider the model presented in (1), where once again ε0 is set to 0. The sample size n is set equal to 60. The results from this simulation are recorded in Table 4. The results show a similar pattern as the results in Table 3. The WALD method produces the poorest results, followed by the likelihood-ratio method (r). The superior performance of the third-order method (BN) is clearly evidenced in this table. The performance of the bias correction given in (12) is presented in Tables 5 and 6 for each of the two simulations. When the true value of α is close to the boundary, one side behaves well in terms of the lower or upper errors yet it affects the other side which is also associated with the central coverage. To obtain the Table 5 results, the values 60 and 5 for k to adjust α are chosen to test α and σ, respectively. Additionally, to obtain the Table 6 results, the values 25 and 5 for k to adjust α are chosen to test α and σ, respectively. The adjusted simulation results demonstrate the advantage of the correction which largely makes the confidence interval more symmetric and closer 16 to the nominal level. 17 18 Method WALD r BN WALD r BN WALD r BN WALD r BN WALD r BN WALD r BN WALD r BN WALD r BN WALD r BN θ (−0.4, 0, log(1)) (−0.4, 5, log(10)) (0, 5, log(5)) (0, 0, log(1)) (0, −5, log(5)) (0.4, 0, log(1)) (0.4, 5, log(10)) (0.65, −5, log(5)) (0.65, 5, log(10)) 0.0142 0.0262 0.0250 0.0137 0.0273 0.0260 0.0250 0.0312 0.0258 0.0239 0.0295 0.0241 0.0478 0.0402 0.0260 0.0478 0.0402 0.0245 0.0468 0.0387 0.0241 0.0878 0.0607 0.0290 ψ=α Lower Error 0.0931 0.0625 0.0283 0.0817 0.0354 0.0283 0.0836 0.0366 0.0287 0.0476 0.0266 0.0264 0.0493 0.0285 0.0273 0.0263 0.0209 0.0253 0.0272 0.0213 0.0258 0.0242 0.0187 0.0243 0.0133 0.0169 0.0265 Upper Error 0.0149 0.0171 0.0266 0.9041 0.9384 0.9467 0.9027 0.9361 0.9453 0.9274 0.9422 0.9478 0.9268 0.9420 0.9486 0.9259 0.9389 0.9487 0.9250 0.9385 0.9497 0.9290 0.9426 0.9516 0.8989 0.9224 0.9445 Central Coverage 0.8920 0.9204 0.9451 0.0337 0.0058 0.0016 0.0349 0.0069 0.0023 0.0113 0.0039 0.0011 0.0127 0.0040 0.0016 0.0121 0.0097 0.0006 0.0125 0.0095 0.0006 0.0113 0.0100 0.0008 0.0372 0.0219 0.0027 Average Bias 0.0391 0.0227 0.0024 0.0305 0.9472 0.0264 0.0324 0.0287 0.0271 0.0328 0.0293 0.0260 0.0337 0.0296 0.0263 0.0408 0.0324 0.0270 0.0391 0.0324 0.0273 0.0373 0.0302 0.0250 0.0539 0.0169 0.0265 ψ=µ Lower Error 0.0539 0.0351 0.0272 0.0275 0.0282 0.0231 0.0291 0.0270 0.0255 0.0321 0.0279 0.0247 0.0325 0.0287 0.0256 0.0372 0.0312 0.0270 0.0387 0.0321 0.0265 0.0377 0.0289 0.0244 0.0522 0.0372 0.0249 Upper Error 0.0541 0.0376 0.0264 0.9420 0.0246 0.9505 0.9385 0.9443 0.9474 0.9351 0.9428 0.9493 0.9338 0.9417 0.9481 0.9220 0.9364 0.9460 0.9222 0.9355 0.9462 0.9250 0.9409 0.9506 0.8939 0.9262 0.9486 Central Coverage 0.8920 0.9273 0.9464 0.0040 0.0018 0.0017 0.0057 0.0028 0.0013 0.0074 0.0036 0.0006 0.0081 0.0041 0.0009 0.0140 0.0068 0.0020 0.0139 0.0072 0.0019 0.0125 0.0045 0.0003 0.0281 0.0119 0.0008 Average Bias 0.0290 0.0113 0.0018 Table 4: Simulation results for Model 2 (n = 60) 0.0644 0.0511 0.0387 0.0642 0.0514 0.0384 0.0585 0.0473 0.0276 0.0606 0.0499 0.0257 0.0535 0.0444 0.9503 0.0575 0.0467 0.0280 0.0561 0.0457 0.0265 0.0644 0.0517 0.0363 ψ=σ Lower Error 0.0602 0.0475 0.0343 0.0092 0.0135 0.0242 0.0080 0.0108 0.0239 0.0090 0.0115 0.0252 0.0114 0.0163 0.0268 0.0096 0.0139 0.0234 0.0100 0.0132 0.0245 0.0094 0.0134 0.0249 0.0103 0.0138 0.0249 Upper Error 0.0094 0.0130 0.0239 0.9264 0.9354 0.9371 0.9278 0.9378 0.9377 0.9325 0.9412 0.9472 0.9280 0.9338 0.9475 0.9369 0.9417 0.0263 0.9325 0.9401 0.9475 0.9345 0.9409 0.9486 0.9253 0.9345 0.9388 Central Coverage 0.9304 0.9395 0.9418 0.0276 0.0188 0.0072 0.0281 0.0203 0.0072 0.0248 0.0179 0.0014 0.0246 0.0168 0.0012 0.0220 0.0153 0.0014 0.0238 0.0168 0.0017 0.0233 0.0161 0.0008 0.0270 0.0190 0.0057 Average Bias 0.0254 0.0173 0.0052 Table 5: Simulation results for Model 1 with and without adjustment (n = 50) WALD r BN WALD r BN ψ=α Lower Error 0.0000 0.0000 0.5036 0.0000 0.0000 0.0180 Upper Error 0.0051 0.0151 0.0319 0.0051 0.0151 0.0348 Central Coverage 0.9949 0.9849 0.4645 0.9949 0.9849 0.9472 Average Bias 0.0225 0.0175 0.2428 0.0225 0.0175 0.0084 ψ=σ Lower Error 0.1348 0.0407 0.3581 0.1307 0.0232 0.0288 Upper Error 0.1782 0.0093 0.0336 0.1656 0.0067 0.0215 Central Coverage 0.6870 0.9500 0.6083 0.7037 0.9701 0.9497 Average Bias 0.1315 0.0157 0.1708 0.1232 0.0101 0.0037 WALD r BN WALD r BN 0.0759 0.0397 0.0301 0.0760 0.0397 0.0297 0.0147 0.0220 0.0255 0.0147 0.0220 0.0255 0.9094 0.9383 0.9444 0.9093 0.9383 0.9448 0.0306 0.0089 0.0028 0.0307 0.0089 0.0026 0.1906 0.0374 0.0282 0.1907 0.0372 0.0237 0.1293 0.0164 0.0237 0.1295 0.0164 0.0241 0.6801 0.9462 0.9481 0.6798 0.9464 0.9522 0.1350 0.0105 0.0023 0.1351 0.0104 0.0011 WALD r BN WALD r BN 0.0376 0.0290 0.0273 0.0376 0.0290 0.0273 0.0356 0.0288 0.0269 0.0356 0.0288 0.0269 0.9268 0.9422 0.9458 0.9268 0.9422 0.9458 0.0116 0.0039 0.0021 0.0116 0.0039 0.0021 0.1920 0.0369 0.0239 0.1920 0.0369 0.0239 0.1412 0.0177 0.0264 0.1412 0.0177 0.0264 0.6668 0.9454 0.9497 0.6668 0.9454 0.9497 0.1416 0.0096 0.0012 0.1416 0.0096 0.0012 WALD r BN WALD r BN 0.0170 0.0247 0.0278 0.0170 0.0247 0.0278 0.0725 0.0395 0.0289 0.0728 0.0395 0.0287 0.9105 0.9358 0.9433 0.9102 0.9358 0.9435 0.0277 0.0074 0.0033 0.0279 0.0074 0.0032 0.1903 0.0367 0.0293 0.1908 0.0364 0.0251 0.1342 0.0157 0.0232 0.1342 0.0157 0.0234 0.6755 0.9476 0.9475 0.6750 0.9479 0.9515 0.1373 0.0105 0.0031 0.1375 0.0104 0.0009 WALD r BN WALD r BN 0.0053 0.0162 0.0353 0.0053 0.0162 0.0391 0.0000 0.0000 0.5154 0.0000 0.0000 0.0184 0.9947 0.9838 0.4493 0.9947 0.9838 0.9425 0.0224 0.0169 0.2503 0.0224 0.0169 0.0104 0.1363 0.0399 0.3558 0.1334 0.0228 0.0246 0.1809 0.0131 0.0352 0.1652 0.0094 0.0231 0.6828 0.9470 0.6090 0.7014 0.9678 0.9523 0.1336 0.0134 0.1705 0.1243 0.0089 0.0012 θ Method (-0.95, 0) Adjusted (-0.5, 0) Adjusted (0, 0) Adjusted (0.5, 0) Adjusted (0.95, 0) Adjusted 19 Table 6: Simulation results for Model 2 with and without adjustment (n = 60) WALD r BN WALD r BN ψ=α Lower Error 0.4460 0.1800 0.4660 0.0400 0.1340 0.0280 Upper Error 0.0000 0.0040 0.0260 0.0000 0.0040 0.0260 Central Coverage 0.5540 0.8160 0.5080 0.9600 0.8620 0.9460 Average Bias 0.2230 0.0880 0.2210 0.0200 0.0650 0.0020 ψ=σ Lower Error 0.0874 0.0732 0.3734 0.0874 0.0732 0.0279 Upper Error 0.0041 0.0069 0.0288 0.0041 0.0069 0.0339 Central Coverage 0.9085 0.9199 0.5978 0.9085 0.9199 0.9382 Average Bias 0.0417 0.0331 0.1761 0.0417 0.0331 0.0059 WALD r BN WALD r BN 0.1218 0.0808 0.0442 0.1156 0.0801 0.0285 0.0111 0.0150 0.0264 0.0111 0.0150 0.0264 0.8671 0.9042 0.9294 0.8733 0.9049 0.9451 0.0553 0.0329 0.0103 0.0688 0.0326 0.0024 0.0688 0.0553 0.0528 0.0522 0.0553 0.0284 0.0086 0.0117 0.0229 0.0086 0.0117 0.0241 0.9226 0.9330 0.9243 0.9226 0.9330 0.9475 0.0301 0.0218 0.0150 0.0301 0.0218 0.0022 WALD r BN WALD r BN 0.0467 0.0390 0.0251 0.0467 0.0390 0.0251 0.0288 0.0220 0.0267 0.0288 0.0220 0.0267 0.9245 0.9390 0.9482 0.9245 0.9390 0.9482 0.0127 0.0085 0.0009 0.0127 0.0085 0.0009 0.0577 0.0469 0.0277 0.0577 0.0469 0.0277 0.0091 0.0122 0.0237 0.0091 0.0122 0.0237 0.9332 0.9409 0.9486 0.9332 0.9409 0.9487 0.0243 0.0173 0.0020 0.0243 0.0173 0.0020 WALD r BN WALD r BN 0.0194 0.0286 0.0246 0.0194 0.0286 0.0246 0.0660 0.0333 0.0279 0.0660 0.0331 0.0275 0.9146 0.9381 0.9475 0.9146 0.9383 0.9479 0.0233 0.0060 0.0017 0.0233 0.0058 0.0014 0.0604 0.0493 0.0309 0.0604 0.0493 0.0296 0.0097 0.0127 0.0233 0.0097 0.0127 0.0235 0.9299 0.9380 0.9458 0.9299 0.9380 0.9469 0.0254 0.0183 0.0038 0.0254 0.0183 0.0031 WALD r BN WALD r BN 0.0059 0.0207 0.0283 0.0059 0.0207 0.0287 0.0511 0.0000 0.2510 0.0000 0.0000 0.0298 0.9430 0.9793 0.7207 0.9941 0.9793 0.9415 0.0226 0.0147 0.1147 0.0221 0.0147 0.0042 0.0677 0.0577 0.2221 0.0677 0.0577 0.0309 0.0073 0.0106 0.0291 0.0073 0.0106 0.0246 0.9250 0.9317 0.7488 0.9250 0.9317 0.9445 0.0302 0.0236 0.1006 0.0302 0.0236 0.0032 θ Method (-0.8, 1, 0) Adjusted (-0.5, 1, 0) Adjusted (0, 1, 0) Adjusted (0.5, 1, 0) Adjusted (0.9, 1, 0) Adjusted 20 5 Discussion and Conclusion This paper has investigated the performance of a third-order likelihood-based method against two commonly used first-order methods for inference concerning a moving average model of order 1. Based on the simulation studies, the third-order method outperformed the other methods based on the criteria considered. A bias correction was also used in this paper to handle boundary problems and was shown through simulations to be effective. The examples and simulations were implemented in Matlab and the code is available from the authors upon request. 21 References [1] Anderson, O.D., 1976, Time Series Analysis and Forcasting: The Box-Jenkins Approach, Butterworth & Co Publishers Ltd. [2] Barndorff-Nielsen, O.E., 1986, Inference on Full or Partial Parameters based on the Standardized Signed Log Likelihood Ratio, Biometrika 73, 307-322. [3] Barndorff-Nielsen, O.E., 1991, Modified Signed Log-likelihood Ratio Statistic, Biometrika 78, 557-563. [4] Chang, F., Wong, A., 2010, Improved Likelihood-Based Inference for the Stationary AR(2) Model, Journal of Statistical Planning and Inference 140, 2099-2110. [5] Doganaksoy, N., Schmee, J., 1993, Comparisons of Approximate Confidence Intervals for Distributions used in Life-data Analysis, Technometrics 35, 175-184. [6] Fraser, D.A.S., 1990, Tail Probabilities from Observed Likelihoods, Biometrika 77, 65-76. [7] Fraser, D.A.S., Reid, N., 1995, Ancillaries and Third Order Significance, Utilitas Mathematica 47, 33-53. [8] Fraser, D.A.S., Reid, N., Wong, A., 1991, Exponential Linear Models: A Two-pass Procedure for Saddlepoint Approximation, Journal of the Royal Statistical, Society Series 53(2), 483-492. [9] Fraser, D.A.S., Reid, N., Wu, J., 1999, A Simple General Formula for Tail Probabilities for Frequentist and Bayesian Inference, Biometrika 86, 249-264. [10] Galbraith, J.W., Zinde-Walsh, V., 1994, A Simple Noniterative Estimator for Moving Average Models, Biometrika 81, 143-155. [11] Montgomery, D.C., Johnson, L.A., 1976, Forecasting and Time Series Analysis, New York, McGraw-Hill. [12] Rekkas, M., Sun, Y., Wong, A.C.M, 2008, Improved Inference for First-Order Autocorrelation using Likelihood Analysis, Journal of Time Series Analysis 29, 513-532. 22 [13] Self, S.G., Liang, K., 1987, Asymptotic Propertities of Maximum Likelihood Estimator and Likelihood Ratio Tests under Nonstandard Conditions, Journal of the American Statistical Association 82, 605-601. [14] Shaman, P., 1969, On the Inverse of the Covariance Matrix of a First Order Moving Average, Biometrika 56, 595-600. [15] Shao, J., 2003, Mathematical Statistics (2nd). Springer-Verlag, New York. [16] Shapiro, A., 1985, Asymptotic Distribution of Test Statistics in the Analysis of Moment Structures under Inequality Constraints, Biometrika 72, 133-144. [17] Wald, A., 1943, Tests of Statistical Hypotheses Concerning Several Parameters when the Number of Observations is Large, Transactions of the American Mathematical Society 54, 426-482. [18] Wilks, S.S., 1938, The Large Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses, Annals of Mathematical Statistics 9, 60-62. [19] Wold, H., 1949, A Large-Sample Test for Moving Averages, Journal of the Royal Statistical Society B 11, 297-305. [20] Wong, A., 1992, Converting Observed Likelihood to Levels of Significance for Transformation Models, Communications in Statistics: Theory and Methods 21, 2809-2823. 23 A A.1 Appendix Derivation for L−1 : Recall that vt = εt + αεt−1 , where εt , t = 0, 1, · · · , n are normally distributed with mean 0 and variance 1. Denote σk = Cov(vt , vt±k ), k = 0, 1, · · · , n, the elements of the covariance matrix of {vt } possess the following form σ0 = E(ε2t ) + 2αE(εt εt−1 ) + α2 E(ε2t−1 ) = 1 + α2 , σ1 = E(εt εt−1 ) + αE(ε2t−1 ) + αE(εt εt−2 ) + α2 E(εt−1 εt−2 ) = α, σk = 0, k ≥ 2. The covariance matrix for {vt } of size n is then Σn = 1 + α2 α α 1 + α2 . .. . ··· 0 ··· 0 .. 0 .. . α 0 . α 1 + α2 Additionally, from Shanman (1969), we have Dn = |Σn | = 1 + α2 + α4 + · · · + α2n = 1 − α2(n+1) . 1 − α2 Assume the Cholesky decomposition for Σn is Σn = Ln L0n , then for n = 2, we have √ 1 + α2 0 , L2 = √ 2 4 1+α +α √ 1+α2 √ α 1+α2 L−1 2 = −√ √ 1 1+α2 α (1+α2 +α4 )(1+α2 ) 24 0 1 r 1+α2 +α4 1+α2 . For n = 3, we have √ L3 = L−1 3 1 + α2 0 q √ α 1+α2 1+α2 +α4 1+α2 q α 0 r 1+α2 +α4 1+α2 0 0 , 1+α2 +α4 +α6 1+α2 +α4 √ 1 1+α2 0 α −√ (1+α2 +α4 )(1+α2 ) = α2 √ 2 4 6 2 0 1 r (1+α +α +α )(1+α +α4 ) −√ 0 1+α2 +α4 1+α2 α(1+α2 ) (1+α2 +α4 +α6 )(1+α2 +α4 ) 1 r 1+α2 +α4 +α6 1+α2 +α4 . ij The above computation results indicate a trend for both Ln = {lij } and L−1 n = {l }, i = 1, · · · , n, j = 1, · · · , n, which is √ 1−α2i+2 √ i=j 2i √1−α 2j lij = √α 1−α i=j+1 , 2j+2 1−α 0 otherwise A.2 lij √ 2i √ 1−α 2i+2 1−α i−j 2j = √(−α) 2i+2(1−α )2i (1−α )(1−α ) 0 i=j i>j otherwise Derivation for pivot z: Recall that the model under investigation is X = µ + σv , v ∼ N (0, Σ), e e e e therefore, X ∼ N (µ, σ 2 Σ). A natural choice for the pivotal quantity is e e L−1 (X − µ) e e σ which is distributed as a multivariate standard normal with dimensionality n. 25