An Interesting Application of a Likelihood-Based Asymptotic Method M. Rekkas Y. She Y. Sun A. Wong Abstract Fraser (1990) discussed how to obtain statistical inference for a scalar parameter of interest from the likelihood function. Since then many authors have extended the method and applied it to various models. In this paper we consider the one-sample normal problem. Using the likelihood-based asymptotic method described in Fraser (1990), we obtain the p-value function for the mean parameter as well as the variance parameter. By re-expressing the results, we derive simple and accurate normal approximations to the Student t- and χ2 - cumulative distribution functions. Keywords: Canonical parameter; Exponential family model; Modified signed log-likelihood ratio statistic; p-value function. MSC classification codes: 62E20; 62F30 1 1 Introduction Fraser (1990) showed how the statistical significance for a scalar canonical parameter of interest from an exponential family model could be obtained from the log-likelihood function of the model. Since then, many authors have extended the method to various models; Reid (2003) gives a summary of some of the major results. It is interesting to note that the existing literature surrounding likelihood-based higher order methods, has rarely discussed or examined the approximation of the cumulative distribution of a test statistic. In this paper we apply a likelihood-based third order method to the one-sample normal problem. With a simple re-expression of the results, we obtain highly accurate approximations to the cumulative distribution functions of the Student t- and χ2 - distributions. In Section 2, we first review the approach set out by Fraser (1990) and then we extend the method for non-canonical parameters of interest. In Section 3, we apply our method to approximate the cumulative distribution functions of the Student t- and χ2 - distributions. Numerical comparisons of the proposed method with some standard approximations are presented in Section 4. General discussions are given in Section 5. 2 Main Results Let x = (x1 , . . . , xn ) be a random sample from a canonical exponential family model with log-likelihood function `(θ) = `(θ; x) = ψt + λs + k(θ), (2.1) where θ = (ψ, λ0 )0 is the p-dimensional canonical parameter, with ψ being the scalar parameter of interest and λ being the (p − 1) dimensional nuisance parameter. The minimal sufficient statistic is (t, s) = (t(x), s(x)). It is well known that θ̂, the maximum likelihood estimate, which satisfies ¯ ∂`(θ) ¯¯ `θ (θ̂) = =0 ∂θ ¯θ=θ̂ has mean θ and variance-covariance matrix given by i−1 θθ 0 (θ), where µ 2 ¶ ∂ `(θ) iθθ0 (θ) = −E[`θθ0 (θ)] = −E , ∂θ∂θ0 (2.2) (2.3) is the Fisher expected information matrix. As iθθ0 (θ) can be difficult to evaluate, it is often approximated by ¯ jψψ (θ̂) ∂ 2 `(θ) ¯¯ jθθ0 (θ̂) = − = ¯ 0 ∂θ∂θ θ=θ̂ jψλ (θ̂) jψλ0 (θ̂) jλλ0 (θ̂) , (2.4) which is the observed information matrix evaluated at θ̂. It is well known that, under moderate regularity, (θ̂ −θ)0 jθθ0 (θ̂)(θ̂ −θ) is asymptotically distributed as χ2 with p degrees of freedom. And for ψ = (1, 0, . . . , 0)θ, 2 we have (ψ̂ − ψ)0 [j ψψ (θ̂)]−1 (ψ̂ − ψ) which is asymptotically distributed as χ2 with 1 degree of freedom, where ψψ ψλ0 j (θ̂) j (θ̂) −1 . (2.5) jθθ 0 (θ̂) = 0 j ψλ (θ̂) j λλ (θ̂) Alternatively, we can present this result as q = (ψ̂ − ψ)[j ψψ (θ̂)]−1/2 , (2.6) where q is asymptotically distributed as standard normal. Standardizing the maximum likelihood estimate in this way is generally referred to as Wald’s method. Notice that this method is not parameterization invariant. An alternative likelihood-based method is based on the signed log-likelihood ratio statistic r = r(ψ) = sgn(ψ̂ − ψ){2[`(θ̂) − `(θ̂ψ )]}1/2 , (2.7) which is asymptotically distributed as standard normal. The vector θ̂ψ = (ψ, λ̂0ψ )0 denotes the constrained maximum likelihood estimate of θ for a given ψ which satisfies ¯ ∂`(θ) ¯¯ `λ (θ̂ψ ) = . ∂λ ¯θ=θ̂ψ (2.8) The signed log-likelihood ratio statistic is invariant to reparameterization. It is important to note that both Wald’s method and the signed log-likelihood ratio statistic method are first order methods, meaning they achieve O(n−1/2 ) accuracy. Based on the Wald and signed log-likelihood statistics, the p-value function for ψ, p(ψ), can be approximated by Φ(q) and Φ(r), respectively, where Φ() is the cumulative distribution function of the standard normal distribution. In the statistics literature, various likelihood-based small-sample asymptotic methods have been proposed that, in theory, achieve a high order of accuracy. One of these methods is given by Barndorff-Nielsen (1986, 1991) and is known as the modified signed log-likelihood ratio method, r∗ (ψ) = r(ψ) + 1 Q(ψ) log , r(ψ) r(ψ) (2.9) where r(ψ) is the signed log-likelihood ratio statistic as defined in (2.7) and Q(ψ) is a standardized maximum likelihood departure measured in a certain parameterization scale. Barndorff-Nielsen (1986, 1991) showed that r∗ (ψ) is asymptotically distributed as standard normal with accuracy O(n−3/2 ) provided that the Q(ψ) is specially chosen. Fraser (1990) showed that for the canonical exponential family model with canonical parameter θ = (ψ, λ0 )0 , Q(ψ) is the standardized maximum likelihood estimate of ψ where the estimated variance of ψ̂ takes into consideration the elimination of the nuisance parameter λ. More specifically, Fraser 3 (1990) showed that ( Q(ψ) = (ψ̂ − ψ) )1/2 |jθθ0 (θ̂)| |jλλ0 (θ̂ψ )| . (2.10) Now consider a general exponential family model, with ϕ = ϕ(θ) being the canonical parameter and ψ(θ) being the scalar parameter of interest. The quantity Q(ψ) has to be expressed in the canonical parameter, ϕ(θ), scale. Let ϕθ (θ) be the derivative of ϕ(θ) with respect to θ, and similarly, ϕλ (θ) is the derivative of ϕ(θ) with respect to λ. Then by chain rule differentiation, we have |jϕϕ0 (θ̂)| = |jθθ0 (θ̂)||ϕθ (θ̂)|−2 (2.11) |j(λλ0 ) (θ̂ψ )| = |jλλ0 (θ̂ψ )||ϕ0λ (θ̂ψ )ϕλ (θ̂ψ )|−1 . (2.12) and Let χ(θ) be a scalar linear version of ϕ(θ) that corresponds to dψ at θ = θ̂ψ . Then χ(θ) = ψϕ (θ̂ψ ) ||ψϕ (θ̂ψ )|| ϕ(θ), (2.13) 2 where ψϕ (θ) is the row of ϕ−1 θ (θ) that corresponds to ψ, and ||ψϕ (θ̂ψ )|| is the squared length of the vector ψϕ (θ̂ψ ). Note that χ(θ) can simply be viewed operationally as the scalar parameter of interest in the ϕ(θ) scale. Hence Q(ψ) can be re-written as ( Q(ψ) = sgn(ψ̂ − ψ)|χ(θ̂) − χ(θ̂ψ )| |jϕϕ0 (θ̂)| |j(λλ0 ) (θ̂ψ )| )1/2 . (2.14) Thus inference concerning ψ(θ) for a general exponential family model with canonical parameter ϕ(θ) can be obtained by using (2.14) in the r∗ (ψ) formula given in (2.9). In particular, the p-value function for ψ, p(ψ), can be approximated by Φ(r∗ (ψ)) with O(n−3/2 ) accuracy. An alternative method to improve the accuracy of approximating the p-value function from the likelihood ratio statistic is due to Lugannani and Rice (1980). The p-value function for ψ takes the following form: ½ ¾ 1 1 Φ(r(ψ)) + φ(r(ψ)) − , (2.15) r(ψ) Q(ψ) where r(ψ) and Q(ψ) are defined in (2.7) and (2.14) respectively, and φ() is the probability density function of the standard normal distribution. It is interesting to note that Barndorff-Nielsen’s method adjusts the statistic r(ψ) such that the p-value function obtained from r∗ (ψ) is close to the true p-value function; whereas Lugannani and Rice’s method adjusts the p-value function obtained from r(ψ) such that it is close to the true p-value function. Fraser (1990) showed that these two adjustments are equivalent up to O(n−3/2 ) accuracy. 4 3 Approximating the cumulative distributions functions of the tand χ2 - distributions In Section 2, the p-value function for a scalar parameter of interest from a general exponential family model was obtained. We now consider a random sample x = (x1 , . . . , xn ) from a normal distribution with mean µ and variance σ 2 . This model is an exponential family model with log-likelihood function given by `(θ) = `(µ, σ 2 ) = − n 1 X (xi − µ)2 . log σ 2 − 2 2 2σ (3.1) A convenient version of the canonical parameter is µ ¶0 1 µ ϕ(θ) = . , σ2 σ2 (3.2) It is easy to obtain the following results: µP 2 0 2 0 θ̂ = (µ̂, σ̂ ) = (x̄, (n − 1)s /n) = xi , n P (xi − x̄)2 n ¶0 and |jθθ0 (θ̂)| = n2 . 2σ̂ 6 (3.3) If the parameter of interest is the mean parameter, ψ(θ) = µ, it is well known that the exact p-value function of µ is p(µ) = Ftn−1 (t), (3.4) where Ftn−1 () is the cumulative distribution function of the Student t distribution with (n − 1) degrees of freedom and t= x̄ − µ √ . s/ n (3.5) By applying the method discussed in Section 2, we have the constrained maximum likelihood of θ µ P ¶0 (xi − µ)2 n 2 0 θ̂µ = (µ, σ̂µ ) = µ, . and |jσ2 σ2 (θ̂µ )| = n 2σ̂µ4 (3.6) The signed log-likelihood ratio statistic can then be simplified to ½ µ r(µ) = sgn(x̄ − µ) n log 1 + t2 n−1 ¶¾1/2 . (3.7) , (3.8) Moreover, with ϕθ (θ) = (ϕµ (θ), ϕσ2 (θ)) = 0 − σ14 1 σ2 − σµ4 Q(µ) can be simplified to Q(µ) = p n(n − 1) · 5 t (n − 1) + t2 ¸ (3.9) and r∗ (µ) can be obtained from (2.9). Thus the p-value function of µ, or equivalently the cumulative distribution function of the Student t distribution with (n − 1) degrees of freedom can be approximated by Φ(r∗ (µ)). Finally by re-indexing the above result, the cumulative distribution function for the Student t distribution with ν degrees of freedom can be approximated by using the Barndorff-Nielsen formula µ ¶ 1 Q Ftν (t) = Φ r + log , r r (3.10) or by using the asymptotically equivalent Lugannani and Rice formula ¾ ½ 1 1 − , Ftν (t) = Φ(r) + φ(r) r Q (3.11) where ½ µ ¶¾1/2 t2 r = sgn(t) (ν + 1) log 1 + ν and Q= p µ ν(ν + 1) t ν + t2 ¶ , with O(n−3/2 ) accuracy. Consider now the case of a random sample x = (x1 , . . . , xn ) from the normal distribution with mean 0 and variance σ 2 for which the parameter of interest is the variance parameter, ψ(θ) = σ 2 . A convenient canonical parameter is ϕ(θ) = 1/σ 2 . The exact p-value function of σ 2 is given by p(σ 2 ) = Fχ2n (x2 ), (3.12) where Fχ2n () is the cumulative distribution function of the χ2 distribution with n degrees of freedom and Pn 2 i=1 x = x2i /n σ2 . (3.13) By applying the method is Section 2, we can obtain 2 r(σ ) Q(σ 2 ) r n = sgn(x − n) (x2 − n) + n log 2 x x2 − n √ = 2n 2 (3.14) (3.15) and r∗ (σ 2 ) can be obtained from (2.9). Thus the p-value function of σ 2 , or equivalently the cumulative distribution function of the χ2 distribution with n degrees of freedom can be approximated by Φ(r∗ (σ 2 )). In other words, the cumulative distribution function of the χ2 distribution with ν degrees of freedom can be approximated by using the Barndorff-Nielsen formula µ ¶ 1 Q Fχ2ν (x2 ) = Φ r + log , r r 6 (3.16) or by using the asymptotically equivalent Lugannani and Rice formula ½ ¾ 1 1 2 Fχ2ν (x ) = Φ(r) + φ(r) − , r Q (3.17) where h ν i1/2 r = sgn(x2 − ν) (x2 − ν) + ν log 2 x (3.18) x2 − ν Q= √ 2ν (3.19) and with O(n−3/2 ) accuracy. Note that since the cumulant generating function of the chi-square distribution is known, we can directly apply the saddlepoint method and the Lugannani and Rice formula to obtain the following results: ¶ µ 2 x −ν h 2 ν i1/2 rSP = sgn (x − ν) + ν log 2x2 x2 x2 − ν √ QSP = 2ν (3.20) (3.21) and ½ Fχ2ν (x2 ) = Φ(rSP ) + φ(rSP ) 1 rSP − 1 QSP ¾ . (3.22) Interestingly, the Q given in (3.19) is identical to the QSP given in (3.21). Moreover, since x2 > 0, we have ³ 2 ´ , which implies that the r given in (3.18) is also identical to the rSP given in (3.20). sgn(x2 − ν) = sgn x2x−ν 2 Thus for approximating the cumulative distribution function of the χ2 distribution, the proposed method and the direct saddlepoint method give exactly the same results. 4 Numerical comparisons To illustrate the accuracy of our proposed method, we compare it with some recent approximations. Jing et al. (2004) applied the saddlepoint method without using the moment generating functions to approximate the cumulative distribution function of the Student t-distribution function. While their approximation is based on the Lugannani and Rice result, the exact form of their result is very complicated. They provide numerical results for comparing their approximations with the exact Student t-distribution with 5 degrees of freedom. Their results are reported for the survivor function rather than the cumulative distribution function. Table 1 contains the results from Jing et al. (2004) and the results of our approximations using both the Barndorff-Nielsen (BN) and Lugannani and Rice (LR) formulas given by equations (3.10) and (3.11), respectively. In Figure 1 we plot the relative errors of the three approximations. From Table 1 and Figure 1, we observe that the Jing et al. (2004) method and our methods are almost indistinguishable around 7 the center of the distribution, but our approximations are much better towards the tail of the distribution which is crucial for inference purposes. Interestingly, the Lugannani and Rice formula seems to provide a better approximation than the Barndorff-Nielsen one. In Figure 2 we plot our proposed approximations for the extreme case of the Student t-distribution with 1 degree of freedom. Even for this extreme case, our approximations give remarkably accurate approximations, especially so using the Lugannani and Rice approximation. Table 1: Comparisons of exact and approximate values for 1 − Ft5 (t) t Exact Jing BN LR 0.1001 0.4620 0.4621 0.4618 0.4618 0.2010 0.4243 0.4244 0.4238 0.4238 0.3034 0.3869 0.3872 0.3861 0.3861 0.4082 0.3500 0.3505 0.3489 0.3490 0.5164 0.3138 0.3146 0.3125 0.3126 0.6290 0.2785 0.2797 0.2771 0.2771 0.7473 0.2443 0.2460 0.2427 0.2427 0.8729 0.2113 0.2136 0.2097 0.2097 1.0078 0.1799 0.1829 0.1782 0.1783 1.1547 0.1502 0.1539 0.1485 0.1486 1.3171 0.1225 0.1268 0.1208 0.1209 1.5000 0.0970 0.1010 0.0954 0.0955 1.7107 0.0739 0.0793 0.0725 0.0727 1.9604 0.0536 0.0592 0.0524 0.0525 2.2678 0.0363 0.0417 0.0353 0.0355 2.6667 0.0223 0.0271 0.0215 0.0217 3.2271 0.0116 0.0154 0.0112 0.0113 4.1295 0.0045 0.0070 0.0043 0.0044 6.0849 0.0009 0.0018 0.0008 0.0008 Lin (1988) provides a very simple approximation to the cumulative distribution of the chi-square distribution. Figures 3 and 4 plot approximations to the cumulative distribution function Fχ2ν (x2 ) for ν = 5 and 1, respectively. We observe that the Lin (1988) approximation is not at all satisfactory. We also observe that, even for the extreme case of the χ2 distribution with 1 degree of freedom, the proposed approximations give remarkably accurate approximations. 8 60 40 Jing BN LR 0 20 relative error (percentage) 80 100 Figure 1: Relative error for approximations to 1-Ft5 (t) 0 1 2 3 4 5 6 t 0.6 0.2 0.4 Exact BN LR 0.0 cdf for t with 1 df 0.8 1.0 Figure 2: Approximations to Ft1 (t) −20 −10 0 t 9 10 20 0.6 0.4 Exact BN LR Lin 0.0 0.2 cdf for chi−square with 5 df 0.8 1.0 Figure 3: Approximations to Fχ25 (χ2 ) 0 5 10 15 x2 0.6 0.4 Exact BN LR Lin 0.2 cdf for chi−square with 1 df 0.8 1.0 Figure 4: Approximations to Fχ21 (χ2 ) 0 2 4 6 x2 10 8 10 5 Conclusion When the likelihood-based asymptotic method is proposed and then applied to the one sample normal problem, the results can easily be transformed to produce a simple and yet highly accurate normal approximations to the cumulative distribution functions of the t- and χ2 -distributions. A characteristic of these approximations is their known O(n−3/2 ) accuracy and resultant expressions which are remarkably simple. References [1] Barndorff-Nielsen, O., 1986, Inference on Full or Partial Parameters Based on the Standardized Signed Log Likelihood Ratio, Biometrika 73, 307-22. [2] Barndorff-Nielsen, O., 1991, Modified Signed Log-Likelihood Ratio, Biometrika 78,557-563. [3] Fraser, D., 1990, Tail Probabilities from Observed Likelihood, Biometrika 77, 65-76. [4] Jing, B.Y., Shao, Q.M., and Zhou, W., 2004, Saddlepoint Approximation for Student t-Statistic With No Moment Conditions, Annals of Statistics 32, 2679-2711. [5] Lin, J.T., 1988, Approximating the Cumulative Chi-square Distribution and Its Inverse, The Statistician 37, 3-5. [6] Lugannani, R., and Rice, S., 1980, Saddlepoint Approximation for the Distribution of the Sums of Independent Random Variables, Advances in Applied Probability 12, 475-490. [7] Reid, N., 2003, Asymptotics and the Theory of Inference, Annals of Statistics 31, 1695-1731. 11 Contact Addresses: Marie Rekkas Department of Economics Simon Fraser University 8888 University Drive Burnaby, BC V5A 1S6 Canada Y. She Department of Mathematics and Statistics York University 4700 Keele Street Toronto, ON M3J 1P3 Canada Ye Sun Department of Statistics University of Toronto 100 St. George Street Toronto, ON M5S 1G1 Canada Augustine Wong (Corresponding Author) Department of Mathematics and Statistics York University 4700 Keele Street Toronto, ON M3J 1P3 Canada Rekkas and Wong gratefully acknowledge the support of the National Sciences and Engineering Research Council of Canada. 12