On the Second Order Properties of Empirical Likelihood with Moment Restrictions

On the Second Order Properties of Empirical Likelihood with Moment Restrictions1 Song Xi Chen1 and Hengjian Cui2 1 2 Department of Statistics, Iowa State University and Department of Mathematics, Beijing Normal University Abstract. This paper considers the second order properties of empirical likelihood for a parameter defined by moment restrictions, which is the framework operated upon by the Generalized Method of Moments. It is shown that the empirical likelihood defined for this general framework still admits the delicate second order property of Bartlett correction, which represents a substantial extension of all the established cases of Bartlett correction for the empirical likelihood. An empirical Bartlett correction is proposed, which is shown to work effectively in improving the coverage accuracy of confidence regions for the parameter. Key words: Bartlett correction, Coverage accuracy, Empirical likelihood, Generalized Method of Moments. 1. Introduction Generalized Method of Moments (GMM) introduced by Hansen (1982) is an important inferential framework in econometric studies. GMM is based on, upon given a model, some known functions g(X, θ) of a random observation X ∈ Rd and an unknown parameter θ ∈ Rp , where g : Rd+p → Rr , such that E{g(X, θ)} = 0 which constitutes moment restrictions on the relationship between X and θ. The power of GMM is in its allowing r ≥ p, namely the 1 The research of this paper was supported by a National University of Singapore Academic Research Grant (R-155-000-018-112) and a RFDP of China grant (20020027010). The authors thank Dr Weidong Lin for computational support, who was supported under the National University of Singapore Academic Research Grant. 1 number of moment restrictions (instruments) can be larger than the number of parameter, which leads to a full exploration of inference opportunities provided by the given model. There is a vast pool of literatures on GMM. Here we only cite the latest reviews of Andrews (2002), Brown and Newey (2002), Imbens (2002) and Hansen and West (2002). Empirical likelihood (EL) introduced by Owen (1988) is a computer-intensive statistical method that facilitates a likelihood-type inference in a nonparametric or semiparametric setting. It is closely connected to the bootstrap as the EL effectively carries out the resampling implicitly. On certain aspects of inference, EL is more attractive than the bootstrap, for instance its ability of internal studentizing so as to avoid explicit variance estimation and producing confidence regions with natural shape and orientation; see Owen (2001) for an overview of EL. A key property of EL is that the log EL ratio is asymptotically chi-squared distributed, which resembles the Wilks’ theorem in parametric likelihood. The Wilks’ theorem was established in the original proposal of Owen (1988) for the means, in Hall and La Scala (1990) for smoothed function of means, Qin and Lawless (1994) for parameters defined by moment restrictions and Kitamura (1997) for weakly dependence observations. There have been comprehensive studies of EL in the context of GMM in econometrics. Imbens (1997) shows that the maximum EL estimator of θ is a one-step variation of the twostage GMM estimator in the over-identified case of r > p, and achieves the same asymptotic efficiency as the two-stage estimator. Testing is considered in Kitamura (2001) for moments restrictions, and Tripathi and Kitamura (2002) for conditional moment restrictions. Estimation and testing with conditional moment restrictions are studied in Donald, Imbens and Newey (2003) and Kitamura, Tripathi and Ahn (2002). They found that EL posses the attractive features of avoiding estimating optimal instruments and achieving asymptotic pivotalness. Tilted EL and other variations are studied in Kitamura and Stutzer (1997), Smith (1997) and Newey and Smith (2004). In particular, Newey and Smith (2004) find that the EL estimator is favorable in terms of the bias and the second order variance in comparison with the GMM estimator. 2 Another key property of the EL is Bartlett correction, which is a delicate second order property implying that a simple mean adjustment to the likelihood ratio can improve the approximation to the limiting chi-square distribution by one order of magnitude and hence can be used to enhance the coverage accuracy of likelihood-based confidence regions. In the context of testing hypotheses, the Bartlett correction reduces the errors between the nominal and actual significant levels of an EL test. Bartlett correction has been established for EL by DiCiccio, Hall and Romano (1991) for smoothed functions of means and Chen (1993, 1994) for linear regression. Baggerly (1998) shows that EL is the only member within the Cressie-Read power divergence family that is Bartlett correctable. Jing and Wood (1996) reveal that the exponentially tilted EL for the means is not Bartlett correction as the tilting alters the delicate second order mechanism of EL. In this paper we show that the EL with moment restrictions is still Bartlett correctable. The finding represents a substantial extension to the established cases of Bartlett correction, which almost all of them assume r = p corresponding to the just-identified case in GMM. The establishment of the Bartlett correction for the just-identified case is a lot easier as the log maximum EL takes a constant value −n log(n) (n is the sample size). However, in the over-identified case the maximum EL is no longer a constant, rather it introduces many extra terms into the log EL ratio and makes the study of Bartlett correction far more challenging as can be seen from the analysis carried out in this paper. The establishment of Bartlett correction in this general case indicates that EL inherits the delicate second order mechanism of the parametric likelihood in a much wider situation. This together with the findings of Imbens (1997), Kitamura (2001) and Newey and Smith (2004) and others suggests that the EL is an attractive inferential tool in the context of moment restrictions. The establishment of the Bartlett correction leads to a practical Bartlett correction, which is confirmed to work effectively for coverage restoration in our simulation studies reported in Section 4. The paper is organized as follows. Section 2 provides an expansion for the log EL ratio for parameters defined by moment restrictions. Bartlett correction and coverage errors 3 assessment of EL confidence regions are investigated in Section 3. Simulation results are reported in Section 4, followed by a general discussion in Section 5. All technical details are left in the appendix. 2. EL for generalized moment restrictions Let X1 , X2 , · · · , Xn be d−dimensional independent and identically distributed random sample whose distribution depends on a p−dimensional parameter θ which takes values in a compact parameter space Θ ⊆ Rp . The information about θ is summarized in the form of r ≥ p unbiased moment restrictions g j (x, θ), j = 1, 2, · · · , r, such that E[g j (X1 , θ0)] = 0 for a unique θ0 , which is the true value of θ. Let g(X, θ) = (g 1(X, θ), g 2 (X, θ), · · · , g r (X, θ))T and V = V ar{g(X1 , θ0)}. We assume the following regularity conditions: (2.1) (i) V is a r × r positive definite matrix and the rank of E[∂g(X1, θ0 )/∂θ] is p; (ii) For any j, 1 ≤ j ≤ p, all the partial derivatives of g j (x, θ) up to the third order with respect to θ are continuous in a neighborhood of θ0 and are bounded by some integrable functions respectively in the neighborhood; (iii) lim sup|t|→∞ |E[exp{itT g(X1 , θ0 )}]| < 1 and Ekg(X1 , θ0 )k15 < ∞. Conditions (i) and (ii) are standard requirements for establishing the Wilks’ theorem and higher order Taylor expansions of the EL ratio. The first part of the condition (iii) is just the Cramér0 s condition on the characteristic function of g(X, θ0 ). It and the requirement that Ekg(X, θ0 )k15 < ∞ are required for establishing the Edgeworth expansion. To facilitate simpler expressions, we transform g(Xi , θ) to wi (θ) = T V −1/2g(Xi , θ) where T is a r × r orthogonal matrix such that (2.2) TV Here U = ukl p×p −1/2 E ∂g(X i , θ0 ) ∂θ U= Λ, 0 τ . r×p is an orthogonal matrix and Λ = diag(λ1 , · · · , λp ) is non-singular. 4 Let p1 , p2 , · · · , pn be non-negative weights allocated to the observations. The EL for θ as proposed in Qin and Lawless (1994) is L(θ) = Pn i=1 Qn i=1 pi subject to Pn i=1 pi = 1 and pi wi (θ) = 0. Let `(θ) = −2 log{L(θ)/nn }. Standard derivations in EL show `(θ) = 2 n X log{1 + λT (θ)wi (θ)} i=1 where λ = λ(θ) is the solution of n−1 Pn wi (θ) i=1 1+λT wi (θ) = 0. According to Qin and Lawless (1994), the maximum EL estimator θ̂ and its corresponding λ, denoted as λ̂, are solutions of Q1n (λ, θ) = n−1 (2.3) (2.4) Q2n (λ, θ) = n −1 n X i=1 wi (θ) = 0 and 1 + λT wi (θ) n X (∂wi(θ)/∂θ)T λ = 0. 1 + λT wi (θ) i=1 Then the log EL ratio is r(θ) = `(θ) − `(θ̂). In the following we are to develop expansions to `(θ0 ) and `(θ̂) respectively. To expand `(θ0 ), define αj1 ...jk = E{wij1 (θ0 )...wijk (θ0)} and Aj1 ...jk = n−1 n X i=1 wij1 (θ0 )...wijk (θ0) − αj1 ...jk . j Here we use a to denote the j-th component of a vector a. Then, it may be shown that n−1 `(θ0 ) = Aj Aj − Aji Aj Ai + 23 αjih Aj AiAh + Aji Ahi Aj Ah + 23 Ajih Aj AiAh (2.5) − 2αjih Agh Aj AiAg + αjgf αihf Aj Ai Ah Ag − 12 αjihg Aj AiAh Ag + Op (n−5/2 ). We use here a convention where if a superscript is repeated a summation over that superscript is understood. This expansion has the same form as DiCiccio, Hall and Romano (1991) for the mean parameter when r = p and Chen (1993) for linear regression. It is quite challenging to expand `(θ̂) in the general case of r > p. Two new systems of notations are introduced to facilitate the expansion. Let η = (λ, θ), Q(η) = (Qτ1n (η), Qτ2n(η))τ , τ S21 = U (Λ, 0) and S12 = S21 . Due to the early transformation in (2.2), S =: E ∂Q(0, θ0) ∂η 5   =  −I S12  S21 0 . Put Γ(η) = S −1 Q(η). Now we can introduce the notations involving Γ(η) and their derivatives β j,j1 ...jk = E ∂ k Γj (0, θ 0) ∂ηj1 ...∂ηjk and B j,j1 ...jk = n 1X ∂ k Γj (0, θ0 ) − β j,j1 ...jk n i=1 ∂ηj1 ...∂ηjk and the notations involving wi (θ) and their derivatives ∂ n wip (θ0) ∂ l wij (θ0) ∂ m wik (θ0 ) ... and ∂θj1 ...∂θjl ∂θk1 ...∂θkm ∂θp1 ...∂θpn n 1X ∂ lwij (θ0) ∂ mwik (θ0) ∂ nwip (θ0) = ... − γ j,j1 ...jl ;k,k1 ...km ;...;p,p1...pn . n i=1 ∂θj1 ...∂θjl ∂θk1 ...∂θkm ∂θp1 ...∂θpn γ j,j1 ...jl ;k,k1 ...km ;...;p,p1...pn = E C j,j1 ...jl ;k,k1 ...km ;...;p,p1...pn Since η̂ = (λ̂, θ̂) is the solution of Γ(η̂) = 0, by inverting this equation, we have for j, k, l, m ∈ {1, 2, · · · , r + p}, η̂ j − η0j = −B j + B j,k B k − 12 β j,klB k B l − B j,k B k,l B l + 12 β k,lm B j,k B lB m + β j,kl B k,m B m B l (2.6) − 1 j,kl k,mn m n l β β B B B 2 − 12 B j,kl B k B l + 16 β j,klm B k B l B m + Op (n−2 ). Note that (2.6) contains expansions for λ̂j when j ≤ r and for θ̂j when j > r respectively. Note that (2.7) `(θ̂) = 2 n X i=1 {λ̂T wi (θ̂) − 12 [λ̂T wi (θ̂)]2 + 13 [λ̂T wi (θ̂)]3 − 41 [λ̂T wi(θ̂)]4} + Op (n−3/2). From now on, we fix the ranges of the superscripts a, b, c, d ∈ {1, 2, · · · , r − p}, f, g, h, i, j ∈ {1, 2, · · · , r}, k, l, m, n, o ∈ {1, 2, · · · , p} and q, s, t, u ∈ {1, 2, · · · , r + p}. It is shown in Appendix 2 by substituting (2.6) into (2.7), that n−1 l(θ̂) = −2B j Aj − B j B j + 2C i,k B i B r+k,q B q + 12 β j,uq β r+k,st γ j,k B u B q B s B t − β j,uq B u B q B r+k,s B s γ j,k − β r+k,uq B u B q C i,k B i − B j B i Aji − 23 αjih B j B i B h + 2C j,k {B j B r+k − B j,pB p B r+k [2, j, r + k] + 12 β j,uq B u B q B r+k [2, j, r + k]} + γ j,kl {−B j B r+k B r+l + B j B r+k B r+l,q B q [3, j, r + k, r + l] − 1 j,uq r+k r+l u q β B B B B [3, j, r 2 + k, r + l]} − C j,kl B j B r+k B r+l − 23 Ajih B j B iB h − B j,u B u B j,q B q − 14 β j,uq β j,st B u B q B s B t + β j,uq B u B q B j,s B s + 2γ j;i;h,k B j B iB h B r+k 6 + B j B i,q B q Aji[2, j, i] − 12 β j,uq B u B q B i Aji [2, j, i] + 13 γ j;k,lm B j B r+k B r+l B r+m + 2γ j;i,l {B j B iB r+l − B j B i B r+l,q B q + 21 β r+l,uq B j B i B u B q − B r+l B i B j,q B q [2, j, i] + (2.8) 1 j,uq u q i r+l β B B B B [2, j, i]} 2 + 2B j B iB r+l C j;i,l − (γ j;i,lk + γ j,l;i,k )B j B i B r+l B r+k + 2αjih B j B i B h,q B q − αjih β j,uq B u B q B i B h − 21 αjihg B j B i B h B g + Op (n−5/2 ) where [2, j, i] indicates there are two terms by exchanging the super-scripts i and j, and the same is understood for [3, i, j, k]. Expansion (2.8) for `(θ̂) is much more complicated than the just-identified case of r = p. In that case, all the B j = 0 from a result established in (A.1), which means `(θ̂) = 0 and r(θ0 ) = `(θ0 ). This is the situations of almost all the existing studies on Bartlett corretability of the EL. When r > p, the expansion of `(θ̂) contains more terms than that of `(θ0 ), which increases substantially the difficulty of the second order study. Combining (2.5) and (2.8), and carrying out further simplifications, n−1 r(θ0 ) = AlAl − Akl Ak Al − 2Al + 2αkl + 2 αl p+a p+a Ap+a Al + 32 αklm Ak AmAl + 2ω kl C p+a,k Ap+a Al − γ p+a,mn wmk wnl Ap+aAk Al + Aji (Ahi Aj Ah − B i,q B q B j [2, i, j]) p+a p+b − γ p+a;p+b,k ω kl Ap+a Ap+bAl + B j,u B j,q B u B q + 2C j,k B j,q B r+k B q − γ j,kl B r+k B r+l B j,q B q − 2γ j,kl B j B r+l B r+k,q B q − 2αjih B j B i B h,q B q + 2γ j;i,l (B j B iB r+l,q B q + B r+l B iB j,q B q [2, j, i]) + (γ j;i,lk + γ j,l;i,k )B j B i B r+l B r+k + ( 41 β j,uq β j,st − 12 β j,uq β r+k,st γ j,k )B u B q B s B t + (αjih β h,uq − γ j;i,l β r+l,uq )B j B i B u B q + (γ j,kl β r+l,uq − γ i;j,k β i,uq − γ j;i,k β i,uq )B u B q B j B r+k + 12 γ j,kl β j,uq B u B q B r+l B r+k − 1 j,klm j r+k r+l r+m γ B B B B 3 − 2γ j;i;h,k B j B i B h B r+k + 12 αjihg B j B iB h B g + C j,kl B j B r+k B r+l − 2C j;i,l B j B i B r+l + 23 Ajih (B j B i B h + Aj AiAh ) (2.9) − 2αjih Agh Aj Ai Ag + αjgf αihf Aj AiAh Ag − 12 αjihg Aj Ai Ah Ag + Op (n−5/2). This expansion leads to the following signed root decomposition n−1 r(θ0) = (R1 + R2 + R3 )j (R1 + R2 + R3 )j + Op (n−5/2) 7 where Ri = Op (n−i/2 ) for i = 1, 2 and 3. Clearly, the terms appeared in the first two lines of (2.9) fully determine R1 and R2 , namely Rl1 = Al , (2.10) Rl2 = − 12 AklAk − Al (2.11) + [αkl p+a p+a Ap+a + 13 αklm Ak Am + ω kl C p+a,k Ap+a − 21 γ p+a,mn wmk ω nl ]Ap+aAk + [αl p+a,p+b − γ p+a;p+b,k ω kl ]Ap+aAp+b and Rj1 = Rj2 = 0 for j ∈ {p + 1, · · · , r}. An expression for Rl3 is given in Appendix 2. d From (2.9), r(θ0 ) = nAl Al + op (1) which means that r(θ0 ) → χ2p and leads to an EL confidence region for θ with nominal confidence level 1 − α: Iα = {θ|r(θ) ≤ cα } where cα is the upper α-quantile of χ2p distribution. 3. The second order properties The coverage accuracy of the the EL confidence region Iα is evaluated in the following theorom. Theorem 1. Under conditions (2.1), P {r(θ0 ) < cα} = α − n−1 p−1 Bc cα fp (cα ) + O(n−2 ) where fp (·) is the density of χ2p distribution, Bc = p−1 ( defined in (A.18). Pp l=1 ∆ll + 1 lkk lmm α α ) 36 and ∆ll is Theorem 1 indicates that the coverage error of the EL confidence region Iα is O(n−1 ), which is the same order as a standard two sided confidence region based on the asymptotic normality of θ̂. The attractions of the EL confidence region are (i) there is no need to carry out any secondary estimation procedure in formulating the confidence region whereas the covariance matrix has to be estimated for the confidence region based on the asymptotic normality; and (ii) the shape and the orientation of the region are naturally determined by the likelihood surface, free of any subjective intervention. 8 Theorem 2. Under conditions (2.1), P {r(θ0) < cα (1 + n−1 Bc )} = α + O(n−2 ). The theorem shows that the Bartlett correction is maintained by the EL for the situation of general moment restrictions, despite that r may be larger than p and `(θ̂) has a rather complex expression. This indicates that the EL is resilient in sharing this delicate second order property with a parametric likelihood and the existence of certain internal mechanism in the EL that resembles that of the parametric likelihood. It can be seen from Appendix A.4. that the Bartlett factor Bc has a rather involved expression for a general over-identified case of r > p due to the lengthy expressions of ∆ ll. However, it admits simpler expression in two special situations. One is in the situation of just-identified moment restrictions with r = p. It may be easily checked from (A.18) that Bc = p−1 1 llkk α 2 − 13 αlkm αlkm , which is the Bartlett factor obtained in DiCiccio, Hall and Romano (1991) for smooth function of means and Chen (1993) for linear regression. The other situation is when r > p, but (i) Cov{g j (X, θ0 ), g p+a (X, θ0 )} = 0 for any j ≤ p and a ≤ r − p and (ii) g j (x, θ) = g j (x) does not depend on θ for j = p + 1, · · · , r. The assumption (i) means that the first p estimating equations are uncorrelated with the last r − p estimating equations at θ 0 and (ii) means that the last r − p estimating equations are free of parameters. In this case, Bc = p−1 1 llkk α 2 + αll p+a p+a − 13 αlkm αlkm − αllk αkp+a p+a − αlf p+a lf p+a α . e =: 1 + B n−1 To practically implement the Bartlett correction in a general situation, B c c e can be obtained by has to be estimated. It is noted that the direct plug-in estimator of B c substituting all the populations moments involved by their corresponding sample moments. e , we propose using the bootstrap to However, considering the rather lengthy forms of B c estimate Bec , which is based on the fact that (3.1) e + O(n−2 ) E{r(θ0 )} = p(1 + Bc n−1 ) + O(n−2 ) = pB c 9 by combining the expressions of E(Rli Rlj ) given in Appendix 3. The bootstrap procedure is Step 1: generate a bootstrap resample {Xi∗ }ni=1 by sampling with replacement from the original sample {Xi }ni=1 and compute r∗ (θ̂) = `∗ (θ̂) − `∗(θ̂∗ ), where `∗ and θ̂∗ are respectively the Log EL ratio and the maximum EL estimate based on the resample; Step 2: for a large integer N , repeat Step 1 N times and obtain r ∗1 (θ̂), · · · , r∗N (θ̂). As N −1 PN b=1 e is r∗b (θ̂) estimates E{r(θ0 )}, a bootstrap estimate of B c ê B c = (N p) −1 N X r∗b (θ̂). b=1 Let Xn = {X1 , · · · , Xn } be the original sample. It may be shown by standard bootstrap arguments, for instance those given in Hall (1992), that (3.2) ê |X } = (1 + B n−1 ){1 + O (n−1/2)} E{B c n c p √ which means that the bootstrap estimate of Bec is n-consistent. Now a practical Bartlett ê }. It can be shown from Theorem 2 and corrected confidence region is Iα,bc = {θ|r(θ) ≤ cα B c (3.2) that the coverage error of Iα,bc is O(n−3/2 ), which improves that of Iα. e naturally leads The above use of the bootstrap to estimate the Bartlett factor Bc or B c ones to think of using the bootstrap to calibrate directly on the distribution of the EL ratio ∗1 ∗2 ∗N r(θ0 ). Let us rank {r∗i (θ̂)}N (θ̂). Then a direct booti=1 such that r (θ̂) ≤ r (θ̂) ≤ · · · ≤ r strap confidence interval with a nominal level 1 − α is Iα,bt = (r∗[αN/2]+1(θ̂), r∗[(1−α)N/2]+1(θ̂)) where [·] is the integer truncation operator. The cumulants and expansions which are quite expensively derived for the purpose of establishing Bartlett correction are needed in assessing the coverage accuracy of the direct Bootstrap confidence interval Iα,bt. It may be shown that under conditions (2.1),2 (3.3) 2 P (θ0 ∈ Iα,bt) = α + O(n−3/2 ) We would not provide the proof here due to a limited space. It can be carried out by taking a similar route as in Hall (1992) and utilizing the Edgeworth expansion established in the proof of Theorem 1. 10 which indicates that the coverage error of the bootstrap confidence interval Iα,bt and the Bartlett corrected interval Iα,bc is at the same order. This is indeed confirmed by our simulation studies reported in the next section, although we observe that the performance of the Bartlett corrected interval is more robust. 4. Simulation Results We report in this section results of two simulation studies designed to confirm the theoretical finding of Bartlett correction of the EL by implementing the proposed empirical Bartlett correction. For comparison purposes, the bootstrap confidence intervals I α,bt is also evaluated. In the first simulation study, X1 , · · · , Xn are independent and identically N (θ, θ 2 + 1) distributed, as considered in an example of Qin and Lawless (1994). The relationship between the mean and variance leads to moment restrictions: g1 (X1 , θ) = X1 − θ and g2 (X1 , θ) = X12 − 2θ − 1. This is an over-identified case as there are two moment restrictions and one parameter of interest, i.e. r = 2 and p = 1. Like Qin and Lawless, the value of θ is chosen to be 0 and 1 respectively. The sample size used in the simulation study is n = 20, 30, 40 and 50 respectively. In the second simulation study, we consider the following autoregressive panel data model, which is an example considered in Brown and Newey (2002) (4.1) Xit = θXit−1 + αi + it, Xi0 = αi + ei , 1−ρ for t = 1, · · · , 4 and i = 1, · · · , n, where |θ| < 1, {it}4t=1 and αi are mutually independent standard normal random variables, vi ∼ N (0, (1 − θ2 )−1 ) and independent of {it}4t=1 and αi . Let Xi = (Xi1 , . . . , Xi4 ). The moment restrictions after taking time differencing are g1 (Xi , θ) = Xi1 (∆Xi3 − θ∆Xi2), g2 (Xi , θ) = Xi1 (∆Xi4 − θ∆Xi3) and g3 (Xi , θ) = Xi2 (∆Xi3 − θ∆Xi2 ) where ∆Xit = Xit −Xit−1 . It is easy to check from model (4.1) that E{gj (Xi , θ)} = 0. Hence, there are three constraints and one parameter, i.e. r = 3 and p = 1, another overidentified case. The parameter θ, which is the autoregressive coefficient is assigned values 11 of 0.5 and 0.9 to obtain different levels of correlations. The sample size is chosen at n = 50 and 100 respectively. In both simulation studies, the empirical coverage and length of the EL, Bartlett corrected EL and the direct bootstrap calibrated intervals are evaluated with nominal coverage levels of 90% and 95% respectively. The bootstrap resample size N used in the Bartlett correction is 250 and the number of simulation is 1000. Tables 1 and 2 contain the empirical coverage and the averaged length of the three types of confidence intervals, which can be summarized as follows. First of all, the need for carrying out the second order correction to the EL confidence interval Iα is quite obvious as the original EL interval has quite severe under coverage for all the cases considered even for a sample size of 100 for the panel data model. The under coverage is particularly severe when the sample size is small for the normal mean model N (1, 2) and for the panel data model, These are the situations where the Bartlett correction is needed. It is observed that in all the cases considered the Bartlett correction improves significantly the coverage of I α . The restoration of coverage by the Bartlett correction is very impressive. We also observed that, as anticipated in (3.3), the direct bootstrap confidence interval has similar performance with the Bartlett corrected intervals in most of the cases. However, in the normal mean models with 90% nominal coverage level, the coverage of the direct bootstrap intervals is not as good as the BC. The robust performance of the Bartlett corrected interval may be due to the fact that the estimation of the Bartlett factor B̃c , which involves simple bootstrap averaging, is more robust than the bootstrap estimation of the extreme quantiles of the distribution of r(θ̂). We also observed in passing that as the sample size increases the EL interval Iα improves both in its coverage and length whereas the improvemnet of the Bartlett intervals is shown in terms of shorter length. 5. Conclusions The main finding of the paper is that the EL with general moment restrictions are Bartlett correctable. This is a substantial extension of the previously established cases of Bartlett 12 correction of EL, including the case of smoothed functions of means by DiCiccio, Hall and Romano (1991) more than one decade ago. It shows that the dedicate Bartlett property of the EL is still preserved even in the case of over-identification. Although the Bartlett factor admits a very involved expression with over-identified moment restrictions, proving that the EL is Bartlett correctable in the general case provides the theoretical foundation to the proposed easily implementable empirical Bartlett correction. The use of the bootstrap to carry out the Bartlett correction empirically is due to a rather involved expression for the Bartlett factor. Although it may be expected that the direct bootstrap calibration would give the same effect as the Bartlett correction, the justification of the direct bootstrap method inevitably needs those cumulants and the Edgeworth expansions established in this paper. The results established in Theorems 1 and 2 can be extended to independent but not identically distributed samples, for instance those arisen in a regression study. We need to modify α, β and γ as follows: αj1 ...jk = n−1 n X E[wij1 (θ0)...wijk (θ0 )], β j,j1 ...jk = n−1 i=1 i=1 γ j,j1 ...jl;k,k1 ...km ;...;p,p1...pn = n X E ∂ k Γj (0, θ 0) ∂ηj1 ...∂ηjk and n ∂ lwj (θ ) ∂ mwk (θ ) 1X ∂ n wip (θ0 ) 0 0 i i E ... . n i=1 ∂θj1 ...∂θjl ∂θk1 ...∂θkm ∂θp1 ...∂θpn We need also to re-define Vn as n−1 Pn i=1 V ar{g(Xi , θ0 )}. These forms of α and Vn were employed in Chen (1993) to establish Bartlett correction for linear regression where r = p. The conditions (2.1) should be modified to reflect the independent but not identically distributed nature of data. Similar conditions as those given in Theorem 20.6 of Bhattacharya and Rao (1976) are required, as in Chen (1993, 1994). Then, it may be shown that Theorem 1 is true by employing Skovgaard (1981) on transformation of Edgeworth expansions. Theorem 2 is then a consequence of Theorem 1 as the calculation of the cumulants follows the same spirits given in the Appendix for independent and identically distributed samples. APPENDIX We provide some technical details on the log EL ratio r(θ0 ) in A.2, the sign root decom13 position in A.3., and the proofs of the two theorems in A.4. A.1. Basic formulae. We first present some basic formulae which will be used throughout the derivations. Let us define Ω = (ω kl )p×p =: U Λ−1 where ω kl = ukl λ−1 l . Please note here that no summation over the subscript l is carried out due to Λ being a diagonal matrix. Since Γ(η) = S −1 Q(η) where S −1 =     T T T −I + S12(S12 S12 )−1 S12 S12(S12 S12 )−1  T T (S12 S12)−1 S12  T (S12 S12)−1  =       0 0 Ω 0 −Ir−p Ω 0 T 0 ΩΩT     ,   it can be checked that β j,k ∂Γj (0, θ0) = E( ) = δ jk ∂ηk  and B =:  1  B       ··· Br      =S −1       0    A      =  −A2  .   0   ΩA1 Here AT = (A1 , · · · Ar )T =: (AT1 , AT2 )T , where A1 = (A1, · · · , Ap)T and A2 = (Ap+1 , · · · , Ar )T constitute a partition of vector A. Therefore for positive integers k and a, (A.1) B k = 0 for k ≤ p; B p+a = −Ap+a for a ≤ r − p, and B r+k = ω kl Al for k ≤ p. Let B1 = (B 1, · · · , B r )τ and B2 = (B r+1 , · · · B r+p )τ . Since SB = (Aτ , 0τp×1 )τ which means that −B1 + S12 B2 = A. As S12 = (γ j,k )r×p and from (A.1) we have γ j,k B r+k = Aj I(j ≤ p) (A.2) where I is the indicator function. Since (A.3)   (B p,q )(r+p)×(r+p) = S −1  we have S21 (B j,k )r×p = (C k,m )τp×r (A.4) ij i,l  −(A ) (C )  (C i,l)T 0 , and S21 (B j,r+a)r×p = 0. As S21 = (γ j,k )τ , these mean γ j,k B j,l = C l,k for l ≤ r and k ≤ p and γ j,k B j,r+a = 0. 14 Furthermore, (A.3) also implies the following which links the B s,t system with the Ajm s and the C j,m s        (A.5)  =       k,l (B ) (B k,p+b ) (B k,r+l  )    (B p+a,l ) (B p+a,p+b ) (B p+a,r+l )   (B r+k,l ) (B r+k,p+b ) (B r+k,r+l ) (ω mk (A C l,m p+a l )  (ω ) mk (A C p+b,m p+a p+b ) 0 ) (ω km [ω nm C l,n − Aml ]) (ω km [ω nm C p+b,n − Am  p+b −(C p+a,l ) ]) (ω km C m,l )    .   In a similar fashion, we can establish the links between β s,t and (αji , γ j,i ) systems where t and i contains either single or double superscripts. Define Q1(θ, λ) = (∂w(θ)/∂θ)T λ 1+λT w(θ) w(θ) 1+λT w(θ) and Q2 (θ, λ) = . For fixed h ∈ {1, · · · , r} and k ∈ {1, · · · , p} ∂Q1(θ, λ) w(θ)wT (θ) ∂Q1(θ, λ) ∂w(θ)/∂θ w(θ)∂wT (θ)λ/∂θ = − , = − , ∂λ (1 + λT w(θ))2 ∂θ 1 + λT w(θ) (1 + λT w(θ))2 ∂ ∂Q1(0, θ0 ) ∂ ∂Q1(0, θ0 ) ∂w(θ0) h ∂wh (θ0) T h = 2w(θ )w (θ )w (θ ), = − w (θ ) − w(θ ) , 0 0 0 0 0 ∂λh ∂λ ∂λh ∂θ ∂θ ∂θ ∂ ∂Q1(0, θ0 ) ∂w(θ0) T ∂wT (θ0 ) ∂ ∂Q1(0, θ0) ∂ 2w(θ0 ) = − w (θ ) − w(θ ) , = ; 0 0 ∂θk ∂λ ∂θk ∂θk ∂θk ∂θ ∂θ∂θk and ∂Q2(θ, λ) ∂λ ∂Q2(θ, λ) ∂θ ∂ ∂Q2(0, θ0 ) ∂λh ∂λ ∂ ∂Q2(0, θ0 ) ∂θk ∂λ ∂wT (θ)/∂θ (∂λT w(θ)/∂θ)wT (θ) − , (1 + λT w(θ)) (1 + λT w(θ))2 ∂ 2λT w(θ)/∂θ∂θT [∂λT w(θ)/∂θ][∂λT w(θ)/∂θ]T = − , 1 + λT w(θ) (1 + λT w(θ))2 ∂wT (θ0) h ∂wh (θ0) T ∂ ∂Q2(0, θ0 ) ∂ 2 wh (θ0 ) = − w (θ0) − w (θ0 ), = ∂θ ∂θ ∂λh ∂θ ∂θ∂θT 2 T ∂ w (θ0) ∂ ∂Q2(0, θ0 ) = , = 0. k ∂θ∂θ ∂θk ∂θ = These mean that for a fixed h ∈ {1, · · · , r}  (β s,th ) =  0p×p      0 0 −Ir−p Ω 0      ×E   ΩT  0 ΩΩT ∂ 2 Q1 ∂λh ∂λ ∂ 2 Q1 ∂λh ∂θ ∂ 2 Q2 ∂λh ∂λ ∂ 2 Q2 ∂λh ∂θ 15     =               T 0p×p 0 Ω 0 −Ir−p 0 Ω 0    ×   ΩΩT (2αklh )p×p (2αp+a lh )r−p×p (2αk p+b h )p×r−p (2αp+a p+b h )r−p×r−p  −(γ h;k,m + γ k;h,m )p×p       −(γ h;p+a,m + γ p+a;h,m )r−p×p −(γ h;k,m + γ k;h,m )Tp×p −(γ h;p+a,m + γ p+a;h,m )Tp×r−p (γ h,kl )p×p and for a fixed h = r + k where k ∈ {1, · · · , p}  (β s,tr+k ) =  0 0 Ω 0       =               0 −Ir−p 0 0 ΩΩT 0 Ω 0 −Ir−p Ω     ×E   ΩT  ∂ 2 Q1 ∂θ k ∂θ ∂ 2 Q2 ∂θ k ∂λ ∂ 2 Q2 ∂θ k ∂θ     T    ×   0 ΩΩT 0 ∂ 2 Q1 ∂θ k ∂λ −(γ l,k;m + γ l;m,k )p×p −(γ l,k;p+a + γ l;p+a,k )p×r−p (γ l,mk )p×p (γ l,mk )Tp×p (γ p+a,mk )Tp×r−p 0p×p −(γ p+a,k;m + γ p+a;m,k )r−p×p −(γ p+a,k;p+b + γ p+a;p+b,k )r−p×r−p (γ p+a,mk )r−p×p        where k, l, m ∈ {1, · · · , p} and a, b ∈ {1, · · · , r − p}. Hence we have (A.6) β l,p+a p+c = −ω ol [γ p+c;p+a,o + γ p+a;p+c,o ], β p+a,p+b p+c = −2αp+a β l,p+a p+n = ω ol γ p+a;on , β p+a,p+b p+n = γ p+a,n;p+b + γ p+a;p+b,n , β r+k,p+a p+c = 2ω ko αop+a β r+k,p+a p+n = ω ko ω mo γ p+a,mn − ω ko [γ o,n;p+a + γ o;p+a,n ], β r+k,p+m p+c = ω ko ω no γ p+c,nm − ω ko [γ p+c;o,m + γ o,p+c;m ], β r+k,p+m p+b p+c , β p+a,p+m β l,p+m p+c p+n β l,p+m p+c p+c = ω ol γ p+c;om , = γ p+c;p+a,m + γ p+a;p+c,m , = 0, β p+a,p+m p+n = −γ p+a,mn , − ω ko ω no [γ p+c;p+a,n + γ p+a;p+c,n ], A.2. Derivations of (2.6), (2.8) and (2.9) 16 p+n = ω ko γ o,mn . Derivation of (2.6). As shown in Appendix 1, β j,k = δ jk . Now expanding Γ(θ̂, λ̂) around (0, θ0 ), 0 = Γj (θ̂, λ̂) = B j + β j,k (η̂ k − η0k ) + B j,k (η̂ k − η0k ) + 1 j,kl k β (η̂ 2 + 1 j,klm k β (η̂ 6 + 1 j,klmn k β (η̂ 24 − η0k )(η̂ l − η0l )(η̂ m − η0m )(η̂ n − η0n ) + 1 B j,klmn (η̂ k 24 − η0k )(η̂ l − η0l )(η̂ m − η0m )(η̂ n − η0n ) + Op (n−3 ). − η0k )(η̂ l − η0l ) + 12 B j,kl (η̂ k − η0k )(η̂ l − η0l ) − η0k )(η̂ l − η0l )(η̂ m − η0m ) + 16 B j,klm (η̂ k − η0k )(η̂ l − η0l )(η̂ m − η0m ) Converting the above expansion, η̂ j − η0j = −B j + B j,k B k − B j,k B k,lB l + 12 β k,lmB j,k B l B m − 1 j,kl β [B k B l 2 − B k,m B mB l − B l,q B q B k + 12 β k,mn B m B n B l + 12 β l,qsB q B s B k ] − 1 j,kl k l B B B 2 + 61 β j,klm B k B lB m + Op (n−2 ) = −B j + B j,k B k − 12 β j,klB k B l − B j,k B k,l B l + 12 β k,lm B j,k B lB m + β j,kl B k,m B m B l − 1 j,kl k,mn m n l β β B B B 2 − 12 B j,kl B k B l + 16 β j,klm B k B l B m + Op (n−2 ), where j, k, l, m, ∈ {1, 2, · · · , r + p}. Thus, we have established (2.6). From (2.6), we have the following expansions for λ̂ and θ̂: λ̂j = −B j + B j,q B q − 21 β j,uq B u B q − B j,u B u,q B q + 12 β u,qsB j,u B q B s + β j,uq B u,s B s B q (A.7) − 1 j,uq u,st s t q β β B BB 2 − 12 B j,uq B u B q + 16 β j,uqs B u B q B s + Op (n−2 ), and θ̂k = −B r+k + B r+k,q B q − 12 β r+k,uq B u B q − B r+k,u B u,q B q (A.8) + 1 u,qs r+k,u q s β B B B 2 − 1 r+k,uq u q B B B 2 + β r+k,uq B u,s B s B q − 12 β r+k,uq β u,st B s B tB q + 16 β r+k,uqs B u B q B s + Op (n−2 ). Derivation of (2.8) and (2.9. We shall expand each term on the right of (2.7). By 17 ignoring terms of Op (n−5/2 ), the first term λ̂T n−1 n X wi (θ̂) = λ̂j n−1 i=1 n X [wij (θ0) + i=1 j ∂wij (θ0 ) k θ̂ + ∂θk j 2 1 ∂ wi (θ0 ) k l θ̂ θ̂ 2 ∂θ k ∂θ l + j 3 1 ∂ wi (θ0 ) k l m θ̂ θ̂ θ̂ ] k 6 ∂θ ∂θ l ∂θ m j = λ̂ A + γ j,k λ̂j θ̂k + λ̂j θ̂k C j,k + 12 γ j,kl λ̂j θ̂k θ̂l + 12 λ̂j θ̂k θ̂l C j,kl + 61 γ j,klm λ̂j θ̂k θ̂l θ̂m . Similarly, the second term T λ̂ n −1 n X wi (θ̂)wi (θ̂)T λ̂ i=1 j h −1 = λ̂ λ̂ n n X i=1 j i {wij (θ0 )wih (θ0 ) ji ji j i l + = λ̂ λ̂ (A + δ ) + λ̂ λ̂ θ̂ {(C ∂wij (θ0) l h wi (θ0) θ̂ [2, j, h] l j;i,l ∂θ +γ j;i,l + 12 wij (θ0) ∂ 2 wih (θ0 ) } ∂θ l ∂θ k )[2, j, i]} + 12 λ̂j λ̂i θ̂l θ̂k {(C j;i,lk + γ j;i,lk )[2, j, i]} + λ̂j λ̂i θ̂l θ̂k {(C j,l;i,k + γ j,l;i,k } = λ̂j λ̂j + λ̂j λ̂i Aji + γ j;i,l[2, j, i]λ̂j λ̂i θ̂l + C j;i,l[2, j, i]λ̂j λ̂i θ̂l + 1 j;i,lk γ [2, j, i]λ̂j λ̂i θ̂l θ̂k 2 + γ j,l;i,k λ̂j λ̂i θ̂l θ̂k + Op (n−5/2 ) = λ̂j λ̂j + λ̂j λ̂i Aji + 2γ j;i,l λ̂j λ̂i θ̂l + 2C j;i,l λ̂j λ̂i θ̂l + (γ j;i,lk + γ j,l;i,k )λ̂j λ̂i θ̂l θ̂k + Op (n−5/2 ) where [2, j, i] indicates there are two terms by swamping the super-scripts i and j, and the same is understood for similar notations. For the third term 2 −1 n 3 Finally, n−1 Pn i=1 [λ̂ Pn T i=1 [λ̂ T wi (θ̂)]3 = 2 j i h λ̂ λ̂ λ̂ (Ajih 3 + αjih ) + 23 {λ̂j λ̂i λ̂h θ̂k γ j;i;h,k [3, j, i, h]} = 2 j i h λ̂ λ̂ λ̂ (Ajih 3 + αjih ) + 2λ̂j λ̂i λ̂h θ̂k γ j;i;h,k + Op (n−5/2 ). wi(θ̂)]4 = λ̂j λ̂i λ̂h λ̂g αjihg + Op (n−5/2 ). We then have for a, b, c, d ∈ {1, 2, · · · , r − p}, f, g, h, i, j ∈ {1, 2, · · · , r}, k, l, m, n, o ∈ {1, 2, · · · , p} and q, s, t, u ∈ {1, 2, · · · , r + p} that n−1 l(θ̂) = −2B j Aj − B j B j + 2B j,q B q (Aj + B j ) − β j,uq B u B q (Aj + B j ) − 2B j,u B u,q B q (Aj + B j ) + β u,qs B j,u B q B s (Aj + B j ) − β j,uq β u,st B q B s B t(Aj + B j ) − B j,uq B u B q (Aj + B j ) + 13 β j,uqs B u B q B s (Aj + B j ) + 2β j,uq B u,s B s B q (Aj + B j ) + 2γ j,k {−B j,q B q B r+k + [ 12 β j,uq B u B q B r+k + B j,u B u,q B q B r+k 18 − 1 u,qs j,u q s r+k β B B B B 2 + 1 j,uq u q r+k B B B B 2 − β j,uq B u,s B q B s B r+k + 12 β j,uq β u,st B q B s B tB r+k − 61 β j,uqs B u B q B s B r+k − 12 β j,uq B u B q B r+k,s B s ][2, j, r + k] + B j,u B u B r+k,q B q + 14 β j,uq β r+k,stB u B q B s B t} + 2C j,k {B j B r+k − B j,q B q B r+k [2, j, r + k] + 12 β j,uq B u B q B r+k [2, j, r + k]} + γ j,kl {−B j B r+k B r+l + B r+k B r+l B j,q B q [3, j, r + k, r + l] − 1 j r+k r+l,uq u q B B β B B [3, j, r 2 + 1 j,klm j r+k r+l r+m γ B B B B 3 + k, r + l]} − C j,kl B j B r+k B r+l − B j,u B u B j,q B q − 14 β j,uq β j,st B u B q B s B t + β j,uq B u B q B j,s B s − B j B i Aji + B j B i,q B q Aji [2, j, i] − 12 β j,uq B u B q B iAji [2, j, i] + 2γ j;i,l {B j B i B r+l − B j B iB r+l,q B q + 21 β r+l,uq B j B iB u B q − B r+l B i B j,q B q [2, j, i] + 1 j,uq u q i r+l β B B B B [2, j, i]} 2 − 2 jih j i h α B BB 3 + 2B j B i B r+l C j;i,l − (γ j;i,lk + γ j,l;i,k )B j B iB r+l B r+k + 2αjih B j B iB h,q B q − αjih β j,uq B u B q B i B h − 23 Ajih B j B i B h + 2γ j;i;h,k B j B i B h B r+k − 21 αjihg B j B i B h B g + Op (n−5/2 ). Applying (A.1) and (A.4), it may be shown that the 3rd to the 18th terms on the right hand side cancel each other and the application of (A.4) simplifies the 20th term. Keep all the other terms, we have (2.8). Now bring in the expansion for `(θ0) in (2.5) we have n−1 r(θ0 ) = (Aj + B j )(Aj + B j ) − Aji (Aj Ai − B j B i ) − 2C j,k B j B r+k − 2γ j;i,l B j B iB r+l + 2 jih α [Aj AiAh 3 + B j B i B h ] + γ j,kl B j B r+k B r+l + Aji (Ahi Aj Ah − B i,q B q B j [2, i, j]) − β j,uq [C j,k B r+k − B r+k,s B s γ j,k + B j,s B s − AjiB i ]B u B q − 2αjih B j B i B h,q B q + B j,u B j,q B u B q + 2C j,k B j,q B r+k B q − γ j,kl B r+k B r+l B j,q B q − 2γ j,kl B j B r+l B r+k,q B q + 2γ j;i,l (B j B iB r+l,q B q + B r+l B iB j,q B q [2, j, i]) + ( 41 β j,uq β j,st − 12 β j,uq β r+k,st γ j,k )B u B q B s B t + 12 γ j,kl β j,uq B u B q B r+l B r+k + (γ j,kl β r+l,uq − γ i;j,k β i,uq − γ j;i,k β i,uq )B u B q B j B r+k − 2γ j;i;h,k B j B i B h B r+k + (γ j;i,lk + γ j,l;i,k )B j B iB r+l B r+k − 31 γ j,klm B j B r+k B r+l B r+m 19 + 1 jihg j i h g α B BB B 2 + (αjih β h,uq − γ j;i,l β r+l,uq )B j B iB u B q + C j,kl B j B r+k B r+l − 2C j;i,l B j B i B r+l + 23 Ajih (B j B i B h + Aj AiAh ) (A.9) − 2αjih Agh Aj Ai Ag + αjgf αihf Aj AiAh Ag − 12 αjihg Aj Ai Ah Ag + Op (n−5/2). Now the terms appeared on the 3rd line of the above equation cancel each other. To appropriate this, by applying the relationships implied by (A.5) together with (A.1) and (A.4), β j,uq [C j,k B r+k − B r+k,s B s γ j,k + B j,s B s − AjiB i ]B u B q = [β j,uq C j,k B r+k − β l,uq γ l,k B r+k,p+a B p+a − β l,uq γ l,k B r+k,r+m B r+m − β l,uq Alp+a B p+a − β p+b,uq Ap+bp+a B p+a + β l,uq B l,p+aB p+a + β p+b,uq B p+b,p+aB p+a + β p+a,uq B p+a,r+m B r+m ]B pB q = [β j,uq C j,k B r+k − β l,uq (B l,p+a − Al + β l,uq (B l,p+a − Al p+a p+a )B p+a − β l,uq C l,m B r+m )B p+a − β p+b,uq Ap+bp+a B p+a + β p+a,uq B p+b,p+aB p+a − β p+a,uq C p+a,m B r+m ]B u B q = 0. Applying again (A.5), we can express the terms appeared in the first two lines of (A.9) in term of As, that is (Aj + B j )(Aj + B j ) − Aji (Aj Ai − B j B i) − 2C j,k B j B r+k − γ j;i,lB j B i B r+l [2, j, i] + 32 αjih [Aj AiAh + B j B i B h ] + γ j,kl B j B r+k B r+l = AlAl − (Alk Al Ak + 2Al p+a Al Ap+a) + 2ω kl C p+a,k Ap+aAl − (γ p+a;p+b,k + γ p+b;p+a,k )ω kl Ap+a Ap+b Al + 32 αklm Ak AlAm + 2αkl p+a Ak AlAp+a + 2αl = AlAl − AklAk Al − 2Al p+a p+b p+a + 2ω kl C p+a,k Ap+aAl + [2αkl + 2[αl p+a p+b Al Ap+aAp+b − γ p+a,mn ω mk ω nl Ap+a Ak Al Ap+aAl + 23 αklm Ak Am Al p+a − γ p+a,mn ω mk ω nl ]Ap+a Ak Al − γ p+a;p+b,k ω kl ]Ap+aAp+b Al . Substitute these results into (A.9), we arrive at (2.9). 20 A.3. Expansion for R3 We subtract Rl2 Rl2 from all the terms appeared in line 4 and below in (2.9). Fortunately all the terms which do not have Al appeared cancel out with those appeared in Rl2Rl2 . Otherwise, a signed root decomposition of the EL ratio r(θ0) would not be possible. Hence the remaining terms can be written as 2Rl1 Rl3 . The pursuit for an expression of R3 is done by repeatedly employing the formulae (A.5) and (A.6) as well as (A.1), (A.2) and (A.4). For instance, the terms appeared in the 4th line of (2.9) Aji Ahi Aj Ah + B j,u B j,q B u B q − 2Aji B i,q B q B j + 2C j,k B j,q B r+k B q = Akl Akm Am Al + 2Akl Ak + Ak p+a Al p+a p+a Ap+aAl + 2Al p+a Ap+a p+b Ap+b Al Ak Al + Ap+al Ap+bl Ap+aAp+b + ω ml ω nl C p+a,m C p+b,n Ap+a Ap+b + ω nl ω km C p+a,n C p+a,k AmAl − 2ω mk C p+b,m Ap+a k Ap+b Ap+a − 2ω mn ω kl C n,k C p+a,mAp+a Al − 2ω kl C p+a,k Ap+a p+b Ap+b Al − 2ω ml ω kn C p+a,k C p+a,m An Al; and the terms in the fifth line −γ j,kl B r+k B r+l B j,q B q − 2γ j,kl B j B r+l B r+k,q B q = −γ m,kl B p+a B r+k B r+l B m,p+a − γ p+b,kl B p+aB r+k B r+l B p+b,p+a − γ p+b,kl B r+k B r+l B r+n B p+b,r+n − 2γ p+b,kl B p+a B p+b B r+l B r+k,p+a − 2γ p+a,kl B p+a B r+l B r+m B r+k,r+m = γ m,kl ω kn ω lo ω vm C p+a,v Ap+aAn Ao + γ p+b,kl ω kn ω lo Ap+b p+a Ap+a An Ao + γ p+b,kl ω kn ω lo ω mv C p+b,m An Ao Av − 2γ p+b,kl ω kn ω lo ω vn C p+a,v Ap+a Ap+b Ao + 2γ p+b,kl ω kn ω lo An p+a Ap+a Ap+b Ao + 2γ p+a,kl ω kv ω ln ω mo C v,m Ap+aAn Ao , and so on for the other terms. It may be shown after some quite involved algebra that that Rl3 = P6 i=1 Rl31 = Rl3i where 3 lm km k A A A 8 + 31 Alkm Ak Am − 5 lkm nm k n α A A A 12 21 − 5 knm lm k n α A A A 12 Rl32 = Alk p+a + 49 αlkn αomn Am Ak Ao − 14 αlknm Am Ak An , Ap+a Ak + Al p+a p+b Ap+a Ap+b − 21 ω kn ω ml C p+a,km Ap+a An − ω kl C p+a;p+b,k Ap+a Ap+b + 21 ω km C p+a,k Alm Ap+a + 12 Alk Ak + Alp+a Ap+a p+b Ap+a Ap+b + 21 Alp+aAp+ak Ak − 21 ω km ω nl C p+a,nC p+a,k Am − ω kl ω mn C n,k C p+a,mAp+a − ω kl C p+a,k Ap+a Rl33 = p+a 1 m,vo vn ol km [γ ω ω ω 2 p+b Ap+b , − 23 αnml ω km ]C p+a,k Ap+a An + 12 γ p+b,ko ω kn ω ol ω mv C p+b,m An Av + [ 12 γ p+b,ko ω kn ω ol − αlnp+b ]Ap+b p+a Ap+aAn + ω vk [ω nl (γ k;p+a,n + γ p+a;k,n ) − αlkp+b − 12 γ p+b,mn ω ml ω nk ]C p+a,v Ap+a Ap+b + γ p+a,ko ω on ω ml ω kv C v,m Ap+aAn + [ 12 γ p+b,mk ω ml ω kn − αlnp+b ]An p+a Ap+a Ap+b , Rl34 = γ p+a;p+b,n ω no ω ml C o,mAp+a Ap+b − αp+a + [(γ p+c;p+a,n + γ p+a;p+c,n )ω nl − 2αl p+b p+c p+a p+c Al ]Ap+c p+c p+b Ap+a Ap+b + (γ p+c;p+a,o + γ p+a;p+c,o )ω on ω kl C p+c,k Ap+aAn − αlk + αp+a p+b p+c − [ 32 αko − [αm p+a α kf p+a Am p+a Ak Am Ap+a Ak p+a p+b Alp+b Ap+a Ak + γ p+a;p+b,k ω km ]AlmAp+a Ap+b , p+a mn p+a + [2αp+a p+a + 41 γ p+a,mn ω mk ω no ]AloAp+a Ak − 2αk p+a p+b Rl35 = [ 12 αlk ω ml C p+c,m Ap+a Ap+b − 32 αlkm Am Ap+aAp+b − 81 ω l l ω m m ω n n ω k k γ p+a,l m γ p+a,n k ]Am An Ak 0 αlmf − αlkm 0 p+a 0 − 13 αlmn (αkn − 0 0 0 0 0 1 m 0 m k 0 k l0 l ω ω ω (ω l v γ p+a,m l γ v,l k 2 − 1 [3αlk p+a p+b 2 + 32 αklv (αv 0 0 0 0 0 − 21 γ p+a,m n ω m k ω n n ) 0 0 p+a 0 0 0 0 0 − 13 γ p+a,m k l )]Ap+aAk Am p+a p+b − 21 γ p+a;p+b,n ω nv ) + (αkvp+a − 12 γ p+a,mn ω mk ω nv )(αlvp+b − 12 γ p+b,m n ω m l ω n v )]Ap+aAp+b Ak 0 0 + [αl × (αk p+a f αp+b p+a p+b Rl36 = {2αl − p+c f − αl p+a p+b p+c − (αlk p+c − γ p+a;p+b,v )ω vk ]Ap+aAp+b Ap+c p+a f αmp+bf + αlmf αp+a 0 1 (γ p+c;p+a,m 2 p+b f 0 0 0 − 12 γ p+c,mn ω ml ω nk ) and 0 0 + ω m m ω n l [ 12 ω ok ω vk γ p+a,om γ p+b,vn 0 0 0 + γ p+a;p+c,m )(γ p+c;p+b,n + γ p+b;p+c,n ) 22 0 0 0 0 − ω ko γ p+b,n k (γ o,m ;p+a + γ o;p+a,m ) − 12 αp+a − 1 ko o,m0 n0 p+a;p+b,k ω γ γ 2 p+b p+c p+c,m0 n0 γ + 21 γ p+a;p+b,m n + 21 γ p+a,n;p+b,m ]}Ap+aAp+b Am . 0 0 0 A.4. Proof of Theorems 1 and 2. The proof of Theorem 1 is divided into two parts. In the first part, we derive the √ cumulants of nR. In the second part, we establish an Edgeworth expansion for the signed root which then leads to an Edgeworth expansion for the EL ratio r(θ0 ). Cumulants of the signed root R. Since the cumulants of order higher than four are of O(n−2 ) or smaller, we only need to derive the first four cumulants. As the first and the third cumulants are easier to derive than the second and the fourth, we present them first. From (2.10) and (2.11), and the fact that Rj3 is the product of four zero-mean averages, we have E(Rl1 ) = 0, E(Rl2 ) = n−1 µl and E(Rj3 ) = O(n−2 ) where µl = − 16 n−1 αlkk . Therefore, the first order cumulant is cum(Rl ) = n−1 µl + O(n−2 ). (A.10) The joint third-order cumulants cum(Rl , Ro , Rv ) = E(Rl Ro Rv ) − E(Rl )E(Ro Rv )[3] + 2E(Rl )E(Ro )E(Rv )[3] = E(Rl1 Ro1 Rv1 ) + E(Rl2 Ro1 Rv1 )[3] − E(Rl2 )E(Ro1 Rv1 )[3] + O(n−3 ). We note that E(Rl1 Ro1) = n−1 δ lo and E(Rl2 Ro1) = n−2 [− 12 (αlokk − δ lo ) − αlo + (αlk p+a p+a p+a − 21 ω mk ω nl γ p+a,mn )αok + 13 αlkm αokm + ω kl γ p+a;o;p+a,k p+a + (αl p+a p+b − ω kl γ p+a;p+b,k )αo p+a p+b ]. Write Rl2 = Rl21 + Rl22 where Rl21 = − 21 Akl Ak + 31 αklm Ak Am and Rl22 contains the rest of the terms appeared in (2.11). We have E(Rl21 ) = − 16 n−1 αlkk , 23 E(Rl22 Ro1 Rv1 ) = E(Rl22 )δ ov + O(n−3 ) = O(n−3 ) and E(Rl21 Ro1 Rv1 ) = n−2 (− 16 αlkk δ ov − 13 αlov ) + O(n−3 ). Thus, E(Rl2 Ro1Rv1 ) = E(Rl2 )E(Ro1 Rv1 ) − 13 E(Rl1 Ro1 Rv1 ) + O(n−3 ), (A.11) which means that cum(Rl , Ro , Rv ) = O(n−3 ). (A.12) To compute the second cumulants, we have to derive the expectation involving the following 21 terms in Rl2 Rl2: (1) 14 Akl Ak Aml Am ; (2) Al p+a Al p+b Ap+a Ap+b; (3) 91 αklm αnlo Ak Am An Ao ; (4) ω kl ω ml C p+a,k C p+b,m Ap+a Ap+b; (5) [αkl (6) [αl − 21 γ p+a,mn ω mk ω nl ][αol p+a p+a p+b (7) Akl Al p+a (10) −[αkl (11) −[αl − γ p+a;p+b,k ω kl ][αl Ap+a Ak ; p+a p+b − 12 γ p+b,m n ω m o ω n l ]Ap+a Ap+b Ak Ao ; p+c p+d 0 0 0 0 − γ p+c;p+d,o ω ol ]Ap+a Ap+bAp+c Ap+d ; (8) − 31 αklm Anl An Ak Am ; (9) −ω kl C p+a,k Aml Ap+aAm ; − 21 γ p+a,mn ω mk ω nl ]Ap+a AolAk Ao ; p+a p+b − γ p+a;p+b,k ω kl ]AmlAp+a Ap+b Am ; (12) − 32 αklm Alp+a Ap+a Ak Am ; (13) −2ω kl C p+a,k Alp+b Ap+a Ap+b ; (14) −2[αkl (15) −2[αl p+a − 21 γ p+a,mn ω mk ω nl ]Alp+bAp+b Ap+a Ak ; p+a p+b − γ p+a;p+b,k ω kl ]Alp+c Ap+c Ap+a Ap+b ; (16) 32 αnlm ω kl C p+a,k Ap+aAn Am ; (17) 23 αolv [αkl p+a − 21 γ p+a,mn ω mk ω nl ]Ap+aAk Ao Av ; (18) 23 αklo [αl p+a p+b (19) 2ω ol [αkl p+a (20) 2ω ol [αl (21) 2[αkl − 21 γ p+a,mn ω mk ω nl ]C p+b,o Ap+bAp+a Ak ; p+a p+b p+c − γ p+a;p+b,n ω nl ]Ap+aAp+b Ak Ao ; − γ p+a;p+b,k ω kl ]C p+c,oAp+c Ap+a Ap+b ; − 21 γ p+c,mn ω mk ω nl ][αl p+a p+b − γ p+a;p+b,o ω owl ]Ap+aAp+b Ap+c Ak . 24 Denote “[2]” for the permutation of l and o, the expectations for the above 21 terms are respectively (1) 14 [αlkk αomm + αlkm αokm + αlokk − δ lo ]; (2) αlo p+a p+a + αl p+a p+a o p+b p+b + αl α p+a p+b o p+a p+b α ]; (3) 19 [αlkk αonn + 2αlkm αokm ]; (4) ω kl ω mo [γ p+a,k;p+a,m + γ p+a,k;p+a γ p+b,m;p+b + γ p+a,k;p+b γ p+b,m;p+a ]; (5) [αkl (6) [αl p+a − 21 γ p+a,mn ω mk ω nl ][αko,p+a − 12 γ p+a,m n ω m k ω n o ]; 0 0 p+a p+a − γ p+a;p+a,k ω kl ][αo p+c p+c 0 0 − γ p+c;p+c,v ω vo ] +[αl p+a p+b − γ p+a;p+b,k ω kl ][αo p+a p+b − γ p+a;p+b,v ω vo ] +[αl p+a p+b − γ p+a;p+b,k ω kl ][αo p+a p+b − γ p+b;p+a,v ω vo ]δ adδ bc ]; (7) 21 [αlkk αo p+a p+a + αlk p+a αok p+a ][2]; (8) − 31 [αlkk αonn + 2αlkm αokm ]; (9) − 12 {ω kl [γ p+a,k;p+a αomm + γ p+a,k;m αom (10) − 12 {[αkl (11) − 12 {[αl p+a p+a p+a (15) −[αl p+a − γ p+a;p+a,k ω kl ]αomm }[2]; − γ p+a;p+b,k ω kl ][αo (16) 13 {αomm ω kl γ p+a,k;p+a }[2]; (19) ω vo [αkl (20) ω vo [αl p+b p+b − 12 γ p+a,mn ω mk ω nl ]αok p+a p+b p+a ]}[2]; − 21 γ p+a,mn ω mk ω nl ]αp+ako }[2]; (13) −ω kl [γ p+a,k;o;p+a + γ p+a,k;p+aαo (14) −[αkl p+a p+a (12) − 31 {αlkk αo + γ p+a,k;p+b αo p+a p+a p+a p+b }[2]; ][2]; }[2]; p+c p+c ab δ + 2αo (18) 31 αokk [αl p+a p+b p+a p+a ][2]; − γ p+a;p+a,nω nl ][2]; − 21 γ p+a,mn ω mk ω nl ]γ p+a,v;k [2]; p+a p+b − γ p+a;p+b,k ω kl ][γ p+c,v;p+c δ ab + γ p+b,v;p+a + γ p+a,v;p+b ][2]; and the expectation of terms (17) and (21) are of O(n−3 ). Now, let J1lo be the sum of the expectations of the terms (1)-(3) and (8) multiplied by n2 , J2lo be that of (5), (7), (10) and (14), J3lo be that of (11), (12), (16) and (18), J4lo be that of (6) and (20), J5lo be that of (13) and (15) and finally J6lo be that of (9) and (19). Extensive 25 derivations show that, with a remainder terms of O(n−1 ), J1lo = 1 (αlkk αomm 4 + αl = p+a p+b o p+a p+b α 1 (αlokk 4 + αl p+a p+a = −αlk α + αl p+a p+a o p+b p+b α + αlo p+a p+a + αl p+a p+a o p+b p+b α , + αlk p+a p+a ok p+a α − 12 γ p+a,mn ω mk ω nl ]αok p+a ok p+a + 12 αlkk αo − 21 γ p+a,m n ω m k ω n o ] 0 0 ][2] − 21 {[αkl p+a p+a p+a p+a 0 0 − 21 γ p+a,mn ω mk ω nl ]αp+ako }[2] }[2] [2] + 41 γ p+a,mn ω mk ω nl αok p+a [2] 1 p+a,mn p+a,m0 n0 mk nl m0 k n0 o γ γ ω ω ω ω , 4 J3lo = − 12 {[αl + 7 lkm okm α α 36 − − 21 γ p+a,mn ω mk ω nl ][αko 1 [αlkk αo p+a p+a 2 p+a p+a + 19 (αlkk αonn + 2αlkm αokm ) − 13 [αlkk αonn + 2αlkm αokm ] 1 lkk omm α α 36 α − [αkl + − δ lo ) + p+a p+b o p+a p+b J2lo = [αkl + + αlkm αokm + αlokk − δ lo) + αlo p+a p+a − γ p+a;p+a,k ω kl ]αomm }[2] − 31 αlkk αo 1 omm kl p+a;p+a,k α ω γ [2] 3 = − 21 αl p+a p+a omm α + 31 αokk [αl p+a p+a p+a p+a − γ p+a;p+a,n ω nl ][2] [2] + 21 γ p+a;p+a,k ω kl αomm [2], J4lo = [αl p+a p+a − γ p+a;p+a,k ω kl ][αo p+b p+b − γ p+b;p+b,v ω vo ] + [αl p+a p+b − γ p+a;p+b,k ω kl ][αo p+a p+b − γ p+a;p+b,v ω vo ] + [αl p+a p+b − γ p+a;p+b,k ω kl ][αo p+a p+b − γ p+b;p+a,v ω vo ] + ω vo [αl = αl p+a p+b [2] − γ p+a;p+b,k ω kl ][γ p+c;p+c,v δ ab + γ p+a;p+b,v + γ p+b;p+a,v ][2] p+a p+a o p+b p+b α + 2αl p+a p+b αo p+a p+b − γ p+a;p+a,k γ p+b;p+b,v ω vo ω kl + γ p+a;p+b,k γ p+a;p+b,v ω kl ω vo + γ p+a;p+b,k γ p+b;p+a,v ω kl ω vo − γ p+a;p+b,k ω kl ω vo [γ p+a;p+b,v + γ p+b;p+a,v ][2] = al p+a p+a o p+b p+b α + 2αl p+a p+b o p+a p+b α − [γ p+a;p+a,k γ p+b;p+b,v + γ p+a;p+b,k γ p+a;p+b,v + γ p+a;p+b,k γ p+b;p+a,v ]ω kl ω vo , J5lo = −ω kl [γ p+a,k;o;p+a + γ p+a,k;p+a αo − [αl p+a p+b − γ p+a;p+b,k ω kl ][αo p+b p+b + γ p+a,k;p+b αo p+c p+c ab δ + 2αo 26 p+a p+b p+a p+b ][2] ][2] = −ω kl γ o;p+a;p+a,k [2] − 2αl + γ p+a;p+b,k αo p+a p+b p+a p+a o p+b p+b α − 4αl p+a p+b o p+a p+b α ω kl [2, l, o] and J6lo = − 12 {ω kl [γ p+a;p+a,k αomm + γ m;p+a,k αom = 1 kl m;p+a,k om p+a ω γ α [2] 2 p+a ]}[2] + ω vo [αlk p+a − 12 γ p+a,mn ω mk ω nl ]γ k;p+a,v [2] − 12 ω kl γ p+a;p+a,k αomm [2] − 21 γ p+a,mn γ k;p+a,v ω mk ω nl ω vo [2]. Then, we have J2lo + J3lo + J6lo = −αlk p+a ok p+a α 1 p+a,mn mk nl ok p+a γ ω ω α [2] 4 + + 21 ω kl γ m;p+a,k αom p+a [2] + 14 γ p+a,mn γ p+a,m n ω mk ω nl ω m k ω n o − 12 γ p+a,mn γ k;p+a,v ω mk ω nl ω vo [2] 0 0 0 0 and J1lo + J4lo + J5lo + E(term (4)) = 14 (αlokk − δ lo) + − αl p+a p+b αo p+a p+b 1 lkk omm α α 36 − ω kl γ o;p+a;p+a,k [2] + γ p+a;p+b,k αo p+a p+b − 7 lkm okm α α 36 + αlo p+a p+a ω kl [2] + [γ p+a,k;p+a,v − γ p+a;p+b,k γ p+a;p+b,v ]ω kl ω vo . Let J lo be the leading order term of the expected value of all the 21 terms multiplied by n2 . Then, J lo = 1 (αlokk 4 − αl − δ lo ) + p+a p+b o p+a p+b α + γ p+a;p+b,k αo + (A.13) 1 lkk omm α α 36 p+a p+b − αlk − 7 lkm okm α α 36 p+a ok p+a α + αlo + 12 ω kl γ m;p+a,k αom ω kl [2] + 14 γ p+a,mn ω mk ω nl αok 1 p+a,mn p+a,m0 n0 mk nl m0 k n0 o γ γ ω ω ω ω 4 p+a p+a p+a p+a [2] [2] − 12 γ p+a,mn γ k;p+a,v ω mk ω nl ω vo [2] − ω kl γ o;p+a;p+a,k [2] + [γ p+a,k;p+a,v − γ p+a;p+b,k γ p+a;p+b,v ]ω kl ω vo . In summary, we have (A.14) E(Rl2 Ro2 ) = n−2 J lo + O(n−3 ). Please note in the expressions for J lo and Jilo , there are terms like ω kl . Although the superscript l is repeated, it does not imply summation over that superscript. 27 We also need to compute E(Rl2 Ro1 ) + E(Rl3Ro1 ). It may be shown that, with remainder terms of O(n−1 ), n2 E[Rl31Ro1 ] = 5 lokk α 8 n2 E(Rl32 Ro1 ) = − 38 δ lo − 5 lo p+a p+a α 2 29 lkm okm α α 72 + αlk − p+a ok p+a α + 1 lo p+a kk p+a α α 2 + 1 km ω (γ p+a,k;o αlm p+a 2 − 1 km nl p+a,n;m p+a,k;o ω ω (γ γ 2 1 lok kmm α α , 72 + 12 αlok αk p+a p+a + αl p+a p+b o p+a p+b α − 12 ω ko ω ml γ p+a,km;p+a − ω kl γ p+a;p+a,k;o + γ p+a,k;p+a αlom ) − 21 ω nl ω ko γ p+a,n;p+a,k + γ p+a,n;o γ p+a,k;m ) + αlo p+a p+a p+b p+b α − ω kl ω mn (γ n,k;p+a γ p+a,m;o + γ n,k;o γ p+a,m;p+a ) − ω kl (γ p+a,k;p+b αo n2 E(Rl33 Ro1 ) = −αlk p+a p+b p+a ok p+a α + γ p+a,k;o αp+a − αlop+b αp+a − 1 km p+a,k;p+a oml ω γ α 3 + 1 kn vl mo p+b,kv p+b,m;n ω ω ω γ γ 2 − ω vk αlk p+a p+a,v;o γ n2 E(Rl34 Ro1 ) = −4αl − αlo p+a p+b αo p+a kk p+a α ), + 12 ω vo ω nl ω km γ m,vn γ p+a,k;p+a + 21 ω ko ω nl γ p+b,kn αp+a p+a p+b + 12 ω ko ω nl ω mv γ p+b,kn γ p+b,m;v + ω vk ω nl (γ k;p+a,n + γ p+a;k,n )γ p+a,v;o 1 ml kn p+a,mk on p+a ω ω γ α 2 + p+a p+b p+b p+b p+a p+b − αlo + ω no ω ml ω kv γ p+a,kn γ v,m;p+a , − 53 αlom αm p+a p+a p+a p+a p+b p+b α + ω nl (γ p+b;p+a,n + γ p+a;p+b,n )αo − 27 αlk p+a ok p+a α + ω ml ω nk γ k,m;o γ p+a;p+a,n p+a p+b + ω no ω kl (γ p+b;p+a,n + γ p+a;p+b,n )γ p+b,k;p+a + ω ml γ p+b,m;o αp+a 1 mo nk p+a,mn lk p+a ω ω γ α 4 − n2 E(Rl35Ro1 ) = 1 lo p+a kk p+a α α 2 + αlk − ω km γ p+a;p+a,k αlom , p+a ok p+a α 1 nv lov p+a;p+a,n ω α γ 6 − 1 l0 l m0 o n0 k k 0 k p+a,l0 m0 p+a,n0 k 0 ω ω ω ω γ γ 8 − 3 mo nv m0 l n0 v p+a,mn p+a,m0 n0 ω ω ω ω γ γ , 8 p+a ok p+a α − 21 αlk − 32 αlo + n2 E(Rl36Ro1 ) = 2αlk + 2αl p+a p+b p+a ok p+a p+a p+b 28 α αo p+a p+a − 13 αlok αk p+a p+a + 41 ω mo ω nv αlvp+a γ p+a,mn [2] p+a p+b + αlok αk p+a p+b + αlo − p+a p+a p+b p+b α + 12 ω m o ω n l ω mk ω vk γ p+a,mm γ p+a,vn 0 1 m0 o n0 l p+b;p+a,m0 ω ω (γ 2 0 0 0 0 0 0 0 0 + γ p+a;p+b,m )(γ p+b;p+a,n + γ p+a;p+b,n ) 0 0 0 − ω m o ω n l ω km (γ m,m ;p+a + γ m;p+a,m )γ p+a,n k − 1 m0 o n0 l p+a p+a p+b p+b,m0 n0 ω ω α γ 2 + 1 m0 o n0 l p+a;p+a,m0 n0 ω ω (γ 2 n2 E(Rl2Ro1 ) = − 12 (αlokk − δ lo ) − αlo + αlk p+a ok p+a α − 21 ω m o ω n l ω km γ m,m n γ p+a;p+a,k 0 0 0 0 0 + γ p+a,n;p+a,m ), p+a p+a + 13 αlkm αokm + αl p+a p+b o p+a p+b α + ω kl γ p+a;o;p+a,k − 12 ω mk ω nl γ p+a,mn αok − ω kl γ p+a;p+b,k αo p+a p+b p+a . In summary, E(Rl2 Ro1 ) + E(Rl3 Ro1 ) = n−2 K lo + O(n−3 ) (A.15) where K lo = 1 (αlokk 8 + δ lo ) − 5 lkm okm α α 72 − 1 km p+a,k;o lm p+a ω γ α 2 + 1 nl km ω ω (γ p+a,k;o γ p+a,n;m 2 + 1 kn vl mo p+a,mv p+a,k;n ω ω ω (γ γ 2 − 1 lok kmm α α 72 − 12 αlok αk p+a p+a − 32 ω km γ p+a,k;p+a αlom + 41 ω ml ω nk αok − γ p+a,n;o γ p+a,k;m ) − γ p+a,kv γ p+a,m;n ) + ω mo ω vl ω kn (γ p+a,km γ p+a;n,v − γ p+a,kv γ p+a;n,m ) + 0 0 1 m0 l mo n0 k nk ω ω ω ω (γ p+a,mn γ p+a,m n 8 0 In light of (A.14) and (A.15), we have cum(Rl, Ro ) = n−1 δ lo + n−2 ∆lo + O(n−3 ) (A.16) where (A.17) ∆lo = K lo [2] + J lo − µl µo = 1 lokk α 2 + αlo p+a p+a − 13 αlkm αokm 29 0 − γ p+a,mm γ p+a,nn ). p+a p+a,mn γ − 1 lok kmm α α 36 − 1 km p+a,k;o lm p+a ω γ α [2] 2 − αlok αk + ω kl ( 21 γ m;p+a,k αom p+a p+a p+a − αl p+a p+b o p+a p+b α − αlk p+a ok p+a α − 32 ω km γ p+a,k;p+a αlom [2] + γ p+a;p+b,k αo + 1 ml nk ok p+a p+a,mn ω ω α γ [2] 2 + 1 nl km p+a,k;o p+a,n;m ω ω (γ γ 2 + 1 kn vl mo p+a,mv p+a,k;n ω ω ω (γ γ 2 p+a p+b − γ o;p+a;p+a,k )[2] + ω kl ω vo (γ p+a,k;p+a,v − γ p+a;p+b,k γ p+a;p+b,v ) − γ p+a,n;o γ p+a,k;m )[2] − 2γ p+a,kv γ p+a,m;n )[2] + ω mo ω vl ω kn (γ p+a,km γ p+a;n,v − γ p+a,kv γ p+a;n,m )[2] (A.18) + 0 0 1 m0 l mo n0 k nk ω ω ω ω (2γ p+a,mn γ p+a,m n 8 (A.19) + 1 p+a,mn p+a,m0 n0 mk nl m0 k n0 o γ γ ω ω ω ω 4 (A.20) − 1 p+a,mn k;p+a,v mk nl vo γ γ ω ω ω [2] 2 0 0 − γ p+a,mm γ p+a,nn )[2]. The joint fourth-order cumulants of R is cum(Rl , Rk , Rm , Rn ) = E(Rl Rk Rm Rn ) − E(Rl Rk )E(Rm Rn )[3] − E(Rl )E(Rk Rm Rn )[4] + 2E(Rl )E(Rk )E(Rm Rn )[6] − 6E(Rl )E(Rk )E(Rm )E(Rn ) (A.21) n l k m n l k m n = E(Rl1 Rk1 Rm 1 R1 ) + E(R2 R1 R1 R1 )[4] + E(R3 R1 R1 R1 )[4] n l k m n l k m n + E(Rl2 Rk2 Rm 1 R1 )[6] − E(R1 R1 ]E(R1 R1 )[3] − E(R2 R1 )E(R1 R1 )[12] l k m n n l k m n − E(Rl3 Rk1 )E(Rm 1 R1 )[12] − E(R2 R2 )E(R1 R1 )[6] − E(R2 )E(R1 R1 R1 )[4] n l k m n −4 − E(Rl2 )E(Rk2 Rm 1 R1 )[12] + 2E(R2 )E(R2 )E(R1 R1 )[6] + O(n ). From (A.11), we immediately have n k m n k m n −4 E(Rl2 ){E(Rk1 Rm 1 R1 )[4] + E(R2 R1 R1 )[12] − 2E(R2 )E(R1 R1 )[6]} = O(n ) which means the sum of the last three terms in (A.21) is negligible. To facilitate easy expressions, let us define t1 = αlkmn , t2 = δ lk δ mn + δ lm δ kn + δ ln δ km , 30 t3 = αlkm αnoo + αlkn αmoo + αlmn αkoo + αkmn αloo , t4 = αlko αmno + αlmo αkno + αlno αkmo , t5 = αkmn αl t6 = αlk p+a p+a p+a mn p+a α + αlmn αk p+a p+a + αlmo α + αlkn αm p+a kn p+a + αln p+a p+a + αlkm αn p+a km p+a α p+a p+a , . It is relatively easy to show that m n −3 −4 n l k E(Rl1 Rk1 Rm 1 R1 ) − E(R1 R1 )E(R1 R1 )[3] = n (t1 − t2 ) + O(n ). (A.22) n l k m n To derive E(Rl2Rk1 Rm 1 R1 )[4] − E(R2 R1 )E(R1 R1 )[12], we notice that − 12 {E(Alo Ao Ak AmAn )[4] − E(AloAo Ak ]E[AmAn )[12]} = n−3 I2 + O(n−4 ) where I2 = −6t1 + 2t2 − 12 t3 − 2t4, −{E(Alp+aAp+a Ak Am An )[4] − E(Al p+a Ap+aAk ]E[AmAn )[12]} = n−3 I3 + O(n−4 ), where I3 = −(t5 + 4t6 ), 1 lov α {E(AoAv Ak Am An )[4] 3 − E(AoAv Ak )E(Am An )[12]} = n−3 I4 + O(n−4 ), where I4 = 13 t3 + 83 t4 , ω ol {E(C p+a,oAp+aAk Am An )[4] − E(C p+a,o Ap+aAk )E(Am An )}[12] = n−3 I5 + O(n−4 ), where I5 = ω ol (γ p+a,o;p+a αkmn + γ p+a,o;k αmn [αlo p+a p+a + γ p+a,o;m αkn p+a + γ p+a,o;n αkm p+a )][4], − 12 γ p+a,m n ω m o ω n l ]{E(Ap+aAo Ak Am An )}[4] − E(Ap+aAo Ak )E(Am An )}[12] 0 0 0 0 = n−3 I6 + O(n−4 ), 0 0 0 0 where I6 = 4t6 − [ω m l ω n k γ p+a,m n αmn {[αl p+a p+b p+a ][6] and − γ p+a;p+b,o ω ol ]{E[Ap+aAp+b Ak Am An ]}[4] − E[Ap+aAp+b Ak ]E[AmAn ]}[12] = n−3 I7 + O(n−4 ), where I7 = t5 − γ p+a;p+a,o [ω olαkmn ][4]. 31 In summary, we have n l k m n E(Rl2 Rk1 Rm 1 R1 )[4] − E(R2 R1 )E(R1 R1 )[12] = n−3 {ω ol (γ p+a,o;k αmn (A.23) p+a + γ p+a,o;m αkn p+a + γ p+a,o;n αkm − 6t1 + 2t2 − 16 t3 + 23 t4 − [ω m l ω n k γ p+a,m n αmn 0 0 0 0 p+a p+a )][4] ][6]} + O(n−4 ). n l k m n l l Next we need to calculate E(Rl2 Rk2 Rm 1 R1 )[6] − E(R2 R2 )E(R1 R1 )[6]. Since R1 = A , the terms involved is closely related to the 21 terms in Rl2 Rk2 listed in Appendix 2. To simplify notations, let denote the expectation of each of the 21 terms as n−3 Ij+7 + O(n−4 ) for j = 1, · · · , 21. The expressions for Ij+7 are I8 = 3t1 − t2 + 32 t3 + 3t4, 0 0 I10 = 23 t3 + I9 = 4t6, 0 0 0 16 t, 9 4 0 I11 = [ω k l ω m k (γ p+a,m ;n γ p+a,k ;m + γ p+a,k ;n γ p+a,m ;m )][6], I12 = [(αlm + (αln p+a p+a − 12 γ p+a,ov ω om ω vl )(αkn − 12 γ p+a,ov ω on ω vl )(αkm 0 0 0 0 = 4t6 − 2{ω o n ω v k γ p+a,o v αlm + 0 0 0 0 1 {ω l l ω m m (ω n k ω k n 4 I13 = 0, p+a 0 p+a p+a − 12 γ p+a,o v ω o n ω v k ) 0 0 0 0 − 12 γ p+a,o v ω o m ω v k )][6] 0 0 0 0 }[6] 0 0 0 0 0 + ω n n ω k k )γ p+a,l m γ p+a,n k }[6], I14 = 3t5 + 4t6 , 16 t , 3 4 I15 = −2t3 − I16 = {−γ p+a,o;p+a(ω ol αkmn + ω ok αlmn )[6] − 21 [ω ol (γ p+a,o;mαkn + γ p+a,o;n αkm p+a ) + ω ok (γ p+a,o;m αln 0 0 0 0 I17 = −4t6 + 41 {γ p+a,m n [ω m m (ω n l αkn 0 0 + ω m n (ω n l αkm p+a 0 0 + ω n k αlm 0 0 0 p+a = −4t6 + [ω m m ω n l γ p+a,m n αkn p+a + γ p+a,o;n αlm 0 p+a + ω n k αln + ω ok (γ p+a,o;m αln p+a p+a p+a 0 0 + γ p+a,o;n αlm 0 0 32 ) ][6], + γ p+a,o;n αkm I21 = −8t6 + 12 γ p+a,m n [ω m m (ω n l αkn )][6]}, )]}[6] I18 = {−3t5 + γ p+a;p+a,o [ω olαkmn + ω ok αlmn ][6]}, I20 = {−[ω ol(γ p+a,o;m αkn p+a p+a p+a p+a p+a p+a I19 = −2t5 , ) )][6]}, 0 + ω n k αln p+a ) 0 0 + ω m n (ω n l αkm 0 p+a 0 + ω n k αlm 0 p+a 0 0 = −8t6 + 2[ω m m ω n l γ p+a,m n αkn )][6] p+a ][6], I23 = { 23 γ p+a,o;p+a(ω ok αlmn + ω ol αkmn )[6]}, I22 = 0, I24 = 0, I25 = {2t5 − 32 γ p+a,o;p+a(ω ok αlmn + ω ol αkmn )[6]}, I26 = {[ω ol(γ p+a,o;m αkn + ω ok (γ p+a,o;m αln p+a p+a 0 1 p+a,m0 n0 γ [(ω ol ω n k 2 − + γ p+a,o;n αkm + γ p+a,o;n αlm 0 p+a p+a ) )][6] 0 0 + ω ok ω n l )(ω m n γ p+a,o;m + ω m m γ p+a,o;n )][6]} and I27 = I28 = 0. In summary, we have n l k m n −3 E(Rl2Rk2 Rm 1 R1 )[6] − E(R2 R2 )E(R1 R1 )[6] = n −3 = n {3t1 − t2 + − 1 t 6 3 − 1 [ω ol (γ p+a,o;m αkn p+a 2 + ω ok (γ p+a,o;m αln 0 0 p+a 5 t 9 4 + [ω m0 m ω γ + γ p+a,o;n αkm + γ p+a,o;n αlm 0 n0 l p+a,m0 n0 p+a p+a 0 α 21 X Ij+7 + O(n−4 ) j=1 kn p+a ][6] ) )][6] 0 0 + [ω k l ω m k (γ p+a,m ;n γ p+a,k ;m + γ p+a,k ;n γ p+a,m ;m )][6] (A.24) − 0 1 p+a,m0 n0 γ [(ω ol ω n k 2 + 0 0 0 0 1 [ω l l ω m m (ω n k ω k n 4 0 0 0 + ω ok ω n l )(ω m n γ p+a,o;m + ω m m γ p+a,o;n )][6] 0 0 0 0 0 0 + ω n n ω k k )γ p+a,l m γ p+a,n k ][6]} + O(n−4 ). n l k m n Finally, we need to derive E(Rl3 Rk1 Rm 1 R1 )[4] − E(R3 R1 )E(R1 R1 )[12]. Due to the fact that E(Ap+a Al) = 0 for l ∈ {1, · · · , p}, there is no need to compute the terms in Rl3 involving Ap+a . In particularly, n l k m n −3 −4 E(Rl31Rk1 Rm 1 R1 )[4] − E(R31 R1 )E(R1 R1 )[12] = n I29 + O(n ) where I29 = 2t1 − 19 t4, n lp+a p+ao o k n −3 −4 E(Alp+a Ap+ao AoRk1 Rm A A R1 )E(Rm 1 R1 )[4] − E(A 1 R1 )[12] = n I30 + O(n ) where I30 = 4t6 , 33 n p+a,n n − 12 ω n l ω k m {E(C p+a,n C p+a,k Am Rk1 Rm C p+a,k Am Rk1 )E(Rm 1 R1 )[4] − E(C 1 R1 )[12]} 0 0 0 0 0 0 0 0 0 = n−3 I31 + O(n−4 ) 0 0 0 0 0 0 where I31 = {ω n l ω k k [γ p+a,k ;n γ p+a,n ;m + γ p+a,n ;n γ p+a,k ;m ]}[6], 0 0 0 1 k 0 n0 o0 l m0 v 0 p+a,k 0 o0 n ω ω ω γ {E(C p+a,m An Av Rk1 Rm 1 R1 )[4] 2 0 0 0 n − E(C p+a,m An Av Rk1 )E(Rm 1 R1 )[12]} = n−3 I32 + O(n−4 ), where I32 = 12 {γ p+a,m n (ω ol ω n k + ω ok ω n l )(ω m n γ p+a,o;m + ω m m γ p+a,o;n )}[6], 0 0 −αlk 0 p+a 0 {E(Am 0 0 p+a 0 0 0 0 Ak Am Ak Am An )[4] − E[Am 0 p+a 0 0 Ak Am Ak )E(Am An )[12]} = n−3 I33 + O(n−4 ), where I33 = −8t6 , [ 12 αlvp+a αoo 0 p+a − 81 ω l l ω m o ω n o ω k v γ p+a,l m γ p+a,n k ]{E(Ao Ao Av Ak Am An )[4] 0 0 0 0 0 0 0 0 0 0 0 −E(AoAo Av Ak )E(Am An )[12]} = n−3 I34 + O(n−4 ), where I34 = 4t6 − 14 {ω l l ω m m (ω n k ω k n + ω n n ω k k )γ p+a,l m γ p+a,n k }[6]. 0 0 0 0 0 0 0 0 0 0 In summary, we have n l k m n −3 E(Rl3Rk1 Rm 1 R1 )[4] − E(R3 R1 )E(R1 R1 )[12] = n −3 = n {2t1 − (A.25) 1 t 9 4 n0 l + [ω ω + 0 1 p+a,m0 n0 [γ (ω ol ω n k 2 − 0 0 0 0 1 [ω l l ω m m (ω n k ω k n 4 k0 k (γ p+a,k 0 ;n p+a,n0 ;m γ 0 +γ 6 X Ij+28 + O(n−4 ) j=1 p+a,n0 ;n p+a,k 0 ;m 0 γ )][6] 0 + ω ok ω n l )(ω m n γ p+a,o;m + ω m m γ p+a,o;n )][6] 0 0 0 0 0 0 + ω n n ω k k )γ p+a,l m γ p+a,n k ][6]} + O(n−4 ), Combining (A.22), (A.23), (A.24) and (A.25) it may be shown that (A.26) cum(Rl , Rk , Rm , Rn ) = O(n−4 ). Edgeworth Expansion for r(θ0). We first derive an Edgeworth expansion for the distribu1 tion of n1/2R. Let κj be the j-th order joint cumulant of n 2 R. From (A.10), (A.16), (A.12) and (A.26), κ1 = n−1/2 µ + O n−3/2 , κ2 = Ip + n−1 ∆ + O(n−3 ), 34 κ3 = O(n−3/2 ), κ4 = O(n−2 ), where Ip is the p × p identity matrix, µ = (µ1 , · · · , µp )τ with µl = − 16 αlkk and ∆ = ∆lo p×p . Let ŪA = ŪC = A1, · · · , Ar , A11, · · · , Arr , A111, · · · , Arrr τ , C 1,1, · · · , C 1,p, · · · , C r,1, · · · , C r,p, C 1;1,1, · · · , C r;r,p τ and Ū = (UAτ , UCτ )τ is a vector of means. From (2.10), (2.11) and the expansion for R3 given in Appendix A.3, the signed square root n1/2R can be expressed as a smooth function of U , namely there exists a smooth function h such that n1/2R = h(Ū ). We can then use the results given in Bhattacharya and Ghosh (1978) to formally establish Edgeworth expansion for the distribution of n1/2R under conditions (2.1). In particular, let B be a class of Boreal sets in Rp satisfying Z sup B∈B (∂B) φ(v)d v = O(), ↓ 0, where ∂B and (∂B) are the boundary of B and -neighborhood of ∂B respectively. A formal 1 Edgeworth expansion for the distribution function of n 2 R is sup |P (n B∈B 1/2 R ∈ B) − Z B π(v)φ(v)d v| = O(n−3/2 ), where π(v) = 1+n−1/2 µT v+ 12 n−1 {v T µµT + ∆ v−tr µµT + ∆ }, φ(v) is the p-dimensional standard normal density, and tr()˙ is the trace operation for square matrices. Let H = (hij )p×p =: µµT + ∆. By the symmetry of φ(v) we have 1 1 P {`(β) < cα} = P {(n 2 R)T (n 2 R) < cα } + O(n−3/2 ) = Z 1/2 kvk<cα π(v)φ(v)dv + O(n−3/2 ) = P (χ2p < cα) + 12 n−1 (A.27) +O(n−3/2 ) R 1/2 kvk<cα { Pp i=1 hii (vi2 − 1) + = α − p−1 Bc cα fp (cα ) n−1 + O(n−3/2 ) 35 P i6=j hij vi vj }φ(v)d v where Bc = Pp i=1 hii = Pp l=1 (µ l l µ + ∆ll). Due to fact that an even order Hermit polynomial is an even function, the error term in (A.27) is in fact O(n−2 ). This completes the proof. Proof of Theorem 2. Based on the Edgeworth expansion given in Theorem 1 and follow standard derivations, for instance those given in Chen (1993), we can readily establish the theorem. References Andrews, D. W. K. (2002): Generalized method of moments estimation when a parameter is on a boundary, Journal of Business and Economic Statistics, 20, 530-544. Baggerly, K. A. (1998): Empirical likelihood as a goodness-of-fit measure. Biometrika, 85, 535-547. BHATTACHARYA, R.N. and GHOSH, J.K. (1978): On the validity of the formal Edgeworth expansion. The Annals of Statistics, 6, 434–451. BHATTACHARYA, R.N. and RAO, R.R. (1976): Normal Approximation and Asymptotic Expansions. Wiley, New York. Brown, B. W. and Newey, W. K. (2002): “Generalized method of moments, efficient bootstrapping, and improved inference”, Journal of Business and Economic Statistics, 20, 507-517. CHEN S. X. (1993): On the coverage accuracy of empirical likelihood regions for linear regression model, The Annals of the Institute of Statistical Mathematics, 45, 621-637. CHEN, S. X. (1994). Empirical likelihood confidence intervals for linear regression coefficients. J. Multivariate Anal. 49, 24-40. DICICCIO, T. J., HALL, P. AND ROMANO, J. P. (1991): Empirical likelihood is Bartlett correctable. The Annals of Statistics, 19, 1053-1061. Donald, S. G., Imbens, G. W. AND Newey, W. K. (2003): Empirical likelihood estimation and consistent tests with conditional moment restrictions. Journal of Econo- 36 metrics, 117, 55-93. Hansen, B. E. and WEST, K. D. (2002): Generalized method of moments and macroeconomics, Journal of Business and Economic Statistics, 20, 460-469. Hansen, L. P. (1982): Large sample properties of generalized method of moments estimators, Econometrica, 50, 1029-1054. HALL, P. (1992): The bootstrap and Edgeworth Expansions Springer-Verlag. HALL, P. AND LA SCALA, B. (1990): Methodology and algorithms of empirical likelihood. International Statistical Reviews 58, 109–127. Imbens, G. W. (1997): One-step estimators for over-identified generalized method of moments models, Review of Economic Studies, 64, 359-383. Imbens, G. W. (2002): Generalized method of moments and empirical likelihood equations, Journal of Business and Economic Statistics, 20, 493-506. JING, B. Y. and WOOD, A. T. A. (1996). Exponential empirical likelihood is not Bartlett correctable. Ann. Statist. 24 365-369. KITAMURA, Y. (1997): Empirical likelihood methods with weakly dependent processes. The Annals of Statistics, 25, 2084-2102. KITAMURA, Y. (2001): Asymptotic optimality of empirical likelihood for testing moment restrictions. Econometrica, 69, 1661-1672. KITAMURA, Y. AND STUTZER, M. (1997): An information-theoretic alternative to generalized method of moments estimation. Econometrica, 65, 861-874. KITAMURA, Y., Tripathi, G. and Ahn, H. (2002): Empirical likelihood-based inference in conditional moment restriction models. Manuscript. Newey, W. K. and Smith, R. J. (2004): Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica to appear. OWEN, A. B. (1988): Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75, 237–249. 37 OWEN, A. B. (2001). Empirical Likelihood. Chapman and Hall. QIN, J. AND LAWLESS, J. (1994): Empirical likelihood and general estimation equations, The Annals of Statistics, 22, 300-325. Smith, R. J. (1997): Alternative Semi-parametric likelihood approaches to generalized method of moments estimation, Economic Journal, 107, 503-519. SKOVGAARD,IB,M. (1981): Transformation of an Edgeworth expansion by a sequence of smooth functions. Scand. J. Statist. 8, 207-217. Tripathi, G. AND KITAMURA, Y. (2002): Testing conditional moment restrictions. Manuscript. Department of Statistics, Iowa State University, Ames, Iowa 50011-1210, USA Email: songchen@iastate.edu Department of Mathematics, Beijing Normal University, 100875, China Email: hjcui@bnu.edu.cn 38 Table 1. Empirical coverage (in percentage) and averaged length of the EL confidence interval Iα, the Bartlett corrected (BC) EL interval Iα,bc and the dircet Bootstrap (BT) calibrated confidence interval Iα,bt for the normal mean example. (a) N (0, 1) Nominal level Sample Size 20 30 40 60 90% 95% EL BC BT EL BCBT coverage 84.66 89.40 86.88 91.12 93.04 93.34 length 0.636 0.817 0.791 0.757 0.967 0.967 coverage 86.30 89.60 87.80 92.60 93.80 93.10 length 0.552 0.662 0.640 0.659 0.789 0.781 coverage 86.47 89.48 87.98 93.49 95.09 94.79 length 0.492 0.558 0.548 0.588 0.668 0.665 coverage 86.70 89.00 87.90 93.50 94.50 92.90 length 0.448 0.492 0.437 0.536 0.588 0.527 (b) N (1, 2) Nominal level Sample Size 20 30 40 60 90% 95% EL BC BT EL BC BT coverage 80.02 87.99 85.57 86.78 91.62 90.41 length 0.656 0.898 0.854 0.781 1.061 1.084 coverage 82.85 89.47 88.16 88.87 93.68 93.88 length 0.6547 0.7740 0.752 0.739 0.883 0.879 coverage 85.77 89.94 88.92 91.77 93.70 93.70 length 0.496 0.590 0.572 0.592 0.703 0.701 coverage 87.47 90.28 89.68 92.69 94.19 94.29 length 0.411 0.454 0.446 0.492 0.543 0.540 39 Table 2. Empirical coverage (in percentage) and averaged length of the EL confidence interval Iα and the Bartlett corrected (BC) EL interval Iα,bc for the panel data model (4.1) (a) θ = 0.5 Nominal level 90% 95% Sample Size EL BC BT EL BC BT 50 coverage 81.7 87.3 87.7 88.8 93.6 93.9 length 0.603 0.723 0.733 0.736 0.876 0.880 coverage 87.4 89.9 89.9 94.3 95.7 95.5 length 0.425 0.461 0.462 0.511 0.556 0.560 100 (b) θ = 0.9 Nominal level 90% 95% Sample Size EL BC BT EL BC BT 50 coverage 84.4 89.6 89.2 90.2 94.9 95.1 length 0.545 0.648 0.656 0.667 0.787 0.800 coverage 87.3 89.1 89.2 92.6 94.4 94.3 length 0.373 0.401 0.402 0.4494 0.4847 0.487 100 40

On the Second Order Properties of Empirical Likelihood with Moment Restrictions

Related documents

Products

Support

On the Second Order Properties of Empirical Likelihood with Moment Restrictions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib