Some results of Composite Likelihoods Efficiency and Higher order asymptotics Zi Jin Department of Statistics, University of Toronto April 2008, The University of Warwick 1. Composite marginal likelihoods: Definition • Low-dimensional Marginal densities (Cox & Reid, 2004). • Univariate and bivariate distributions are often adopted, and this leads to the so-called pairwise likelihood. • For a single vector Y , the 1st and 2nd-order log-likelihoods contribute X l1 (θ; Y ) = log f (Ys ; θ), (1) s l2 (θ; Y ) = X log f (Ys , Yt ; θ) − al1 (θ; Y ). s>t a is a pre-specified constant. a = 0: taking all possible bivariate dist’ns and leading to the pairwise log-likelihood. a = 1/2: taking all possible conditional dist’ns of one component given another. (2) 2. Composite score functions and estimators Composite score functions For n i.i.d. vectors Y (1) , Y (2) , . . . , Y (n) the joint composite log-likelihoods are defined by addition: X (1) (2) (n) lν (θ; Y (i) ), ν = 1, 2. lν (θ; Y , Y , . . . , Y ) = i Composite score functions are defined in the usual way by the composite likelihood derivatives: Uν (θ; Y (1) , Y (2) , . . . , Y (n) ) = ∂lν (θ; Y (1) , Y (2) , . . . , Y (n) )/∂θ, ν = 1, 2. Maximum Composite likelihood estimator The estimating equations Uν (θ; Y (1) , Y (2) , . . . , Y (n) ) = 0 give out the maximum composite likelihood estimator θ̃. 3. Efficiency of Composite likelihoods Intraclass Correlation Normal Y = (Y (1) , . . . , Y (n) )T , and Y (1) , . . . , Y (n) ∼ i.i.d.Nq (µ, Σ) where 2 2 T Σ = σ ((1 − ρ)I + ρJ)) and θ = (µ, σ , ρ) . e • Full log-likelihood function of Y : l(θ, Y ) • Pairwise log-likelihood function: l2 (θ ; Y ) = e q XX i s>r log f (Yr(i) , Ys(i) ; θ) e • Pseudo log-likelihood function: l∗ = l2 − aql1 θ̂ = θ̃ = θ ∗ = (ȳ.. , 1 ȳ.. = nq S1 2S2 , )T nq (q − 1)S1 (3) P Pq P Pq (i) (i) (i) (i) − ȳ.. )2 , and S2 = − ȳ.. )(Ys − ȳ.. ). s>r (Yr r=1 (Yr r=1 Yr , S1 = P Pq 3.1 Introclass Correlation Normal: Information Matrices Question: I(θ) = H(θ)J −1 (θ)H(θ)? Smth we need to obtain to show the equivalence: • obtain Σ̇σ2 , Σ̈σ2 , Σ̇ρ , Σ̈ρ , derivatives of Σ with respect to σ 2 and ρ; 2 ∂ l2 • H(θ) = E(− ∂θ∂θ T ), easy one; 2 ∂l2 • J(θ) = E( ∂l ∂θ ∂θ T ), difficult one.; • need to obtain H(θ), J(θ) for both pairwise and pseudolikelihood. 2Z Z For example, Vrs = Zr2 + Zs2 − ρ r s q X X X X X 1 2 Vrs Vr′ s′ (Vrs ) = 4 r s6=r ′ ′ ′ s>r r = r 6=s XX X XX X X X 1 XX ′ ′ rssr + rsrs + [ rsss + rsrs 4 r s6=r r s6=r s′ 6=r,s r s6=r s′ 6=s,r r s6=r XX X XX X XX X X ′ ′ ′ ′ + rsr r + rsr s + rsr s ] r s6=r r′ 6=s,r r s6=r s′ 6=s,r r s6=r r′ 6=s,r s′ 6=r,r′ ,s ∗ I = HJ −1 H = H ∗ J −1 H ∗ , regardless of the value of a. 3.2 Symmetric Normal (i) (i) Y (i) = (Y1 , . . . , Yq )T ∼ Nq (0, V ) = Nq (0, (1 − ρ)I + ρJ) e 0.90 0.85 0.80 ARE 0.95 1.00 e 0.0 0.2 0.4 0.6 0.8 1.0 rho FIG 1 Ratio of asympotic variance of rhohat to rhotilde for q=2,3,5,8,10. 3.3 Unrestricted Multivariate Normal • In Mardia 2007, conditional pairwise pseudolikelihood is defined as PPL = q n Y Y f (yji |yki ; Σ) i=1 j6=k and has been showed to be fully efficient for unrestricted multivariate normal distributions. • Let a = 1/2, log CR = log P P L. • yji yki = CY ∼ N (0, CΣC ′ ) e \′ = C Σ̂C ′ = (Σ̂)jk . if Σ unrestricted, Σ̂jk = CΣC 3.4 AR(1) The working full likelihood of AR(1) process is given by 2 L(a, σ ) = f (x1 ) T Y f (xt |xt−1 ) t=2 The associated log-likelihood is 1 T 1 log(1 − a2 ) − log σ 2 − 2 [S1 + a2 S2 − 2aS12 ] 2 2 2σ P P −1 2 P −1 where S1 = Tt=1 x2t , S2 = Tt=2 xt , S12 = Tt=1 xt xt+1 . l(a, σ 2 ) = (4) Pairwise likelihood for AR(1) We propose a composite likelihood formed by adjacent pairs, denoted by L2 T Y L2 = f (xt , xt−1 ) (5) t=2 where (Xt , Xt−1 ) follows a bivariate normal distribution with mean zero variances σ 2 /(1 − a2 ) and correlation a. 2 L2 (a, σ ) = f (x1 ) T Y f (xt |xt−1 ) t=2 T Y 2 f (xt ) = L(a, σ ) t=2 T Y f (xt ) t=2 where L(a, σ 2 ) is the full likelihood function. That is, 2 l2 (a, σ ) = l(a, σ) + T X f (xt ) (6) t=2 3.4.1 Max likelihood and pairwise likelihood estimation MLE 0 = σ̂ 2 = 1 2 S1 )S2 â3 − (1 − )S12 â2 − (S2 + )â + S12 T T T (S1 + â2 S2 − 2âS12 )/T (1 − (7) (8) For large T , it can be show â ∼ = S12 /S2 . MPLE ã = σ̃ 2 = For large T , â = ã. 2S12 , S1 + S2 2 (S1 + S2 )2 − 4S12 . 2(T − 1)(S1 + S2 ) (9) (10) 3.4.2 Simulations Table I Maximum likelihood estimatiors and Maximum pairwise likelihood estimators for AR(1) model Xt = 0.55Xt−1 + ǫt , with ǫ ∼ N (0, 0.12 ). AR MLE MPLE T=200 a 0.5651 0.5579 σ 0.0970 0.0096 T=100 a 0.5604 0.5556 T=20 a 0.4978 0.5037 σ 0.1006 0.0103 σ 0.0661 0.0046 Sample Variance Table II Sample covariance matrices of Full and Pairwise likelihood for AR(1) model Xt = 0.55Xt−1 + ǫt , with ǫ ∼ N(0, 0.12 ) with different lengths T . N=100,T=200 ratio ˆ θ̂) I( \ −1 H(θ̂) HJ „ 4.778e „ 1.186e 4.796e 2.801e 0.2017444 − 03 1.186e − 05 2.318e − 03 2.801e − 06 9.030e N=200,T=100 0.1973112 « 6.260e − 03 −1.958e − 05 4.058e − 05 « „ −1.958e − 05 6.380e − 03 −3.087e − 06 −3.087e − 06 1.568e − 06 « − 05 − 05 « − 06 − 07 „ N=100,T=20 ratio ˆ θ̂) I( \ −1 H(θ̂) HJ „ 3.758e „ 4.150e 3.646e 5.658e 0.1999862 − 02 4.150e − 04 2.417e − 02 5.658e − 05 9.470e N=10,T=20 « − 04 − 04 « − 05 − 06 N=100,T=3 ratio ˆ θ̂) I( \ −1 H(θ̂) HJ 0.2449904 « 0.2730 −0.0024 0.0012 « „ −0.0024 0.2545 0.0004 0.0004 0.00006 „ „ 7.102e „ 9.829e 6.551e 1.449e 0.2137193 − 02 9.829e − 04 2.537e − 02 1.449e − 04 1.254e « − 04 − 04 « − 04 − 05 N=10,T=3 0.1826142 « 1.5296e − 01 −5.0205e − 03 1.4909e − 03 « „ −5.0205e − 03 1.42323e − 01 1.1454e − 03 1.1454e − 03 5.3282e − 05 „ Relative Efficiency The efficiency of pairwise likelihood compared to full likelihood for AR(1) model can be obtained as |H(θ)J −1 (θ)H(θ)| 1/2 (11) |I(θ)| where θ = (a, σ 2 )′ is a 2-dimensional parameter vector. E{(Xi − µi )(Xj − µj )(Xk − µk )(Xl − µl )} = σij σkl + σik σjl + σil σjk Table III Efficiency of pairwise likelihood for AR(1) model Xt = 0.55Xt−1 + ǫt , with ǫ ∼ N (0, 0.12 ) with different lengths T . Efficiency T=100 0.2296845 T=20 0.4101251 T=8 0.5005665 T=4 0.5636764 Figure 1: Efficiency against T 0.4 0.3 0.2 Efficiency 0.5 AR=0.55, sigma=0.1 0 50 100 T 150 200 Figure 2: Given T = 4, Efficiency against a and σ. Efficiency for T=4 0.8 cy Efficien 0.6 0.4 0.2 1.5 sig ma 1.0 0.2 0.4 AR coe 0.6 ffic ien t 0.5 0.8 Figure 3: Given T = 20, Efficiency against a and σ. Efficiency for T=20 0.8 0.4 0.2 1.5 1.0 sig ma cy Efficien 0.6 0.2 0.4 AR coe 0.6 ffic ien t 0.5 0.8 4. Higher order asymptotics • Higher accuracy inference in terms of r̃ under simple null hypothesis under the first scenario. • Expansions for the expectation and variance of the composite signed likelihood ratio root are found to the order n−3/2 and n−2 , respectively. • Ideally we would expect that E(r̃) = m(θ) + O(n−3/2 ), V ar(r̃) = 1 + v(θ) + O(n−2 ) where m(θ) is of order O(n−1 ) and v(θ) is of order O(n−1 ). • Then, the standard normal approximation to the distribution of r̃ − E(r̃) {V ar(r̃)}1/2 has an error of order O(n−3/2 ). Expansion of r̃2 with asymptotic magnitude of order n−3/2 r̃ 2 = 1 rs tu vw u u u νrtv l̃s l̃u l̃w 3 1 rs tu vw xy zp rs tu vw u u νrtv νsxz l̃u l̃w l̃y l̃p +u u u Hrt Hsv l̃u l̃w + u u u 4 1 rs tu vw rs tu vw xy Hrtv l̃s l̃u l̃w +u u u u νrtv Hsx l̃u l̃w l̃y + u u u 3 1 rs tu vw xy −3/2 u u u u νrtvx l̃s l̃u l̃w l̃y + Op (n ) + 12 u rs l̃r l̃s + u rs tu u Hrt l̃s l̃u + • The leading term urs l̃r ˜ls is the composite score statistic. • The 1st order result of the composite likelihood ratio statistic E[r̃2 ] = µrs E[l̃r l̃s ] + O(n−1/2 ) = µrs νr,s + O(n−1/2 ). (12) Expansion of the expectation of r̃2/r̃ Similar as defined in Peter McCullagh(1987, Ch.7)[10], let r̃t = 1 l̃t + urs (3Hrt l̃s + uvw vrtv l̃s l̃w ) 6 1 rs vw h 27HrtHsv ˜lw + 8uxy uzp vrtv vsxz l̃w ˜ly l̃p + u u 72 i xy xy ˜ +30u vrtv Hsx l̃w l̃y + 12Hrtv l̃s lw + 3u vrtvx l̃s l̃w l̃y (13) It can be verified that while E(r̃t ) = = r̃2 = r̃t r̃u utu + Op (n−3/2 ) 1 rs u {3E(l̃rt l̃s ) + uvw vrtv E(l̃s l̃w )} + O(n−3/2 ) 6 1 rs u {3vrt,s + uvw vrtv vs,w } + O(n−3/2 ) 6 (14) (15) Expansion of the expectation of r̃2/r̃ with Bartlett Identities If the second Bartlett Identity holds, equation (15) can be simplified into E(r̃t ) = 1 rs u {3vrt,s + vrst } + O(n−3/2 ) 6 consistent with the results shown in Barndorff & Cox 1994. It suffices to write the variance of r̃ as V ar(r̃) = c{1 + v(θ)} + O(n−2 ) where v(θ) is of order O(n−1 ), computed directly with sufficient algebraic diligence. (16) Comments • For certain distributions, composite marginal likelihoods may yield full MLE and are fully efficient. • Loss of efficiency is, in general, not ignorable for curved exponential family, or restricted normal distributions. • Though from time to time we tend to overlook the fact that composite likelihood estimators are not consistent or asymptotically normally distributed especially under the second scenario, their asymptotic behaviors are still worth of further consideration. • Profile based inference for AR(1) model, as σ 2 can be treated as a nuisance parameter. Performance of r̃ and its higher order approximation. Arnold, B.C. and Strauss, D. (1991), Pseudolikelihood estimation: some examples, Sankhya: The Indian Journal of Statistics, 53, 233-243. Barndorff-Nielsen, O.E. and Cox, D.R. (1994), Inference and Asymptotics, Chap. Besag, J.E. (1974), Spatial interaction and the statistical analysis of lattice systems (with discussion), Journal of the Royal Statistical Society, B 34, 192-236. Cox, D.R. and Reid, N. (2004), A note on pseudolikelihood constructed from marginal densities, Biometrika, 91, 727-737. DiCiccio, T.J. and Stern, S.E. (1994), Constructing Approximately Standard Normal Pivots from Signed Roots of Adjusted Likelihood Ratio Statistics, The Scandinavian Journal of Statistics, 21, 447-460. Geys, H., Molenberghs, G. and Ryan, L. M. (1999), Pseudolikelihood Modeling of Multivariate Outcomes in Developmental Toxicology, Journal of the American Statistical Association, 94, 734-745. Kent, J.T. (1982), Robust Properties of Likelihood Ratio Test, Biometrika, 69, 19-27. Lindsay, B.G. (1988). Composite likelihood methods, Contemporary Mathematics, 80, 221-240. Mardia K.V., Hughes G. and Taylor C.C. (2007), Efficiency of the pseudolikelihood for multivariate normal and von Mises distributions, Research reports, STAT07-02, Department of Statistics, University of Leeds. McCullagh, P. (1987), Tensor Methods in Statistics, Chapman and Hall. Severini, T.A. (2000), Likelihood Methods in Statistics, Oxford University Press. Stern, S.E.(2006), Simple and accurate one-sided inference based on a class of M -estimators, Biometrika, 93, 973-987. Varin, C.(2007), On composite marginal likelihoods, preprint.