1681 Two-term Edgeworth expansion of the distributions of the maximum likelihood estimators in factor analysis under nonnormality Haruhiko Ogasawara Otaru University of Commerce, Otaru 047-8501 Japan E-mail hogasa@res.otaru-uc.ac.jp 1. Estimators in factor analysis The purpose of this paper is to have the two-term Edgeworth expansion of the parameter estimators by maximum likelihood (ML) in factor analysis possibly with factor rotation under nonnormality (for the Edgeworth expansion, see e.g., Hall, 1992). Ogasawara (2005a) gave the corresponding results of the various least squares estimators and the normal-theory (NT) Studentized estimators under nonnormality. For the non-NT Studentized estimators by ML, Ogasawara (2004, September; 2005, January; 2005b) provided the similar results up to order n −1/ 2 , where n+1=N is the number of observations. Let Σ = Σ (θ) be the p × p structured covariance matrix of observable variables in factor analysis: (1) Σ = Σ(θ) = Λ Λ '+ Ψ , where Λ is the p × u loading matrix; Ψ is the diagonal covariance matrix for unique factors; and θ is the q × 1 vector of parameters. When the factor analysis model holds for standardized variables, (2) Σ = D( Λ Λ '+ Ψ ) D, Diag( Λ Λ '+ Ψ ) = I p , where D = Diag1/2 ( Σ) ; Diag(⋅) is a diagonal matrix taking the diagonal elements of an argument; and I p is the p × p identity matrix . In the case 1682 of exploratory factor analysis, the restrictions for model identification are imposed on (1) or (2), which is generally given by ∂f R ( Λ ) ∂f R ( Λ ) (3) − Λ=O ∂Λ ∂Λ ' (Archer & Jennrich, 1973), where f R ( Λ ) is a rotation criterion to be optimized by Λ (e.g., the raw-varimax criterion). The Wishart ML estimaΛ' tors θ̂ is given by minimizing F (θ, S) = tr( Σ −1S) + ln | Σ | with possible restrictions (3) (and (2)), where S is the unbiased sample covariance matrix. Note that the estimators in factor analysis are functions of sample variances and covariances. Usually, however, they are implicit ones, which require some special treatment (see Section 3). 2. The Edgeworth expansion 1/ 2 Let w = n (θˆ − θ ) , where θ is a population parameter in θ and θˆ is its estimator. We assume that the following cumulants are available: κ1 ( w) = E( w) = n −1/ 2α1 + O(n −1/ 2 ), κ 2 ( w) = E[{w − E( w)}2 ] = α 2 + n −1Δα 2 + O(n −1 ), κ 3 ( w) = E[{w − E( w)}3 ] = n −1/ 2α 3 + O(n −1/ 2 ), (4) κ 4 ( w) = E[{w − E( w)}4 ] − 3{κ 2 ( w)}2 = n −1α 4 + O(n −1 ). The parameter estimators and their asymptotic cumulants in (4) are given by the first-order conditions ' ∂ Gˆ ⎛ ∂ Fˆ ˆ ∂ hˆ ˆ ⎞ (5) , h ' ⎟ = 0, Gˆ = Fˆ + ξˆ ' hˆ , =⎜ +ξ' ∂ ηˆ ⎝ ∂ θˆ ' ∂ θˆ ' ⎠ where η = (θ ', ξ ') ' ; ξ is the vector of Lagrange multipliers; Fˆ = F (θˆ , S) ; h = h(θ) is the r × 1 vector for restrictions ( h(θˆ ) = 0 ). Note that in exploratory factor analysis, (5) reduces to (∂ Fˆ / ∂ θˆ ', hˆ ') ' = 0 with Gˆ = Fˆ since the restrictions are for identification. Let σ = v( Σ) , s = v(S) , where v(⋅) is the vectorizing operator taking non-duplicated elements of a symmetric matrix, and Ω = n acov(s) . Then, from Ogasawara (2006), we have 1683 ⎞ 1 ⎛ ∂ 2θ ∂θ ∂θ , Ω ⎟ , α2 = Ω 2 ⎝ ∂σ∂σ' ⎠ ∂σ' ∂σ ∂θ ∂θ Δα 2 = −∑∑ (σ abcd − σ abσ cd − σ acσ bd − σ adσ bc ) a ≥b c ≥ d ∂σ ab ∂σ cd α1 = tr ⎜ ⎧⎪ ∂θ ∂ 2θ + ∑∑∑ ⎨ (σ abcdef − σ abσ cdef − 2σ cdσ abef ∂ ∂ ∂ σ σ σ a ≥b c ≥ d e ≥ f ⎪ ab cd ef ⎩ − 2σ acd σ bef − 2σ abcσ def − 2σ abd σ cef + 2σ abσ cdσ ef ) ⎛ 1 ∂ 2θ ∂ 2θ ∂θ ∂ 3θ +∑ ⎜ + ⎜ ∂σ gh ∂σ ab ∂σ cd ∂σ ef g ≥ h ⎝ 2 ∂σ ab ∂σ ef ∂σ cd ∂σ gh α3 = ∑∑∑ a ≥b c ≥ d e ≥ f ⎫⎪ ⎞ ⎟⎟(Ω)ab ,cd (Ω)ef , gh ⎬ , ⎪⎭ ⎠ ∂θ ∂θ ∂θ (σ abcdef − 3σ abσ cdef ∂σ ab ∂σ cd ∂σ ef − 6σ abcσ def ∂θ ∂ 2θ ∂θ + 2σ abσ cdσ ef ) + 3 Ω Ω , ∂σ' ∂σ∂σ' ∂σ ⎡ ∂θ ∂θ ∂θ ∂θ g ≥h ⎢ ⎣ ∂σ ab ∂σ cd ∂σ ef ∂σ gh α 4 = ∑∑∑∑ ⎢ a ≥b c ≥ d e ≥ f 24 32 ⎛ × ⎜ κ abcdefgh + ∑ κ acκ bdefgh + ∑ κ aceκ bdfgh ⎝ 8 24 96 48 + ∑ κ acegκ bdfh + ∑ κ abegκ cdfh + ∑ κ acκ beκ dfgh + ∑ κ acκ egκ bdfh 96 48 6 ⎞ + ∑ κ acκ beg κ dfh + ∑ κ bcκ deκ fgκ ha − ∑ κ abcd (Ω)ef , gh ⎟ ⎠ 2 ∂θ ∂θ ∂θ ∂θ 10 +∑ 2 ∑ (Ω)ab,cd M (ef , gh, jk ) ∂σ ab ∂σ cd ∂σ ef ∂σ gh ∂σ jk j≥k ⎛ 3 ∂ 2θ 2 ∂ 2θ ∂ 3θ ∂θ ⎞ + ∑∑ ⎜ + ⎟ ⎜ 3 ∂σ ab ∂σ cd ∂σ ef ∂σ gh ⎟⎠ j ≥ k l ≥ m ⎝ 2 ∂σ ab ∂σ cd ∂σ ef ∂σ gh ⎤ ∂θ ∂θ 15 (Ω) ab ,cd (Ω)ef , gh (Ω) jk ,lm ⎥ × ∑ ∂σ jk ∂σ lm ⎥⎦ −(4α1α 3 + 6α 2 Δα 2 + 6α 2α12 ), (6) 1684 where the partial derivatives denote those evaluated at the population values; 12 4 8 M ( ab, cd , ef ) = κ abcdef + ∑ κ abceκ df + ∑ κ aceκ bdf + ∑ κ acκ beκ df , ( p ≥ a ≥ b ≥ 1; p ≥ c ≥ d ≥; p ≥ e ≥ f ≥ 1); κ a a ...a (σ a a ...a ) is the t-th order multivariate cumulant (central moment) of 1 2 t 1 2 t k X a1 ,..., X at ; and Σ denotes a summation of similar k terms. The Edgeworth expansion up to order n −1 is given as follows with the assumption of its validity: ⎛ w ⎞ ⎧α ⎫ ⎧1 α z Pr ⎜ 1/ 2 ≤ z ⎟ = Φ( z) − n−1/ 2 ⎨ 1/12 + 3/3 2 ( z 2 − 1) ⎬φ ( z) − n−1 ⎨ (Δα2 + α12 ) 6α 2 α 2 (7) ⎝ α2 ⎠ ⎩α 2 ⎭ ⎩2 2 5 3 3 ⎛ α α α ⎞ z − 3z α3 ( z − 10 z + 15z ) ⎫ −1 +⎜ 4 + 1 3 ⎟ + ⎬φ ( z) + o(n ), 6 ⎠ α 22 72α 23 ⎝ 24 ⎭ where φ ( z ) = (1/ 2π ) exp(− z 2 / 2) and Φ ( z ) = ∫ z −∞ φ (t ) dt . 3. Partial derivatives The partial derivatives up to the third order in (6) are given using the partial derivatives in implicit functions (see Ogasawara, 2004, October; 2005c) as follows: −1 ⎛ ∂ 2 Gˆ ⎞ ∂ ηˆ ∂ 2 Gˆ , = − ⎜⎜ ⎟⎟ ∂ sab ⎝ ∂ ηˆ ∂ ηˆ ' ⎠ ∂ ηˆ ∂ sab ⎛ ∂ 2Gˆ ⎞ ∂ 2 ηˆ = − ⎜⎜ ⎟⎟ ∂ sab ∂ scd ⎝ ∂ ηˆ ∂ ηˆ ' ⎠ −1 ∂ηˆi ∂ηˆ j ∂ηˆi ∂ 3Gˆ ∂ 3Gˆ ⎪⎧ +∑ ⎨∑∑ ˆ ˆ ˆ ˆ ˆ η η s s s sab η η η ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ i i j ab cd i cd ⎩⎪ i j +∑ i ∂ηˆi ∂ 3Gˆ ∂ 3Gˆ + ∂ ηˆ ∂ ηˆi ∂ sab ∂ scd ∂ ηˆ ∂ sab ∂ scd ⎪⎫ ⎬, ⎪⎭ (8) 1685 −1 ⎡ ∂ηˆi ∂ηˆ j ∂ηˆk ∂ 4Gˆ ⎢ ∑∑∑ ⎢⎣ i j k ∂ ηˆ ∂ ηˆi ∂ ηˆ j ∂ ηˆk ∂ sab ∂ scd ∂ sef 2 3 ∂ηˆi ∂ ηˆ j ∂ηˆi ∂ηˆ j ⎞ ∂ 3Gˆ ∂ 4Gˆ ⎪⎧ ⎛ + ∑ ∑ ⎨∑ ⎜ + ⎟ ⎜ ˆ ∂ ηˆ ∂ ηˆ ∂ s ∂ s ∂ s ∂ ηˆ ∂ ηˆi ∂ ηˆ j ∂ sU ∂ sV ∂ sW ⎠ (U ,V ,W ) i ⎪ j ⎝ ∂ η i j U V W ⎩ ⎛ ∂ 2Gˆ ⎞ ∂ 3 ηˆ = − ⎜⎜ ⎟⎟ ∂ sab ∂ scd ∂ sef ⎝ ∂ ηˆ ∂ ηˆ ' ⎠ ⎤ ∂ηˆi ⎫ ∂ 2ηˆi ∂ 4Gˆ ∂ 4Gˆ ∂ 3Gˆ + ⎥, ⎬+ ∂ ηˆ ∂ ηˆi ∂ sU ∂ sV ∂ sW ∂ ηˆ ∂ηˆi ∂ sU ∂ sV ∂ sW ⎭ ∂ ηˆ ∂ sab ∂ scd ∂ sef ⎦⎥ ( p ≥ a ≥ b ≥ 1; p ≥ c ≥ d ≥ 1; p ≥ e ≥ f ≥ 1), 3 where ∑ denotes a summation over the range: + (U ,V ,W ) (U , V , W ) ∈ {( ab, cd , ef ), (ef , ab, cd ), (cd , ef , ab)} . The partial derivatives of Ĝ with respect to η̂ and s evaluated at the population values are given by Ogasawara (2005c) as follows: ⎛ ∂ 2G ∂ Σ −1 ∂ Σ ⎞ Σ = tr ⎜ Σ −1 ⎟, ⎜ ∂θi ∂θ j ∂θi ∂θ j ⎟⎠ ⎝ ∂ht ⎛ ∂ 2G ∂ 2G ∂ Σ −1 ⎞ = = − ⎜ Σ −1 Σ ⎟ (2 − δ ab ), , ∂θi ∂ξt ∂θi ∂θi ∂σ ab ∂θi ⎝ ⎠ ab ⎛ ∂ 3G ∂ Σ −1 ∂ Σ −1 ∂ Σ = tr ⎜ −4Σ −1 Σ Σ ⎜ ∂θ i ∂θ j ∂θ k ∂θ i ∂θ j ∂θ k ⎝ (9) 2 2 2 ∂ Σ ∂Σ ∂ Σ ∂Σ ∂ Σ ∂Σ⎞ + Σ −1 + Σ −1 + Σ −1 Σ −1 Σ −1 Σ −1 ⎟, ∂θ i ∂θ j ∂θ k ∂θ i ∂θ k ∂θ j ∂θ j ∂θ k ∂θ i ⎟⎠ ⎛ −1 ∂ 2 Σ −1 ∂ 3G ∂ Σ −1 ∂ Σ −1 Σ + Σ −1 Σ = ⎜ −Σ Σ ⎜ ∂θi ∂θ j ∂σ ab ⎝ ∂θi ∂θ j ∂θ j ∂θi + Σ −1 ∂ ht ∂G , = ∂θ i ∂θ j ∂ξt ∂θi ∂θ j 3 2 ∂ Σ −1 ∂ Σ −1 ⎞ Σ Σ ⎟ ⎟ ∂θi ∂θ j ⎠ ab 1686 ⎛ ∂ 4G ∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ = tr ⎜ 6 Σ −1 Σ Σ Σ ⎜ ∂θi ∂θ j ∂θ k ∂θl ∂θi ∂θ j ∂θ k ∂θl ⎝ ∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ ∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ Σ Σ Σ Σ Σ Σ +6 Σ −1 + 6Σ −1 ∂θi ∂θ j ∂θl ∂θ k ∂θi ∂θl ∂θ j ∂θ k −4 Σ −1 ∂ 2 Σ −1 ∂ Σ −1 ∂ Σ ∂ 2 Σ −1 ∂ Σ −1 ∂ Σ Σ Σ Σ Σ − 4Σ −1 ∂θi ∂θ j ∂θ k ∂θl ∂θi ∂θ k ∂θ j ∂θl −4 Σ −1 ∂2Σ ∂ Σ −1 ∂ Σ ∂ 2 Σ −1 ∂ Σ −1 ∂ Σ Σ Σ −1 Σ Σ − 4 Σ −1 ∂θ j ∂θ k ∂θ j ∂θ k ∂θ i ∂θl ∂θi ∂θl −4 Σ −1 ∂ 2 Σ −1 ∂ Σ −1 ∂ Σ ∂ 2 Σ −1 ∂ Σ −1 ∂ Σ Σ Σ Σ Σ − 4Σ −1 ∂θ j ∂θl ∂θ i ∂θ k ∂θ k ∂θ l ∂θi ∂θ j + Σ −1 ∂ 2 Σ −1 ∂ 2 Σ ∂ 2 Σ −1 ∂ 2 Σ ∂ 2 Σ −1 ∂ 2 Σ Σ Σ Σ + Σ −1 + Σ −1 ∂θi ∂θ j ∂θ k ∂θl ∂θi ∂θ k ∂θ j ∂θl ∂θ i ∂θ l ∂θ j ∂θ k + Σ −1 ∂ Σ −1 ∂3Σ ∂ Σ −1 ∂ 3 Σ ∂ Σ −1 ∂ 3 Σ Σ Σ Σ + Σ −1 + Σ −1 ∂θi ∂θ j ∂θ k ∂θl ∂θ j ∂θi ∂θ k ∂θl ∂θ k ∂θ i ∂θ j ∂θ l + Σ −1 ∂ Σ −1 ∂3Σ Σ ∂θl ∂θi ∂θ j ∂θ k ⎞ ⎟⎟ , ⎠ ⎛ ∂ 4G ∂ Σ −1 ∂ 2 Σ ∂ Σ −1 ∂ 2 Σ −1 = ⎜ 2Σ −1 Σ Σ −1 + 2 Σ − 1 Σ Σ ∂θi ∂θ j ∂θ k ∂σ ab ⎜⎝ ∂θi ∂θ j ∂θ k ∂θ j ∂θ i ∂θ k ∂ Σ −1 ∂ 2 Σ −1 ∂3Σ Σ Σ − Σ −1 Σ −1 +2Σ −1 ∂θ k ∂θi ∂θ j ∂θi ∂θ j ∂θ k −2Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ −1 Σ Σ Σ − 2 Σ −1 Σ Σ Σ ∂θi ∂θ j ∂θ k ∂θi ∂θ k ∂θ j −2Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ⎞ Σ Σ Σ ⎟ ⎟ ∂θ k ∂θi ∂θ j ⎠ ab + ba 2 − δ ab , 2 ∂ 3ht ∂G = , ∂θi ∂θ j ∂θ k ∂ξt ∂θi ∂θ j ∂θ k 4 (i, j , k , l = 1,..., q; t = 1,.., r; p ≥ a ≥ b ≥ 1), where δ ab is the Kronecker delta and (⋅)ab+ba = (⋅)ab + (⋅)ba . 1687 4. A numerical example A numerical example using the raw-varimax solution for standardized variables (see (2)) with the following population values is shown: Λ ' = ⎡.8 .7 .6 .3 .2 .0 ⎤ , Ψ = Diag(I 6 − ΛΛ '), where the factors are in⎣⎢.0 .2 .3 .6 .7 .8⎦⎥ dependently normally or chi-square (df=1) distributed. Table 1 gives the theoretical and simulated results with Heywood cases included for the analysis. Table 1. Results of the parameter estimators for the fourth and sixth observed variables in the case of the raw-varimax solution for standardized variables (N=400; The number of replications=1,000,000) α1 (bias) α 3 (skewness) α 4 (kurtosis) Standard error ratio Th. Sim. Th. Sim. Th. Sim. HASE ASE SD ASE Normal Ψ 4 -.956-.964 .259 .141 -5.6 -2.81.0018 1.0022 -1.533-1.619-6.586 -9.192 159.8 292.81.0240 1.0284 6 2.2 3.41.0074 1.0094 I 4 -.277-.260 .410 .348 6.8 7.81.0062 1.0071 6 .146 .170 -.377 -.409 11.4 13.61.0116 1.0130 II 4 -.383-.414 -2.007 -2.084 18.7 28.61.0250 1.0286 6 .254 .280 .045 .347 Chi-square, df=1 .53 -172.7 -171.0 .9789 .9822 Ψ 4 -1.680-1.653 .75 65.4 242.11.0003 1.0055 6 -.759-.896 15.95 9.82 31.4 24.31.0013 1.0029 I 4 -.226-.224 2.27 2.05 .50 .49 14.0 15.11.0069 1.0088 6 .220 .243 29.1 39.2 .9937 .9970 II 4 -.681-.703 -7.84 -7.89 -.568 -.516 -8.82 -8.25 132.9 145.61.0155 1.0195 6 Note. Th.(Sim.)=Theoretical (Simulated) values, I(II)=Loadings of factor I (II), HASE= {(α 2 / n) + ( Δα 2 / n 2 )}1/ 2 , ASE= (α 2 / n)1/ 2 , SD=Standard deviations from simulation. The results are shown only for the parameters corresponding to the fourth and sixth observed variables to save space (see the symmetric pattern of Λ in this example). The simulated cumulants are given from the kstatistics (unbiased estimators of cumulants) based on 1,000,000 estimates for each parameter with multiplication of appropriate powers of n for comparison to the asymptotic values. We find that the asymptotic values are 1688 reasonably similar to their corresponding simulated values and that HASE is closer to the true value given by SD than ASE. References Archer, C. O., & Jennrich, R. I. (1973). Standard errors for rotated factor loadings. Psychometrika, 38, 581-592. Hall, P. (1992). The bootstrap and Edgeworth expansion. New York: Springer. Ogasawara, H. (2004, September). Asymptotic expansion in factor analysis and structural equation modeling under nonnormality/normality. Proceedings of the 72nd annual meeting of Japan Statistical Society (pp.101-102 with an error in Corollary 2A corrected at the presentation). Fuji University, Hanamaki, Japan. Ogasawara, H. (2004, October). Higher-order estimation error in factor analysis and structural equation modeling under nonnormality. Proceedings of the 83rd symposium of behaviormetrics: “Factor analysis centennial symposium” organized by Y. Kano (pp.39-53). Osaka University, Osaka, Japan. Ogasawara, H. (2005, January). Asymptotic expansion of the distributions of the parameter estimators in structural equation modeling. Paper presented at International conference on the future of statistical theory, practice and education. Hyderabad, India. Ogasawara, H. (2005a). Asymptotic expansion of the distributions of the least squares estimators in factor analysis and structural equation modeling. To appear in C. R. Rao, & R. Chakraborty (Eds.), Handbook of statistics: Bioinformatics. New York: Elsevier. Ogasawara, H. (2005b). Asymptotic expansion of the distributions of the estimators in factor analysis under nonnormality. To appear in British Journal of Mathematical and Statistical Psychology. Ogasawara, H. (2005c). Higher-order estimation error in structural equation modeling. Paper submitted for publication. Ogasawara, H. (2006). Asymptotic expansion of the sample correlation coefficient under nonnormality. Computational Statistics and Data Analysis, 50, 891910.