Two-term Edgeworth expansion of the distributions of the maximum likelihood

advertisement
1681
Two-term Edgeworth expansion of the
distributions of the maximum likelihood
estimators in factor analysis under nonnormality
Haruhiko Ogasawara
Otaru University of Commerce, Otaru 047-8501 Japan
E-mail hogasa@res.otaru-uc.ac.jp
1. Estimators in factor analysis
The purpose of this paper is to have the two-term Edgeworth expansion of
the parameter estimators by maximum likelihood (ML) in factor analysis
possibly with factor rotation under nonnormality (for the Edgeworth expansion, see e.g., Hall, 1992). Ogasawara (2005a) gave the corresponding
results of the various least squares estimators and the normal-theory (NT)
Studentized estimators under nonnormality. For the non-NT Studentized
estimators by ML, Ogasawara (2004, September; 2005, January; 2005b)
provided the similar results up to order n −1/ 2 , where n+1=N is the number
of observations.
Let Σ = Σ (θ) be the p × p structured covariance matrix of observable
variables in factor analysis:
(1)
Σ = Σ(θ) = Λ Λ '+ Ψ ,
where Λ is the p × u loading matrix; Ψ is the diagonal covariance matrix for unique factors; and θ is the q × 1 vector of parameters. When the
factor analysis model holds for standardized variables,
(2)
Σ = D( Λ Λ '+ Ψ ) D, Diag( Λ Λ '+ Ψ ) = I p ,
where D = Diag1/2 ( Σ) ; Diag(⋅) is a diagonal matrix taking the diagonal
elements of an argument; and I p is the p × p identity matrix . In the case
1682
of exploratory factor analysis, the restrictions for model identification are
imposed on (1) or (2), which is generally given by
∂f R ( Λ ) ∂f R ( Λ )
(3)
−
Λ=O
∂Λ
∂Λ '
(Archer & Jennrich, 1973), where f R ( Λ ) is a rotation criterion to be optimized by Λ (e.g., the raw-varimax criterion). The Wishart ML estimaΛ'
tors θ̂ is given by minimizing F (θ, S) = tr( Σ −1S) + ln | Σ | with possible
restrictions (3) (and (2)), where S is the unbiased sample covariance matrix. Note that the estimators in factor analysis are functions of sample
variances and covariances. Usually, however, they are implicit ones, which
require some special treatment (see Section 3).
2. The Edgeworth expansion
1/ 2
Let w = n (θˆ − θ ) , where θ is a population parameter in θ and θˆ is its
estimator. We assume that the following cumulants are available:
κ1 ( w) = E( w) = n −1/ 2α1 + O(n −1/ 2 ),
κ 2 ( w) = E[{w − E( w)}2 ] = α 2 + n −1Δα 2 + O(n −1 ),
κ 3 ( w) = E[{w − E( w)}3 ] = n −1/ 2α 3 + O(n −1/ 2 ),
(4)
κ 4 ( w) = E[{w − E( w)}4 ] − 3{κ 2 ( w)}2 = n −1α 4 + O(n −1 ).
The parameter estimators and their asymptotic cumulants in (4) are given
by the first-order conditions
'
∂ Gˆ ⎛ ∂ Fˆ ˆ ∂ hˆ ˆ ⎞
(5)
, h ' ⎟ = 0, Gˆ = Fˆ + ξˆ ' hˆ ,
=⎜
+ξ'
∂ ηˆ ⎝ ∂ θˆ '
∂ θˆ '
⎠
where η = (θ ', ξ ') ' ; ξ is the vector of Lagrange multipliers; Fˆ = F (θˆ , S) ;
h = h(θ) is the r × 1 vector for restrictions ( h(θˆ ) = 0 ). Note that in exploratory factor analysis, (5) reduces to (∂ Fˆ / ∂ θˆ ', hˆ ') ' = 0 with Gˆ = Fˆ
since the restrictions are for identification.
Let σ = v( Σ) , s = v(S) , where v(⋅) is the vectorizing operator taking
non-duplicated elements of a symmetric matrix, and Ω = n acov(s) . Then,
from Ogasawara (2006), we have
1683
⎞
1 ⎛ ∂ 2θ
∂θ ∂θ
,
Ω ⎟ , α2 =
Ω
2 ⎝ ∂σ∂σ' ⎠
∂σ' ∂σ
∂θ ∂θ
Δα 2 = −∑∑
(σ abcd − σ abσ cd − σ acσ bd − σ adσ bc )
a ≥b c ≥ d ∂σ ab ∂σ cd
α1 = tr ⎜
⎧⎪ ∂θ
∂ 2θ
+ ∑∑∑ ⎨
(σ abcdef − σ abσ cdef − 2σ cdσ abef
∂
∂
∂
σ
σ
σ
a ≥b c ≥ d e ≥ f ⎪
ab
cd
ef
⎩
− 2σ acd σ bef − 2σ abcσ def − 2σ abd σ cef + 2σ abσ cdσ ef )
⎛ 1 ∂ 2θ
∂ 2θ
∂θ
∂ 3θ
+∑ ⎜
+
⎜
∂σ gh ∂σ ab ∂σ cd ∂σ ef
g ≥ h ⎝ 2 ∂σ ab ∂σ ef ∂σ cd ∂σ gh
α3 = ∑∑∑
a ≥b c ≥ d e ≥ f
⎫⎪
⎞
⎟⎟(Ω)ab ,cd (Ω)ef , gh ⎬ ,
⎪⎭
⎠
∂θ ∂θ ∂θ
(σ abcdef − 3σ abσ cdef
∂σ ab ∂σ cd ∂σ ef
− 6σ abcσ def
∂θ
∂ 2θ
∂θ
+ 2σ abσ cdσ ef ) + 3
Ω
Ω
,
∂σ' ∂σ∂σ' ∂σ
⎡ ∂θ ∂θ ∂θ ∂θ
g ≥h ⎢
⎣ ∂σ ab ∂σ cd ∂σ ef ∂σ gh
α 4 = ∑∑∑∑ ⎢
a ≥b c ≥ d e ≥ f
24
32
⎛
× ⎜ κ abcdefgh + ∑ κ acκ bdefgh + ∑ κ aceκ bdfgh
⎝
8
24
96
48
+ ∑ κ acegκ bdfh + ∑ κ abegκ cdfh + ∑ κ acκ beκ dfgh + ∑ κ acκ egκ bdfh
96
48
6
⎞
+ ∑ κ acκ beg κ dfh + ∑ κ bcκ deκ fgκ ha − ∑ κ abcd (Ω)ef , gh ⎟
⎠
2
∂θ
∂θ ∂θ ∂θ 10
+∑ 2
∑ (Ω)ab,cd M (ef , gh, jk )
∂σ ab ∂σ cd ∂σ ef ∂σ gh ∂σ jk
j≥k
⎛ 3 ∂ 2θ
2
∂ 2θ
∂ 3θ
∂θ ⎞
+ ∑∑ ⎜
+
⎟
⎜
3 ∂σ ab ∂σ cd ∂σ ef ∂σ gh ⎟⎠
j ≥ k l ≥ m ⎝ 2 ∂σ ab ∂σ cd ∂σ ef ∂σ gh
⎤
∂θ ∂θ 15
(Ω) ab ,cd (Ω)ef , gh (Ω) jk ,lm ⎥
×
∑
∂σ jk ∂σ lm
⎥⎦
−(4α1α 3 + 6α 2 Δα 2 + 6α 2α12 ),
(6)
1684
where the partial derivatives denote those evaluated at the population values;
12
4
8
M ( ab, cd , ef ) = κ abcdef + ∑ κ abceκ df + ∑ κ aceκ bdf + ∑ κ acκ beκ df ,
( p ≥ a ≥ b ≥ 1; p ≥ c ≥ d ≥; p ≥ e ≥ f ≥ 1);
κ a a ...a (σ a a ...a ) is the t-th order multivariate cumulant (central moment) of
1 2
t
1 2
t
k
X a1 ,..., X at ; and Σ denotes a summation of similar k terms.
The Edgeworth expansion up to order n −1 is given as follows with the
assumption of its validity:
⎛ w
⎞
⎧α
⎫
⎧1
α
z
Pr ⎜ 1/ 2 ≤ z ⎟ = Φ( z) − n−1/ 2 ⎨ 1/12 + 3/3 2 ( z 2 − 1) ⎬φ ( z) − n−1 ⎨ (Δα2 + α12 )
6α 2
α 2 (7)
⎝ α2
⎠
⎩α 2
⎭
⎩2
2
5
3
3
⎛ α α α ⎞ z − 3z α3 ( z − 10 z + 15z ) ⎫
−1
+⎜ 4 + 1 3 ⎟
+
⎬φ ( z) + o(n ),
6 ⎠ α 22
72α 23
⎝ 24
⎭
where φ ( z ) = (1/ 2π ) exp(− z 2 / 2) and Φ ( z ) =
∫
z
−∞
φ (t ) dt .
3. Partial derivatives
The partial derivatives up to the third order in (6) are given using the partial derivatives in implicit functions (see Ogasawara, 2004, October;
2005c) as follows:
−1
⎛ ∂ 2 Gˆ ⎞
∂ ηˆ
∂ 2 Gˆ
,
= − ⎜⎜
⎟⎟
∂ sab
⎝ ∂ ηˆ ∂ ηˆ ' ⎠ ∂ ηˆ ∂ sab
⎛ ∂ 2Gˆ ⎞
∂ 2 ηˆ
= − ⎜⎜
⎟⎟
∂ sab ∂ scd
⎝ ∂ ηˆ ∂ ηˆ ' ⎠
−1
∂ηˆi ∂ηˆ j
∂ηˆi
∂ 3Gˆ
∂ 3Gˆ
⎪⎧
+∑
⎨∑∑
ˆ
ˆ
ˆ
ˆ
ˆ
η
η
s
s
s
sab
η
η
η
∂
∂
∂
∂
∂
∂
∂
∂
∂
i
i
j
ab
cd
i
cd
⎩⎪ i j
+∑
i
∂ηˆi
∂ 3Gˆ
∂ 3Gˆ
+
∂ ηˆ ∂ ηˆi ∂ sab ∂ scd ∂ ηˆ ∂ sab ∂ scd
⎪⎫
⎬,
⎪⎭
(8)
1685
−1
⎡
∂ηˆi ∂ηˆ j ∂ηˆk
∂ 4Gˆ
⎢ ∑∑∑
⎢⎣ i j k ∂ ηˆ ∂ ηˆi ∂ ηˆ j ∂ ηˆk ∂ sab ∂ scd ∂ sef
2
3
∂ηˆi ∂ ηˆ j
∂ηˆi ∂ηˆ j ⎞
∂ 3Gˆ
∂ 4Gˆ
⎪⎧ ⎛
+ ∑ ∑ ⎨∑ ⎜
+
⎟
⎜ ˆ ∂ ηˆ ∂ ηˆ ∂ s ∂ s ∂ s
∂ ηˆ ∂ ηˆi ∂ ηˆ j ∂ sU ∂ sV ∂ sW ⎠
(U ,V ,W ) i ⎪ j ⎝ ∂ η
i
j
U
V
W
⎩
⎛ ∂ 2Gˆ ⎞
∂ 3 ηˆ
= − ⎜⎜
⎟⎟
∂ sab ∂ scd ∂ sef
⎝ ∂ ηˆ ∂ ηˆ ' ⎠
⎤
∂ηˆi ⎫
∂ 2ηˆi
∂ 4Gˆ
∂ 4Gˆ
∂ 3Gˆ
+
⎥,
⎬+
∂ ηˆ ∂ ηˆi ∂ sU ∂ sV ∂ sW ∂ ηˆ ∂ηˆi ∂ sU ∂ sV ∂ sW ⎭ ∂ ηˆ ∂ sab ∂ scd ∂ sef ⎦⎥
( p ≥ a ≥ b ≥ 1; p ≥ c ≥ d ≥ 1; p ≥ e ≥ f ≥ 1),
3
where
∑ denotes a summation over the range:
+
(U ,V ,W )
(U , V , W ) ∈ {( ab, cd , ef ), (ef , ab, cd ), (cd , ef , ab)} .
The partial derivatives of Ĝ with respect to η̂ and s evaluated at the
population values are given by Ogasawara (2005c) as follows:
⎛
∂ 2G
∂ Σ −1 ∂ Σ ⎞
Σ
= tr ⎜ Σ −1
⎟,
⎜
∂θi ∂θ j
∂θi
∂θ j ⎟⎠
⎝
∂ht
⎛
∂ 2G
∂ 2G
∂ Σ −1 ⎞
=
= − ⎜ Σ −1
Σ ⎟ (2 − δ ab ),
,
∂θi ∂ξt ∂θi ∂θi ∂σ ab
∂θi
⎝
⎠ ab
⎛
∂ 3G
∂ Σ −1 ∂ Σ −1 ∂ Σ
= tr ⎜ −4Σ −1
Σ
Σ
⎜
∂θ i ∂θ j ∂θ k
∂θ i
∂θ j
∂θ k
⎝
(9)
2
2
2
∂ Σ
∂Σ
∂ Σ
∂Σ
∂ Σ
∂Σ⎞
+ Σ −1
+ Σ −1
+ Σ −1
Σ −1
Σ −1
Σ −1
⎟,
∂θ i ∂θ j
∂θ k
∂θ i ∂θ k
∂θ j
∂θ j ∂θ k
∂θ i ⎟⎠
⎛ −1 ∂ 2 Σ −1
∂ 3G
∂ Σ −1 ∂ Σ −1
Σ + Σ −1
Σ
= ⎜ −Σ
Σ
⎜
∂θi ∂θ j ∂σ ab ⎝
∂θi ∂θ j
∂θ j
∂θi
+ Σ −1
∂ ht
∂G
,
=
∂θ i ∂θ j ∂ξt ∂θi ∂θ j
3
2
∂ Σ −1 ∂ Σ −1 ⎞
Σ
Σ ⎟
⎟
∂θi
∂θ j
⎠
ab
1686
⎛
∂ 4G
∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ
= tr ⎜ 6 Σ −1
Σ
Σ
Σ
⎜
∂θi ∂θ j ∂θ k ∂θl
∂θi
∂θ j
∂θ k
∂θl
⎝
∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ
∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ∂ Σ
Σ
Σ
Σ
Σ
Σ
Σ
+6 Σ −1
+ 6Σ −1
∂θi
∂θ j
∂θl
∂θ k
∂θi
∂θl
∂θ j
∂θ k
−4 Σ −1
∂ 2 Σ −1 ∂ Σ −1 ∂ Σ
∂ 2 Σ −1 ∂ Σ −1 ∂ Σ
Σ
Σ
Σ
Σ
− 4Σ −1
∂θi ∂θ j
∂θ k
∂θl
∂θi ∂θ k
∂θ j
∂θl
−4 Σ −1
∂2Σ
∂ Σ −1 ∂ Σ
∂ 2 Σ −1 ∂ Σ −1 ∂ Σ
Σ
Σ −1
Σ
Σ
− 4 Σ −1
∂θ j
∂θ k
∂θ j ∂θ k
∂θ i
∂θl
∂θi ∂θl
−4 Σ −1
∂ 2 Σ −1 ∂ Σ −1 ∂ Σ
∂ 2 Σ −1 ∂ Σ −1 ∂ Σ
Σ
Σ
Σ
Σ
− 4Σ −1
∂θ j ∂θl
∂θ i
∂θ k
∂θ k ∂θ l
∂θi
∂θ j
+ Σ −1
∂ 2 Σ −1 ∂ 2 Σ
∂ 2 Σ −1 ∂ 2 Σ
∂ 2 Σ −1 ∂ 2 Σ
Σ
Σ
Σ
+ Σ −1
+ Σ −1
∂θi ∂θ j
∂θ k ∂θl
∂θi ∂θ k
∂θ j ∂θl
∂θ i ∂θ l
∂θ j ∂θ k
+ Σ −1
∂ Σ −1
∂3Σ
∂ Σ −1 ∂ 3 Σ
∂ Σ −1 ∂ 3 Σ
Σ
Σ
Σ
+ Σ −1
+ Σ −1
∂θi
∂θ j ∂θ k ∂θl
∂θ j
∂θi ∂θ k ∂θl
∂θ k
∂θ i ∂θ j ∂θ l
+ Σ −1
∂ Σ −1
∂3Σ
Σ
∂θl
∂θi ∂θ j ∂θ k
⎞
⎟⎟ ,
⎠
⎛
∂ 4G
∂ Σ −1 ∂ 2 Σ
∂ Σ −1 ∂ 2 Σ −1
= ⎜ 2Σ −1
Σ
Σ −1 + 2 Σ − 1
Σ
Σ
∂θi ∂θ j ∂θ k ∂σ ab ⎜⎝
∂θi
∂θ j ∂θ k
∂θ j
∂θ i ∂θ k
∂ Σ −1 ∂ 2 Σ −1
∂3Σ
Σ
Σ − Σ −1
Σ −1
+2Σ −1
∂θ k
∂θi ∂θ j
∂θi ∂θ j ∂θ k
−2Σ −1
∂ Σ −1 ∂ Σ −1 ∂ Σ −1
∂ Σ −1 ∂ Σ −1 ∂ Σ −1
Σ
Σ
Σ − 2 Σ −1
Σ
Σ
Σ
∂θi
∂θ j
∂θ k
∂θi
∂θ k
∂θ j
−2Σ −1
∂ Σ −1 ∂ Σ −1 ∂ Σ −1 ⎞
Σ
Σ
Σ ⎟
⎟
∂θ k
∂θi
∂θ j
⎠
ab + ba
2 − δ ab
,
2
∂ 3ht
∂G
=
,
∂θi ∂θ j ∂θ k ∂ξt ∂θi ∂θ j ∂θ k
4
(i, j , k , l = 1,..., q; t = 1,.., r; p ≥ a ≥ b ≥ 1),
where δ ab is the Kronecker delta and (⋅)ab+ba = (⋅)ab + (⋅)ba .
1687
4. A numerical example
A numerical example using the raw-varimax solution for standardized
variables (see (2)) with the following population values is shown:
Λ ' = ⎡.8 .7 .6 .3 .2 .0 ⎤ , Ψ = Diag(I 6 − ΛΛ '), where the factors are in⎣⎢.0 .2 .3 .6 .7 .8⎦⎥
dependently normally or chi-square (df=1) distributed. Table 1 gives the
theoretical and simulated results with Heywood cases included for the
analysis.
Table 1. Results of the parameter estimators for the fourth and sixth observed variables in the case of the raw-varimax solution for standardized
variables (N=400; The number of replications=1,000,000)
α1 (bias) α 3 (skewness) α 4 (kurtosis) Standard error ratio
Th.
Sim.
Th.
Sim.
Th.
Sim.
HASE
ASE
SD
ASE
Normal
Ψ 4 -.956-.964
.259 .141
-5.6
-2.81.0018 1.0022
-1.533-1.619-6.586
-9.192
159.8
292.81.0240
1.0284
6
2.2
3.41.0074 1.0094
I 4 -.277-.260 .410 .348
6.8
7.81.0062 1.0071
6 .146 .170 -.377 -.409
11.4
13.61.0116 1.0130
II 4 -.383-.414 -2.007 -2.084
18.7
28.61.0250 1.0286
6 .254 .280 .045 .347
Chi-square, df=1
.53
-172.7 -171.0 .9789 .9822
Ψ 4 -1.680-1.653 .75
65.4 242.11.0003 1.0055
6 -.759-.896 15.95 9.82
31.4
24.31.0013 1.0029
I 4 -.226-.224 2.27 2.05
.50
.49
14.0
15.11.0069 1.0088
6 .220 .243
29.1
39.2 .9937 .9970
II 4 -.681-.703 -7.84 -7.89
-.568
-.516
-8.82
-8.25
132.9
145.61.0155
1.0195
6
Note. Th.(Sim.)=Theoretical (Simulated) values, I(II)=Loadings of factor I
(II), HASE= {(α 2 / n) + ( Δα 2 / n 2 )}1/ 2 , ASE= (α 2 / n)1/ 2 , SD=Standard
deviations from simulation.
The results are shown only for the parameters corresponding to the
fourth and sixth observed variables to save space (see the symmetric pattern of Λ in this example). The simulated cumulants are given from the kstatistics (unbiased estimators of cumulants) based on 1,000,000 estimates
for each parameter with multiplication of appropriate powers of n for comparison to the asymptotic values. We find that the asymptotic values are
1688
reasonably similar to their corresponding simulated values and that HASE
is closer to the true value given by SD than ASE.
References
Archer, C. O., & Jennrich, R. I. (1973). Standard errors for rotated factor loadings.
Psychometrika, 38, 581-592.
Hall, P. (1992). The bootstrap and Edgeworth expansion. New York: Springer.
Ogasawara, H. (2004, September). Asymptotic expansion in factor analysis and
structural equation modeling under nonnormality/normality. Proceedings of
the 72nd annual meeting of Japan Statistical Society (pp.101-102 with an error in Corollary 2A corrected at the presentation). Fuji University, Hanamaki,
Japan.
Ogasawara, H. (2004, October). Higher-order estimation error in factor analysis
and structural equation modeling under nonnormality. Proceedings of the
83rd symposium of behaviormetrics: “Factor analysis centennial symposium”
organized by Y. Kano (pp.39-53). Osaka University, Osaka, Japan.
Ogasawara, H. (2005, January). Asymptotic expansion of the distributions of the
parameter estimators in structural equation modeling. Paper presented at International conference on the future of statistical theory, practice and education. Hyderabad, India.
Ogasawara, H. (2005a). Asymptotic expansion of the distributions of the least
squares estimators in factor analysis and structural equation modeling. To appear in C. R. Rao, & R. Chakraborty (Eds.), Handbook of statistics: Bioinformatics. New York: Elsevier.
Ogasawara, H. (2005b). Asymptotic expansion of the distributions of the estimators in factor analysis under nonnormality. To appear in British Journal of
Mathematical and Statistical Psychology.
Ogasawara, H. (2005c). Higher-order estimation error in structural equation
modeling. Paper submitted for publication.
Ogasawara, H. (2006). Asymptotic expansion of the sample correlation coefficient
under nonnormality. Computational Statistics and Data Analysis, 50, 891910.
Download