2. Kurtosis and the Asymptotic Covariance Matrix

advertisement
1
Does the ADF fit function decrease when the kurtosis increases?
Ulf Henning Olsson
Norwegian School of Management
Tron Foss
Norwegian School of Management
Sigurd V. Troye
Norwegian School of Economics and Business Administration
July, 2002
Forthcoming in British Journal of Mathematical and Statistical
Psychology
Address correspondence to:
Ulf H. Olsson
Department of Economics
Norwegian School of Management BI
Sandvika, Norway
ulf.h.olsson@bi.no
+47 67557397 Phone
+ 47 67557675 Fax
____________________
The authors appreciate the constructive comments of the Editor, Dr. Patricia Lovie, and two anonymous
referees.
2
Abstract
In this study we demonstrate how the asymptotically distribution free (ADF) fit function
is affected by (excessive) kurtosis in the observed data.
More specifically we address how different levels of univariate kurtosis affect fit-values
(and therefore fit-indices) for misspecified factor models. By using numerical calculation,

we show (for 13 factor models) that the probability limit F0 of F , for the ADF fit
function decreases considerably as the kurtosis increases. We also give a formal proof
that the value of F0 decreases monotonically with the kurtosis for a whole class of
structural equation models.
3
Does the ADF fit function decrease when the kurtosis increases?
1. Introduction
It is well known that different estimation procedures such as Maximum Likelihood (ML),
Generalized Least Squares (GLS) and Asymptotically Distribution Free Estimator (ADF,
or WLS, Weighted Least Squares, which is used in the LISREL language) will produce
estimates for a structural equation or covariance structure model that will converge to the
same optimum and have the same asymptotic properties (Browne 1974, 1984) when
models are correctly specified and the observed vector X has no kurtosis1.
Under such ideal conditions the choice between methods is thus arbitrary. In the more
realistic cases of misspecification and/or non-normal data, the ML, GLS and ADF will
not generally give asymptotically converging estimates (Arminger & Schoenberg
(1989)). One obvious reason for this is that the weight elements in the respective fit
functions (discrepancy functions) differ. Whereas the asymptotic values of the ML and
GLS fit functions depend on the type and the degree of misspecification, they do not
depend on the 4th order moments, which is the case for ADF.
There is empirical evidence that ML and GLS are reasonably robust to moderate
deviations from normality with respect to parameter estimates and (empirical) fit (Finch,
West and MacKinnon 1997, Chou, Bentler and Satorra 1991).
However, Cudeck & Browne (1992) who found Maximum Likelihood estimates to be
robust with respect to several levels of misspecification (lack of fit), also observed
unexpectedly ”: ….that there are situations where an incorrect assumption of normality
leads to an unjustified impression that the model under consideration fits well”. When the
model did not hold, the fit improved with more extreme values of skewness and kurtosis.
It is known that excessive kurtosis can lead to incorrect chi-squares and incorrect

asymptotic covariance matrix, ACOV ( ) (Bollen, 1989).
Browne (1984) p.64:” If all fourth-order cumulants are equal to zero, …… We shall say that the
multivariate distribution of X ’has no kurtosis’.” If the distribution of x has no kurtosis, the class of BGLS
estimators includes GLS and ML (Browne, 1984).
1
4
ADF as an Asymptotic Distribution Free Estimator, appears to be a natural choice in
large samples when the normality criterion is not met. Recent research indicates (Olsson,
Foss, Troye and Howell, 2000) that the performance of ADF, with respect to empirical fit
(i.e., the discrepancy between the predicted and the observed covariance matrix as
measured by e.g., RMSEA and chi-square) improves with increasing levels of peakedness
– especially when models are severely misspecified.
The purpose of the present paper is first to demonstrate how the minimum of the ADF fit
function is affected by kurtosis and misspecification. Second, we will formally prove that

under some additional assumptions the probability limit F0 of F , for the ADF fit
function decreases monotonically with the kurtosis of the underlying random variables.
The rationale for demonstrating the performance of ADF both numerically and formally,
is that whereas an analytical proof will show the general nature of the observed pattern,
the numerical demonstration will indicate the magnitude of the effects for given levels of
misspecification and kurtosis.
2. Kurtosis and the Asymptotic Covariance Matrix
Let us start with some notational conventions about the asymptotic covariance matrix and
kurtosis.
Using the notation of Browne (1984), let Z be a stochastic q  1 vector, which has a
distribution with finite fourth order moments, with a q  q population covariance matrix
 . Let S be an unbiased estimator of  obtained from N independent observations. Let
s = vecs( S )  ( s11 , s12 , s 22 , s13 , s 23 , s33 ,......, s qq ) be a u 1 vector, where u =
1
q ( q  1) . The
2
covariances sij are the u elements we find above and on the diagonal in S.
Then the finite sample distribution for  s  N s , will have a u  u covariance matrix
cov( s ,  s ' ) . By the term asymptotic covariance matrix associated with the vector Z we
will mean L cov( s , s ' )  lim cov( s ,  s ' ) .
N 
5
Later in this paper the general vector Z will take on different labels like X and 
depending on the situation.
Univariate kurtosis can be defined in the following manner:
Let X be a random variable with a population mean of 1 , then the j-th order central
moment is defined as  j  E( X  1 ) j , j  1. Univariate kurtosis is defined as2
4 
4
. We also use  4 to denote kurtosis when X is a vector. Then  4 will be a vector
22
of univariate kurtosis values.
 4 is a population parameter which can be estimated by
m4
m2
2
,
where m j  (1 / N ) ( X  X ) j , j  1 .
In order to understand how and why the performance of ADF may be affected by
kurtosis, it is useful to give a brief presentation of the weight elements in the respective fit
function (discrepancy function)3.
The ADF fit function can be expressed as

1
FADF ( )  ( s    )'U ADF ( s    ) ,

where s  vecs(S ) and    vecs(( )) , and where U ADF is a consistent estimator of the
asymptotic covariance matrix U ADF .
The ADF estimator uses a weight matrix with a typical element that is a combination of
estimates of second and fourth order moments:
2
3
In order for the reference normal distribution to have kurtosis of zero, 3 is often subtracted.
A fit- or discrepancy function is a scalar valued function F ( S , ) of two symmetric q  q matrices S
and  with the following properties: (a) F ( S , ) 0 (b) F ( S , ) =0  S =  and (c) F ( S , ) is twice
continous differentiable function of S and  (Browne, 1984).
6


U ADF   sijkl  sij s kl
ij , kl
sijkl 
 (x
i
i j
k  l , where
 xi )( x j  x j )( x k  x k )( xl  xl )
N
is an estimate of
 ijkl  E{( xi  Exi )( x j  Ex j )( x k  Exk )( xl  Exl )}
The differences in terms of the fit functions between the alternative estimators are carried
over to estimations of empirical fit (see Olsson, Foss, Troye and Howell, (2000)). The
reason for this is that all fit indices are directly derived from the minimum value of the
discrepancy function F(S,  ( ) ), where S is the sample covariance matrix,  ( ) is the
covariance matrix implied by a specific theoretical model and  a is vector of all the free


parameters. This minimum value denoted by F  F ( S , ( )) attempts to measure the
“deviation” between the sample covariance matrix S and the estimated covariance matrix

( ) (Jöreskog & Sörbom, 1993, p.122).
Whether the fit indices are adjusted for degrees of freedom, or are adjusted for the sample

size or both, or take a baseline4 model into account, the estimate (or value) F has a

central place in the calculation of the specific index. Thus F not only determines the
solutions produced in terms of parameter estimates and estimated variance-covariance
matrices, but enters directly into the calculation of fit.
3. Numerical illustrations and analytical results of the Performance of ADF
as a Function of Misspecification and Kurtosis
In a study by Olsson, Foss, Troye & Howell (2000), misspecification and kurtosis were

found to produce a significant interaction effect on F ( S , ( )) for ADF. Empirical fit
was demonstrated to improve with increasing kurtosis and this effect was increasing with
4
Incremental fit indices (Bollen, 1989) measures how much better the model fits as compared to a baseline
model. Since the baseline model is more severely misspecified than the hypothetical model and if the effect
7
higher levels of misspecification. This result is consistent with the findings in a
simulation study reported by Curran, West and Finch (1996, p.25) who observed:" The
most surprising findings related to the behavior of SB (Satorra Bentler chi-square) and
ADF test statistics was under the simultanous conditions of misspecification and
multivariate nonnormality ...: The expected values of these test statistics markedly
decreased with increasing nonnormality. ….Although the specific reason for this loss of
power is currently not known, we theorize that it is due to the inclusion of the fourthorder moment (kurtosis) in computation of SB and ADF test statistics, ..".
Curran, West and Finch studied three distributional conditions: Normal distribution
(univariate skewness = 0, univariate kurtosis = 3), moderately non-normal distibution
(univariate skewness = 2.0, univariate kurtosis = 10.0) and severely non-normal
distribution (univariate skewness = 3.0, univariate kurtosis = 24.0).

The results above seem to indicate that the fit function value F ( S , ( )) for ADF is
somewhat deflated when the data show excessive kurtosis (see appendix B for
illustrations using real data).
Before showing formally that this is to be expected, given some assumptions which will
be discussed in the next section, we will demonstrate numerically by re-investigating and
generalizing the results of Olsson, Foss, Troye & Howell (2000), using population data,
how fit is influenced by kurtosis and misspecification. We will use a variety of general
factor models and different types and levels of misspecification.
3.1 Design and Methodology
In this study we will focus on the confirmatory factor model (measurement model)
X   x   where we use the conventional notation established for the LISREL model
(Jöreskog and Sörbom, 1989):
X '  ( x1 , x 2 ,..., x q ) are the observed or measured variables
 x is the matrix of factor loadings
 '  (1 ,  2 ,..... k ) are the latent variables or factors
from the 4th order moment is more pronounced for more severly misspecified models, the asymptotic
values of some of these fit indices would of course indicate worse fit with larger kurtosis.
8
 '  ( 1 ,  2 ,....,  q ) are the error terms (unique part).
It is assumed that the  's and  ' s are random variables with zero means, and the  ' s are
uncorrelated with  's. All observed variables are measured in deviations from their
mean.
The assumed model implies that the covariance matrix of X is
   x  x ' ; where  and  are the covariance matrices of  and  respectively.
In simulation studies the most common procedure is to generate "sample data", use these
for parameter estimation etc., and then replicate this procedure. For generating nonnormal sample data there are several approaches:
Fleishman (1978) presents a procedure for drawing non-normal data from a distribution
with prescribed expectation, variance, skewness and kurtosis. Tadikamalla (1980)
presents several methods for generating non-normal data with prescribed skewness and
kurtosis. For some of the methods it is also possible to calculate the probability density
functions and the cumulative distributions. Both of these studies are only dealing with
univariate distributions. Vale and Maurelli (1983) extend the method of Fleishman (1978)
to multivariate distributions. A method described in Ramberg, Tadikamalla, Dudewicz,
and Mykytka (1979) and further developed in Mattson (1997) for generating non-normal
data for structural equation models shows that by controlling univariate skewness and
kurtosis on pre-specified random latent variables and error terms, observed variables can
be made to have a wide range of univariate skewness and kurtosis characteristics
according to the pre-specified model.
Since the ADF method has been shown to be reliable only for large sample sizes
(N>1000 for relative simple models, Curran et al., 1996; N > 5000 for more complex
models, Hu, Bentler & Kano, 1992) we have chosen to calculate the asymptotic
covariance matrix for the population instead of generating large samples and then
estimate the asymptotic covariance matrix from this sample. Our approach is based on the
work of Ramberg, Tadikamalla, Dudewicz, and Mykytka (1979) and Mattson (1997).
9
Since the procedure is not straightforward we will describe it more in detail in the next
section.
We therefore limit our study to the population value F0  F (, ( 0 ), U ADF ) 5, referred to
as the discrepancy due to approximation (Browne & Cudeck, 1992). ( 0 ) is the modelgenerated matrix i.e., the best fit of the model to the population covariance matrix  .  ,
( 0 ) and U ADF are usually fixed unknown matrices. In our study they will be known
since  and U ADF will be calculated from the true model, i.e., the model specified to
generate the data.
To assess and extend the generalizability of the findings reported by Olsson, Foss, Troye
and Howell (2000) a wide range of population models were defined and different
“theoretical” models (denoted as Ta to Tf in table 1) were applied. Consequently, the
degree and nature of misspecification varied as a function of which population model
each theoretical model was applied to. A total of eleven population models were
designed covering six classes of population models (true models): Ma, Mb, Mc, Md ,
Me, Mf (see table 1). As we see the population models vary both with respect to size and
structure. The following types of misspecification were operationalized:
Misspecification in terms of parsimony. For this kind of misspecification we can
distinguish the following:
1) Misspecification by excluding whole sub-structures of the generating model,
that is by omitting factors and associated paths present in the population
model.
2) Misspecification by excluding single paths in the model.

5

F ( S , ( ), U ADF ) will converge to F ( ,  ( 0 ), U ADF ) in probability when N .
10
We do not claim that the types of misspecification exhaust all relevant possibilities, but
we think the ones addressed in the study, represent some of the more typical ways
population models are misrepresented in theoretical models.
****Insert table 1 about here****
The distributions of the population data were divided into three different categories of
kurtosis. The following levels of kurtosis were specified:
1 Negligible and no kurtosis with 4 = 2.0 and 3.0.
2 Medium to severe kurtosis with 4 =5.0, 8.0, 10. and 12.0 and
3 extreme kurtosis with 4 =18.0, 22.0 and 28.0
Here 4 is a vector (  4 '  ( 41 , 42 ,...., 4i ,...) ). When we write 4 = k
we mean that all the univariate kurtosis are equal to k, i.e.,  4 '  (k , k ,..., k ,...) .
(The 4th order moment  4 is equal to the kurtosis  4 when all the second order moments
 2  1 ).
3.2 Calculation of the asymptotic population covariance matrix for the ADF Fit
Function
From the definition of the ADF fit function in section 2 it follows that the population fit
function takes the form
1
(1) FADF ( )  (    )'U ADF (    ) ,
11
where U ADF ij,kl   ijkl   ij kl , i  j and k  l , and  ijkl is the 4th order population
moments and  ij kl is the product of second order population moments.
  vecs()  ( 11 ,  12 ,  22 ,  13 ,  23 ,  33 ,  ,  qq ) is a u 1 vector where the covariances
 ij are the elements we find above and on the diagonal in  .
We start with the “true” model
X   x   where  x ,  and   are known.
As in the general model from section 3.1 we have assumed that X is a q  1 vector of
observed or measured variables,  x a q  k matrix of factor loadings,  a k  1 vector
of latent variables and  a q  1 vector of uncorrelated error terms. The  ' s are
uncorrelated with  's.
Following the tradition we assume that E ( )  0 and Var ( )  1 . The covariance of  ,
E (  ' )   will therefore be a k  k matrix with 1’s along the diagonal. We also
assume that E ( )  0 and we have Var( )   .
Since  is positive definite there will exist a k  k matrix P so that   PP' .
Concerning  and  we make two further assumptions:
As in the simulation approach by Mattson (1997) the assumption that   P1 where 1
is a k  1 vector of independent stochastic variables , and E(1 )  0 and E( 1 '1 )  I .
The covariance of  is therefore  .
In the same way we will assume that   D 2 , where  2 is a q  1 vector of independent
stochastic variables where E( 2 )  0 and E( 2 ' 2 )  I , and



D



 1
0
..
0
0
0
 2
..
..
..
..
..
0 

0 
is a q  q matrix, with the standard
.. 

 q 

deviations of  along the main diagonal, and zero elsewhere. The covariance matrix of
 is therefore   .
Then the true model in this study can be written as:
12
X   x   , where   P1 and   D 2
We will show that the elements of U ADF are functions of  x ,  ,   and the 4th order
moment  4 where  4 '  (  41 ,  42 ,....,  4,k  q ) . Note that  4i is the 4th order moment for
element i of the vector  = (
1
).
2
To do this it is convenient to write the true model in a simpler form:
(2) X   x    A ; where A is a q  (q  k ) matrix and  is a (q  k )  1 vector of
independent variables where E( )  0 and E ( ' )  I .
The argument for writing X   x    A is as follows:
X   x     x P 1 D 2  ( x P | D)(
1
)  A
2
A = ( x P | D) , is composed of the q  k matrix  x P and the q  q matrix D. A is
therefore a q  (q  k ) matrix.
Note that  is not a vector of latent variables (not a  -vector), but rather a vector of
random drawings from a given distribution. We have assumed the ’s to be independent.
But the fact that the ’s are independent does not imply that the  ’s (   P1 ) are
independent, they will even be correlated. But it does imply that  and  are independent
vectors and that the elements of  are independent.
Let U ( ) be the asymptotic covariance matrix associated with  and let U ( X ) the
asymptotic covariance matrix associated with X , and let p = q+k. The following two
lemmas show how U ( ) easily can be transformed to U ( X ) and that U ( ) will have zero
elements outside the main diagonal and that the elements on the main diagonal will be of
2
the form  4,i   2 or 2 2 .
13
Lemma1
Let  be a p  1 vector of independent variables where all the elements have mean zero
and equal variance  2 . Let X  A where A is a q  p matrix. Then


U ( X )  BU ( ) B' ; where B  K q ' ( A  A) K p ' and where K q and K p are defined in
Browne (1974).
Proof:
Let ( ) be the covariance matrix of  and let ( X ) be the covariance matrix of X .
   vec(( )) is a p 2  1 vector,  X  vec(( X )) is a q 2  1 vector,    vec( ' ) is
a p 2  1 vector and  X  vec(XX ' ) is a q 2  1 vector. Then we can write
U ( )  E ( K p '  ( K p '  )' )  K p '  ( K p '  )'  E ( K p ' (    '    ' ) K p ) .
In the same manner we can write
U ( X )  E ( K q ' X ( K q ' X )' )  K q ' X ( K q ' X )'  E ( K q ' ( X  X ' X  X ' ) K q )
Then


BU ( ) B' = K q ' ( A  A) K p ' E ( K p ' (    '    ' ) K p ) K p ( A  A)' K q =


E ( K q ' ( A  A) K p ' K p ' (   '    ' ) K p K p ( A  A)' K q ) =
E ( K q ' ( A  A) M p ' (    '    ' ) M p ( A  A)' K q ) =
E ( K q ' M q ' ( A  A)(    '    ' )( A  A)' M q K q ) =
E ( K q ' ( A  A)(    '    ' )( A  A)' K q ) =
E ( K q ' ( A  A)    ' ( A  A)' K q  K q ' ( A  A)    ' ( A  A)' K q ) =
E ( K q ' vec( A ' A' )( K q ' vec( A ' A' ))' K q ' vec( A( ) A' )( K q ' vec( A( ) A' ))' ) =
E ( K q ' vec( XX ' )( K q ' vec( XX ' ))' K q ' vec(( X ))( K q ' vec(( X )))' ) =
E ( K q ' ( X  X ' X  X ' ) K q ) =
U (X ) .

Here M p  K p K p is a symmetric idempotent matrix with the following property
( A  A) M p  M q ( A  A) . We have also used that
( A  A)   ( A  A)vec( ' )  vec( A ' A' ) (see Browne 1974, pp. 207 - 208). This
proves lemma 1.
14
Lemma 2
If  is a p  1 vector of independent variables where all the elements have mean zero
and equal variance  2 , the asymptotic covariance matrix U ( ) will have zero elements
outside the main diagonal. The elements on the main diagonal will be of the form
 4i   2 2 or 2 2 .
Proof:
From above (Cramer,1946) we know that if  is a p  1 vector of independent variables
then E ( i  j )  E ( i ) E ( j ) for i  j . Let for simplicity U  U ( ) .
m
k
m
k
We have U ij,kl  E ( i j  k  l )  E ( i j ) E ( k  l ) .
An element on the main diagonal of the asymptotic covariance matrix will either be of
the form U ii,ii or of the form U ij ,ij where i  j .
U ii,ii  E ( i i i i )  E ( i i ) E ( i i ) =
 4i   2 2 .
Assume i  j then
U ij,ij  E ( i j i j )  E ( i j ) E ( i j ) =
E (i 2 j 2 )  ( E (i ) E ( j ))( E (i ) E ( j ))  E (i 2 ) E ( j 2 )  0  2 2
An element outside the main diagonal will be of the form:
U ij,kl
where (i , j )  ( k , l ) i  j and k  l
We will show that this covariance is zero:
1) Let i  j , k  l and i  k . I.e., we calculate the covariance between two
variances:
U ij,kl  U ii,kk
 E ( i i k  k )  E ( i i ) E ( k  k ) 
E( i  k )  E( i ) E( k )  0
2
2
2
2
2) Let i  j and k  l . I.e., we calculate the covariance of a variance and a “real”
covariance. Here the indexes k or l must appear only once. If this is true we will have
15
U ij,kl = 0. Without loss of generality we can assume that the index
k appears only once.
Then we have U ij,kl  U ii,kl  E ( i i k  l )  E ( i ) E ( k  l ) 
2
E( i i l ) E( k )  E( i ) E( k ) E( l )  0  0  0 .
2
3) Let i  j and k  l . Then also U ij ,kl  0 . The proof is equivalent to the proof above.
4) Let i  j and k  l and (i , j )  ( k , l )
No two pairs of indexes can be equal:
If j  k then i  j and k  l => i  l .
This is why at least one index only can appear once. The implication of this is that
U ij,kl = 0. This proves lemma 2.
Applying these results to the formula for the ADF population fit function in (1)
we can calculate the function value F0 for different models.
Given the true models with the known parameter values and kurtosis defined with respect
to  we can study how F0 is influenced by the kurtosis.
In the next section we present some results from applying the above formulas for the
relation between U ( ) and U ( X ) to calculate F0 for different factor models. The
motivation for doing so is to give an indication of the magnitude of the effect due to the
kurtosis. We have chosen  2  1 . The calculations are carried out by using a
combination of LISREL 8.30, PRELIS 2.30 and C++.
3.3 Numerical Results
The kurtosis was defined with respect to the independent vector . Generally the kurtosis
of the observed vector X is not identical to the kurtosis of the vector  and should – as
follows from the central limit theorem – approximate a normal distribution (with 4 = 3),
since the observed x-variable is a linear sum of the ’s. In fact we observe that kurtosis
16
of the observed variables is less extreme than for . For example for the two extreme
values 4 = 2 and 4 = 28, the kurtosis of the observed variables varied between 2.5
and 2.8 and between 8.7 and 15 respectively.
In figure 1, 2 and 3 we have illustrated graphically how F0 drops as a function of the
kurtosis for three of the models. However, for all models included in this study we
observe the same pattern (see table 2): For a given level of misspecification F0 drops
monotonically as kurtosis increases . As can be seen from the results in table 2 the
reduction in the F0 –value is substantial, between 47.4% and 96%, when the same
wrongly specified model is applied on moderately flat data (  4  2 ) and highly peaked
data (  4  2 8).
All these numerical results support the hypothesis that F0 drops monotonically as the
kurtosis increases.
****Insert figure 1, 2 and 3 and table 2 about here****
3.4 Formal Proof
Certainly, these numerical illustrations do not sufficiently prove the generality of the
pattern. Therefore we will outline a formal approach where we first prove that the
quadratic form z 'U ADF
1
z , for a suitable vector6 z decreases with the kurtosis for the
factor model X  A where A  ( x P | D) . We also show that the results are valid for
any structural equation model, under some additional assumptions.
( s    ) or equal to (    ) in population studies. Since we are studying
misspecified models (    ) will generally not be zero.
6
For fit functions z =
17
Proposition 1:
Let two hypothetical distributions have fourth-order moments  4i
(1)
and  4i
( 2)
, with
 4i (1)   4i ( 2) i . Then for the corresponding matrices U ( X ) we have
z 'U 1 ( X ) 1 z  z 'U 2 ( X ) 1 z z , where z is a constant vector.
Proof:
From lemma 2 we know that  4i
(1)
  4i
( 2)
i will imply that U1 ( )  U 2 ( ) in the
Löwner7 sense of inequality. Since we have seen that U ( X )  BU ( ) B' , it follows that
U 1 ( X )  U 2 ( X )  U 1 ( X ) 1  U 2 ( X ) 1  z 'U1 ( X ) 1 z  z 'U 2 ( X ) 1 z z (See Magnus
and Neudecker, 1988, p.22). This proves proposition 1.
Let z ( ) = (    ) and let F0  min  z( )'U ( X ) 1 z( ) denote the minimum of the
population fit function.
Proposition 2:
Let two hypothetical distributions have fourth-order moments  4i
(1)
and  4i
( 2)
, with
 4i (1)   4i ( 2) then for the corresponding minimum values of the fit functions
F0
(r )
 min  z( )'U r ( X ) 1 z( ); r  1, 2 ; we have F0
(1)
 F0
( 2)
.
Proof:
From proposition 1 it is immediately clear that
F0
(1)
 min  z( )'U1 ( X ) 1 z( )  z(1 )'U1 ( X ) 1 z(1 )  z(1 )'U 2 ( X ) 1 z(1 )
 min  z( )'U 2 ( X ) 1 z( )  z( 2 )'U 2 ( X ) 1 z( 2 )  F0 . This proves proposition 2.
( 2)
7
See Browne, 1974, p. 213
18
Note that the propositions also include the whole class of structural equation models,
under some additional assumptions. For example for these general models, Bentler (1983;
1995, pp. 206-207) uses the representation Z  C , where Z is the vector of observed
variables and C is a matrix function of the parameters, and  is the vector of latent
exogenous latent variables. Writing   P , as in section 3.2 under the assumption that
the elements of  are independent, gives the desired result Z  CP  A , where A is
implicit given.
4. Conclusions and discussion
We have shown analytically that F0  min  z ( )'U ( X ) 1 z ( ) , for a whole class of
structural models, will be a non-increasing function of the fourth order moment.
These results are valid for any structural equation model as long as the elements of  are
independent. Furthermore, the results extend to any structural equation model if “increase
in the fourth order moment” is taken in the Löwner sense of inequality: U1 ( )  U 2 ( ) .
But the results do not extend to any structural equation model if “increase in the fourth
order moment” is taken in the univariate sense.
The numerical results seemed to indicate that the more misspecified a model is, the more
substantial the drop in F0 is when the fourth order moment increases. For some of the
models the magnitude of this effect was considerable. Therefor a low chi-square may not
only point to good fit, but it may also point to bad fit but low power.
We feel that we have addressed an important issue in the debate concerning model fit.
Satorra (1990) concludes that the ADF is asymptotically optimal under a variety of
distributions for models that are “structurally correct”. For misspecified models,
however, (    ) will generally not be zero, therefore most of the decrease in the
function value is attributable to the increasing kurtosis reflected in the weight matrix. In
appendix A we show an example where (    ) is fixed and only the kurtosis changes.
19
This means that only the weight matrix will affect the function value. The drop in
F0 shows that the impact from (    ) is almost negligible compared to the effect from
the weight matrix.
20
References
Arminger, G. and Schoenberg, R.J. (1989). Pseudo Maximum Likelihood Estimation and
a Test for Misspecification in Mean and Covariance Structure Models.
Psychometrika 54, 3, 409-425.
Bentler, P.M. (1983). Some contributions to efficient statistics for structural models:
Specification and estimation of moment structures. Psychometrika, 48, 493-517.
Bentler, P.M. (1995). EQS Structural Equations Program Manual. Encino, CA:
Multivariate Software, Inc.
Bollen, K.A. (1989). Structural Equations with latent variables. New York: Wiley.
Browne, M.W. (1974). Generalized Least-Squares estimators in the analysis of
covariance structures. South African Statistical Journal, 8, 1-24.
Browne, M.W. (1984). Asymptotically distribution-free methods for the analysis of
covariance structures. British Journal of Mathematical and Statistical
Psychology, 37, 62-83.
Browne, M. W. & Cudeck, R. (1992). Alternative ways of assessing model fit.
Sociological Methods & Research, 21,2, 230-258.
Chou, C.P., Bentler, P., and Satorra, A (1991): Scaled test statistics and robust standard
errors for non-normal data in covariance structure analysis: A Monte Carlo study.
British Journal of Mathematical and Statistical Psychology, 44, 347-357.
Cramér, H. (1946): Mathematical Methods of Statistics. Princeton: Princeton
University Press
Curran, P.J., West, S.G. and Finch , J.F. (1996). The robustness of test statistics to
nonnormality and specification error in confirmatory factor analysis.
Psychological Methods, 1, 16-29.
Cudeck, R., Browne, M.W. (1992). Constructing a covariance matrix that yields a
specified minimizer and a specified minimum discrepancy function value.
Psychometrika, 57, 3,357-369.
21
Finch, J.F., West, S.G., and MacKinnon D.P. (1997). Effects of Sample Size and
nonnormality on the Estimation of Mediated Effects in Latent variable Models,
Structural Equation Modeling, 2, 87-105.
Fleishman A.I. (1978).A Method For Simulating Non-Normal Distributions
Psychometrika 43,4, 521-532.
Graybill F.A (1983). Matrices with applications in statistics. 2nd Edition. Wadsworth
international Group Belmont, California.
Hu, L., Bentler, P.M., Kano, Y., (1992). Can test statistics in covariance structure
analysis be trusted? Psychological Bulletin, 112, 351-362.
Jöreskog, K. G. and Sörbom, D. (1989). LISREL 7: A guide to the program and
applications (2nd edition). New York:SPSS.
Jöreskog, K. G. and Sörbom, D. (1993). LISREL 8: Structural Equation Modeling with
the Simplis Command Language. Chicago: Scientific Software International.
Jöreskog, K. G. and Sörbom, D. (2000). LISREL 8.30 and PRELIS 2.30. Scientific
Software International, Inc.
Magnus, J.R and Neudecker, H. (1988). Matrix differential calculus with applications in
statistiscs and econometrics. New York, Wiley.
Mattson S. (1997). How to Generate Non-normal data for Simulation of Structural
Equation Models. Multivariate Behavioral Research, 32, 4, 355-373.
Olsen, S.O., Wilcox, J., Olsson, U.H. (2002). Consequences of Ambivalence on
Satisfaction and Loyalty. Working paper, Texas Tech University, USA
22
Olsson Ulf H., Sigurd Villads Troye, Tron Foss and Roy D. Howell (2000).
The performance of ML, GLS, and WLS Estimation in Structural Equation
Modeling under conditions of Misspecification and Nonnormality, Structural
Equation Modeling,7,4, 557-595.
Ramberg J.S., Tadikamalla P.R., Dudewicz E.J. and Mykytka E.F. (1979). A Probability
Distribution and Its Use in Fitting Data. Technometrics, 21,2, 201- 215.
Satorra Albert (1990). Robustness issues in Structural equation modeling: a review of
recent developments. Quality and Quantity 2, 367-386.
Tadikamalla P. R .(1980). On Simulating non-normal distributions
Psychometrika 45, 2, 273-278.
Vale C .David and Maurelli Vincent A.(1983). Simulating nonnormal distributions
Psychometrika 48, 3, 465-471.
Yuan, K.H., and Bentler P.M. (1997). Improving parameter tests in covariance structure
analysis. Computational Statistics & Data Analysis, 26, 177-198.
23
Table 1
4
5
6
7
.
Population model: Ma
x1
x2
x3
1
x4
x5
x6
x7
2
x8
x9
x10
x11
3
Factors 1,2 and 3 are correlated at 0.2. Factors
4,5,6 and7 are orthogonal,
no cross-loadings.
11=21= 31=41=0.7.
52=62=72= 82=0.8.
93=10,3= 11,3=12,3=0.6
14=54= 94=0.2
25=65= 10,5=0.3
36=0.2, 46=0.4, 11,6=0.1, 77=0.3, 87=0.4, 12,7=0.1
x12
Ta 1
Nature of misspecification
Ta 2
Nature of misspecification
Ta 3
Nature of misspecification
1.
1
2
1
2
3
Four minor factors missing
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
1
2
3
Four minor factors missing
Orthogonal factors
Four minor factors missing
Orthogonal factors
Two missing paths
Population model: Mb
Non-orthogonal factors. All factors are correlated at
0.5. Number of indicators varies from 2 to 5 per
factor:
11=21=0.7
32=42=52=62=72=0.7
83=93=0.7
10,4=11,4=0.7
12,5=13,5=14,5=15,5=16,5=0.7
17,6=18,6=0.7
4
Theoretical Model: Tb
5
Nature of misspecification:
6
Three indicators per factor (non-existing paths
included and existing paths
excluded. I.e., Four ’s in the wrong column).
24
Table 1 cont.
7
8
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
1
Population models: Mc1, Mc2 and Mc3
Six non-orthogonal factors (correlated at 0.5), two
orthogonal , “subgroup specific” method factors. All
loadings equal to 0.7, except for the method loadings
which all six are set to 0.3, 0.5 and 0.7 for Mc , Mc
2
1
and Mc3 respectively. The factor structure:
3
11=21= 31=0.7;42=52=62=0.7
73 =83=93=0.7;10,4=11,4= 12,4=0.7
13,5=14,5=15,5=0.7; 16,6=17,6=18,6=0.7 . The
method loading (m):
37=57=77=12,8=14,8=16,8=0.3,0.5,0.7
4
5
Theoretical Model: Tc
Nature of misspecification:
6
The two method factors are omitted.
.
7
8
9
.
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
1
2
Population models: Md1 and Md2
Six non-orthogonal factors (correlated at 0.5),
three orthogonal ”method” factors. All loadings
equal to 0.7 (see Mc ), except for the method
loadings which all 18 are set to 0.3 and 0.5 for
Mc , Mc and Mc respectively. All the
3
4
1
2
3
observed variables load on the method factors
(MTMM-model)
17=47=77=10,7=13,7=16,7=0.3,0.5
28=58=88=11,8=14,8=17,8=0.3,0.5
39=69=99=12,9=15,9=18,9=0.3,0.5
5
6
Theoretical Model: Td
Nature of misspecification: The three method
factors are omitted
2
25
Table 1 cont.
x1
x2
x3
x4
4
Population model: Me1 , Me2, Mf1 and Mf2
Three non-orthogonal factors (correlated at 0.5),
one orthogonal ”method” factor. All loadings
equal to 0.7, except for the “method loadings”
which are set to +/-0.3 and +/-0.5 The structure:
11=21=31=41=51=0.7.
62=72=82=92=10,2=0.7.
11,3=12,3=13,3=14,3=15,3=0.7.
14=54=64=10,4=11,4=15,4= 0.3 and 0.7 for
Me1 and Me2 respectively. 14=64=11,4= 0.3
and 0.7 and 54=10,4=15,4= -0.3 and -0.7
for Mf1 and Mf2 respectively.
1
x5
x6
x7
x8
x9
x10
x11
x12
2
3
x13
x14
Theoretical Models: Te and Tf
Nature of misspecification:
The method factor is omitted
x15
The error terms are not included in the figures
Table 2: F0 ADF as a function of kurtosis for all models *)
Model\4
2
3
5
8
10
12
18
22
28
**) %F0
Ta1
0.115
0.111
0.103
0.094
0.089
0.084
0.073
0.067
0.060
47.4%
Ta2
0.203
0.169
0.138
0.115
0.105
0.098
0.082
0.073
0.066
67.5%
Ta3
0.746
0.612
0.462
0.345
0.297
0.262
0.196
0.168
0.140
81.3%
Tb
Tc1
0.571
0.503
0.405
0.315
0.276
0.246
0.187
0.162
0.134
76.5%
0.108
0.098
0.082
0.066
0.058
0.052
0.040
0.034
0.028
73.7%
Tc2
0.382
0.380
0.180
0.117
0.095
0.078
0.054
0.044
0.035
90.8%
Tc3
Td1
Td2
Te1
Te2
Tf1
Tf2
0.629
0.424
0.230
0.136
0.107
0.088
0.058
0.047
0.037
94.2%
0.954
0.842
0.678
0.529
0.463
0.412
0.312
0.269
0.223
76.6%
1.599
1.491
1.052
0.739
0.619
0.533
0.378
0.317
0.256
83.9%
0.090
0.083
0.072
0.059
0.053
0.048
0.037
0.032
0.027
70%
0.670
0.400
0.220
0.130
0.110
0.087
0.057
0.047
0.036
94.6%
0.170
0.150
0.110
0.085
0.073
0.064
0.046
0.039
0.032
81%
0.970
0.490
0.250
0.140
0.110
0.091
0.059
0.048
0.037
96%
*)E.g., Tc2 will mean that we use the theoretical model Tc for the true model Mc2
**) Reduction in F0 (in %) when the kurtosis increases from 2 to 28
26
Figure 1: F0 ADF as a function of kurtosis for models Ta1, Ta2, Ta3.
0,8
0,7
F-value
Fo
0,6
0,5
Ta1
Ta2
Ta3
0,4
0,3
0,2
0,1
0
0,00
0 2
45,006
8 10,00
10 12 15,00
14 16 1820,00
20 22 25,00
24 26 28
30,00
30
4.alph4
order
alpha4
Figure 2: F0 ADF as a function of kurtosis for models Tc1, Tc2, Tc3.
0,7
0,6
Fo
0,5
Tc1
0,4
Tc2
0,3
Tc3
0,2
0,1
0
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30
alpha4
alpha4
27
Figure 3: F0 ADF as a function of kurtosis for models Td1, Td2.
1,8
1,6
1,4
Fo
1,2
1
Td1
0,8
Td2
0,6
0,4
0,2
0
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30
alph4
alpha4
Appendix A
The difference (    )
Table A1 shows F0 for model Ta1 for ADF as a function of kurtosis for Ma when the
difference (    ) is being fixed at the value it takes for 4 = 10. F0 drops from 0.1198
to 0.0619, which is about 48%. One can compare the results here with the results in table
2 where (    ) is not fixed. Obvious the impact from (    ) will be almost
negligible compared to the effect from the weight matrix.
Table A1
4
2
3
5
8
10
12
18
22
28
F0
0.1198
0.1134
0.1042
0.0942
0.0889
0.0843
0.0737
0.0683
0.0619
28
Appendix B
Olsen et. al (2002) tested a nine items measurement model for the constructs
ambivalence, satisfaction and loyalty. The kurtosis of the measured variables ranged from
2.0 to 5.84. The sample size was 1194. The model was estimated both with ML and ADF.
The ADF chi-square was 19.76 (df = 24). The model was also estimated using normal
scores (univariate kurtosis ranging from 2.4 to 3.0), and the ADF chi-square now became
32.86 (df = 24).
Download