2. Kurtosis and the Asymptotic Covariance Matrix

1 Does the ADF fit function decrease when the kurtosis increases? Ulf Henning Olsson Norwegian School of Management Tron Foss Norwegian School of Management Sigurd V. Troye Norwegian School of Economics and Business Administration July, 2002 Forthcoming in British Journal of Mathematical and Statistical Psychology Address correspondence to: Ulf H. Olsson Department of Economics Norwegian School of Management BI Sandvika, Norway ulf.h.olsson@bi.no +47 67557397 Phone + 47 67557675 Fax ____________________ The authors appreciate the constructive comments of the Editor, Dr. Patricia Lovie, and two anonymous referees. 2 Abstract In this study we demonstrate how the asymptotically distribution free (ADF) fit function is affected by (excessive) kurtosis in the observed data. More specifically we address how different levels of univariate kurtosis affect fit-values (and therefore fit-indices) for misspecified factor models. By using numerical calculation,  we show (for 13 factor models) that the probability limit F0 of F , for the ADF fit function decreases considerably as the kurtosis increases. We also give a formal proof that the value of F0 decreases monotonically with the kurtosis for a whole class of structural equation models. 3 Does the ADF fit function decrease when the kurtosis increases? 1. Introduction It is well known that different estimation procedures such as Maximum Likelihood (ML), Generalized Least Squares (GLS) and Asymptotically Distribution Free Estimator (ADF, or WLS, Weighted Least Squares, which is used in the LISREL language) will produce estimates for a structural equation or covariance structure model that will converge to the same optimum and have the same asymptotic properties (Browne 1974, 1984) when models are correctly specified and the observed vector X has no kurtosis1. Under such ideal conditions the choice between methods is thus arbitrary. In the more realistic cases of misspecification and/or non-normal data, the ML, GLS and ADF will not generally give asymptotically converging estimates (Arminger & Schoenberg (1989)). One obvious reason for this is that the weight elements in the respective fit functions (discrepancy functions) differ. Whereas the asymptotic values of the ML and GLS fit functions depend on the type and the degree of misspecification, they do not depend on the 4th order moments, which is the case for ADF. There is empirical evidence that ML and GLS are reasonably robust to moderate deviations from normality with respect to parameter estimates and (empirical) fit (Finch, West and MacKinnon 1997, Chou, Bentler and Satorra 1991). However, Cudeck & Browne (1992) who found Maximum Likelihood estimates to be robust with respect to several levels of misspecification (lack of fit), also observed unexpectedly ”: ….that there are situations where an incorrect assumption of normality leads to an unjustified impression that the model under consideration fits well”. When the model did not hold, the fit improved with more extreme values of skewness and kurtosis. It is known that excessive kurtosis can lead to incorrect chi-squares and incorrect  asymptotic covariance matrix, ACOV ( ) (Bollen, 1989). Browne (1984) p.64:” If all fourth-order cumulants are equal to zero, …… We shall say that the multivariate distribution of X ’has no kurtosis’.” If the distribution of x has no kurtosis, the class of BGLS estimators includes GLS and ML (Browne, 1984). 1 4 ADF as an Asymptotic Distribution Free Estimator, appears to be a natural choice in large samples when the normality criterion is not met. Recent research indicates (Olsson, Foss, Troye and Howell, 2000) that the performance of ADF, with respect to empirical fit (i.e., the discrepancy between the predicted and the observed covariance matrix as measured by e.g., RMSEA and chi-square) improves with increasing levels of peakedness – especially when models are severely misspecified. The purpose of the present paper is first to demonstrate how the minimum of the ADF fit function is affected by kurtosis and misspecification. Second, we will formally prove that  under some additional assumptions the probability limit F0 of F , for the ADF fit function decreases monotonically with the kurtosis of the underlying random variables. The rationale for demonstrating the performance of ADF both numerically and formally, is that whereas an analytical proof will show the general nature of the observed pattern, the numerical demonstration will indicate the magnitude of the effects for given levels of misspecification and kurtosis. 2. Kurtosis and the Asymptotic Covariance Matrix Let us start with some notational conventions about the asymptotic covariance matrix and kurtosis. Using the notation of Browne (1984), let Z be a stochastic q  1 vector, which has a distribution with finite fourth order moments, with a q  q population covariance matrix  . Let S be an unbiased estimator of  obtained from N independent observations. Let s = vecs( S )  ( s11 , s12 , s 22 , s13 , s 23 , s33 ,......, s qq ) be a u 1 vector, where u = 1 q ( q  1) . The 2 covariances sij are the u elements we find above and on the diagonal in S. Then the finite sample distribution for  s  N s , will have a u  u covariance matrix cov( s ,  s ' ) . By the term asymptotic covariance matrix associated with the vector Z we will mean L cov( s , s ' )  lim cov( s ,  s ' ) . N  5 Later in this paper the general vector Z will take on different labels like X and  depending on the situation. Univariate kurtosis can be defined in the following manner: Let X be a random variable with a population mean of 1 , then the j-th order central moment is defined as  j  E( X  1 ) j , j  1. Univariate kurtosis is defined as2 4  4 . We also use  4 to denote kurtosis when X is a vector. Then  4 will be a vector 22 of univariate kurtosis values.  4 is a population parameter which can be estimated by m4 m2 2 , where m j  (1 / N ) ( X  X ) j , j  1 . In order to understand how and why the performance of ADF may be affected by kurtosis, it is useful to give a brief presentation of the weight elements in the respective fit function (discrepancy function)3. The ADF fit function can be expressed as  1 FADF ( )  ( s    )'U ADF ( s    ) ,  where s  vecs(S ) and    vecs(( )) , and where U ADF is a consistent estimator of the asymptotic covariance matrix U ADF . The ADF estimator uses a weight matrix with a typical element that is a combination of estimates of second and fourth order moments: 2 3 In order for the reference normal distribution to have kurtosis of zero, 3 is often subtracted. A fit- or discrepancy function is a scalar valued function F ( S , ) of two symmetric q  q matrices S and  with the following properties: (a) F ( S , ) 0 (b) F ( S , ) =0  S =  and (c) F ( S , ) is twice continous differentiable function of S and  (Browne, 1984). 6   U ADF   sijkl  sij s kl ij , kl sijkl   (x i i j k  l , where  xi )( x j  x j )( x k  x k )( xl  xl ) N is an estimate of  ijkl  E{( xi  Exi )( x j  Ex j )( x k  Exk )( xl  Exl )} The differences in terms of the fit functions between the alternative estimators are carried over to estimations of empirical fit (see Olsson, Foss, Troye and Howell, (2000)). The reason for this is that all fit indices are directly derived from the minimum value of the discrepancy function F(S,  ( ) ), where S is the sample covariance matrix,  ( ) is the covariance matrix implied by a specific theoretical model and  a is vector of all the free   parameters. This minimum value denoted by F  F ( S , ( )) attempts to measure the “deviation” between the sample covariance matrix S and the estimated covariance matrix  ( ) (Jöreskog & Sörbom, 1993, p.122). Whether the fit indices are adjusted for degrees of freedom, or are adjusted for the sample  size or both, or take a baseline4 model into account, the estimate (or value) F has a  central place in the calculation of the specific index. Thus F not only determines the solutions produced in terms of parameter estimates and estimated variance-covariance matrices, but enters directly into the calculation of fit. 3. Numerical illustrations and analytical results of the Performance of ADF as a Function of Misspecification and Kurtosis In a study by Olsson, Foss, Troye & Howell (2000), misspecification and kurtosis were  found to produce a significant interaction effect on F ( S , ( )) for ADF. Empirical fit was demonstrated to improve with increasing kurtosis and this effect was increasing with 4 Incremental fit indices (Bollen, 1989) measures how much better the model fits as compared to a baseline model. Since the baseline model is more severely misspecified than the hypothetical model and if the effect 7 higher levels of misspecification. This result is consistent with the findings in a simulation study reported by Curran, West and Finch (1996, p.25) who observed:" The most surprising findings related to the behavior of SB (Satorra Bentler chi-square) and ADF test statistics was under the simultanous conditions of misspecification and multivariate nonnormality ...: The expected values of these test statistics markedly decreased with increasing nonnormality. ….Although the specific reason for this loss of power is currently not known, we theorize that it is due to the inclusion of the fourthorder moment (kurtosis) in computation of SB and ADF test statistics, ..". Curran, West and Finch studied three distributional conditions: Normal distribution (univariate skewness = 0, univariate kurtosis = 3), moderately non-normal distibution (univariate skewness = 2.0, univariate kurtosis = 10.0) and severely non-normal distribution (univariate skewness = 3.0, univariate kurtosis = 24.0).  The results above seem to indicate that the fit function value F ( S , ( )) for ADF is somewhat deflated when the data show excessive kurtosis (see appendix B for illustrations using real data). Before showing formally that this is to be expected, given some assumptions which will be discussed in the next section, we will demonstrate numerically by re-investigating and generalizing the results of Olsson, Foss, Troye & Howell (2000), using population data, how fit is influenced by kurtosis and misspecification. We will use a variety of general factor models and different types and levels of misspecification. 3.1 Design and Methodology In this study we will focus on the confirmatory factor model (measurement model) X   x   where we use the conventional notation established for the LISREL model (Jöreskog and Sörbom, 1989): X '  ( x1 , x 2 ,..., x q ) are the observed or measured variables  x is the matrix of factor loadings  '  (1 ,  2 ,..... k ) are the latent variables or factors from the 4th order moment is more pronounced for more severly misspecified models, the asymptotic values of some of these fit indices would of course indicate worse fit with larger kurtosis. 8  '  ( 1 ,  2 ,....,  q ) are the error terms (unique part). It is assumed that the  's and  ' s are random variables with zero means, and the  ' s are uncorrelated with  's. All observed variables are measured in deviations from their mean. The assumed model implies that the covariance matrix of X is    x  x ' ; where  and  are the covariance matrices of  and  respectively. In simulation studies the most common procedure is to generate "sample data", use these for parameter estimation etc., and then replicate this procedure. For generating nonnormal sample data there are several approaches: Fleishman (1978) presents a procedure for drawing non-normal data from a distribution with prescribed expectation, variance, skewness and kurtosis. Tadikamalla (1980) presents several methods for generating non-normal data with prescribed skewness and kurtosis. For some of the methods it is also possible to calculate the probability density functions and the cumulative distributions. Both of these studies are only dealing with univariate distributions. Vale and Maurelli (1983) extend the method of Fleishman (1978) to multivariate distributions. A method described in Ramberg, Tadikamalla, Dudewicz, and Mykytka (1979) and further developed in Mattson (1997) for generating non-normal data for structural equation models shows that by controlling univariate skewness and kurtosis on pre-specified random latent variables and error terms, observed variables can be made to have a wide range of univariate skewness and kurtosis characteristics according to the pre-specified model. Since the ADF method has been shown to be reliable only for large sample sizes (N>1000 for relative simple models, Curran et al., 1996; N > 5000 for more complex models, Hu, Bentler & Kano, 1992) we have chosen to calculate the asymptotic covariance matrix for the population instead of generating large samples and then estimate the asymptotic covariance matrix from this sample. Our approach is based on the work of Ramberg, Tadikamalla, Dudewicz, and Mykytka (1979) and Mattson (1997). 9 Since the procedure is not straightforward we will describe it more in detail in the next section. We therefore limit our study to the population value F0  F (, ( 0 ), U ADF ) 5, referred to as the discrepancy due to approximation (Browne & Cudeck, 1992). ( 0 ) is the modelgenerated matrix i.e., the best fit of the model to the population covariance matrix  .  , ( 0 ) and U ADF are usually fixed unknown matrices. In our study they will be known since  and U ADF will be calculated from the true model, i.e., the model specified to generate the data. To assess and extend the generalizability of the findings reported by Olsson, Foss, Troye and Howell (2000) a wide range of population models were defined and different “theoretical” models (denoted as Ta to Tf in table 1) were applied. Consequently, the degree and nature of misspecification varied as a function of which population model each theoretical model was applied to. A total of eleven population models were designed covering six classes of population models (true models): Ma, Mb, Mc, Md , Me, Mf (see table 1). As we see the population models vary both with respect to size and structure. The following types of misspecification were operationalized: Misspecification in terms of parsimony. For this kind of misspecification we can distinguish the following: 1) Misspecification by excluding whole sub-structures of the generating model, that is by omitting factors and associated paths present in the population model. 2) Misspecification by excluding single paths in the model.  5  F ( S , ( ), U ADF ) will converge to F ( ,  ( 0 ), U ADF ) in probability when N . 10 We do not claim that the types of misspecification exhaust all relevant possibilities, but we think the ones addressed in the study, represent some of the more typical ways population models are misrepresented in theoretical models. ****Insert table 1 about here**** The distributions of the population data were divided into three different categories of kurtosis. The following levels of kurtosis were specified: 1 Negligible and no kurtosis with 4 = 2.0 and 3.0. 2 Medium to severe kurtosis with 4 =5.0, 8.0, 10. and 12.0 and 3 extreme kurtosis with 4 =18.0, 22.0 and 28.0 Here 4 is a vector (  4 '  ( 41 , 42 ,...., 4i ,...) ). When we write 4 = k we mean that all the univariate kurtosis are equal to k, i.e.,  4 '  (k , k ,..., k ,...) . (The 4th order moment  4 is equal to the kurtosis  4 when all the second order moments  2  1 ). 3.2 Calculation of the asymptotic population covariance matrix for the ADF Fit Function From the definition of the ADF fit function in section 2 it follows that the population fit function takes the form 1 (1) FADF ( )  (    )'U ADF (    ) , 11 where U ADF ij,kl   ijkl   ij kl , i  j and k  l , and  ijkl is the 4th order population moments and  ij kl is the product of second order population moments.   vecs()  ( 11 ,  12 ,  22 ,  13 ,  23 ,  33 ,  ,  qq ) is a u 1 vector where the covariances  ij are the elements we find above and on the diagonal in  . We start with the “true” model X   x   where  x ,  and   are known. As in the general model from section 3.1 we have assumed that X is a q  1 vector of observed or measured variables,  x a q  k matrix of factor loadings,  a k  1 vector of latent variables and  a q  1 vector of uncorrelated error terms. The  ' s are uncorrelated with  's. Following the tradition we assume that E ( )  0 and Var ( )  1 . The covariance of  , E (  ' )   will therefore be a k  k matrix with 1’s along the diagonal. We also assume that E ( )  0 and we have Var( )   . Since  is positive definite there will exist a k  k matrix P so that   PP' . Concerning  and  we make two further assumptions: As in the simulation approach by Mattson (1997) the assumption that   P1 where 1 is a k  1 vector of independent stochastic variables , and E(1 )  0 and E( 1 '1 )  I . The covariance of  is therefore  . In the same way we will assume that   D 2 , where  2 is a q  1 vector of independent stochastic variables where E( 2 )  0 and E( 2 ' 2 )  I , and    D     1 0 .. 0 0 0  2 .. .. .. .. .. 0   0  is a q  q matrix, with the standard ..    q   deviations of  along the main diagonal, and zero elsewhere. The covariance matrix of  is therefore   . Then the true model in this study can be written as: 12 X   x   , where   P1 and   D 2 We will show that the elements of U ADF are functions of  x ,  ,   and the 4th order moment  4 where  4 '  (  41 ,  42 ,....,  4,k  q ) . Note that  4i is the 4th order moment for element i of the vector  = ( 1 ). 2 To do this it is convenient to write the true model in a simpler form: (2) X   x    A ; where A is a q  (q  k ) matrix and  is a (q  k )  1 vector of independent variables where E( )  0 and E ( ' )  I . The argument for writing X   x    A is as follows: X   x     x P 1 D 2  ( x P | D)( 1 )  A 2 A = ( x P | D) , is composed of the q  k matrix  x P and the q  q matrix D. A is therefore a q  (q  k ) matrix. Note that  is not a vector of latent variables (not a  -vector), but rather a vector of random drawings from a given distribution. We have assumed the ’s to be independent. But the fact that the ’s are independent does not imply that the  ’s (   P1 ) are independent, they will even be correlated. But it does imply that  and  are independent vectors and that the elements of  are independent. Let U ( ) be the asymptotic covariance matrix associated with  and let U ( X ) the asymptotic covariance matrix associated with X , and let p = q+k. The following two lemmas show how U ( ) easily can be transformed to U ( X ) and that U ( ) will have zero elements outside the main diagonal and that the elements on the main diagonal will be of 2 the form  4,i   2 or 2 2 . 13 Lemma1 Let  be a p  1 vector of independent variables where all the elements have mean zero and equal variance  2 . Let X  A where A is a q  p matrix. Then   U ( X )  BU ( ) B' ; where B  K q ' ( A  A) K p ' and where K q and K p are defined in Browne (1974). Proof: Let ( ) be the covariance matrix of  and let ( X ) be the covariance matrix of X .    vec(( )) is a p 2  1 vector,  X  vec(( X )) is a q 2  1 vector,    vec( ' ) is a p 2  1 vector and  X  vec(XX ' ) is a q 2  1 vector. Then we can write U ( )  E ( K p '  ( K p '  )' )  K p '  ( K p '  )'  E ( K p ' (    '    ' ) K p ) . In the same manner we can write U ( X )  E ( K q ' X ( K q ' X )' )  K q ' X ( K q ' X )'  E ( K q ' ( X  X ' X  X ' ) K q ) Then   BU ( ) B' = K q ' ( A  A) K p ' E ( K p ' (    '    ' ) K p ) K p ( A  A)' K q =   E ( K q ' ( A  A) K p ' K p ' (   '    ' ) K p K p ( A  A)' K q ) = E ( K q ' ( A  A) M p ' (    '    ' ) M p ( A  A)' K q ) = E ( K q ' M q ' ( A  A)(    '    ' )( A  A)' M q K q ) = E ( K q ' ( A  A)(    '    ' )( A  A)' K q ) = E ( K q ' ( A  A)    ' ( A  A)' K q  K q ' ( A  A)    ' ( A  A)' K q ) = E ( K q ' vec( A ' A' )( K q ' vec( A ' A' ))' K q ' vec( A( ) A' )( K q ' vec( A( ) A' ))' ) = E ( K q ' vec( XX ' )( K q ' vec( XX ' ))' K q ' vec(( X ))( K q ' vec(( X )))' ) = E ( K q ' ( X  X ' X  X ' ) K q ) = U (X ) .  Here M p  K p K p is a symmetric idempotent matrix with the following property ( A  A) M p  M q ( A  A) . We have also used that ( A  A)   ( A  A)vec( ' )  vec( A ' A' ) (see Browne 1974, pp. 207 - 208). This proves lemma 1. 14 Lemma 2 If  is a p  1 vector of independent variables where all the elements have mean zero and equal variance  2 , the asymptotic covariance matrix U ( ) will have zero elements outside the main diagonal. The elements on the main diagonal will be of the form  4i   2 2 or 2 2 . Proof: From above (Cramer,1946) we know that if  is a p  1 vector of independent variables then E ( i  j )  E ( i ) E ( j ) for i  j . Let for simplicity U  U ( ) . m k m k We have U ij,kl  E ( i j  k  l )  E ( i j ) E ( k  l ) . An element on the main diagonal of the asymptotic covariance matrix will either be of the form U ii,ii or of the form U ij ,ij where i  j . U ii,ii  E ( i i i i )  E ( i i ) E ( i i ) =  4i   2 2 . Assume i  j then U ij,ij  E ( i j i j )  E ( i j ) E ( i j ) = E (i 2 j 2 )  ( E (i ) E ( j ))( E (i ) E ( j ))  E (i 2 ) E ( j 2 )  0  2 2 An element outside the main diagonal will be of the form: U ij,kl where (i , j )  ( k , l ) i  j and k  l We will show that this covariance is zero: 1) Let i  j , k  l and i  k . I.e., we calculate the covariance between two variances: U ij,kl  U ii,kk  E ( i i k  k )  E ( i i ) E ( k  k )  E( i  k )  E( i ) E( k )  0 2 2 2 2 2) Let i  j and k  l . I.e., we calculate the covariance of a variance and a “real” covariance. Here the indexes k or l must appear only once. If this is true we will have 15 U ij,kl = 0. Without loss of generality we can assume that the index k appears only once. Then we have U ij,kl  U ii,kl  E ( i i k  l )  E ( i ) E ( k  l )  2 E( i i l ) E( k )  E( i ) E( k ) E( l )  0  0  0 . 2 3) Let i  j and k  l . Then also U ij ,kl  0 . The proof is equivalent to the proof above. 4) Let i  j and k  l and (i , j )  ( k , l ) No two pairs of indexes can be equal: If j  k then i  j and k  l => i  l . This is why at least one index only can appear once. The implication of this is that U ij,kl = 0. This proves lemma 2. Applying these results to the formula for the ADF population fit function in (1) we can calculate the function value F0 for different models. Given the true models with the known parameter values and kurtosis defined with respect to  we can study how F0 is influenced by the kurtosis. In the next section we present some results from applying the above formulas for the relation between U ( ) and U ( X ) to calculate F0 for different factor models. The motivation for doing so is to give an indication of the magnitude of the effect due to the kurtosis. We have chosen  2  1 . The calculations are carried out by using a combination of LISREL 8.30, PRELIS 2.30 and C++. 3.3 Numerical Results The kurtosis was defined with respect to the independent vector . Generally the kurtosis of the observed vector X is not identical to the kurtosis of the vector  and should – as follows from the central limit theorem – approximate a normal distribution (with 4 = 3), since the observed x-variable is a linear sum of the ’s. In fact we observe that kurtosis 16 of the observed variables is less extreme than for . For example for the two extreme values 4 = 2 and 4 = 28, the kurtosis of the observed variables varied between 2.5 and 2.8 and between 8.7 and 15 respectively. In figure 1, 2 and 3 we have illustrated graphically how F0 drops as a function of the kurtosis for three of the models. However, for all models included in this study we observe the same pattern (see table 2): For a given level of misspecification F0 drops monotonically as kurtosis increases . As can be seen from the results in table 2 the reduction in the F0 –value is substantial, between 47.4% and 96%, when the same wrongly specified model is applied on moderately flat data (  4  2 ) and highly peaked data (  4  2 8). All these numerical results support the hypothesis that F0 drops monotonically as the kurtosis increases. ****Insert figure 1, 2 and 3 and table 2 about here**** 3.4 Formal Proof Certainly, these numerical illustrations do not sufficiently prove the generality of the pattern. Therefore we will outline a formal approach where we first prove that the quadratic form z 'U ADF 1 z , for a suitable vector6 z decreases with the kurtosis for the factor model X  A where A  ( x P | D) . We also show that the results are valid for any structural equation model, under some additional assumptions. ( s    ) or equal to (    ) in population studies. Since we are studying misspecified models (    ) will generally not be zero. 6 For fit functions z = 17 Proposition 1: Let two hypothetical distributions have fourth-order moments  4i (1) and  4i ( 2) , with  4i (1)   4i ( 2) i . Then for the corresponding matrices U ( X ) we have z 'U 1 ( X ) 1 z  z 'U 2 ( X ) 1 z z , where z is a constant vector. Proof: From lemma 2 we know that  4i (1)   4i ( 2) i will imply that U1 ( )  U 2 ( ) in the Löwner7 sense of inequality. Since we have seen that U ( X )  BU ( ) B' , it follows that U 1 ( X )  U 2 ( X )  U 1 ( X ) 1  U 2 ( X ) 1  z 'U1 ( X ) 1 z  z 'U 2 ( X ) 1 z z (See Magnus and Neudecker, 1988, p.22). This proves proposition 1. Let z ( ) = (    ) and let F0  min  z( )'U ( X ) 1 z( ) denote the minimum of the population fit function. Proposition 2: Let two hypothetical distributions have fourth-order moments  4i (1) and  4i ( 2) , with  4i (1)   4i ( 2) then for the corresponding minimum values of the fit functions F0 (r )  min  z( )'U r ( X ) 1 z( ); r  1, 2 ; we have F0 (1)  F0 ( 2) . Proof: From proposition 1 it is immediately clear that F0 (1)  min  z( )'U1 ( X ) 1 z( )  z(1 )'U1 ( X ) 1 z(1 )  z(1 )'U 2 ( X ) 1 z(1 )  min  z( )'U 2 ( X ) 1 z( )  z( 2 )'U 2 ( X ) 1 z( 2 )  F0 . This proves proposition 2. ( 2) 7 See Browne, 1974, p. 213 18 Note that the propositions also include the whole class of structural equation models, under some additional assumptions. For example for these general models, Bentler (1983; 1995, pp. 206-207) uses the representation Z  C , where Z is the vector of observed variables and C is a matrix function of the parameters, and  is the vector of latent exogenous latent variables. Writing   P , as in section 3.2 under the assumption that the elements of  are independent, gives the desired result Z  CP  A , where A is implicit given. 4. Conclusions and discussion We have shown analytically that F0  min  z ( )'U ( X ) 1 z ( ) , for a whole class of structural models, will be a non-increasing function of the fourth order moment. These results are valid for any structural equation model as long as the elements of  are independent. Furthermore, the results extend to any structural equation model if “increase in the fourth order moment” is taken in the Löwner sense of inequality: U1 ( )  U 2 ( ) . But the results do not extend to any structural equation model if “increase in the fourth order moment” is taken in the univariate sense. The numerical results seemed to indicate that the more misspecified a model is, the more substantial the drop in F0 is when the fourth order moment increases. For some of the models the magnitude of this effect was considerable. Therefor a low chi-square may not only point to good fit, but it may also point to bad fit but low power. We feel that we have addressed an important issue in the debate concerning model fit. Satorra (1990) concludes that the ADF is asymptotically optimal under a variety of distributions for models that are “structurally correct”. For misspecified models, however, (    ) will generally not be zero, therefore most of the decrease in the function value is attributable to the increasing kurtosis reflected in the weight matrix. In appendix A we show an example where (    ) is fixed and only the kurtosis changes. 19 This means that only the weight matrix will affect the function value. The drop in F0 shows that the impact from (    ) is almost negligible compared to the effect from the weight matrix. 20 References Arminger, G. and Schoenberg, R.J. (1989). Pseudo Maximum Likelihood Estimation and a Test for Misspecification in Mean and Covariance Structure Models. Psychometrika 54, 3, 409-425. Bentler, P.M. (1983). Some contributions to efficient statistics for structural models: Specification and estimation of moment structures. Psychometrika, 48, 493-517. Bentler, P.M. (1995). EQS Structural Equations Program Manual. Encino, CA: Multivariate Software, Inc. Bollen, K.A. (1989). Structural Equations with latent variables. New York: Wiley. Browne, M.W. (1974). Generalized Least-Squares estimators in the analysis of covariance structures. South African Statistical Journal, 8, 1-24. Browne, M.W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62-83. Browne, M. W. & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21,2, 230-258. Chou, C.P., Bentler, P., and Satorra, A (1991): Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: A Monte Carlo study. British Journal of Mathematical and Statistical Psychology, 44, 347-357. Cramér, H. (1946): Mathematical Methods of Statistics. Princeton: Princeton University Press Curran, P.J., West, S.G. and Finch , J.F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29. Cudeck, R., Browne, M.W. (1992). Constructing a covariance matrix that yields a specified minimizer and a specified minimum discrepancy function value. Psychometrika, 57, 3,357-369. 21 Finch, J.F., West, S.G., and MacKinnon D.P. (1997). Effects of Sample Size and nonnormality on the Estimation of Mediated Effects in Latent variable Models, Structural Equation Modeling, 2, 87-105. Fleishman A.I. (1978).A Method For Simulating Non-Normal Distributions Psychometrika 43,4, 521-532. Graybill F.A (1983). Matrices with applications in statistics. 2nd Edition. Wadsworth international Group Belmont, California. Hu, L., Bentler, P.M., Kano, Y., (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351-362. Jöreskog, K. G. and Sörbom, D. (1989). LISREL 7: A guide to the program and applications (2nd edition). New York:SPSS. Jöreskog, K. G. and Sörbom, D. (1993). LISREL 8: Structural Equation Modeling with the Simplis Command Language. Chicago: Scientific Software International. Jöreskog, K. G. and Sörbom, D. (2000). LISREL 8.30 and PRELIS 2.30. Scientific Software International, Inc. Magnus, J.R and Neudecker, H. (1988). Matrix differential calculus with applications in statistiscs and econometrics. New York, Wiley. Mattson S. (1997). How to Generate Non-normal data for Simulation of Structural Equation Models. Multivariate Behavioral Research, 32, 4, 355-373. Olsen, S.O., Wilcox, J., Olsson, U.H. (2002). Consequences of Ambivalence on Satisfaction and Loyalty. Working paper, Texas Tech University, USA 22 Olsson Ulf H., Sigurd Villads Troye, Tron Foss and Roy D. Howell (2000). The performance of ML, GLS, and WLS Estimation in Structural Equation Modeling under conditions of Misspecification and Nonnormality, Structural Equation Modeling,7,4, 557-595. Ramberg J.S., Tadikamalla P.R., Dudewicz E.J. and Mykytka E.F. (1979). A Probability Distribution and Its Use in Fitting Data. Technometrics, 21,2, 201- 215. Satorra Albert (1990). Robustness issues in Structural equation modeling: a review of recent developments. Quality and Quantity 2, 367-386. Tadikamalla P. R .(1980). On Simulating non-normal distributions Psychometrika 45, 2, 273-278. Vale C .David and Maurelli Vincent A.(1983). Simulating nonnormal distributions Psychometrika 48, 3, 465-471. Yuan, K.H., and Bentler P.M. (1997). Improving parameter tests in covariance structure analysis. Computational Statistics & Data Analysis, 26, 177-198. 23 Table 1 4 5 6 7 . Population model: Ma x1 x2 x3 1 x4 x5 x6 x7 2 x8 x9 x10 x11 3 Factors 1,2 and 3 are correlated at 0.2. Factors 4,5,6 and7 are orthogonal, no cross-loadings. 11=21= 31=41=0.7. 52=62=72= 82=0.8. 93=10,3= 11,3=12,3=0.6 14=54= 94=0.2 25=65= 10,5=0.3 36=0.2, 46=0.4, 11,6=0.1, 77=0.3, 87=0.4, 12,7=0.1 x12 Ta 1 Nature of misspecification Ta 2 Nature of misspecification Ta 3 Nature of misspecification 1. 1 2 1 2 3 Four minor factors missing x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 1 2 3 Four minor factors missing Orthogonal factors Four minor factors missing Orthogonal factors Two missing paths Population model: Mb Non-orthogonal factors. All factors are correlated at 0.5. Number of indicators varies from 2 to 5 per factor: 11=21=0.7 32=42=52=62=72=0.7 83=93=0.7 10,4=11,4=0.7 12,5=13,5=14,5=15,5=16,5=0.7 17,6=18,6=0.7 4 Theoretical Model: Tb 5 Nature of misspecification: 6 Three indicators per factor (non-existing paths included and existing paths excluded. I.e., Four ’s in the wrong column). 24 Table 1 cont. 7 8 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 1 Population models: Mc1, Mc2 and Mc3 Six non-orthogonal factors (correlated at 0.5), two orthogonal , “subgroup specific” method factors. All loadings equal to 0.7, except for the method loadings which all six are set to 0.3, 0.5 and 0.7 for Mc , Mc 2 1 and Mc3 respectively. The factor structure: 3 11=21= 31=0.7;42=52=62=0.7 73 =83=93=0.7;10,4=11,4= 12,4=0.7 13,5=14,5=15,5=0.7; 16,6=17,6=18,6=0.7 . The method loading (m): 37=57=77=12,8=14,8=16,8=0.3,0.5,0.7 4 5 Theoretical Model: Tc Nature of misspecification: 6 The two method factors are omitted. . 7 8 9 . x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 1 2 Population models: Md1 and Md2 Six non-orthogonal factors (correlated at 0.5), three orthogonal ”method” factors. All loadings equal to 0.7 (see Mc ), except for the method loadings which all 18 are set to 0.3 and 0.5 for Mc , Mc and Mc respectively. All the 3 4 1 2 3 observed variables load on the method factors (MTMM-model) 17=47=77=10,7=13,7=16,7=0.3,0.5 28=58=88=11,8=14,8=17,8=0.3,0.5 39=69=99=12,9=15,9=18,9=0.3,0.5 5 6 Theoretical Model: Td Nature of misspecification: The three method factors are omitted 2 25 Table 1 cont. x1 x2 x3 x4 4 Population model: Me1 , Me2, Mf1 and Mf2 Three non-orthogonal factors (correlated at 0.5), one orthogonal ”method” factor. All loadings equal to 0.7, except for the “method loadings” which are set to +/-0.3 and +/-0.5 The structure: 11=21=31=41=51=0.7. 62=72=82=92=10,2=0.7. 11,3=12,3=13,3=14,3=15,3=0.7. 14=54=64=10,4=11,4=15,4= 0.3 and 0.7 for Me1 and Me2 respectively. 14=64=11,4= 0.3 and 0.7 and 54=10,4=15,4= -0.3 and -0.7 for Mf1 and Mf2 respectively. 1 x5 x6 x7 x8 x9 x10 x11 x12 2 3 x13 x14 Theoretical Models: Te and Tf Nature of misspecification: The method factor is omitted x15 The error terms are not included in the figures Table 2: F0 ADF as a function of kurtosis for all models *) Model\4 2 3 5 8 10 12 18 22 28 **) %F0 Ta1 0.115 0.111 0.103 0.094 0.089 0.084 0.073 0.067 0.060 47.4% Ta2 0.203 0.169 0.138 0.115 0.105 0.098 0.082 0.073 0.066 67.5% Ta3 0.746 0.612 0.462 0.345 0.297 0.262 0.196 0.168 0.140 81.3% Tb Tc1 0.571 0.503 0.405 0.315 0.276 0.246 0.187 0.162 0.134 76.5% 0.108 0.098 0.082 0.066 0.058 0.052 0.040 0.034 0.028 73.7% Tc2 0.382 0.380 0.180 0.117 0.095 0.078 0.054 0.044 0.035 90.8% Tc3 Td1 Td2 Te1 Te2 Tf1 Tf2 0.629 0.424 0.230 0.136 0.107 0.088 0.058 0.047 0.037 94.2% 0.954 0.842 0.678 0.529 0.463 0.412 0.312 0.269 0.223 76.6% 1.599 1.491 1.052 0.739 0.619 0.533 0.378 0.317 0.256 83.9% 0.090 0.083 0.072 0.059 0.053 0.048 0.037 0.032 0.027 70% 0.670 0.400 0.220 0.130 0.110 0.087 0.057 0.047 0.036 94.6% 0.170 0.150 0.110 0.085 0.073 0.064 0.046 0.039 0.032 81% 0.970 0.490 0.250 0.140 0.110 0.091 0.059 0.048 0.037 96% *)E.g., Tc2 will mean that we use the theoretical model Tc for the true model Mc2 **) Reduction in F0 (in %) when the kurtosis increases from 2 to 28 26 Figure 1: F0 ADF as a function of kurtosis for models Ta1, Ta2, Ta3. 0,8 0,7 F-value Fo 0,6 0,5 Ta1 Ta2 Ta3 0,4 0,3 0,2 0,1 0 0,00 0 2 45,006 8 10,00 10 12 15,00 14 16 1820,00 20 22 25,00 24 26 28 30,00 30 4.alph4 order alpha4 Figure 2: F0 ADF as a function of kurtosis for models Tc1, Tc2, Tc3. 0,7 0,6 Fo 0,5 Tc1 0,4 Tc2 0,3 Tc3 0,2 0,1 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 alpha4 alpha4 27 Figure 3: F0 ADF as a function of kurtosis for models Td1, Td2. 1,8 1,6 1,4 Fo 1,2 1 Td1 0,8 Td2 0,6 0,4 0,2 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 alph4 alpha4 Appendix A The difference (    ) Table A1 shows F0 for model Ta1 for ADF as a function of kurtosis for Ma when the difference (    ) is being fixed at the value it takes for 4 = 10. F0 drops from 0.1198 to 0.0619, which is about 48%. One can compare the results here with the results in table 2 where (    ) is not fixed. Obvious the impact from (    ) will be almost negligible compared to the effect from the weight matrix. Table A1 4 2 3 5 8 10 12 18 22 28 F0 0.1198 0.1134 0.1042 0.0942 0.0889 0.0843 0.0737 0.0683 0.0619 28 Appendix B Olsen et. al (2002) tested a nine items measurement model for the constructs ambivalence, satisfaction and loyalty. The kurtosis of the measured variables ranged from 2.0 to 5.84. The sample size was 1194. The model was estimated both with ML and ADF. The ADF chi-square was 19.76 (df = 24). The model was also estimated using normal scores (univariate kurtosis ranging from 2.4 to 3.0), and the ADF chi-square now became 32.86 (df = 24).

2. Kurtosis and the Asymptotic Covariance Matrix

Related documents

Products

Support

2. Kurtosis and the Asymptotic Covariance Matrix

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib