On the generalizability of factors: The influence of changing contexts

Methods of Psychological Research Online 2001, Vol.6, No.1 Internet: http://www.mpr-online.de Institute for Science Education © 2001 IPN Kiel On the generalizability of factors: The influence of changing contexts of variables on different methods of factor extraction André Beauducel1 Abstract The influence of changing contexts of variables on results is often mentioned as a main problem of exploratory factor analysis limiting the generalizability of factors. In the present study, the influence of changing contexts of variables on results of different methods of factor extraction (principal component analysis, principal axis factor analysis, alpha factoring, and maximum likelihood factor analysis) was investigated by means of artificial data. In the first simulation study four factor solutions with pronounced simple structure were created on the basis of artificial data both with 200 and 1000 cases. These four factor solutions represented the context of variables in which a factor was identified. In the second simulation study, a context of variables was created, which completely dissolved one of the four factors composed by four marker variables in the previous study. These data were then analyzed by means of principal component analysis, principal axis factor analysis, alpha factor analysis, and maximum likelihood factor analysis. The factor was less dissolved in principal axis factor analysis, alpha factor analysis, and maximum likelihood factor analysis than in principal component analysis. Moreover, a slight overextraction may also be favorable for the identification of a dissolved factor. On the basis of the results, some recommendations were given in order to perform factor extraction in a way which maximizes the generalizability of factors. Keywords: Generalizability, principal component analysis, factor analysis 1 Author's address: André Beauducel, Technische Universität Dresden, Mommsenstr. 13, 01062 Dresden, Germany (beauduce@rcs.urz.tu-dresden.de) 70 1. MPR-Online 2001, No. 1 Introduction The problem of the influence of variable selection on results in exploratory factor analysis has been discussed several times (e.g. Block, 1995; Brocke, 2000; Guilford, 1975; Holz-Ebeling, 1995; Saucier, 1997). Cattell (1988) criticized the use of factor analysis in what he called private universes of variables, i.e. a context of variables without any marker variables from prior research. In order to avoid what one could call private factors, it is important to bring factors into a context of variables, which relates them to existing knowledge. However, the question whether a factor can be replicated within another context of variables depends on theoretical decisions concerning the selection of variables and on the sensitivity of factor analysis to changing contexts of variables as a methodological question. It is generally accepted that the results of factor analysis depend on the variables selected (e.g. Block, 1995), but the question whether different ways to perform factor analysis lead to different sensitivities to changing contexts of variables was rarely investigated. In order to close this gap, in the present study the sensitivity of different methods of factor extraction to changing contexts of variables was investigated. Thus, different methods of factor extraction were compared with respect to the generalizability of factors. This topic was probably rarely investigated because of the opinion that different methods of factor extraction, like principal component analysis of the unreduced correlation matrix (PCA) and principal axis factor analysis of the reduced correlation matrix (PAF) are similar and that differences between these methods only occur in cases of overextraction (e.g. Velicer & Jackson, 1990). However, Snook and Gorsuch (1989) and Widaman (1993) demonstrate that PCA and PAF differ also when the correct number of factors is extracted and that PCA leads to biased factor loadings. Moreover, Widaman (1993) demonstrated that factor loadings based on PAF were more generalizable than loadings based on PCA. The generalizability of factors was also addressed in Kaiser and Caffrey (1965), who developed Alpha factor analysis (AF) as a method of factor extraction, which was aimed to reach maximal generalizability of factors. Kaiser and Caffrey (1965) as well as Kaiser and Derflinger (1990) consider that it is most important in factor analysis to generalize across variables. If a factor has high generalizability it would probably be less sensitive to changing contexts of variables, since it might be represented in different sets of variables. However, no direct empirical results concerning the comparison of AF to other methods of factor extraction with regard to the generalizability of factors are available. A. Beauducel: On the generalizability of factors... 71 Of course, it would never be possible to reach a complete independence of factors from the context of variables, since factors just represent the most important relations between a given set of variables. However, one should differentiate between the dependence of factors and variables which is due to theoretical relations and the robustness or generalizability of results as a methodological question. Only the latter was addressed here. However, when generalizability of factors suffers from methodological problems, which may, for example, be caused by suboptimal methods of factor extraction, also the theoretical generalizability of constructs which are indicated by the factors would suffer. Therefore, it seems important to avoid problems of generalizability due to suboptimal methods of data analysis. In the present study, the influence of the context of variables on results in factor analysis was discussed with reference to the research process: Typically, a researcher would establish some factor on the basis of a specific, more or less clearly defined set of marker variables (i.e. a more or less private universe of variables). Then, other researchers would perhaps replicate parts of it in other contexts of variables. This does not imply that the original context of variables must be preferred over the others. All different sets of variables may be justified on theoretical grounds. However, it would be interesting to know how robust a factor can be when different sets of variables are used and whether the robustness depends also on the method of factor extraction. This type of generalizability will be investigated in the present study. The effect of changing contexts of variables was explored by means of the following procedure: First a factor was well established in a favorable context of variables. It is assumed that the optimal condition for the identification of a factor would be: high correlations between the variables representing a common factor and low correlations between the variables representing different factors. Then, the high correlating marker variables of the well established factor were embedded in a less favorable context of variables and it was explored whether the factor could still be identified. The most critical situation concerning changing contexts of variables would be that a factor which was perfectly defined by marker variables in one context is then more or less dissolved in another context of variables. This dissolution could occur in a context of variables with considerable overlap between variables representing different factors. More specifically, the unfavorable condition for the identification of a factor which is marked by highly correlated variables would be: High correlations of the marker variables of the factor with the marker variables of other factors. In addition, the marker variables of the other factors should have high intercorrelations when they belong to the same factor and low 72 MPR-Online 2001, No. 1 correlations when they belong to different factors. Thus, the factor of interest will overlap considerably with at least two factors which do not overlap with each other. This could lead to some dissolution of the factor of interest. The aim of the study was to explore how well different methods of factor extraction can identify a factor which was first clearly identified in a favorable context of variables under these unfavorable conditions. As methods of factor extraction PCA and PAF were considered because of their frequent use in psychometric research. In addition, AF was considered in the present study because it was developed in order to produce factors with optimal generalizability in the sense of Cronbachs alpha. It was therefore interesting to investigate whether AF leads to factors which were less dependent on changing contexts of variables. In contrast to the psychometric inference of AF, Maximum Likelihood Factor Analysis (ML) was originally developed as a method for the statistical inference of factors (Lawley & Maxwell, 1963; Jöreskog, 1967). It was investigated whether the statistical inference of ML or the psychometric inference of AF provides better identification of a factor in changing contexts of variables. Therefore, ML was also included in the present analyses. The focus of the present study was on the different methods of factor extraction, so that other aspects of factor analysis could not be investigated here. However, the number of factors to extract should be determined precisely, because otherwise the information on the performance of the extraction procedures might be distorted. There is no simple convention on the number of factors to extract (e.g. Gorsuch, 1983). However, the simulation studies, in which several methods of factor extraction have been compared, indicated that parallel analysis (Horn, 1965) is superior to conventional methods like the eigenvalue-greater-one rule (or Kaiser-Guttman rule) and Cattells (1966) screetest (Zwick & Velicer, 1986; Hubbard & Allen, 1987; Velicer, Eaton & Fava, 2000). However, also some problems with parallel analysis have been demonstrated (Glorfeld, 1995; Turner, 1998). Moreover, it has been argued, that one can have some confidence to the results, when the Kaiser-Guttman rule and the scree-test converge (Buley, 1995). Therefore, in the present study the Kaiser-Guttman rule, the scree-test, and parallel analysis will be used to determine the number of factors to extract. Parallel analysis will be based on PCA eigenvalues, since this is the way it is often performed (e.g. Zwick & Velicer, 1986; Velicer, Eaton & Fava, 2000). For ML a !²-test for the significance of residuals has been proposed (e.g. Lawley & Maxwell, 1963). Although the !²-test is interesting it occasionally seems to overestimate A. Beauducel: On the generalizability of factors... 73 the number of factors to extract (e.g. Harris & Harris, 1971; Schönemann & Wang, 1972). Gorsuch (1983) proposed that the !²-test may be used as an upper bound to the number of factors. Since ML will be employed in the present study the corresponding !²-test will also be performed. Even when still some research is necessary concerning parallel analysis and other criteria to determine the number of factors (Turner, 1998; Zosky & Jurs, 1996) it is assumed that the present combination of criteria will yield sufficient information for the present purposes. As methods of factor rotation Varimax (Kaiser, 1958) and Oblimin (Jennrich & Sampson, 1966) were used in the present context. Since the purpose of the present study was the sensitivity of different methods of factor extraction, only these two methods of factor rotation were considered here because of their frequent use in research. Of course, the influence of changing contexts of variables on factor rotation would need more detailed consideration in further research. 2. Simulation Study 1: Data set with pronounced simple structure The aim of simulation study 1 was to demonstrate one factor within a clear simple structure of other factors. This factor would serve as a basis for further analyses. 2.1. Method Here and in the following, variables were created and analyzed with SPSS for Windows 9.0 (1999) software through aggregation of z-standardized, normally distributed random variables. Different correlations between variables can be produced by different weights for the common variables (e.g. Knuth, 1981; Schweizer, Boller & Braun, 1996). The aggregates were z-standardized again. First, data sets with a clear oblique four factor simple structure were created on the basis of highly correlated variables. Every factor had four marker variables so that for every data set 16 variables were created. Since the aim the first study was only to demonstrate one factor within a clear simple structure of other factors it would also have been possible to create data sets with another number of factors than four. The four marker variables forming a common factor shared one common variable c1 to c4. One variable s1 to s16 formed the specific part of each variable. Here and in the following, the specific variables were not systematically related. Thus, only random cor- 74 MPR-Online 2001, No. 1 relations between the specific variables occurred. Obliqueness was achieved through aggregation of a third random variable c5 which was common to all 16 variables. Since the aggregated variables were z-standardized, the weighting functions for the highly correlating variables H1 to H4 (H stands for high correlations) can be written as: Hi = 1 3 c1 + 1 3 si + 1 3 c 5 , for i = 1 to 4 (1) The expected value of the correlations between the common part c1 of H1 to H4 was (1/"3)² = .33. However, the correlation between these variables was enhanced because of the oblique part c5. The expected value for the complete intercorrelation between two variables forming a common factor was therefore .33 + .33 = .66. H5 to H8 had the common variable c2, H9 to H12 had the common variable c3, and H13 to H16 the common variable c4. Due to the variable c5, which is common to H1 to H16, the correlation between variables which do not form a common factor was approximately .33. Thus, from the aggregation process a correlation matrix resulted, in which the variables forming a common factor were correlated about .66 and the variables representing different factors were correlated about .33. In order to produce stable estimates for the eigenvalues and loadings 100 solutions were created on the basis of 200 cases and 100 solutions on the basis of 1000 cases. However, a high correlation between variables of about .66 is not often reached in the area of psychology. In order to produce additional solutions on the basis of moderate correlations, which are more typical in the field of psychology, the variables M1 to M16 (M for moderate correlations) were created. The variables M1 to M4 were created like the variables H1 to H4 through the aggregation of common and specific random variables. The only difference was that the weight for the specific term was larger (see equation 2). Mi = 1 6 c1 + 2 3 si + 1 6 c 5 , for i = 1 to 4 (2) The variables M5 to M8 were composed of the common variables c2 and c5 and the specific variables s5 to s8. M9 to M12 were composed of the common variables c3 and c5 and the specific variables s9 to s12. M13 to M16 were composed of c4 and c5 and s13 to s16. The expected value for the intercorrelation between the marker variables for a factor was .33. Due to the variable c5, which is common to the variables M1 to M16, the intercorrelation between variables which compose different factors was about .17. Like in the previous data four factors with four marker variables were expected. Since the propor- A. Beauducel: On the generalizability of factors... 75 tion of noise or specific variance was larger than in the previous data set, the estimates for loadings and eigenvalues were based on 500 runs for 200 as well as for 1000 cases. 2.2. Results In order to evaluate the number of factors to extract, the mean eigenvalues both for highly and moderately correlated variables were given in Table 1. Table 1: Mean PCA, PAF, and PCA noise eigenvalues for data with four 4-variable factors N = 200 Factor N = 1000 high corre- moderate high corre- moderate lationsc correlationsd lationsc correlationsd PCA PAFb PCA PAFb PCA PCA PAFb PCA PAFb PCA noisea noisea 1 6.99 6.76 4.05 3.61 1.52 7.02 6.74 4.01 3.43 1.23 2 1.93 1.70 1.57 1.13 1.40 1.76 1.48 1.43 .86 1.17 3 1.69 1.46 1.39 .95 1.32 1.66 1.38 1.34 .77 1.14 4 1.47 1.24 1.23 .78 1.24 1.57 1.29 1.27 .69 1.11 5 .48 .24 .93 .48 1.18 .40 .11 .79 .22 1.08 6 .44 .20 .86 .40 1.12 .38 .09 .76 .18 1.06 7 .40 .17 .80 .34 1.06 .37 .08 .73 .16 1.03 8 .38 .14 .75 .28 1.01 .36 .07 .71 .14 1.01 9 .35 .12 .70 .23 .95 .35 .06 .69 .11 .98 10 .33 .09 .65 .18 .90 .34 .05 .67 .09 .96 Note. In order to save space only the first 10 eigenvalues out of 15/16 eigenvalues were presented in the Table. a For parallel analysis, the PCA noise eigenvalues can only be compared with the PCA eigenvalues. b The eigenvalues of AF factors were ommited here, since they were similar to the PAF eigenvalues. c The mean eigenvalues for high correlations were based on 100 runs. d The mean eigenvalues for low correlations were based on 500 runs. According to the Kaiser-Guttman rule, the scree-test, and parallel-analysis (based on PCA eigenvalues) four factors should be extracted in the data sets based on highly correlated variables. Moreover, for these data sets the !²-test indicated the following: The mean of p values was .51 (SD= .29, with three out of 100 solutions with p # .05) for the solutions based on 200 cases and .47 (SD= .31, with ten out of 100 solutions with p # 76 MPR-Online 2001, No. 1 .05) for the solutions based on 1000 cases. Since the !²-test for the number of factors tends to overextraction (e.g. Harris & Harris, 1971) it was not surprising that it was significant in ten of the 100 solutions indicating that more than four factors should be extracted in these solutions. However, the remaining criteria converged in that four factors should be extracted. In the data sets with moderately correlated variables, the Kaiser-Guttman rule and the scree-test indicated that four factors should be extracted both for data sets based on 200 and on 1000 cases. Parallel analysis indicated that three factors should be extracted in the data sets based on 200 cases and that four factors should be extracted in the data sets based on 1000 cases. The mean of the p values in the !²-test was .58 (SD= .27, with 14 out of 500 solutions with p # .05) for the solutions based on 200 cases and .52 (SD= .30, with 26 out of 500 solutions with p # .05) for the solutions based on 1000 cases. Thus, the convergence of the criteria for the number of factors to extract was less pronounced in the solutions based on low correlations. There was convergence with regard to the four factor solutions in the data sets based on high correlations and moderate convergence on four factor solutions for the data sets based on moderate correlations. Therefore, the means and standard deviations of the loadings for the four-factor PCA and PAF solutions based on 200 cases were presented in Table 2. Since the magnitude of salient and non salient loadings was nearly the same for every group of marker variables, only the loadings of the first variable of every group were presented in Table 2. A. Beauducel: On the generalizability of factors... 77 Table 2: Mean loadings for Varimax-solution with 16 variables based on 200 cases high intercorrelations, 100 runs PCA-factors Varimax Factor 1 PAF-factors Factor 2 Factor 3 Factor 4 Factor 1 Factor 2 Factor 3 Factor 4 H1 .81 (.03) .16 (.05) .17 (.05) .18 (.05) .76 (.04) .17 (.05) .18 (.05) .19 (.05) H5 .17 (.05) .82 (.02) .17 (.05) .16 (.04) .18 (.05) .76 (.03) .18 (.05) .17 (.04) H9 .17 (.05) .17 (.05) .82 (.03) .17 (.05) .17 (.05) .18 (.05) .76 (.03) .18 (.04) H13 .17 (.05) .17 (.05) .17 (.05) .81 (.03) .18 (.05) .18 (.05) .18 (.05) .76 (.04) Oblimin* Factor 1 Factor 2 Factor 3 Factor 4 Factor 1 Factor 2 Factor 3 Factor 4 H1 .85 (.04) .00 (.06) .01 (.06) .01 (.05) .80 (.05) .00 (.05) .01 (.06) .01 (.05) H5 .01 (.06) .86 (.04) .00 (.05) .00 (.05) .01 (.06) .80 (.05) .00 (.05) .00 (.05) H9 .00 (.06) .00 (.05) .85 (.04) .01 (.05) .00 (.05) .01 (.05) .80 (.05) .01 (.05) H13 .00 (.06) .01 (.05) .01 (.06) .85 (.04) .00 (.05) .01 (.05) .00 (.05) .80 (.05) Moderate intercorrelations, 500 runs PCA-factors Varimax Factor 1 PAF-factors Factor 2 Factor 3 Factor 4 Factor 1 Factor 2 Factor 3 Factor 4 M1 .67 (.12) .11 (.10) .11 (.10) .12 (.10) .53 (.12) .12 (.08) .13 (.08) .13 (.08) M5 .11 (.10) .66 (.12) .11 (.10) .11 (.10) .13 (.08) .53 (.13) .13 (.08) .13 (.08) M9 .11 (.10) .11 (.10) .67 (.12) .11 (.09) .13 (.08) .12 (.08) .53 (.12) .13 (.08) M13 .11 (.10) .11 (.10) .11 (.10) .67 (.13) .13 (.08) .12 (.08) .13 (.08) .54 (.13) Oblimin* Factor 1 Factor 2 Factor 3 Factor 4 Factor 1 Factor 2 Factor 3 Factor 4 M1 .67 (.15) .03 (.11) .03 (.11) .04 (.11) .53 (.15) .04 (.08) .04 (.09) .03 (.09) M5 .04 (.11) .66 (.15) .03 (.11) .03 (.11) .02 (.08) .55 (.18) .04 (.09) .04 (.09) M9 .03 (.11) .03 (.10) .67 (.14) .03 (.10) .04 (.09) .03 (.09) .54 (.15) .04 (.09) M13 .03 (.11) .03 (.10) .03 (.11) .67 (.15) .03 (.09) .03 (.08) .03 (.08) .54 (.15) Notes. Only the loadings of the first variable of every block of four variables were presented. The largest difference between mean loadings within groups of items representing the same factor was .03. The standard deviations were given in brackets. Loadings $ .30 were given in bold face. * Oblimin was performed with % = 0, the factor pattern was presented here. The Fishers Z transformed, averaged and retransformed mean intercorrelation between Oblimin rotated factors was .41 for PCA and .46 for PAF solutions. The standard deviations of the correlations were .05 both for the PCA and PAF solutions. 78 MPR-Online 2001, No. 1 The simple structure of the Varimax- and Oblimin-solutions was very pronounced both in the solutions based on high and moderate correlations. This indicates that indeed four factor solutions were created by the aggregation procedure described above. Of course, the salient loadings were considerably lower in the solutions based on moderate correlations (especially for the PAF-factors). Since there was some obliqueness in the data, there were some secondary loadings in the Varimax solutions which disappeared in the factor pattern of the Oblimin solutions. For the highly correlated variables the mean loadings of the PCA and PAF four factor solutions based on 1000 cases were the same as for the 100 solutions based on 200 cases. Only the standard deviations of the loadings were smaller in the solutions based on 1000 cases (maximum SD=.07). The mean main and secondary loadings of the AFand ML-solutions were so close to those of the PAF-solutions, that they needed not to be presented here. For the AF- and ML-solutions based on high correlations and 200 cases the mean of the main loadings was .76 (SD=.03) for Varimax and .80 (SD=.05) for Oblimin. The mean of the non-salient Varimax loadings of the AF- and MLsolutions was .18 (SD=.05) and zero (SD=.05) for Oblimin loadings. The loadings of the AF- and ML-solutions based on high correlations and 1000 cases had the same means and smaller standard deviations (maximum SD=.02). For the solutions based on moderate correlations the mean of the main loadings both of the AF- and ML-solutions based on 200 cases was .53 (SD=.18) for Varimax and .54 (SD=.15) for Oblimin. The mean of the non-salient Varimax loadings of the AF- and ML-solutions was .13 (SD=.08) for Varimax- and .03 (SD=.09) for Oblimin-solutions. The loadings of the AF- and MLsolutions based on 1000 cases had slightly larger means for salient loadings and smaller standard deviations (maximum SD=.07). Overall, for the solutions based on moderate correlations the simple structure was a bit more pronounced when based on 1000 cases compared to the simple structure based on 200 cases which was presented in Table 2. The main result from this simulation was that four factors with a pronounced simple structure could be created by the aggregation of random variables based on the weights in Equation 1 for the variables with high intercorrelations and Equation 2 for the variables with moderate intercorrelations. This result corresponds to the establishment of a factor in one specific context of variables. In the next step it was investigated, how a factor, which was established in this favorable context could be replicated in another, unfavorable context and to what extent the identification of the factor depends on the method of factor extraction. A. Beauducel: On the generalizability of factors... 3. 79 Simulation Study 2: Data set with reduced simple structure The first factor which could be demonstrated in simulation study 1 was embedded within another context of variables in simulation study 2. 3.1. Method First the aggregation procedure for the variables with high intercorrelations is described. Four marker variables were created by means of the same weights and variables as in Equation 1. Thus, the variables H1 to H4 were created in the same way as the marker variables in the previous data sets (see Equation 1). It has already been demonstrated that H1 to H4 had enough common variance (an intercorrelation of about .66) to form a factor with marked main loadings (see Table 2), even in an oblique context. The first set of four new context variables with high intercorrelations NH5 to NH8 (N for new, H for high correlations), was created by aggregation of one variable c2 forming the common part of these variables, one variable s5 to s8 forming the specific part, and the variable c1, which also corresponds to a common part of the variables H1 to H4. The weights for the variables NH5 to NH8 were: NH i = 1 6 c2 + 1 6 si + 2 3 c1 , for i = 5 to 8 (3) The intercorrelations between these variables were large, their expected value was 1/6 + 2/3 = .83. The expected value of the correlations of the variables NH5 to NH8 with the variables H1 to H4 was 1/"3 * "(2/3) = .47. The next set of four new context variables NH9 to NH12 was similar to the variables NH5 to NH8. The only differences were that the common variable c3 was used, that the variables s9 to s12 were used for the specific part, and that, instead of the common variable c1, the variable c5 was used (see equation 4). NH i = 1 6 c3 + 1 6 si + 2 3 c5 , for i = 9 to 12 (4) Like c1, the variable c5 is common to the variables H1 to H4 (see equation 2). The intercorrelations of NH9 to NH12 were approximately the same as the intercorrelations of NH5 to NH8. The correlations of NH9 to NH12 with H1 to H4 were approximately the same as for the variables NH5 to NH8, i.e. about .47. The correlations of the variables NH9 to NH12 with the variables NH5 to NH8 were about zero, since they shared no com- 80 MPR-Online 2001, No. 1 mon variables in the aggregation process. Thus, two sets of uncorrelated new context variables were created, both highly correlated with H1 to H4. The third set of new context variables NH13 to NH16 was created by aggregation of one common variable c4, one specific variable s13 to s16, and the common variables c1 and c5. The weights for these variables were: NH i = 1 7 c4 + 2 7 si + 1 7 c1 + 1 7 c5 , for i = 13 to 16 (5) The expected value of the intercorrelations of the variables NH13 to NH16 was 3 * 1/7 = .43. The correlations of NH13 to NH16 with H1 to H4 were about .44. The correlations of NH13 to NH16 with NH5 to NH12 were about .31. All new context variables NH5 to NH16 had considerable correlations with H1 to H4, so that the factor marked by H1 to H4 could be dissolved within this context. The retransformed mean of the Fishers Z transformed correlations between variables of the same group was .73. Correlations of these magnitude are rarely reached in psychology. Therefore, as in the first simulation study, additional data sets with moderate correlations were created. The variables with moderate correlations were created like the variables with high correlations, but in every set of aggregates the weight of the specific variables was larger than in the corresponding aggregates with high intercorrelations. Thus, the variables M1 to M4 were created as in equation 2. The first set of four new context variables with moderate intercorrelations NM5 to NM8 (N for new, M for moderate correlations), was created by aggregation of one variable c2 forming the common part of these variables, one variable s5 to s8 forming the specific part, and the variable c1, which corresponds to a common part of the variables M1 to M4. The weights for the variables NM5 to NM8 were: 1 2 2 NM i = c 2 + s i + c1 , 3 3 3 for i = 5 to 8 (6) The intercorrelations between NM5 to NM8 were larger than .33, the expected value of the intercorrelations of M1 to M4, their expected value was 1/9 + 4/9 = .56. The expected value of the correlations of the variables NM5 to NM8 with the variables M1 to M4 was 1/"6 * "(4/9) = .27. The next set of four new context variables NM9 to NM12 was similar to the variables NM5 to NM8. The only differences were, that the common variable c3 was used, that the variables s9 to s12 were used for the specific part and that, instead of the common variable c1, the variable c5 was used (see equation 7). A. Beauducel: On the generalizability of factors... 81 1 2 2 NM i = c 3 + s i + c 5 , 3 3 3 for i = 9 to 12 (7) Like c1, the variable c5 was common to the variables M1 to M4 (see equation 2). The intercorrelations of NM9 to NM12 were approximately the same as the intercorrelations of NM5 to NM8. The third set of new context variables NM13 to NM16 was created by aggregation of one common variable c4, one specific variable s13 to s16, and the common variables c1 and c5. The weights for these variables were: NM i = 1 19 c4 + 4 19 si + 1 19 c1 + 1 19 c 5 , for i = 13 to 16 (8) The expected value of the intercorrelations of the variables NM13 to NM16 was 3 * 1/19 = .16. The correlations of NM13 to NM16 with M1 to M4 were about .19. The correlations of NM13 to NM16 with NM5 to NM12 were about .15. All new context variables NM5 to NM16 were correlated lowly or moderately with M1 to M4, so that the factor marked by M1 to M4 could be dissolved within this context. The retransformed mean of the Fishers Z transformed correlations between variables of the same group was .44. Correlations of these magnitude are regularly reached in psychology. 3.2. Results The mean PCA eigenvalues for both the analyses with highly and with moderately correlated variables were presented in Table 3. In order to perform parallel analysis, the mean eigenvalues in Table 3 were compared with the mean eigenvalues of noise factors in Table 1. For the solutions based on high correlations and 200 cases, the mean of the second eigenvalue (3.55, second column in Table 3) was larger than the corresponding mean of noise eigenvalues in Table 1 (1.93, second column in Table 1). The mean of the third eigenvalue (1.09, second column in Table 3) was smaller than the corresponding mean of noise eigenvalues (1.69, second column in Table 1). Thus, parallel analysis indicated that two factors should be extracted. In the same way parallel analysis indicated that two factors should be extracted in the solutions based on 1000 cases and high correlations as well as in the solutions based 200 and 1000 cases with moderately correlating variables. 82 MPR-Online 2001, No. 1 Table 3: Mean PCA and PAF eigenvalues for the solutions with reduced simple structure. N = 200 N = 1000 high corre- Moderate high corre- moderate lationsb Correla- lationsb correlationsc tionsc Factor PCA PAFa PCA PAFa PCA PAFa PCA PAFa 1 7.14 6.93 4.39 4.01 7.14 6.88 4.36 3.88 2 3.55 3.45 2.71 2.42 3.53 3.39 2.68 2.31 3 1.09 .75 1.15 .64 1.07 .66 1.05 .40 4 .68 .37 .99 .49 .62 .30 .90 .25 5 .61 .24 .91 .41 .58 .11 .86 .19 6 .54 .18 .83 .34 .55 .08 .81 .16 7 .48 .15 .75 .28 .52 .07 .75 .13 8 .37 .11 .69 .24 .35 .06 .69 .11 9 .32 .09 .63 .19 .33 .05 .66 .10 10 .28 .07 .57 .16 .31 .04 .62 .08 Note. In order to save space only the first 10 eigenvalues out of 15/16 eigenvalues were presented in the Table. For noise eigenvalues see Table 1. a The eigenvalues of AF factors were ommited here, since they were similar to the PAF eigenvalues. b The mean eigenvalues for high correlations were based on 100 runs. c The mean eigenvalues for low correlations were based on 500 runs. According to the Kaiser-Guttman rule three factors should be extracted in the solutions based on high and moderate correlations. The scree-test indicated in these solutions, that two or three factors should be extracted. For all solutions based on high correlations the !²-test indicated that more than two factors should be extracted (df=89; p<.001 for every solution). Thus, the two factor solutions which were indicated by parallel analysis were within the upper bound of the !²-test. For the solutions based on moderate correlations the !²-test was significant in 24 percent of the solutions based on 200 cases and in 97 percent of the solutions based on 1000 cases (df=89; p<.05). This indicates, that for some of the solutions based on moderate correlations the upper bound for the number of factors to extract was already reached with two factors. Since parallel analysis indicated that two factors should be extracted, first the loadings of the two factor solutions based on 200 cases were presented in Table 4. A. Beauducel: On the generalizability of factors... 83 Table 4: Means and standard deviations of Varimax- and Oblimin-loadings for two factor PCA-, PAF-, AF-, and ML-solutions (200 cases). high intercorrelations, 100 runs Vari- Pincipal component Principal axis factoring Alpha factor analysis Maximum likelihood max analysis (PCA) (PAF) (AF) factor analysis (ML) Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 H1 .58 (.07) .57 (.08) .56 (.07) .56 (.07) .56 (.07) .55 (.08) .55 (.07) .54 (.07) NH1 .91 (.01) -.03 (.07) .89 (.02) -.02 (.06) .88 (.02) -.03 (.07) .91 (.01) -.01 (.06) NH5 -.03 (.07) .90 (.01) -.02 (.07) .89 (.02) -.03 (.07) .87 (.02) .00 (.06) .91 (.02) NH9 .42 (.08) .43 (.08) .39 (.07) .40 (.07) .41 (.08) .42 (.08) .37 (.07) .37 (.07) Obli- Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 H1 .51 (.10) .50 (.11) .49 (.11) .49 (.11) .48 (.12) .49 (.11) .48 (.09) .46 (.11) NH1 .93 (.02) .06 (.18) .91 (.04) .05 (.19) .89 (.04) .07 (.19) .93 (.04) .08 (.18) NH5 .06 (.18) .93 (.02) .05 (.18) .91 (.03) .07 (.19) .89 (.05) .07 (.17) .93 (.04) NH9 .39 (.11) .36 (.11) .36 (.11) .33 (.10) .37 (.12) .35 (.11) .33 (.10) .29 (.11) * min Mean inter-factor correlations: .21 (.15) .20 (.19) .23 (.13) .17 (.23) moderate intercorrelations, 500 runs Vari- Pincipal component Principal axis factoring Alpha factor analysis Maximum likelihood max analysis (PCA) (PAF) (AF) factor analysis (ML) Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 M1 .43 (.10) .43 (.10) .39 (.09) .39 (.09) .39 (.10) .40 (.10) .38 (.09) .38 (.09) NM1 .78 (.08) -.03 (.07) .73 (.09) -.01 (.07) .71 (.11) -.02 (.08) .74 (.09) .00 (.07) NM5 -.03 (.07) .78 (.08) -.01 (.07) .73 (.09) -.02 (.07) .71 (.10) .00 (.07) .74 (.10) NM9 .27 (.10) .26 (.10) .23 (.08) .22 (.09) .25 (.09) .24 (.09) .22 (.08) .22 (.09) Obli- Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 Factor 1 Factor 2 M1 .38 (.12) .38 (.13) .34 (.11) .35 (.11) .34 (.11) .35 (.12) .34 (.10) .33 (.10) NM1 .80 (.10) -.13 (.11) .76 (.11) -.11 (.10) .73 (.12) -.12 (.11) .76 (.11) -.11 (.09) NM5 -.13 (.10) .80 (.10) -.12 (.09) .75 (.12) -.12 (.10) .73 (.12) -.11 (.09) .76 (.11) NM9 .25 (.12) .23 (.12) .20 (.09) .20 (.09) .22 (.11) .21 (.10) .20 (.09) .19 (.09) * min Mean inter-factor correlations: .25 (.05) .29 (.05) .27 (.06) .29 (.05) Notes. Only the loadings of the first variable of every block of four variables were presented. The largest difference between mean loadings within groups of items representing the same factor was .03. The standard deviations were given in brackets. Loadings $ .30 were given in bold face. * Oblimin was performed with % = 0, the factor pattern was presented here. The inter-factor correlations were Fishers Z transformed, averaged and than retransformed. 84 MPR-Online 2001, No. 1 The only important difference for the solutions based on 1000 cases was that the standard deviations of the loadings were smaller (the maximal standard deviation of the loadings based on 1000 cases was .04 for the Varimax-solutions and .13 for Obliminsolutions). Since the differences between the mean loadings of every block of four variables defined by the same equation were very small (maximum: .03), only the mean loadings for the first of the four variables defined by the same equation were presented in Table 4. It can be seen in Table 4 that for all methods of factor extraction the Varimax- and Oblimin-solutions yielded a factor which was primarily marked by the variables H1 to H4 in the solutions based on high correlations and M1 to M4 in the solutions based on low correlations. The second factor was primarily marked by the variables NH5 to NH8 in the solutions based on high correlations and by NM5 to NM8 in the solutions based on moderate correlations. The factor which had so clear main loadings in Table 2 was completely dissolved in these two factors, i.e. the marker variables of this factor (H1 to H4 and M1 to M4) load equally high on both factors in PCA-, PAF-, AF- and ML-solutions. Moreover, orthogonal simple structure in these solutions was low and could not be substantially improved by Oblimin-rotation. The factor which could be demonstrated clearly within the context of the first simulation study could not be demonstrated in the present context. Because the scree-test and Kaiser-Guttman rule indicated that three factors should be extracted, the three factor solutions were presented in Table 5. For the three factor solutions based on highly correlated variables and 200 cases the !²-value was significant in 50 percent of the solutions (df=75; p<.05). For the solutions based on highly correlated variables and 1000 cases the !²-value was significant in all solutions (df=75; p<.05). This demonstrated the sensitivity of the !²-test to sample size and that for the solutions based on 1000 cases at least four factors would constitute the upper bound for the number of factors. For the three factor solutions based on moderately correlated variables and 200 cases the !²-value was significant in 3 percent of the solutions (df=75; p<.05). For the three factor solutions based on moderately correlated variables and 1000 cases the !²-value was significant in 22 percent of the solutions (df=75; p<.05). This again demonstrated the sensitivity of the !²-test to sample size. The means and standard deviations of the Varimax-loadings for the PCA-, PAF-, AF-, and ML- three factor solutions based on 200 cases were presented in Table 5. The solutions based on 1000 cases had nearly the same mean loadings. The only important A. Beauducel: On the generalizability of factors... 85 difference for the solutions based on 1000 cases was that the standard deviations of the loadings were smaller (for Varimax: Maximum SD= .09; for Oblimin: Maximum SD= .16). Therefore the solutions based on 1000 cases were not presented here. * Factor 2 .90 (.02) .21 (.07) Factor 2 .45 (.07) Factor 1 .53 (.06) .90 (.01) -.03 (.05) .21 (.07) Factor 1 .44 (.07) .93 (.02) -.09 (.04) .01 (.09) Varimax H1 NH1 NH5 NH9 Oblimin* H1 NH1 NH5 NH9 .36 (.14) M1 .08 (.17) .79 (.14) -.10 (.07) .34 (.13) Factor 2 .15 (.14) .77 (.13) Factor 3 .23 (.09) .00 (.04) .12 (.08) .46 (.39) .04 (.13) .03 (.12) .14 (.21) Factor 3 .49 (.32) .47 (.11) .24 (.14) .08 (.11) -.08 (.08) .73 (.18) .25 (.13) Factor 1 .15 (.09) -.02 (.06) .72 (.12) .32 (.10) .21 (.19) .12 (.08) Factor 1 .57 (.04) .44 (.17) .00 (.08) -.01 (.07) Factor 3 .75 (.10) .91 (.03) .34 (.07) .19 (.09) .00 (.05) Factor 1 .21 (.06) -.03 (.05) .88 (.02) .45 (.06) Factor 1 Factor 3 .68 (.06) .19 (.04) .19 (.04) .35 (.08) Factor 3 .32 (.15) .07 (.11) .73 (.18) -.08 (.07) .25 (.13) Factor 2 .15 (.09) .72 (.12) -.02 (.06) .32 (.10) Factor 2 .20 (.16) .00 (.07) .92 (.03) -.01 (.08) .35 (.09) Factor 2 .21 (.06) .88 (.02) -.04 (.05) .45 (.07) Factor 2 PAF .57 (.04) .45 (.16) .02 (.08) -.07 (.04) .91 (.03) .38 (.07) Factor 1 .22 (.06) -.03 (.05) .88 (.02) .48 (.06) Factor 1 .31 (.40) .04 (.12) .04 (.11) .23 (.22) Factor 3 .32 (.20) .16 (.10) .16 (.09) .30 (.18) Factor 3 .45 (.12) .24 (.12) .08 (.11) -.09 (.08) .72 (.19) .29 (.13) Factor 1 .16 (.09) -.03 (.06) .71 (.13) .35 (.10) Factor 1 moderate intercorrelations, 500 runs .65 (.10) -.01 (.05) .00 (.05) .32 (.11) Factor 3 .57 (.05) .23 (.04) .23 (.04) .46 (.10) Factor 3 .29 (.14) .07 (.12) .72 (.16) -.09 (.09) .29 (.13) Factor 2 .15 (.09) .71 (.10) -.03 (.07) .34 (.10) Factor 2 .20 (.14) .00 (.07) .91 (.03) -.06 (.06) .39 (.08) Factor 2 .22 (.06) .87 (.02) -.04 (.05) .48 (.07) Factor 2 AF .33 (.23) .06 (.12) .06 (.14) .17 (.19) Factor 3 .33 (.19) .16 (.11) .16 (.12) .25 (.16) Factor 3 .65 (.10) -.01 (.05) -.01 (.05) .24 (.14) Factor 3 .57 (.06) .22 (.04) .22 (.04) .41 (.09) Factor 3 .48 (.12) .22 (.16) .07 (.12) -.07 (.08) .72 (.42) .22 (.13) Factor 1 .15 (.09) -.03 (.07) .72 (.14) .30 (.10) Factor 1 .57 (.07) .37 (.24) .01 (.08) -.04 (.04) .91 (.04) .23 (.09) Factor 1 .21 (.06) -.04 (.05) .87 (.02) .39 (.07) .35 (.16) .07 (.11) .72 (.53) -.07 (.08) .22 (.15) Factor 2 .15 (.09) .71 (.19) -.02 (.07) .30 (.10) Factor 2 .26 (.22) .00 (.08) .92 (.03) -.04 (.05) .24 (.12) Factor 2 .21 (.07) .87 (.02) -.04 (.05) .38 (.07) Factor 2 ML .31 (.78) .05 (.14) .05 (.13) .28 (.58) Factor 3 .30 (.31) .18 (.14) .16 (.12) .35 (.23) Factor 3 .60 (.11) .00 (.05) .01 (.06) .50 (.15) Factor 3 .53 (.08) .26 (.05) .26 (.05) .58 (.10) Factor 3 MPR-Online 2001, No. 1 Factor 1 Oblimin was performed with % = 0, the factor pattern was presented here. The inter-factor correlations were Fishers Z transformed, averaged and than retransformed. representing the same factor was .03.The standard deviations were given in brackets and loadings $ .30 in bold face. Notes. Only the loadings of the first variable of every block of four variables were presented. The largest difference between mean loadings within groups of items .18 (.05) .28 (.09) Factor 2 Mean inter-factor correlations: .08 (.16) Factor 1 Oblimin* NM9 .15 (.13) NM9 .79 (.15) -.03 (.07) NM5 -.10 (.08) .78 (.12) NM5 .40 (.12) M1 NM1 NM1 .39 (.12) Factor 1 Varimax -.03 (.07) Factor 2 .49 (.04) Factor 3 .43 (.06) .16 (.07) Factor 2 Mean inter-factor correlations: .00 (.09) .93 (.02) -.08 (.05) -.04 (.05) .52 (.07) PCA N = 200 high intercorrelations, 100 runs Table 5: Means and standard deviations of Varimax- and Oblimin-loadings for three factor PCA-, PAF-, AF-, and ML-solutions 86 A. Beauducel: On the generalizability of factors... 87 An effect of the method of factor extraction was observed for the variables H1 to H4 as well as for M1 to M4. These variables were factorially complex in all solutions, but in PCA, they had their highest loadings on the first two factors, whereas in PAF and AF they had very similar loadings on all three factors. In ML these variables had their highest loading on the third factor. Thus, with respect to the loadings H1 to H4 and M1 to M4 the ML-solutions were somehow the reverse of the PCA-solutions. The factor composed by the variables H1 to H4 and M1 to M4 was less dissolved in the ML-solutions. However, in the ML-solutions based on moderate correlations large standard deviations occurred for the salient loadings of the third factor (see Table 5). However, the standard deviations of the salient loadings on the third factor were only .08 in the ML-solutions based on 1000 cases. This indicates that a large number of cases is needed for ML. The same pattern was observed in the Oblimin-solutions (see Table 5). However, the lowest loadings of the variables H1 to H4 and M1 to M4 were slightly reduced. For the PCA- and AF-solutions the loadings of these variables were reduced especially on the third factor. For the ML-solutions based on high correlations, the loadings of the variables H1 to H4 on the first and second factor were reduced. Therefore, the differences between ML-, PCA-, and AF-solutions were more pronounced in the Oblimin- than in the Varimax-solutions. However, the effect was less pronounced in the solutions based on moderate correlations. The standard deviations of the loadings were larger in the Oblimin-solutions than in the corresponding Varimax-solutions (see Table 5), especially in the ML-solutions based on moderate correlations. However, again the standard deviations of loadings were reduced in the solutions based on 1000 cases (for Oblimin: Maximum SD= .16), while the means of the loadings remained nearly unchanged. Therefore, the Oblimin-solutions based on 1000 cases were not presented here. When four factors were extracted, which should be considered as a slight overextraction, the fourth factor was marked only by the variables H1 to H4 for the solutions based on high correlations and M1 to M4 for the solutions based on moderate correlations in PAF-, AF-, and ML-solutions (see Table 6). In the PCA-solutions, only a single or two variables loaded on the fourth factor, so that this factor could not be interpreted within PCA-solutions (both for 200 and 1000 cases). The fact, that the fourth factor was loaded by only one or two variables in PCA lead to large standard deviations of the loadings especially on the fourth factor, since the loadings were near to zero in some solutions and near to .90 in other solutions. In order to demonstrate, that the large standard deviations of the loadings in the PCA-solutions were not due to the small sample size (as they were in the three factor ML-solutions) and that they were due to 88 MPR-Online 2001, No. 1 the oscillations in main loadings, the four factor solutions were presented for the sample of 1000 cases. The Varimax-solutions were presented first (see Table 6). Table 6: Mean loadings for Varimax-solutions with four 4-variable factors with reduced simple structure (1000 cases) high intercorrelations, 100 runs PCA-factors Factor 1 H1 NH1 NH5 PAF-factors Factor 2 Factor 3 Factor 4 Factor 1 Factor 2 Factor 3 Factor 4 .50 (.05) .49 (.05) .29 (.10) .23 (.18) .40 (.02) .39 (.02) .41 (.03) .43 (.05) .90 (.01) -.03 (.02) .15 (.03) .11 (.04) .88 (.01) -.04 (.02) .23 (.02) .11 (.03) -.03 .91 (.01) .14 (.02) .11 (.04) -.04 (.02) .88 (.01) .23 (.02) .11 (.03) .20 (.04) .19 (.04) .55 (.23) .41 (.31) .20 (.03) .58 (.04) .10 (.04) (.02) NH9 .19 (.03) AF-factors H1 .40 (.02) .39 (.02) ML-factors .41 (.03) .43 (.05) .40 (.02) .41 (.03) .43 (.05) NH1 .88 (.01) -.04 (.02) .23 (.02) .11 (.03) .88 (.01) -.04 (.02) .23 (.02) .11 (.03) NH5 -.04 (.02) .88 (.01) .23 (.02) .11 (.03) -.04 (.02) .88 (.01) .23 (.02) .11 (.03) NH9 .20 (.03) .58 (.04) .10 (.04) .58 (.04) .10 (.04) .19 (.03) .20 (.03) .39 (.02) .19 (.03) moderate intercorrelations, 500 runs PCA-factors PAF-factors Factor 1 Factor 2 Factor 3 Factor 4 Factor 1 Factor 2 Factor 3 Factor 4 .41 (.05) .41 (.05) .09 (.10) .19 (.12) .28 (.04) .28 (.05) .23 (.09) .33 (.12) NM1 .79 (.04) -.02 (.03) .09 (.04) .06 (.04) .72 (.05) -.03 (.03) .15 (.05) .12 (.07) NM5 -.03 (.03) .79 (.04) .08 (.04) .06 (.06) -.03 (.03) .72 (.06) .15 (.05) .12 (.07) NM9 .12 (.05) .44 (.40) .36 (.51) .12 (.03) .33 (.13) .13 (.12) M1 .12 (.05) .12 (.04) AF-factors M1 ML-factors .19 (.09) .31 (.11) .27 (.06) .22 (.12) .35 (.26) NM1 .72 (.05) -.02 (.03) .13 (.06) .12 (.06) .70 (.13) -.03 (.03) .15 (.06) .12 (.08) NM5 -.03 (.03) .72 (.05) .13 (.05) .12 (.06) -.03 (.03) .71 (.11) .15 (.07) .12 (.09) NM9 .12 (.03) .31 (.16) .15 (.14) .33 (.28) .12 (.12) .30 (.05) .30 (.05) .13 (.04) .12 (.06) .27 (.06) .12 (.04) Notes. Only the loadings of the first variable of every block of four variables were presented. The largest difference between mean loadings within groups of items representing the same factor was .04. The standard deviations were given in brackets. Loadings $ .30 were given in bold face. A. Beauducel: On the generalizability of factors... 89 The standard deviations of the main loadings on the third and especially on the fourth factor were large, because the fourth factor was mostly a singlet factor in PCA. The variables H1 to H4 and M1 to M4 load on the first two PCA-factors and load on all factors in the PAF-, AF-, and ML-solutions. Even when H1 to H4 and M1 to M4 were more complex in the PAF-, AF-, and ML-solutions than in the PCA-solutions, it should be recognized that in these solutions the fourth factor was constituted only by the four variables which represent the dissolved factor. In this sense a part of the variance of the variables H1 to H4 and M1 to M4 could be isolated with PAF, AF, and ML, but not with PCA. The effect was more pronounced in the Oblimin-solutions (see Table 7): There was a perfect simple structure in the Oblimin-rotated PAF-, AF-, and ML-solutions, while the PCA-solutions had a low simple structure. Thus, it is interesting to note that for PAF, AF and ML, the extraction of four factors, which might be considered as a slight overextraction lead to the identification of the previously dissolved factor. However, in the solutions based on moderate correlations the standard deviations of the loadings of the ML-solutions were again very large, indicating some instability of these solutions. 90 MPR-Online 2001, No. 1 Table 7: Mean loadings for Oblimin-solutions with four 4-variable factors with reduced simple structure (1000 cases) high intercorrelations, 100 runs PCA-factors PAF-factors Factor 1 Factor 2 Factor 3 Factor 4 Factor 1 Factor 2 Factor 3 Factor 4 .42 (.07) .41 (.07) .15 (.12) .11 (.19) .04 (.04) .03 (.04) .03 (.05) .74 (.08) NH1 .93 (.01) -.08 (.02) .00 (.03) .01 (.04) .90 (.02) -.01 (.02) .01 (.03) .01 (.04) NH5 -.08 (.02) .93 (.01) .00 (.03) .01 (.04) -.01 (.03) .90 (.03) .01 (.02) .01 (.05) NH9 -.01 (.06) -.01 (.06) .53 (.34) .34 (.40) .01 (.04) .00 (.04) .62 (.07) .04 (.08) H1 Mean inter-factor correlations: Factor2 .17 (.09) Factor3 .42 (.07) .39 (.08) .60 (.02) Factor4 .35 (.11) .31 (.13) H1 .04 (.05) .04 (.04) .03 (.05) .74 (.09) NH1 .90 (.02) -.01 (.02) .01 (.02) NH5 -.01 (.03) .90 (.03) .01 (.02) NH9 .01 (.04) .00 (.04) .62 (.07) .70 (.08) .25 (.22) .72 (.09) .26 (.23) .04 (.04) .03 (.04) .01 (.04) .90 (.02) .00 (.02) .01 (.03) .01 (.04) .01 (.05) -.01 (.03) .90 (.03) .01 (.02) .01 (.05) .04 (.08) .01 (.04) .00 (.04) .62 (.07) .04 (.08) .44 (.12) AF-factors .50 (.02) ML-factors .03 (.05) .74 (.08) Mean inter-factor correlations: Factor2 .60 (.03) Factor3 .76 (.07) .42 (.15) .59 (.10) Factor4 .64 (.06) .09 (.16) .50 (.03) .63 (.05) .08 (.15) .77 (.06) .45 (.13) .50 (.05) moderate intercorrelations, 500 runs PCA-factors Factor 1 Factor 2 PAF-factors Factor 3 Factor 4 Factor 1 Factor 2 Factor 3 Factor 4 M1 .36 (.06) .36 (.06) .03 (.12) .11 (.11) .14 (.08) .14 (.08) .08 (.10) .34 (.19) NM1 .80 (.05) -.09 (.03) .02 (.04) -.01 (.04) .74 (.09) -.03 (.04) .00 (.05) .01 (.07) NM5 -.09 (.03) .80 (.05) .02 (.04) .00 (.04) -.03 (.04) .74 (.11) .01 (.05) .01 (.07) NM9 .01 (.06) .01 (.07) .46 (.42) .29 (.59) .02 (.05) .02 (.05) .33 (.19) .09 (.15) Factor2 .16 (.02) Mean inter-factor correlations: .52 (.14) Factor3 .28 (.06) .27 (.06) Factor4 .22 (.09) .21 (.09) .53 (.12) .23 (.19) .57 (.14) .26 (.21) M1 .20 (.10) .20 (.10) .06 (.09) .27 (.19) .14 (.09) .14 (.08) .10 (.13) NM1 .74 (.08) -.04 (.04) .01 (.05) NM5 -.05 (.04) .74 (.09) .01 (.05) .01 (.06) .73 (.15) -.03 (.04) .00 (.06) .01 (.07) .01 (.06) -.03 (.04) .73 (.15) .01 (.05) .01 (.07) NM9 .02 (.05) .02 (.05) .30 (.25) .12 (.18) .02 (.05) .02 (.05) .33 (.46) .09 (.14) .19 (.06) AF-factors .44 (.14) ML-factors .34 (.54) Mean inter-factor correlations: Factor2 .45 (.17) Factor3 .53 (.13) .32 (.18) .43 (.21) Factor4 .49 (.16) .21 (.17) .41 (.15) .51 (.16) .20 (.20) .52 (.17) .40 (.21) .44 (.17) Notes. Only the loadings of the first variable of every block of four variables were presented. The largest difference between mean loadings within groups of items representing the same factor was .06. The standard deviations were given in brackets. Loadings $ .30 were given in bold face. Oblimin was performed with % = 0, the factor pattern was presented here. The inter-factor correlations were Fishers Z transformed, averaged and than retransformed. A. Beauducel: On the generalizability of factors... 91 It should also be noted that many inter-factor correlations in the PCA-, PAF-, AF-, and ML- Oblimin-solutions were substantially different (see Table 7), indicating an influence of the method of factor extraction on the inter-factor correlations. These differences might have consequences when hierarchical factor analyses are performed. 4. Discussion The effect of changing contexts of variables on results in exploratory factor analysis was analyzed with respect to different methods of factor extraction (PCA, PAF, AF, and ML) and rotation (Varimax and Oblimin). In the first simulation study, data sets with pronounced simple structure and four marker variables for every factor were created in order to identify a factor in a favorable context. The pronounced simple structure could be demonstrated for data sets based on 200 and 1000 cases and with high as well as moderate intercorrelations between the variables. In the second simulation study, data sets with lower simple structure were created in order to produce a less favorable context for one of the factors previously identified in the favorable context. Thus, in the second study, four of the marker variables were created exactly like in the first study in order to represent one factor of the first study. These four variables were now embedded in a less favorable context for the identification of the factor they marked. Thus, they were analyzed together with a set of 12 substantially correlated variables. Again the analyses were performed on the basis of 200 and 1000 cases and with high and moderate correlations between variables. Parallel analysis indicated that two factors should be extracted. In the two factor solutions for PCA, PAF, AF, and ML, the four variables, which marked a factor in the first study, load on both factors equally. Thus, irrespective of the method of factor extraction, a factor which represents these variables could not be demonstrated. However, parallel analysis has been shown to underextract when there is a large first eigenvalue (Turner, 1998), as was the case in the present data. Therefore, and because the screetest and the Kaiser-Guttman rule indicated that three factors should be extracted, also the three factor solutions were compared for PCA, PAF, AF, and ML. While the four marker variables load quite equally on all three Varimax-rotated factors in PCA, PAF, and AF, they had marked loadings on the third factor in ML in the solutions based on high intercorrelations. In the Oblimin-solutions the four variables loaded on all three factors in the PAF-solutions and on the first two factors for the PCA- and AF-solutions. They had marked loadings only on the third factor in the ML-solutions based in high 92 MPR-Online 2001, No. 1 correlations. Moreover, in the ML-Oblimin-solutions the loading pattern of the variables representing the dissolved factor was the reverse of the pattern in the PCA-Obliminsolutions, the patterns of the PAF- and AF-Oblimin-solutions lying in between. This corresponds to the results of Velicer (1977) who found that PCA and ML were most dissimilar. However, in the Oblimin-solutions based on low intercorrelations, large standard deviations of loadings, indicating some instability of the solutions, occurred for the salient ML-loadings on the third factor. Then, four factors were extracted, which must be regarded as a slight overextraction, given the course of the eigenvalues. In PCA-solutions the fourth factor was loaded by only one or two variables and was thus not interpretable (Velicer & Fava, 1987). The fact that the most important difference between PCA and the remaining methods of factor extraction occured in the case of a slight overextraction is in line with Velicer and Jackson (1990). In the Varimax-rotated PAF-, AF-, and ML-solutions, the four marker variables of the previous data sets load substantially on all factors, but the fourth factor was composed only by these variables. In the Oblimin-rotated solutions, a perfect simple structure emerged for PAF, AF, and ML, but not for PCA where the fourth factor was mostly a singlet. According to Gorsuch (1983) the tendency of PCA to produce singlet factors is related to the fact that PCA tends to reproduce all the variances of the variables. A singlet reproduces the unity in the diagonal of the correlation matrix for a variable. Therefore, the tendency of PCA to reproduce the units in the diagonal may reduce its sensitivity to the more relevant aspects of a structure which are typically not in the diagonal. Especially when the correlations are low the units in the diagonal may cover the more relevant information contained in the non-diagonal elements of a correlation matrix. The result of lower generalizability of PCA parameters is in line with Widaman (1993) who found more pronounced changes in PCA loadings compared to PAF loadings when the context of variables was changed. With regard to the generalizability of factor solutions, the following conclusions should be drawn with caution, since replication with different types of data sets would be necessary: 1) In contexts of extreme overlap between variables a factor might most easily be dissolved with PCA and less easily with ML, with PAF and AF lying in between. However, especially with low sample sizes and moderate intercorrelations between variables, ML solutions might have some instability. Therefore, PAF and AF may be regarded as A. Beauducel: On the generalizability of factors... 93 optimal compromise between sensitivity to dissolved factors on the one hand and stability of results on the other. 2) Slight overfactoring can, in combination especially with PAF and AF, and in large samples perhaps also with ML further help to (re-)produce dissolved factors. In combination with PCA, overfactoring may lead more easily to factors which are loaded by only one or two variables and which can therefore not be interpreted. Thus, when factors are to be identified in changing contexts of variables a good strategy seems to perform several tests for the number of factors to extract and to make a choice, which avoids underextraction. This strategy should by preference be combined with PAF or AF, or perhaps with ML, but not with PCA. Perhaps, the bootstrap approach in factor analysis (e.g. Thompson, 1988) could help to increase the robustness of ML-solutions across cases, while maintaining its sensitivity to the structure of variables. Of course, the results and conclusions of the present study are only a first step to deal with a problem which seems to be important for research based on exploratory factor analysis. They should therefore not be generalized to much. In further studies it should be investigated, how far the present results can be generalized to a larger number of factors and different types of overlap between variables, different numbers of variables per factor, and different component saturations. In addition, it would be interesting to investigate the role of different (especially oblique) methods of factor rotation with respect to the identification of factors in changing contexts of variables. Concerning oblique rotation it should be noted that many inter-factor correlations were different between the PCA-, PAF-, AF-, and ML-solutions. This would be of importance for hierarchical factor analysis. On the basis of the present results, it could be expected that quite different hierarchical factor solutions may emerge when different methods of factor extraction are employed. The differences between hierarchical solutions based on different methods of factor extraction should be explored in further research. For the more general discussion it should be retained, that in the present study, the results of factor analysis were not simply the effect of variable selection as was suggested by Block (1995). When the above-mentioned strategy of the avoidance of underfactoring combined with PAF-, AF-, or ML-extraction was used, an impressive independence of the results from the context of variables was obtained. Therefore, it might be possible to 94 MPR-Online 2001, No. 1 replicate factors even in unfavorable contexts of variables, which could in turn enhance the importance of the results obtained with exploratory factor analysis. References [1] Block, J. (1995). A contrarian view of the five-factor approach to personality description. Psychological Bulletin, 117(2), 187-215. [2] Brocke, B. (2000). Das bemerkenswerte Comeback der Differentiellen Psychologie: Glückwünsche und Warnungen vor einem neuen Desaster [The remarkable comeback of personality psychology: Congratulations and warnings against a new desaster]. Zeitschrift für Differentielle und Diagnostische Psychologie, 21(1), 5-30. [3] Buley, J. L. (1995). Evaluating Exploratory Factor Analysis: Which Initial-Extraction Techniques Provide the Best Factor Fidelity. Human Communication Research, 21, 478-493. [4] Cattell, R.B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276. [5] Cattell, R.B. (1988). The meaning and strategic use of factor analysis. In J.R. Nesselroade & R.B. Cattell (Eds.), Handbook of multivariate experimental psychology (2nd Edition) (pp. 131-201). New York: Plenum Press. [6] Glorfeld, L.W. (1995). An improvement on Horn's parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55, 377-393. [7] Gorsuch, R.L. (1983). Factor Analysis. Hillsdale, NJ: Lawrence Erlbaum. [8] Guilford, J.P. (1975). Factors and Factors of Personality. Psychological Bulletin, 82(5), 802-814. [9] Harris, M.L. & Harris, C.W. (1971). A factor analytic interpretation strategy. Educational and Psychological Measurement, 31(3), 589-606. [10] Holz-Ebeling, F. (1995). Faktorenanalysen und was dann? Zur Frage der Validität von Dimensionsinterpretationen [Factor analysis and then what? The validity of dimensional interpretations]. Psychologische Rundschau, 46, 18-35. [11] Horn, J. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185. A. Beauducel: On the generalizability of factors... 95 [12] Hubbard, R. & Allen, S.J. (1987). An empirical comparison of alternative methods for principal components extraction. Journal of Business Research, 15, 173-190. [13] Jennrich, R.I. & Sampson, P.F. (1966). Rotation for simple loadings. Psychometrika, 31, 313-323. [14] Jöreskog, K.G. (1967). A computer program for unrestricted maximum likelihood factor analysis. Research Bulletin, Princeton, Educational Testing Service. [15] Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187-200. [16] Kaiser, H.F. & Caffrey, J. (1965). Alpha factor analysis. Psychometrika, 30(1), 1-14. [17] Kaiser, H.F. & Derflinger, G. (1990). Some contrasts between maximum likelihood factor analysis and alpha factor analysis. Applied psychological measurement, 14(1), 29-32. [18] Knuth, D.E. (1981). The art of computer programming. Vol. II (2nd ed.). Reading, Mass.: Addison-Wesley. [19] Lawley, D.N. & Maxwell , A.E. (1963). Factor analysis as a statistical method. London: Butterworth. [20] Saucier, G. (1997). Effects of variable selection on the factor structure of person descriptors. Journal of Personality and Social Psychology, 73, 1296-1312. [21] Schönemann, P.H. & Wang, M.M. (1972). Some new results on factor indeterminacy. Psychometrika, 37, 61-91. [22] Schweizer, K., Boller, E., & Braun, G. (1996). Der Einfluß von Klassifikationsverfahren, Stichprobengröße und strukturellen Datenmerkmalen auf die Klassifizierbarkeit von Variablen [The influence of classification procedures, sample size, and structural properties of data on the classification of variables]. MPR-online: Methods for Psychological Research, 1, 93-105. [23] Snook, S.C. & Gorsuch, R.L. (1989). Component analysis versus common factor analysis: A Monte Carlo study. Psychological Bulletin, 106, 148-154. [24] SPSS, Inc. (1999). SPSS for Windows, Release 9.0. Chicago: Author. [25] Thompson, B. (1988). Program FACSTRAP: A program that computes bootstrap estimates of factor structure. Educational and Psychological Measurement, 48, 681686. 96 MPR-Online 2001, No. 1 [26] Turner, N. E. (1998). The Effect of Common Variance and Structure Pattern on Random Data Eigenvalues: Implications for the Accuracy of Parallel Analysis. Educational and Psychological Measurement, 58, 541-568. [27] Velicer, W.F. (1977). An empirical comparison of the similarity of principal component, image, and factor analysis. The Journal of Multivariate Behavioral Research, 12, 3-22. [28] Velicer, W.F. & Fava, J.L. (1987). An evaluation of the effects of variable sampling on component, image, and factor analysis . Multivariate Behavioral Research, 22, 193-209. [29] Velicer, W.F., Eaton, C.A. & Fava, J.L. (2000). Construct explication through factor or component analysis: A review and evaluation of alternative procedures for determining the number of factors or components. In R.D. Goffin & E. Helmes (Eds.). Problems and solutions in human assessment. Honoring Douglas N. Jackson at seventy (pp. 41-72). Norwell, Massachusetts: Kluwer Academic Publishers. [30] Velicer, W.F. & Jackson, D.N. (1990). Component analysis versus common factor analysis: Some issues in selecting an appropriate procedure. Multivariate Behavioral Research, 25, 1-28. [31] Widaman, K.F. (1993). Common factor analysis versus principal component analysis: Differential bias in representing model parameters? Multivariate Behavioral Research, 28, 263-311. [32] Zwick, W.R. & Velicer, W.F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432-442.

On the generalizability of factors: The influence of changing contexts

Related documents

Products

Support

On the generalizability of factors: The influence of changing contexts

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib