On the generalizability of factors: The influence of changing contexts

advertisement
Methods of Psychological Research Online 2001, Vol.6, No.1
Internet: http://www.mpr-online.de
Institute for Science Education
© 2001 IPN Kiel
On the generalizability of factors: The influence of
changing contexts of variables on different methods of
factor extraction
André Beauducel1
Abstract
The influence of changing contexts of variables on results is often mentioned as a
main problem of exploratory factor analysis limiting the generalizability of factors. In
the present study, the influence of changing contexts of variables on results of different
methods of factor extraction (principal component analysis, principal axis factor analysis, alpha factoring, and maximum likelihood factor analysis) was investigated by means
of artificial data. In the first simulation study four factor solutions with pronounced
simple structure were created on the basis of artificial data both with 200 and 1000
cases. These four factor solutions represented the context of variables in which a factor
was identified. In the second simulation study, a context of variables was created, which
completely dissolved one of the four factors composed by four marker variables in the
previous study. These data were then analyzed by means of principal component analysis, principal axis factor analysis, alpha factor analysis, and maximum likelihood factor
analysis. The factor was less dissolved in principal axis factor analysis, alpha factor
analysis, and maximum likelihood factor analysis than in principal component analysis.
Moreover, a slight overextraction may also be favorable for the identification of a dissolved factor. On the basis of the results, some recommendations were given in order to
perform factor extraction in a way which maximizes the generalizability of factors.
Keywords: Generalizability, principal component analysis, factor analysis
1
Author's address: André Beauducel, Technische Universität Dresden, Mommsenstr. 13, 01062 Dresden,
Germany (beauduce@rcs.urz.tu-dresden.de)
70
1.
MPR-Online 2001, No. 1
Introduction
The problem of the influence of variable selection on results in exploratory factor
analysis has been discussed several times (e.g. Block, 1995; Brocke, 2000; Guilford, 1975;
Holz-Ebeling, 1995; Saucier, 1997). Cattell (1988) criticized the use of factor analysis in
what he called ”private universes” of variables, i.e. a context of variables without any
marker variables from prior research. In order to avoid what one could call “private factors”, it is important to bring factors into a context of variables, which relates them to
existing knowledge. However, the question whether a factor can be replicated within
another context of variables depends on theoretical decisions concerning the selection of
variables and on the sensitivity of factor analysis to changing contexts of variables as a
methodological question.
It is generally accepted that the results of factor analysis depend on the variables selected (e.g. Block, 1995), but the question whether different ways to perform factor
analysis lead to different sensitivities to changing contexts of variables was rarely investigated. In order to close this gap, in the present study the sensitivity of different methods of factor extraction to changing contexts of variables was investigated. Thus, different methods of factor extraction were compared with respect to the generalizability of
factors. This topic was probably rarely investigated because of the opinion that different
methods of factor extraction, like principal component analysis of the unreduced correlation matrix (PCA) and principal axis factor analysis of the reduced correlation matrix
(PAF) are similar and that differences between these methods only occur in cases of
overextraction (e.g. Velicer & Jackson, 1990). However, Snook and Gorsuch (1989) and
Widaman (1993) demonstrate that PCA and PAF differ also when the correct number
of factors is extracted and that PCA leads to biased factor loadings. Moreover, Widaman (1993) demonstrated that factor loadings based on PAF were more generalizable
than loadings based on PCA.
The generalizability of factors was also addressed in Kaiser and Caffrey (1965), who
developed Alpha factor analysis (AF) as a method of factor extraction, which was aimed
to reach maximal generalizability of factors. Kaiser and Caffrey (1965) as well as Kaiser
and Derflinger (1990) consider that it is most important in factor analysis to generalize
across variables. If a factor has high generalizability it would probably be less sensitive
to changing contexts of variables, since it might be represented in different sets of variables. However, no direct empirical results concerning the comparison of AF to other
methods of factor extraction with regard to the generalizability of factors are available.
A. Beauducel: On the generalizability of factors...
71
Of course, it would never be possible to reach a complete independence of factors
from the context of variables, since factors just represent the most important relations
between a given set of variables. However, one should differentiate between the dependence of factors and variables which is due to theoretical relations and the robustness or
generalizability of results as a methodological question. Only the latter was addressed
here. However, when generalizability of factors suffers from methodological problems,
which may, for example, be caused by suboptimal methods of factor extraction, also the
theoretical generalizability of constructs which are indicated by the factors would suffer.
Therefore, it seems important to avoid problems of generalizability due to suboptimal
methods of data analysis.
In the present study, the influence of the context of variables on results in factor
analysis was discussed with reference to the research process: Typically, a researcher
would establish some factor on the basis of a specific, more or less clearly defined set of
marker variables (i.e. a more or less “private universe” of variables). Then, other researchers would perhaps replicate parts of it in other contexts of variables. This does
not imply that the original context of variables must be preferred over the others. All
different sets of variables may be justified on theoretical grounds. However, it would be
interesting to know how robust a factor can be when different sets of variables are used
and whether the robustness depends also on the method of factor extraction. This type
of generalizability will be investigated in the present study.
The effect of changing contexts of variables was explored by means of the following
procedure: First a factor was well established in a favorable context of variables. It is
assumed that the optimal condition for the identification of a factor would be: high correlations between the variables representing a common factor and low correlations between the variables representing different factors. Then, the high correlating marker
variables of the well established factor were embedded in a less favorable context of
variables and it was explored whether the factor could still be identified. The most critical situation concerning changing contexts of variables would be that a factor which was
perfectly defined by marker variables in one context is then more or less dissolved in
another context of variables. This dissolution could occur in a context of variables with
considerable overlap between variables representing different factors. More specifically,
the unfavorable condition for the identification of a factor which is marked by highly
correlated variables would be: High correlations of the marker variables of the factor
with the marker variables of other factors. In addition, the marker variables of the other
factors should have high intercorrelations when they belong to the same factor and low
72
MPR-Online 2001, No. 1
correlations when they belong to different factors. Thus, the factor of interest will overlap considerably with at least two factors which do not overlap with each other. This
could lead to some dissolution of the factor of interest. The aim of the study was to explore how well different methods of factor extraction can identify a factor which was
first clearly identified in a favorable context of variables under these unfavorable conditions.
As methods of factor extraction PCA and PAF were considered because of their frequent use in psychometric research. In addition, AF was considered in the present study
because it was developed in order to produce factors with optimal generalizability in the
sense of Cronbach’s alpha. It was therefore interesting to investigate whether AF leads
to factors which were less dependent on changing contexts of variables. In contrast to
the psychometric inference of AF, Maximum Likelihood Factor Analysis (ML) was
originally developed as a method for the statistical inference of factors (Lawley & Maxwell, 1963; Jöreskog, 1967). It was investigated whether the statistical inference of ML
or the psychometric inference of AF provides better identification of a factor in changing contexts of variables. Therefore, ML was also included in the present analyses.
The focus of the present study was on the different methods of factor extraction, so
that other aspects of factor analysis could not be investigated here. However, the number of factors to extract should be determined precisely, because otherwise the information on the performance of the extraction procedures might be distorted. There is no
simple convention on the number of factors to extract (e.g. Gorsuch, 1983). However,
the simulation studies, in which several methods of factor extraction have been compared, indicated that parallel analysis (Horn, 1965) is superior to conventional methods
like the eigenvalue-greater-one rule (or Kaiser-Guttman rule) and Cattell’s (1966) screetest (Zwick & Velicer, 1986; Hubbard & Allen, 1987; Velicer, Eaton & Fava, 2000).
However, also some problems with parallel analysis have been demonstrated (Glorfeld,
1995; Turner, 1998). Moreover, it has been argued, that one can have some confidence
to the results, when the Kaiser-Guttman rule and the scree-test converge (Buley, 1995).
Therefore, in the present study the Kaiser-Guttman rule, the scree-test, and parallel
analysis will be used to determine the number of factors to extract. Parallel analysis will
be based on PCA eigenvalues, since this is the way it is often performed (e.g. Zwick &
Velicer, 1986; Velicer, Eaton & Fava, 2000).
For ML a !²-test for the significance of residuals has been proposed (e.g. Lawley &
Maxwell, 1963). Although the !²-test is interesting it occasionally seems to overestimate
A. Beauducel: On the generalizability of factors...
73
the number of factors to extract (e.g. Harris & Harris, 1971; Schönemann & Wang,
1972). Gorsuch (1983) proposed that the !²-test may be used as an upper bound to the
number of factors. Since ML will be employed in the present study the corresponding
!²-test will also be performed. Even when still some research is necessary concerning
parallel analysis and other criteria to determine the number of factors (Turner, 1998;
Zosky & Jurs, 1996) it is assumed that the present combination of criteria will yield
sufficient information for the present purposes.
As methods of factor rotation Varimax (Kaiser, 1958) and Oblimin (Jennrich &
Sampson, 1966) were used in the present context. Since the purpose of the present study
was the sensitivity of different methods of factor extraction, only these two methods of
factor rotation were considered here because of their frequent use in research. Of course,
the influence of changing contexts of variables on factor rotation would need more detailed consideration in further research.
2.
Simulation Study 1: Data set with pronounced simple
structure
The aim of simulation study 1 was to demonstrate one factor within a clear simple
structure of other factors. This factor would serve as a basis for further analyses.
2.1.
Method
Here and in the following, variables were created and analyzed with SPSS for Windows 9.0 (1999) software through aggregation of z-standardized, normally distributed
random variables. Different correlations between variables can be produced by different
weights for the common variables (e.g. Knuth, 1981; Schweizer, Boller & Braun, 1996).
The aggregates were z-standardized again. First, data sets with a clear oblique four factor simple structure were created on the basis of highly correlated variables. Every factor had four marker variables so that for every data set 16 variables were created. Since
the aim the first study was only to demonstrate one factor within a clear simple structure of other factors it would also have been possible to create data sets with another
number of factors than four.
The four marker variables forming a common factor shared one common variable c1
to c4. One variable s1 to s16 formed the specific part of each variable. Here and in the
following, the specific variables were not systematically related. Thus, only random cor-
74
MPR-Online 2001, No. 1
relations between the specific variables occurred. Obliqueness was achieved through aggregation of a third random variable c5 which was common to all 16 variables. Since the
aggregated variables were z-standardized, the weighting functions for the highly correlating variables H1 to H4 (“H” stands for high correlations) can be written as:
Hi =
1
3
c1 +
1
3
si +
1
3
c 5 , for i = 1 to 4
(1)
The expected value of the correlations between the common part c1 of H1 to H4 was
(1/"3)² = .33. However, the correlation between these variables was enhanced because
of the oblique part c5. The expected value for the complete intercorrelation between two
variables forming a common factor was therefore .33 + .33 = .66. H5 to H8 had the
common variable c2, H9 to H12 had the common variable c3, and H13 to H16 the common
variable c4. Due to the variable c5, which is common to H1 to H16, the correlation between variables which do not form a common factor was approximately .33. Thus, from
the aggregation process a correlation matrix resulted, in which the variables forming a
common factor were correlated about .66 and the variables representing different factors
were correlated about .33. In order to produce stable estimates for the eigenvalues and
loadings 100 solutions were created on the basis of 200 cases and 100 solutions on the
basis of 1000 cases.
However, a high correlation between variables of about .66 is not often reached in the
area of psychology. In order to produce additional solutions on the basis of moderate
correlations, which are more typical in the field of psychology, the variables M1 to M16
(“M” for moderate correlations) were created. The variables M1 to M4 were created like
the variables H1 to H4 through the aggregation of common and specific random variables. The only difference was that the weight for the specific term was larger (see equation 2).
Mi =
1
6
c1 +
2
3
si +
1
6
c 5 , for i = 1 to 4
(2)
The variables M5 to M8 were composed of the common variables c2 and c5 and the
specific variables s5 to s8. M9 to M12 were composed of the common variables c3 and c5
and the specific variables s9 to s12. M13 to M16 were composed of c4 and c5 and s13 to s16.
The expected value for the intercorrelation between the marker variables for a factor
was .33. Due to the variable c5, which is common to the variables M1 to M16, the intercorrelation between variables which compose different factors was about .17. Like in the
previous data four factors with four marker variables were expected. Since the propor-
A. Beauducel: On the generalizability of factors...
75
tion of noise or specific variance was larger than in the previous data set, the estimates
for loadings and eigenvalues were based on 500 runs for 200 as well as for 1000 cases.
2.2.
Results
In order to evaluate the number of factors to extract, the mean eigenvalues both for
highly and moderately correlated variables were given in Table 1.
Table 1: Mean PCA, PAF, and PCA noise eigenvalues for data with four 4-variable factors
N = 200
Factor
N = 1000
high corre-
moderate
high corre-
moderate
lationsc
correlationsd
lationsc
correlationsd
PCA PAFb PCA PAFb PCA PCA PAFb PCA PAFb PCA
noisea
noisea
1
6.99
6.76
4.05
3.61
1.52
7.02
6.74
4.01
3.43
1.23
2
1.93
1.70
1.57
1.13
1.40
1.76
1.48
1.43
.86
1.17
3
1.69
1.46
1.39
.95
1.32
1.66
1.38
1.34
.77
1.14
4
1.47
1.24
1.23
.78
1.24
1.57
1.29
1.27
.69
1.11
5
.48
.24
.93
.48
1.18
.40
.11
.79
.22
1.08
6
.44
.20
.86
.40
1.12
.38
.09
.76
.18
1.06
7
.40
.17
.80
.34
1.06
.37
.08
.73
.16
1.03
8
.38
.14
.75
.28
1.01
.36
.07
.71
.14
1.01
9
.35
.12
.70
.23
.95
.35
.06
.69
.11
.98
10
.33
.09
.65
.18
.90
.34
.05
.67
.09
.96
Note. In order to save space only the first 10 eigenvalues out of 15/16 eigenvalues were presented in the
Table.
a
For parallel analysis, the PCA noise eigenvalues can only be compared with the PCA eigenvalues.
b
The eigenvalues of AF factors were ommited here, since they were similar to the PAF eigenvalues.
c
The mean eigenvalues for high correlations were based on 100 runs.
d
The mean eigenvalues for low correlations were based on 500 runs.
According to the Kaiser-Guttman rule, the scree-test, and parallel-analysis (based on
PCA eigenvalues) four factors should be extracted in the data sets based on highly correlated variables. Moreover, for these data sets the !²-test indicated the following: The
mean of p values was .51 (SD= .29, with three out of 100 solutions with p # .05) for the
solutions based on 200 cases and .47 (SD= .31, with ten out of 100 solutions with p #
76
MPR-Online 2001, No. 1
.05) for the solutions based on 1000 cases. Since the !²-test for the number of factors
tends to overextraction (e.g. Harris & Harris, 1971) it was not surprising that it was
significant in ten of the 100 solutions indicating that more than four factors should be
extracted in these solutions. However, the remaining criteria converged in that four factors should be extracted.
In the data sets with moderately correlated variables, the Kaiser-Guttman rule and
the scree-test indicated that four factors should be extracted both for data sets based on
200 and on 1000 cases. Parallel analysis indicated that three factors should be extracted
in the data sets based on 200 cases and that four factors should be extracted in the data
sets based on 1000 cases. The mean of the p values in the !²-test was .58 (SD= .27, with
14 out of 500 solutions with p # .05) for the solutions based on 200 cases and .52 (SD=
.30, with 26 out of 500 solutions with p # .05) for the solutions based on 1000 cases.
Thus, the convergence of the criteria for the number of factors to extract was less pronounced in the solutions based on low correlations.
There was convergence with regard to the four factor solutions in the data sets based
on high correlations and moderate convergence on four factor solutions for the data sets
based on moderate correlations. Therefore, the means and standard deviations of the
loadings for the four-factor PCA and PAF solutions based on 200 cases were presented
in Table 2. Since the magnitude of salient and non salient loadings was nearly the same
for every group of marker variables, only the loadings of the first variable of every group
were presented in Table 2.
A. Beauducel: On the generalizability of factors...
77
Table 2: Mean loadings for Varimax-solution with 16 variables based on 200 cases
high intercorrelations, 100 runs
PCA-factors
Varimax Factor 1
PAF-factors
Factor 2
Factor 3
Factor 4
Factor 1
Factor 2
Factor 3
Factor 4
H1
.81 (.03)
.16 (.05)
.17 (.05)
.18 (.05)
.76 (.04)
.17 (.05)
.18 (.05)
.19 (.05)
H5
.17 (.05)
.82 (.02)
.17 (.05)
.16 (.04)
.18 (.05)
.76 (.03)
.18 (.05)
.17 (.04)
H9
.17 (.05)
.17 (.05)
.82 (.03)
.17 (.05)
.17 (.05)
.18 (.05)
.76 (.03)
.18 (.04)
H13
.17 (.05)
.17 (.05)
.17 (.05)
.81 (.03)
.18 (.05)
.18 (.05)
.18 (.05)
.76 (.04)
Oblimin* Factor 1
Factor 2
Factor 3
Factor 4
Factor 1
Factor 2
Factor 3
Factor 4
H1
.85 (.04)
.00 (.06)
.01 (.06)
.01 (.05)
.80 (.05)
.00 (.05)
.01 (.06)
.01 (.05)
H5
.01 (.06)
.86 (.04)
.00 (.05)
.00 (.05)
.01 (.06)
.80 (.05)
.00 (.05)
.00 (.05)
H9
.00 (.06)
.00 (.05)
.85 (.04)
.01 (.05)
.00 (.05)
.01 (.05)
.80 (.05)
.01 (.05)
H13
.00 (.06)
.01 (.05)
.01 (.06)
.85 (.04)
.00 (.05)
.01 (.05)
.00 (.05)
.80 (.05)
Moderate intercorrelations, 500 runs
PCA-factors
Varimax Factor 1
PAF-factors
Factor 2
Factor 3
Factor 4
Factor 1
Factor 2
Factor 3
Factor 4
M1
.67 (.12)
.11 (.10)
.11 (.10)
.12 (.10)
.53 (.12)
.12 (.08)
.13 (.08)
.13 (.08)
M5
.11 (.10)
.66 (.12)
.11 (.10)
.11 (.10)
.13 (.08)
.53 (.13)
.13 (.08)
.13 (.08)
M9
.11 (.10)
.11 (.10)
.67 (.12)
.11 (.09)
.13 (.08)
.12 (.08)
.53 (.12)
.13 (.08)
M13
.11 (.10)
.11 (.10)
.11 (.10)
.67 (.13)
.13 (.08)
.12 (.08)
.13 (.08)
.54 (.13)
Oblimin* Factor 1
Factor 2
Factor 3
Factor 4
Factor 1
Factor 2
Factor 3
Factor 4
M1
.67 (.15)
.03 (.11)
.03 (.11)
.04 (.11)
.53 (.15)
.04 (.08)
.04 (.09)
.03 (.09)
M5
.04 (.11)
.66 (.15)
.03 (.11)
.03 (.11)
.02 (.08)
.55 (.18)
.04 (.09)
.04 (.09)
M9
.03 (.11)
.03 (.10)
.67 (.14)
.03 (.10)
.04 (.09)
.03 (.09)
.54 (.15)
.04 (.09)
M13
.03 (.11)
.03 (.10)
.03 (.11)
.67 (.15)
.03 (.09)
.03 (.08)
.03 (.08)
.54 (.15)
Notes. Only the loadings of the first variable of every block of four variables were presented. The largest
difference between mean loadings within groups of items representing the same factor was .03. The standard deviations were given in brackets. Loadings $ .30 were given in bold face.
*
Oblimin was performed with % = 0, the factor pattern was presented here. The Fisher’s Z transformed,
averaged and retransformed mean intercorrelation between Oblimin rotated factors was .41 for PCA and
.46 for PAF solutions. The standard deviations of the correlations were .05 both for the PCA and PAF
solutions.
78
MPR-Online 2001, No. 1
The simple structure of the Varimax- and Oblimin-solutions was very pronounced
both in the solutions based on high and moderate correlations. This indicates that indeed four factor solutions were created by the aggregation procedure described above.
Of course, the salient loadings were considerably lower in the solutions based on moderate correlations (especially for the PAF-factors). Since there was some obliqueness in the
data, there were some secondary loadings in the Varimax solutions which disappeared in
the factor pattern of the Oblimin solutions.
For the highly correlated variables the mean loadings of the PCA and PAF four factor solutions based on 1000 cases were the same as for the 100 solutions based on 200
cases. Only the standard deviations of the loadings were smaller in the solutions based
on 1000 cases (maximum SD=.07). The mean main and secondary loadings of the AFand ML-solutions were so close to those of the PAF-solutions, that they needed not to
be presented here. For the AF- and ML-solutions based on high correlations and 200
cases the mean of the main loadings was .76 (SD=.03) for Varimax and .80 (SD=.05)
for Oblimin. The mean of the non-salient Varimax loadings of the AF- and MLsolutions was .18 (SD=.05) and zero (SD=.05) for Oblimin loadings. The loadings of the
AF- and ML-solutions based on high correlations and 1000 cases had the same means
and smaller standard deviations (maximum SD=.02). For the solutions based on moderate correlations the mean of the main loadings both of the AF- and ML-solutions based
on 200 cases was .53 (SD=.18) for Varimax and .54 (SD=.15) for Oblimin. The mean of
the non-salient Varimax loadings of the AF- and ML-solutions was .13 (SD=.08) for
Varimax- and .03 (SD=.09) for Oblimin-solutions. The loadings of the AF- and MLsolutions based on 1000 cases had slightly larger means for salient loadings and smaller
standard deviations (maximum SD=.07). Overall, for the solutions based on moderate
correlations the simple structure was a bit more pronounced when based on 1000 cases
compared to the simple structure based on 200 cases which was presented in Table 2.
The main result from this simulation was that four factors with a pronounced simple
structure could be created by the aggregation of random variables based on the weights
in Equation 1 for the variables with high intercorrelations and Equation 2 for the variables with moderate intercorrelations. This result corresponds to the establishment of a
factor in one specific context of variables. In the next step it was investigated, how a
factor, which was established in this favorable context could be replicated in another,
unfavorable context and to what extent the identification of the factor depends on the
method of factor extraction.
A. Beauducel: On the generalizability of factors...
3.
79
Simulation Study 2: Data set with reduced simple structure
The first factor which could be demonstrated in simulation study 1 was embedded
within another context of variables in simulation study 2.
3.1.
Method
First the aggregation procedure for the variables with high intercorrelations is described. Four marker variables were created by means of the same weights and variables
as in Equation 1. Thus, the variables H1 to H4 were created in the same way as the
marker variables in the previous data sets (see Equation 1).
It has already been demonstrated that H1 to H4 had enough common variance (an intercorrelation of about .66) to form a factor with marked main loadings (see Table 2),
even in an oblique context. The first set of four new context variables with high intercorrelations NH5 to NH8 (“N” for new, “H” for high correlations), was created by aggregation of one variable c2 forming the common part of these variables, one variable s5 to
s8 forming the specific part, and the variable c1, which also corresponds to a common
part of the variables H1 to H4. The weights for the variables NH5 to NH8 were:
NH i =
1
6
c2 +
1
6
si +
2
3
c1 ,
for i = 5 to 8
(3)
The intercorrelations between these variables were large, their expected value was 1/6
+ 2/3 = .83. The expected value of the correlations of the variables NH5 to NH8 with
the variables H1 to H4 was 1/"3 * "(2/3) = .47. The next set of four new context variables NH9 to NH12 was similar to the variables NH5 to NH8. The only differences were
that the common variable c3 was used, that the variables s9 to s12 were used for the specific part, and that, instead of the common variable c1, the variable c5 was used (see
equation 4).
NH i =
1
6
c3 +
1
6
si +
2
3
c5 ,
for i = 9 to 12
(4)
Like c1, the variable c5 is common to the variables H1 to H4 (see equation 2). The intercorrelations of NH9 to NH12 were approximately the same as the intercorrelations of
NH5 to NH8. The correlations of NH9 to NH12 with H1 to H4 were approximately the
same as for the variables NH5 to NH8, i.e. about .47. The correlations of the variables
NH9 to NH12 with the variables NH5 to NH8 were about zero, since they shared no com-
80
MPR-Online 2001, No. 1
mon variables in the aggregation process. Thus, two sets of uncorrelated new context
variables were created, both highly correlated with H1 to H4.
The third set of new context variables NH13 to NH16 was created by aggregation of
one common variable c4, one specific variable s13 to s16, and the common variables c1 and
c5. The weights for these variables were:
NH i =
1
7
c4 +
2
7
si +
1
7
c1 +
1
7
c5 ,
for i = 13 to 16
(5)
The expected value of the intercorrelations of the variables NH13 to NH16 was 3 * 1/7
= .43. The correlations of NH13 to NH16 with H1 to H4 were about .44. The correlations
of NH13 to NH16 with NH5 to NH12 were about .31. All new context variables NH5 to
NH16 had considerable correlations with H1 to H4, so that the factor marked by H1 to H4
could be dissolved within this context. The retransformed mean of the Fishers Z transformed correlations between variables of the same group was .73. Correlations of these
magnitude are rarely reached in psychology. Therefore, as in the first simulation study,
additional data sets with moderate correlations were created.
The variables with moderate correlations were created like the variables with high
correlations, but in every set of aggregates the weight of the specific variables was larger
than in the corresponding aggregates with high intercorrelations. Thus, the variables M1
to M4 were created as in equation 2. The first set of four new context variables with
moderate intercorrelations NM5 to NM8 (“N” for new, “M” for moderate correlations),
was created by aggregation of one variable c2 forming the common part of these variables, one variable s5 to s8 forming the specific part, and the variable c1, which corresponds to a common part of the variables M1 to M4. The weights for the variables NM5
to NM8 were:
1
2
2
NM i = c 2 + s i + c1 ,
3
3
3
for i = 5 to 8
(6)
The intercorrelations between NM5 to NM8 were larger than .33, the expected value
of the intercorrelations of M1 to M4, their expected value was 1/9 + 4/9 = .56. The expected value of the correlations of the variables NM5 to NM8 with the variables M1 to
M4 was 1/"6 * "(4/9) = .27. The next set of four new context variables NM9 to NM12
was similar to the variables NM5 to NM8. The only differences were, that the common
variable c3 was used, that the variables s9 to s12 were used for the specific part and that,
instead of the common variable c1, the variable c5 was used (see equation 7).
A. Beauducel: On the generalizability of factors...
81
1
2
2
NM i = c 3 + s i + c 5 ,
3
3
3
for i = 9 to 12
(7)
Like c1, the variable c5 was common to the variables M1 to M4 (see equation 2). The
intercorrelations of NM9 to NM12 were approximately the same as the intercorrelations
of NM5 to NM8. The third set of new context variables NM13 to NM16 was created by
aggregation of one common variable c4, one specific variable s13 to s16, and the common
variables c1 and c5. The weights for these variables were:
NM i =
1
19
c4 +
4
19
si +
1
19
c1 +
1
19
c 5 , for i = 13 to 16
(8)
The expected value of the intercorrelations of the variables NM13 to NM16 was 3 *
1/19 = .16. The correlations of NM13 to NM16 with M1 to M4 were about .19. The correlations of NM13 to NM16 with NM5 to NM12 were about .15. All new context variables
NM5 to NM16 were correlated lowly or moderately with M1 to M4, so that the factor
marked by M1 to M4 could be dissolved within this context. The retransformed mean of
the Fishers Z transformed correlations between variables of the same group was .44.
Correlations of these magnitude are regularly reached in psychology.
3.2.
Results
The mean PCA eigenvalues for both the analyses with highly and with moderately
correlated variables were presented in Table 3. In order to perform parallel analysis, the
mean eigenvalues in Table 3 were compared with the mean eigenvalues of noise factors
in Table 1. For the solutions based on high correlations and 200 cases, the mean of the
second eigenvalue (3.55, second column in Table 3) was larger than the corresponding
mean of noise eigenvalues in Table 1 (1.93, second column in Table 1). The mean of the
third eigenvalue (1.09, second column in Table 3) was smaller than the corresponding
mean of noise eigenvalues (1.69, second column in Table 1). Thus, parallel analysis indicated that two factors should be extracted. In the same way parallel analysis indicated
that two factors should be extracted in the solutions based on 1000 cases and high correlations as well as in the solutions based 200 and 1000 cases with moderately correlating variables.
82
MPR-Online 2001, No. 1
Table 3: Mean PCA and PAF eigenvalues for the solutions with reduced simple structure.
N = 200
N = 1000
high corre-
Moderate
high corre-
moderate
lationsb
Correla-
lationsb
correlationsc
tionsc
Factor
PCA PAFa PCA PAFa PCA PAFa PCA PAFa
1
7.14
6.93
4.39
4.01
7.14
6.88
4.36
3.88
2
3.55
3.45
2.71
2.42
3.53
3.39
2.68
2.31
3
1.09
.75
1.15
.64
1.07
.66
1.05
.40
4
.68
.37
.99
.49
.62
.30
.90
.25
5
.61
.24
.91
.41
.58
.11
.86
.19
6
.54
.18
.83
.34
.55
.08
.81
.16
7
.48
.15
.75
.28
.52
.07
.75
.13
8
.37
.11
.69
.24
.35
.06
.69
.11
9
.32
.09
.63
.19
.33
.05
.66
.10
10
.28
.07
.57
.16
.31
.04
.62
.08
Note. In order to save space only the first 10 eigenvalues out of 15/16 eigenvalues were presented in the
Table. For noise eigenvalues see Table 1.
a
The eigenvalues of AF factors were ommited here, since they were similar to the PAF eigenvalues.
b
The mean eigenvalues for high correlations were based on 100 runs.
c
The mean eigenvalues for low correlations were based on 500 runs.
According to the Kaiser-Guttman rule three factors should be extracted in the solutions based on high and moderate correlations. The scree-test indicated in these solutions, that two or three factors should be extracted. For all solutions based on high correlations the !²-test indicated that more than two factors should be extracted (df=89;
p<.001 for every solution). Thus, the two factor solutions which were indicated by parallel analysis were within the upper bound of the !²-test. For the solutions based on
moderate correlations the !²-test was significant in 24 percent of the solutions based on
200 cases and in 97 percent of the solutions based on 1000 cases (df=89; p<.05). This
indicates, that for some of the solutions based on moderate correlations the upper bound
for the number of factors to extract was already reached with two factors. Since parallel
analysis indicated that two factors should be extracted, first the loadings of the two factor solutions based on 200 cases were presented in Table 4.
A. Beauducel: On the generalizability of factors...
83
Table 4: Means and standard deviations of Varimax- and Oblimin-loadings for two factor PCA-, PAF-, AF-, and ML-solutions (200 cases).
high intercorrelations, 100 runs
Vari-
Pincipal component
Principal axis factoring
Alpha factor analysis
Maximum likelihood
max
analysis (PCA)
(PAF)
(AF)
factor analysis (ML)
Factor 1
Factor 2
Factor 1
Factor 2
Factor 1
Factor 2
Factor 1
Factor 2
H1
.58 (.07)
.57 (.08)
.56 (.07)
.56 (.07)
.56 (.07)
.55 (.08)
.55 (.07)
.54 (.07)
NH1
.91 (.01)
-.03 (.07)
.89 (.02)
-.02 (.06)
.88 (.02)
-.03 (.07)
.91 (.01)
-.01 (.06)
NH5
-.03 (.07)
.90 (.01)
-.02 (.07)
.89 (.02)
-.03 (.07)
.87 (.02)
.00 (.06)
.91 (.02)
NH9
.42 (.08)
.43 (.08)
.39 (.07)
.40 (.07)
.41 (.08)
.42 (.08)
.37 (.07)
.37 (.07)
Obli-
Factor 1
Factor 2
Factor 1
Factor 2
Factor 1
Factor 2
Factor 1
Factor 2
H1
.51 (.10)
.50 (.11)
.49 (.11)
.49 (.11)
.48 (.12)
.49 (.11)
.48 (.09)
.46 (.11)
NH1
.93 (.02)
.06 (.18)
.91 (.04)
.05 (.19)
.89 (.04)
.07 (.19)
.93 (.04)
.08 (.18)
NH5
.06 (.18)
.93 (.02)
.05 (.18)
.91 (.03)
.07 (.19)
.89 (.05)
.07 (.17)
.93 (.04)
NH9
.39 (.11)
.36 (.11)
.36 (.11)
.33 (.10)
.37 (.12)
.35 (.11)
.33 (.10)
.29 (.11)
*
min
Mean inter-factor correlations:
.21 (.15)
.20 (.19)
.23 (.13)
.17 (.23)
moderate intercorrelations, 500 runs
Vari-
Pincipal component
Principal axis factoring
Alpha factor analysis
Maximum likelihood
max
analysis (PCA)
(PAF)
(AF)
factor analysis (ML)
Factor 1
Factor 2
Factor 1
Factor 2
Factor 1
Factor 2
Factor 1
Factor 2
M1
.43 (.10)
.43 (.10)
.39 (.09)
.39 (.09)
.39 (.10)
.40 (.10)
.38 (.09)
.38 (.09)
NM1
.78 (.08)
-.03 (.07)
.73 (.09)
-.01 (.07)
.71 (.11)
-.02 (.08)
.74 (.09)
.00 (.07)
NM5
-.03 (.07)
.78 (.08)
-.01 (.07)
.73 (.09)
-.02 (.07)
.71 (.10)
.00 (.07)
.74 (.10)
NM9
.27 (.10)
.26 (.10)
.23 (.08)
.22 (.09)
.25 (.09)
.24 (.09)
.22 (.08)
.22 (.09)
Obli-
Factor 1
Factor 2
Factor 1
Factor 2
Factor 1
Factor 2
Factor 1
Factor 2
M1
.38 (.12)
.38 (.13)
.34 (.11)
.35 (.11)
.34 (.11)
.35 (.12)
.34 (.10)
.33 (.10)
NM1
.80 (.10)
-.13 (.11)
.76 (.11)
-.11 (.10)
.73 (.12)
-.12 (.11)
.76 (.11)
-.11 (.09)
NM5
-.13 (.10)
.80 (.10)
-.12 (.09)
.75 (.12)
-.12 (.10)
.73 (.12)
-.11 (.09)
.76 (.11)
NM9
.25 (.12)
.23 (.12)
.20 (.09)
.20 (.09)
.22 (.11)
.21 (.10)
.20 (.09)
.19 (.09)
*
min
Mean inter-factor correlations:
.25 (.05)
.29 (.05)
.27 (.06)
.29 (.05)
Notes. Only the loadings of the first variable of every block of four variables were presented. The largest
difference between mean loadings within groups of items representing the same factor was .03. The standard deviations were given in brackets. Loadings $ .30 were given in bold face.
*
Oblimin was performed with % = 0, the factor pattern was presented here. The inter-factor correlations
were Fisher’s Z transformed, averaged and than retransformed.
84
MPR-Online 2001, No. 1
The only important difference for the solutions based on 1000 cases was that the
standard deviations of the loadings were smaller (the maximal standard deviation of the
loadings based on 1000 cases was .04 for the Varimax-solutions and .13 for Obliminsolutions). Since the differences between the mean loadings of every block of four variables defined by the same equation were very small (maximum: .03), only the mean
loadings for the first of the four variables defined by the same equation were presented
in Table 4.
It can be seen in Table 4 that for all methods of factor extraction the Varimax- and
Oblimin-solutions yielded a factor which was primarily marked by the variables H1 to H4
in the solutions based on high correlations and M1 to M4 in the solutions based on low
correlations. The second factor was primarily marked by the variables NH5 to NH8 in
the solutions based on high correlations and by NM5 to NM8 in the solutions based on
moderate correlations. The factor which had so clear main loadings in Table 2 was completely “dissolved” in these two factors, i.e. the marker variables of this factor (H1 to H4
and M1 to M4) load equally high on both factors in PCA-, PAF-, AF- and ML-solutions.
Moreover, orthogonal simple structure in these solutions was low and could not be substantially improved by Oblimin-rotation. The factor which could be demonstrated
clearly within the context of the first simulation study could not be demonstrated in the
present context.
Because the scree-test and Kaiser-Guttman rule indicated that three factors should be
extracted, the three factor solutions were presented in Table 5. For the three factor solutions based on highly correlated variables and 200 cases the !²-value was significant in
50 percent of the solutions (df=75; p<.05). For the solutions based on highly correlated
variables and 1000 cases the !²-value was significant in all solutions (df=75; p<.05).
This demonstrated the sensitivity of the !²-test to sample size and that for the solutions
based on 1000 cases at least four factors would constitute the upper bound for the number of factors. For the three factor solutions based on moderately correlated variables
and 200 cases the !²-value was significant in 3 percent of the solutions (df=75; p<.05).
For the three factor solutions based on moderately correlated variables and 1000 cases
the !²-value was significant in 22 percent of the solutions (df=75; p<.05). This again
demonstrated the sensitivity of the !²-test to sample size.
The means and standard deviations of the Varimax-loadings for the PCA-, PAF-,
AF-, and ML- three factor solutions based on 200 cases were presented in Table 5. The
solutions based on 1000 cases had nearly the same mean loadings. The only important
A. Beauducel: On the generalizability of factors...
85
difference for the solutions based on 1000 cases was that the standard deviations of the
loadings were smaller (for Varimax: Maximum SD= .09; for Oblimin: Maximum SD=
.16). Therefore the solutions based on 1000 cases were not presented here.
*
Factor 2
.90 (.02)
.21 (.07)
Factor 2
.45 (.07)
Factor 1
.53 (.06)
.90 (.01)
-.03 (.05)
.21 (.07)
Factor 1
.44 (.07)
.93 (.02)
-.09 (.04)
.01 (.09)
Varimax
H1
NH1
NH5
NH9
Oblimin*
H1
NH1
NH5
NH9
.36 (.14)
M1
.08 (.17)
.79 (.14)
-.10 (.07)
.34 (.13)
Factor 2
.15 (.14)
.77 (.13)
Factor 3
.23 (.09)
.00 (.04)
.12 (.08)
.46 (.39)
.04 (.13)
.03 (.12)
.14 (.21)
Factor 3
.49 (.32)
.47 (.11)
.24 (.14)
.08 (.11)
-.08 (.08)
.73 (.18)
.25 (.13)
Factor 1
.15 (.09)
-.02 (.06)
.72 (.12)
.32 (.10)
.21 (.19)
.12 (.08)
Factor 1
.57 (.04)
.44 (.17)
.00 (.08)
-.01 (.07)
Factor 3
.75 (.10)
.91 (.03)
.34 (.07)
.19 (.09)
.00 (.05)
Factor 1
.21 (.06)
-.03 (.05)
.88 (.02)
.45 (.06)
Factor 1
Factor 3
.68 (.06)
.19 (.04)
.19 (.04)
.35 (.08)
Factor 3
.32 (.15)
.07 (.11)
.73 (.18)
-.08 (.07)
.25 (.13)
Factor 2
.15 (.09)
.72 (.12)
-.02 (.06)
.32 (.10)
Factor 2
.20 (.16)
.00 (.07)
.92 (.03)
-.01 (.08)
.35 (.09)
Factor 2
.21 (.06)
.88 (.02)
-.04 (.05)
.45 (.07)
Factor 2
PAF
.57 (.04)
.45 (.16)
.02 (.08)
-.07 (.04)
.91 (.03)
.38 (.07)
Factor 1
.22 (.06)
-.03 (.05)
.88 (.02)
.48 (.06)
Factor 1
.31 (.40)
.04 (.12)
.04 (.11)
.23 (.22)
Factor 3
.32 (.20)
.16 (.10)
.16 (.09)
.30 (.18)
Factor 3
.45 (.12)
.24 (.12)
.08 (.11)
-.09 (.08)
.72 (.19)
.29 (.13)
Factor 1
.16 (.09)
-.03 (.06)
.71 (.13)
.35 (.10)
Factor 1
moderate intercorrelations, 500 runs
.65 (.10)
-.01 (.05)
.00 (.05)
.32 (.11)
Factor 3
.57 (.05)
.23 (.04)
.23 (.04)
.46 (.10)
Factor 3
.29 (.14)
.07 (.12)
.72 (.16)
-.09 (.09)
.29 (.13)
Factor 2
.15 (.09)
.71 (.10)
-.03 (.07)
.34 (.10)
Factor 2
.20 (.14)
.00 (.07)
.91 (.03)
-.06 (.06)
.39 (.08)
Factor 2
.22 (.06)
.87 (.02)
-.04 (.05)
.48 (.07)
Factor 2
AF
.33 (.23)
.06 (.12)
.06 (.14)
.17 (.19)
Factor 3
.33 (.19)
.16 (.11)
.16 (.12)
.25 (.16)
Factor 3
.65 (.10)
-.01 (.05)
-.01 (.05)
.24 (.14)
Factor 3
.57 (.06)
.22 (.04)
.22 (.04)
.41 (.09)
Factor 3
.48 (.12)
.22 (.16)
.07 (.12)
-.07 (.08)
.72 (.42)
.22 (.13)
Factor 1
.15 (.09)
-.03 (.07)
.72 (.14)
.30 (.10)
Factor 1
.57 (.07)
.37 (.24)
.01 (.08)
-.04 (.04)
.91 (.04)
.23 (.09)
Factor 1
.21 (.06)
-.04 (.05)
.87 (.02)
.39 (.07)
.35 (.16)
.07 (.11)
.72 (.53)
-.07 (.08)
.22 (.15)
Factor 2
.15 (.09)
.71 (.19)
-.02 (.07)
.30 (.10)
Factor 2
.26 (.22)
.00 (.08)
.92 (.03)
-.04 (.05)
.24 (.12)
Factor 2
.21 (.07)
.87 (.02)
-.04 (.05)
.38 (.07)
Factor 2
ML
.31 (.78)
.05 (.14)
.05 (.13)
.28 (.58)
Factor 3
.30 (.31)
.18 (.14)
.16 (.12)
.35 (.23)
Factor 3
.60 (.11)
.00 (.05)
.01 (.06)
.50 (.15)
Factor 3
.53 (.08)
.26 (.05)
.26 (.05)
.58 (.10)
Factor 3
MPR-Online 2001, No. 1
Factor 1
Oblimin was performed with % = 0, the factor pattern was presented here. The inter-factor correlations were Fisher’s Z transformed, averaged and than retransformed.
representing the same factor was .03.The standard deviations were given in brackets and loadings $ .30 in bold face.
Notes. Only the loadings of the first variable of every block of four variables were presented. The largest difference between mean loadings within groups of items
.18 (.05)
.28 (.09)
Factor 2
Mean inter-factor correlations:
.08 (.16)
Factor 1
Oblimin*
NM9
.15 (.13)
NM9
.79 (.15)
-.03 (.07)
NM5
-.10 (.08)
.78 (.12)
NM5
.40 (.12)
M1
NM1
NM1
.39 (.12)
Factor 1
Varimax
-.03 (.07)
Factor 2
.49 (.04)
Factor 3
.43 (.06)
.16 (.07)
Factor 2
Mean inter-factor correlations:
.00 (.09)
.93 (.02)
-.08 (.05)
-.04 (.05)
.52 (.07)
PCA
N = 200
high intercorrelations, 100 runs
Table 5: Means and standard deviations of Varimax- and Oblimin-loadings for three factor PCA-, PAF-, AF-, and ML-solutions
86
A. Beauducel: On the generalizability of factors...
87
An effect of the method of factor extraction was observed for the variables H1 to H4 as
well as for M1 to M4. These variables were factorially complex in all solutions, but in
PCA, they had their highest loadings on the first two factors, whereas in PAF and AF
they had very similar loadings on all three factors. In ML these variables had their
highest loading on the third factor. Thus, with respect to the loadings H1 to H4 and M1
to M4 the ML-solutions were somehow the reverse of the PCA-solutions. The factor
composed by the variables H1 to H4 and M1 to M4 was less dissolved in the ML-solutions.
However, in the ML-solutions based on moderate correlations large standard deviations
occurred for the salient loadings of the third factor (see Table 5). However, the standard
deviations of the salient loadings on the third factor were only .08 in the ML-solutions
based on 1000 cases. This indicates that a large number of cases is needed for ML.
The same pattern was observed in the Oblimin-solutions (see Table 5). However, the
lowest loadings of the variables H1 to H4 and M1 to M4 were slightly reduced. For the
PCA- and AF-solutions the loadings of these variables were reduced especially on the
third factor. For the ML-solutions based on high correlations, the loadings of the variables H1 to H4 on the first and second factor were reduced. Therefore, the differences
between ML-, PCA-, and AF-solutions were more pronounced in the Oblimin- than in
the Varimax-solutions. However, the effect was less pronounced in the solutions based
on moderate correlations. The standard deviations of the loadings were larger in the
Oblimin-solutions than in the corresponding Varimax-solutions (see Table 5), especially
in the ML-solutions based on moderate correlations. However, again the standard deviations of loadings were reduced in the solutions based on 1000 cases (for Oblimin: Maximum SD= .16), while the means of the loadings remained nearly unchanged. Therefore,
the Oblimin-solutions based on 1000 cases were not presented here.
When four factors were extracted, which should be considered as a slight overextraction, the fourth factor was marked only by the variables H1 to H4 for the solutions based
on high correlations and M1 to M4 for the solutions based on moderate correlations in
PAF-, AF-, and ML-solutions (see Table 6). In the PCA-solutions, only a single or two
variables loaded on the fourth factor, so that this factor could not be interpreted within
PCA-solutions (both for 200 and 1000 cases). The fact, that the fourth factor was
loaded by only one or two variables in PCA lead to large standard deviations of the
loadings especially on the fourth factor, since the loadings were near to zero in some
solutions and near to .90 in other solutions. In order to demonstrate, that the large
standard deviations of the loadings in the PCA-solutions were not due to the small
sample size (as they were in the three factor ML-solutions) and that they were due to
88
MPR-Online 2001, No. 1
the oscillations in main loadings, the four factor solutions were presented for the sample
of 1000 cases. The Varimax-solutions were presented first (see Table 6).
Table 6: Mean loadings for Varimax-solutions with four 4-variable factors with reduced
simple structure (1000 cases)
high intercorrelations, 100 runs
PCA-factors
Factor 1
H1
NH1
NH5
PAF-factors
Factor 2
Factor 3
Factor 4
Factor 1
Factor 2
Factor 3
Factor 4
.50 (.05) .49 (.05)
.29 (.10)
.23 (.18)
.40 (.02)
.39 (.02)
.41 (.03)
.43 (.05)
.90 (.01) -.03 (.02) .15 (.03)
.11 (.04)
.88 (.01) -.04 (.02) .23 (.02)
.11 (.03)
-.03
.91 (.01)
.14 (.02)
.11 (.04) -.04 (.02) .88 (.01)
.23 (.02)
.11 (.03)
.20 (.04) .19 (.04)
.55 (.23)
.41 (.31)
.20 (.03)
.58 (.04)
.10 (.04)
(.02)
NH9
.19 (.03)
AF-factors
H1
.40 (.02)
.39 (.02)
ML-factors
.41 (.03)
.43 (.05)
.40 (.02)
.41 (.03)
.43 (.05)
NH1
.88 (.01) -.04 (.02) .23 (.02)
.11 (.03)
.88 (.01) -.04 (.02) .23 (.02)
.11 (.03)
NH5
-.04 (.02) .88 (.01)
.23 (.02)
.11 (.03) -.04 (.02) .88 (.01)
.23 (.02)
.11 (.03)
NH9
.20 (.03)
.58 (.04)
.10 (.04)
.58 (.04)
.10 (.04)
.19 (.03)
.20 (.03)
.39 (.02)
.19 (.03)
moderate intercorrelations, 500 runs
PCA-factors
PAF-factors
Factor 1
Factor 2
Factor 3
Factor 4
Factor 1
Factor 2
Factor 3
Factor 4
.41 (.05)
.41 (.05)
.09 (.10)
.19 (.12)
.28 (.04)
.28 (.05)
.23 (.09)
.33 (.12)
NM1
.79 (.04) -.02 (.03) .09 (.04)
.06 (.04)
.72 (.05) -.03 (.03) .15 (.05)
.12 (.07)
NM5
-.03 (.03) .79 (.04)
.08 (.04)
.06 (.06) -.03 (.03) .72 (.06)
.15 (.05)
.12 (.07)
NM9
.12 (.05)
.44 (.40)
.36 (.51)
.12 (.03)
.33 (.13)
.13 (.12)
M1
.12 (.05)
.12 (.04)
AF-factors
M1
ML-factors
.19 (.09)
.31 (.11)
.27 (.06)
.22 (.12)
.35 (.26)
NM1
.72 (.05) -.02 (.03) .13 (.06)
.12 (.06)
.70 (.13) -.03 (.03) .15 (.06)
.12 (.08)
NM5
-.03 (.03) .72 (.05)
.13 (.05)
.12 (.06) -.03 (.03) .71 (.11)
.15 (.07)
.12 (.09)
NM9
.12 (.03)
.31 (.16)
.15 (.14)
.33 (.28)
.12 (.12)
.30 (.05)
.30 (.05)
.13 (.04)
.12 (.06)
.27 (.06)
.12 (.04)
Notes. Only the loadings of the first variable of every block of four variables were presented. The largest
difference between mean loadings within groups of items representing the same factor was .04. The standard deviations were given in brackets. Loadings $ .30 were given in bold face.
A. Beauducel: On the generalizability of factors...
89
The standard deviations of the main loadings on the third and especially on the
fourth factor were large, because the fourth factor was mostly a “singlet” factor in PCA.
The variables H1 to H4 and M1 to M4 load on the first two PCA-factors and load on all
factors in the PAF-, AF-, and ML-solutions. Even when H1 to H4 and M1 to M4 were
more complex in the PAF-, AF-, and ML-solutions than in the PCA-solutions, it should
be recognized that in these solutions the fourth factor was constituted only by the four
variables which represent the dissolved factor. In this sense a part of the variance of the
variables H1 to H4 and M1 to M4 could be isolated with PAF, AF, and ML, but not with
PCA. The effect was more pronounced in the Oblimin-solutions (see Table 7): There
was a perfect simple structure in the Oblimin-rotated PAF-, AF-, and ML-solutions,
while the PCA-solutions had a low simple structure. Thus, it is interesting to note that
for PAF, AF and ML, the extraction of four factors, which might be considered as a
slight overextraction lead to the identification of the previously dissolved factor. However, in the solutions based on moderate correlations the standard deviations of the
loadings of the ML-solutions were again very large, indicating some instability of these
solutions.
90
MPR-Online 2001, No. 1
Table 7: Mean loadings for Oblimin-solutions with four 4-variable factors with reduced
simple structure (1000 cases)
high intercorrelations, 100 runs
PCA-factors
PAF-factors
Factor 1
Factor 2
Factor 3
Factor 4
Factor 1
Factor 2
Factor 3
Factor 4
.42 (.07)
.41 (.07)
.15 (.12)
.11 (.19)
.04 (.04)
.03 (.04)
.03 (.05)
.74 (.08)
NH1
.93 (.01)
-.08 (.02)
.00 (.03)
.01 (.04)
.90 (.02)
-.01 (.02)
.01 (.03)
.01 (.04)
NH5
-.08 (.02)
.93 (.01)
.00 (.03)
.01 (.04)
-.01 (.03)
.90 (.03)
.01 (.02)
.01 (.05)
NH9
-.01 (.06)
-.01 (.06)
.53 (.34)
.34 (.40)
.01 (.04)
.00 (.04)
.62 (.07)
.04 (.08)
H1
Mean inter-factor correlations:
Factor2
.17 (.09)
Factor3
.42 (.07)
.39 (.08)
.60 (.02)
Factor4
.35 (.11)
.31 (.13)
H1
.04 (.05)
.04 (.04)
.03 (.05)
.74 (.09)
NH1
.90 (.02)
-.01 (.02)
.01 (.02)
NH5
-.01 (.03)
.90 (.03)
.01 (.02)
NH9
.01 (.04)
.00 (.04)
.62 (.07)
.70 (.08)
.25 (.22)
.72 (.09)
.26 (.23)
.04 (.04)
.03 (.04)
.01 (.04)
.90 (.02)
.00 (.02)
.01 (.03)
.01 (.04)
.01 (.05)
-.01 (.03)
.90 (.03)
.01 (.02)
.01 (.05)
.04 (.08)
.01 (.04)
.00 (.04)
.62 (.07)
.04 (.08)
.44 (.12)
AF-factors
.50 (.02)
ML-factors
.03 (.05)
.74 (.08)
Mean inter-factor correlations:
Factor2
.60 (.03)
Factor3
.76 (.07)
.42 (.15)
.59 (.10)
Factor4
.64 (.06)
.09 (.16)
.50 (.03)
.63 (.05)
.08 (.15)
.77 (.06)
.45 (.13)
.50 (.05)
moderate intercorrelations, 500 runs
PCA-factors
Factor 1
Factor 2
PAF-factors
Factor 3
Factor 4
Factor 1
Factor 2
Factor 3
Factor 4
M1
.36 (.06)
.36 (.06)
.03 (.12)
.11 (.11)
.14 (.08)
.14 (.08)
.08 (.10)
.34 (.19)
NM1
.80 (.05)
-.09 (.03)
.02 (.04)
-.01 (.04)
.74 (.09)
-.03 (.04)
.00 (.05)
.01 (.07)
NM5
-.09 (.03)
.80 (.05)
.02 (.04)
.00 (.04)
-.03 (.04)
.74 (.11)
.01 (.05)
.01 (.07)
NM9
.01 (.06)
.01 (.07)
.46 (.42)
.29 (.59)
.02 (.05)
.02 (.05)
.33 (.19)
.09 (.15)
Factor2
.16 (.02)
Mean inter-factor correlations:
.52 (.14)
Factor3
.28 (.06)
.27 (.06)
Factor4
.22 (.09)
.21 (.09)
.53 (.12)
.23 (.19)
.57 (.14)
.26 (.21)
M1
.20 (.10)
.20 (.10)
.06 (.09)
.27 (.19)
.14 (.09)
.14 (.08)
.10 (.13)
NM1
.74 (.08)
-.04 (.04)
.01 (.05)
NM5
-.05 (.04)
.74 (.09)
.01 (.05)
.01 (.06)
.73 (.15)
-.03 (.04)
.00 (.06)
.01 (.07)
.01 (.06)
-.03 (.04)
.73 (.15)
.01 (.05)
.01 (.07)
NM9
.02 (.05)
.02 (.05)
.30 (.25)
.12 (.18)
.02 (.05)
.02 (.05)
.33 (.46)
.09 (.14)
.19 (.06)
AF-factors
.44 (.14)
ML-factors
.34 (.54)
Mean inter-factor correlations:
Factor2
.45 (.17)
Factor3
.53 (.13)
.32 (.18)
.43 (.21)
Factor4
.49 (.16)
.21 (.17)
.41 (.15)
.51 (.16)
.20 (.20)
.52 (.17)
.40 (.21)
.44 (.17)
Notes. Only the loadings of the first variable of every block of four variables were presented. The largest
difference between mean loadings within groups of items representing the same factor was .06. The standard deviations were given in brackets. Loadings $ .30 were given in bold face. Oblimin was performed
with % = 0, the factor pattern was presented here. The inter-factor correlations were Fisher’s Z transformed, averaged and than retransformed.
A. Beauducel: On the generalizability of factors...
91
It should also be noted that many inter-factor correlations in the PCA-, PAF-, AF-,
and ML- Oblimin-solutions were substantially different (see Table 7), indicating an influence of the method of factor extraction on the inter-factor correlations. These differences might have consequences when hierarchical factor analyses are performed.
4.
Discussion
The effect of changing contexts of variables on results in exploratory factor analysis
was analyzed with respect to different methods of factor extraction (PCA, PAF, AF,
and ML) and rotation (Varimax and Oblimin). In the first simulation study, data sets
with pronounced simple structure and four marker variables for every factor were created in order to identify a factor in a favorable context. The pronounced simple structure could be demonstrated for data sets based on 200 and 1000 cases and with high as
well as moderate intercorrelations between the variables. In the second simulation study,
data sets with lower simple structure were created in order to produce a less favorable
context for one of the factors previously identified in the favorable context. Thus, in the
second study, four of the marker variables were created exactly like in the first study in
order to represent one factor of the first study. These four variables were now embedded
in a less favorable context for the identification of the factor they marked. Thus, they
were analyzed together with a set of 12 substantially correlated variables. Again the
analyses were performed on the basis of 200 and 1000 cases and with high and moderate
correlations between variables.
Parallel analysis indicated that two factors should be extracted. In the two factor
solutions for PCA, PAF, AF, and ML, the four variables, which marked a factor in the
first study, load on both factors equally. Thus, irrespective of the method of factor extraction, a factor which represents these variables could not be demonstrated. However,
parallel analysis has been shown to underextract when there is a large first eigenvalue
(Turner, 1998), as was the case in the present data. Therefore, and because the screetest and the Kaiser-Guttman rule indicated that three factors should be extracted, also
the three factor solutions were compared for PCA, PAF, AF, and ML. While the four
marker variables load quite equally on all three Varimax-rotated factors in PCA, PAF,
and AF, they had marked loadings on the third factor in ML in the solutions based on
high intercorrelations. In the Oblimin-solutions the four variables loaded on all three
factors in the PAF-solutions and on the first two factors for the PCA- and AF-solutions.
They had marked loadings only on the third factor in the ML-solutions based in high
92
MPR-Online 2001, No. 1
correlations. Moreover, in the ML-Oblimin-solutions the loading pattern of the variables
representing the dissolved factor was the reverse of the pattern in the PCA-Obliminsolutions, the patterns of the PAF- and AF-Oblimin-solutions lying in between. This
corresponds to the results of Velicer (1977) who found that PCA and ML were most
dissimilar. However, in the Oblimin-solutions based on low intercorrelations, large standard deviations of loadings, indicating some instability of the solutions, occurred for the
salient ML-loadings on the third factor.
Then, four factors were extracted, which must be regarded as a slight overextraction,
given the course of the eigenvalues. In PCA-solutions the fourth factor was loaded by
only one or two variables and was thus not interpretable (Velicer & Fava, 1987). The
fact that the most important difference between PCA and the remaining methods of
factor extraction occured in the case of a slight overextraction is in line with Velicer and
Jackson (1990). In the Varimax-rotated PAF-, AF-, and ML-solutions, the four marker
variables of the previous data sets load substantially on all factors, but the fourth factor
was composed only by these variables. In the Oblimin-rotated solutions, a perfect simple
structure emerged for PAF, AF, and ML, but not for PCA where the fourth factor was
mostly a “singlet”. According to Gorsuch (1983) the tendency of PCA to produce “singlet” factors is related to the fact that PCA tends to reproduce all the variances of the
variables. A “singlet” reproduces the unity in the diagonal of the correlation matrix for
a variable. Therefore, the tendency of PCA to reproduce the units in the diagonal may
reduce its sensitivity to the more relevant aspects of a structure which are typically not
in the diagonal. Especially when the correlations are low the units in the diagonal may
cover the more relevant information contained in the non-diagonal elements of a correlation matrix. The result of lower generalizability of PCA parameters is in line with Widaman (1993) who found more pronounced changes in PCA loadings compared to PAF
loadings when the context of variables was changed.
With regard to the generalizability of factor solutions, the following conclusions
should be drawn with caution, since replication with different types of data sets would
be necessary:
1) In contexts of extreme overlap between variables a factor might most easily be dissolved with PCA and less easily with ML, with PAF and AF lying in between. However, especially with low sample sizes and moderate intercorrelations between variables,
ML solutions might have some instability. Therefore, PAF and AF may be regarded as
A. Beauducel: On the generalizability of factors...
93
optimal compromise between sensitivity to dissolved factors on the one hand and stability of results on the other.
2) Slight overfactoring can, in combination especially with PAF and AF, and in large
samples perhaps also with ML further help to (re-)produce dissolved factors. In combination with PCA, overfactoring may lead more easily to factors which are loaded by
only one or two variables and which can therefore not be interpreted.
Thus, when factors are to be identified in changing contexts of variables a good strategy seems to perform several tests for the number of factors to extract and to make a
choice, which avoids underextraction. This strategy should by preference be combined
with PAF or AF, or perhaps with ML, but not with PCA. Perhaps, the “bootstrap”
approach in factor analysis (e.g. Thompson, 1988) could help to increase the robustness
of ML-solutions across cases, while maintaining its sensitivity to the structure of variables.
Of course, the results and conclusions of the present study are only a first step to
deal with a problem which seems to be important for research based on exploratory factor analysis. They should therefore not be generalized to much. In further studies it
should be investigated, how far the present results can be generalized to a larger number
of factors and different types of overlap between variables, different numbers of variables per factor, and different component saturations. In addition, it would be interesting to investigate the role of different (especially oblique) methods of factor rotation
with respect to the identification of factors in changing contexts of variables.
Concerning oblique rotation it should be noted that many inter-factor correlations
were different between the PCA-, PAF-, AF-, and ML-solutions. This would be of importance for hierarchical factor analysis. On the basis of the present results, it could be
expected that quite different hierarchical factor solutions may emerge when different
methods of factor extraction are employed. The differences between hierarchical solutions based on different methods of factor extraction should be explored in further research.
For the more general discussion it should be retained, that in the present study, the
results of factor analysis were not simply the effect of variable selection as was suggested
by Block (1995). When the above-mentioned strategy of the avoidance of underfactoring
combined with PAF-, AF-, or ML-extraction was used, an impressive independence of
the results from the context of variables was obtained. Therefore, it might be possible to
94
MPR-Online 2001, No. 1
replicate factors even in unfavorable contexts of variables, which could in turn enhance
the importance of the results obtained with exploratory factor analysis.
References
[1] Block, J. (1995). A contrarian view of the five-factor approach to personality description. Psychological Bulletin, 117(2), 187-215.
[2] Brocke, B. (2000). Das bemerkenswerte Comeback der Differentiellen Psychologie:
Glückwünsche und Warnungen vor einem neuen Desaster [The remarkable comeback of personality psychology: Congratulations and warnings against a new desaster]. Zeitschrift für Differentielle und Diagnostische Psychologie, 21(1), 5-30.
[3] Buley, J. L. (1995). Evaluating Exploratory Factor Analysis: Which Initial-Extraction Techniques Provide the Best Factor Fidelity. Human Communication Research, 21, 478-493.
[4] Cattell, R.B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276.
[5] Cattell, R.B. (1988). The meaning and strategic use of factor analysis. In J.R. Nesselroade & R.B. Cattell (Eds.), Handbook of multivariate experimental psychology
(2nd Edition) (pp. 131-201). New York: Plenum Press.
[6] Glorfeld, L.W. (1995). An improvement on Horn's parallel analysis methodology for
selecting the correct number of factors to retain. Educational and Psychological
Measurement, 55, 377-393.
[7] Gorsuch, R.L. (1983). Factor Analysis. Hillsdale, NJ: Lawrence Erlbaum.
[8] Guilford, J.P. (1975). Factors and Factors of Personality. Psychological Bulletin,
82(5), 802-814.
[9] Harris, M.L. & Harris, C.W. (1971). A factor analytic interpretation strategy. Educational and Psychological Measurement, 31(3), 589-606.
[10] Holz-Ebeling, F. (1995). Faktorenanalysen und was dann? Zur Frage der Validität
von Dimensionsinterpretationen [Factor analysis and then what? The validity of
dimensional interpretations]. Psychologische Rundschau, 46, 18-35.
[11] Horn, J. (1965). A rationale and test for the number of factors in factor analysis.
Psychometrika, 30, 179-185.
A. Beauducel: On the generalizability of factors...
95
[12] Hubbard, R. & Allen, S.J. (1987). An empirical comparison of alternative methods
for principal components extraction. Journal of Business Research, 15, 173-190.
[13] Jennrich, R.I. & Sampson, P.F. (1966). Rotation for simple loadings. Psychometrika, 31, 313-323.
[14] Jöreskog, K.G. (1967). A computer program for unrestricted maximum likelihood
factor analysis. Research Bulletin, Princeton, Educational Testing Service.
[15] Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis.
Psychometrika, 23, 187-200.
[16] Kaiser, H.F. & Caffrey, J. (1965). Alpha factor analysis. Psychometrika, 30(1), 1-14.
[17] Kaiser, H.F. & Derflinger, G. (1990). Some contrasts between maximum likelihood
factor analysis and alpha factor analysis. Applied psychological measurement, 14(1),
29-32.
[18] Knuth, D.E. (1981). The art of computer programming. Vol. II (2nd ed.). Reading,
Mass.: Addison-Wesley.
[19] Lawley, D.N. & Maxwell , A.E. (1963). Factor analysis as a statistical method. London: Butterworth.
[20] Saucier, G. (1997). Effects of variable selection on the factor structure of person descriptors. Journal of Personality and Social Psychology, 73, 1296-1312.
[21] Schönemann, P.H. & Wang, M.M. (1972). Some new results on factor indeterminacy. Psychometrika, 37, 61-91.
[22] Schweizer, K., Boller, E., & Braun, G. (1996). Der Einfluß von Klassifikationsverfahren, Stichprobengröße und strukturellen Datenmerkmalen auf die Klassifizierbarkeit von Variablen [The influence of classification procedures, sample size, and
structural properties of data on the classification of variables]. MPR-online: Methods
for Psychological Research, 1, 93-105.
[23] Snook, S.C. & Gorsuch, R.L. (1989). Component analysis versus common factor
analysis: A Monte Carlo study. Psychological Bulletin, 106, 148-154.
[24] SPSS, Inc. (1999). SPSS for Windows, Release 9.0. Chicago: Author.
[25] Thompson, B. (1988). Program FACSTRAP: A program that computes bootstrap
estimates of factor structure. Educational and Psychological Measurement, 48, 681686.
96
MPR-Online 2001, No. 1
[26] Turner, N. E. (1998). The Effect of Common Variance and Structure Pattern on
Random Data Eigenvalues: Implications for the Accuracy of Parallel Analysis. Educational and Psychological Measurement, 58, 541-568.
[27] Velicer, W.F. (1977). An empirical comparison of the similarity of principal component, image, and factor analysis. The Journal of Multivariate Behavioral Research,
12, 3-22.
[28] Velicer, W.F. & Fava, J.L. (1987). An evaluation of the effects of variable sampling
on component, image, and factor analysis . Multivariate Behavioral Research, 22,
193-209.
[29] Velicer, W.F., Eaton, C.A. & Fava, J.L. (2000). Construct explication through factor or component analysis: A review and evaluation of alternative procedures for determining the number of factors or components. In R.D. Goffin & E. Helmes (Eds.).
Problems and solutions in human assessment. Honoring Douglas N. Jackson at seventy (pp. 41-72). Norwell, Massachusetts: Kluwer Academic Publishers.
[30] Velicer, W.F. & Jackson, D.N. (1990). Component analysis versus common factor
analysis: Some issues in selecting an appropriate procedure. Multivariate Behavioral
Research, 25, 1-28.
[31] Widaman, K.F. (1993). Common factor analysis versus principal component analysis: Differential bias in representing model parameters? Multivariate Behavioral Research, 28, 263-311.
[32] Zwick, W.R. & Velicer, W.F. (1986). Comparison of five rules for determining the
number of components to retain. Psychological Bulletin, 99, 432-442.
Download