Asking the right questions, getting the wrong answers

advertisement
Asking the right questions,
getting the wrong answers
On the risks of using factor analysis on survey data
Cees van der Eijk and Jonathan Rose
Workshop on Issues in latent structure analysis using survey data
A Nottingham-Warwick-Birmingham Advanced Quantitative Methods
(AQM) Workshop – organised by the Nottingham ESRC DTC in
collaboration with the Methods and Data Institute
Nottingham, 8 March 2013
The Right Question
“which items (if any) from a pool can be regarded as indicators
of the same underlying phenomenon?”
This is the right question when analysing survey data, because
– Questionnaire design often includes multiple operationalisations
– Even when not explicitly intended in the design of a questionnaire,
many seemingly different behavioural, opinion or attitudinal items
relate to a smaller number of underlying orientations
– If a set of items can validly be regarded as indicators of a latent
variable, respondents’ positions on that variable can be estimated
better on the basis of a composite measure than by any single item
(better: more reliably, and with more discrimination)
– The conceptual framework of a study simplifies, becomes more
general (and generalizable) if a set of items can validly be regarded as
indicators of a latent variable.
2
Multiple indicators in surveys
Most popular forms of multiple indicators for probing attitudes,
orientations, predispositions etc. in surveys are
• Dichotomies
– yes/no
– agree/disagree
– Does / does not apply, etc.
• Likert items: stimuli for which respondent has to indicate the
extent of agreement or support. E.g., response categories
ranging from ‘disagree strongly’ to ‘agree strongly’. Number of
categories ranges mostly between 4 and 7, depending on the
number of gradations and the inclusion (or not) of a ‘neither’
category. Often used in the form of item ‘batteries’.
3
A popular solution: Factor Analysis
Many researchers ubiquitously use factor analytic methods to
diagnose the number and character of meaning dimensions
underlying a pool of items.
Reasons for this popularity:
– Apparent simplicity of the method
– Availability in virtually all commonly available statpacks, including
sophisticated looking tests of applicability such as ‘tests of sphericity’
– It is the recommended approach in many textbooks and
methodological sites
– ‘Everyone else does it’, and I don’t have to spend time explaining other
methods to referees or readers
– ‘I can always interpret the results’
4
But, are the answers right?
Most texts refer to the assumptions of factor analysis that the
items are assumed to be interval-level.
This raises the question whether dichotomous items and Likert
items lend themselves to factor analysis. How robust are factor
analytic results for violations of the assumption of interval data?
5
Can dichotomous items be FA’d?
The use of factor analysis on dichotomous items is often
frowned upon.
Yet, there are numerous texts and sites stating that dichotomous
items can be factor analysed ‘with caution’ (the argument is
generally that FA uses product-moment correlations as input,
and those can be validly calculated on dichotomies, as the
product-moment correlation is numerically identical to the
designated 2×2 measure of association which is phi).
Occasionally, the argument is made that FA on dichotomous
items is justified if based on tetrachoric correlations.
6
Can Likert items be FA’d?
Ordinal items of the Likert type are often seen as amenable to
factor analysis, particularly when it is thought that
‘assignment of ordinal categories to the data does not
seriously distort the underlying metric scaling’
(Mueller and Kim).
Likert-type items are more often than not regarded as
conforming to this requirement. They are then referred to as
quasi-interval, and seen as appropriate for being factor analysed.
7
But are the answers right? – 2
Yet, some critics warn against factor analysing dichotomous or
ordinal (Likert type) items, mainly on the following grounds:
• factor analysis on dichotomous or Likert items would lead to
‘over-dimensionalisation’, i.e., a tendency of factor analysis to
suggest a larger number of factors than is warranted. This has
obvious (negative) consequences for conceptualisation and
theory development.
Small scale simulations and didactic examples demonstrate
that this can indeed occur (Van Schuur; Van der Eijk). It is,
however, unclear how large this risk is and when it is most
likely to occur (i.e., under what conditions)
8
Risk assessment
Researchers are caught between conflicting recommendations,
and they have few, if any, support in assessing how frequently
and how seriously the alleged risks are.
The aim of this study is to assess the risk that factor analysis of
ordinal (Likert-type) survey items will lead to incorrect
diagnosis of the latent structure underlying those items.
The verdict ‘incorrect’ requires knowledge of the true underlying
structure. In empirical studies this is generally absent (barring
very strong and contestable assumptions). But data
simulation allows unequivocal knowledge about the true
latent structure, and thus the possibility to evaluate the risk of
incorrect results from factor analysis.
9
Data simulation strategy
• Define a large pool of Likert-type survey items all of which express a
single underlying continuum, plus random measurement error.
As a consequence, the true underlying dimensionality is 1.
• The items may vary in terms of number of response categories
(although we use only 5 category items here), and in terms of
difficulty (i.e., the location at the underlying continuum of the
boundaries between adjacent response categories)
• Specify a distribution of a population of respondents on the same
underlying continuum.
• Simulate individual-level responses to items on the basis of
– The position of the category boundaries of the item
– The position of the simulated respondent
– A perturbation for each response, based on a random draw from a standard
normal distribution
10
Data simulation strategy - 2
• Sample at random sets of items from the pool of available
items
• Simulate responses from a sample from the population
• Factor analyse the generated data
• Assess the proportion of incorrect inferences about latent
structure when following standard recommendations about
the conduct and interpretation of factor analytic outcomes.
Keep in mind that the true latent structure is known: the entire
pool of items, and hence every sample from it, is stochastically
uni-dimensional
11
Simulated data
We constructed:
– A pool of 27 items ranging from very difficult to very easy on the
underlying continuum. From this pool sets of items are randomly
drawn consisting of, respectively, 3, 5, 8 and 10 items
– 4 different distributions of the population on the latent dimension:
uniform, bimodal, normal, and skewed normal. From each of these
distributions 800 different samples of 10,000 respondents are drawn.
All respondents in a sample respond to all items from one particular
sample of items
– In half the samples we employ a large random perturbation that
affects the responses, in the other half of the samples we specify a
smaller perturbation
12
The pool of available items (k=27)
13
What is an ‘incorrect’ finding?
In Exploratory Factor Analysis (EFA) the findings hinge on the
criteria used for the choice of the number of factors. Most
commonly used in the literature (in spite of warnings not to do
so mechanically) is Kaiser’s criterion (eigenvalues>1).
Therefore we look at the magnitude of the eigenvalues, and
particularly the magnitude of the second eigenvalue (whether or
not it is >1).
In Confirmatory Factor Analysis (CFA) a unidimensional model is
specified and subjected to a test of fit, e.g., chi-square. Hence,
an ‘incorrect’ finding would consist of a significant value of this
test statistic as that would imply the rejection of the unidimensional model.
14
Analyses of the simulated data
Each simulated dataset of 10,000 cases responding to k items is
subjected to:
– An exploratory factor analysis (EFA)
– A confirmatory factor analysis (CFA), in which the model is tested that all
k items are indicators of a single underlying dimension
From each set of analyses the following data are harvested:
– The magnitude of the 1st and 2nd eigenvalues (EFA), chi-square (CFA)
– The range of the difficulties of the items; each item is characterised by
the interpolated median of the 10,000 responses; the range of the set is
expressed as the inter-quartile range
– The population distribution from which respondents were drawn
– Whether the perturbation affecting responses was small or large
– The number of items in the sample of items
Total number of simulated datasets is 3200
15
Degrees of risk
When applying the EFA ‘default’ option of extracting only
dimensions with eigenvalues > 1, we find the following risks of
incorrect indications of 2 underlying factors in the data, for
different numbers of items and different population
distributions:
Analyses
Regressing the number of eigenvalues > 1 (Kaiser’s criterion for
determining the dimensionality of the latent space, which is nearly universally
shows it to be strongly dependent
on the number of items, the population distribution, and the
difficulty range of the item set.
recommended and used as a default)
Variable
Coefficient
Std. Error
t-value
p-value
Intercept
Bi-modal distribution
dummy
Skewed normal distribution
dummy
0.594173
0.0186
31.944
<2e-16
-0.013233
0.014322
-0.924
0.3556
0.35468
0.01497
23.693
<2e-16
Normal distribution dummy
0.169761
0.014322
11.853
<2e-16
Difficulty range
0.164071
0.0124
13.232
<2e-16
-0.024643
0.010232
-2.408
0.0161
0.04227
0.001963
21.536
<2e-16
Small perturbance dummy
Number of items
R2 adjusted
0.3306
17
Analyses - 2
Regressing the magnitude of the 2nd eigenvalue yields the
following:
Variable
Coefficient
Std. Error
t-value
p-value
Intercept
-0.022871
0.009614
-2.379
0.0174
Bi-modal distribution dummy
-0.110159
0.007403
-14.881
<2e-16
Skewed normal distribution dummy
0.443042
0.007737
57.261
<2e-16
Normal distribution dummy
0.269762
0.007402
36.443
<2e-16
Difficulty range
0.223663
0.006409
34.899
<2e-16
-0.109247
0.005288
-20.658
<2e-16
0.052782
0.001014
52.031
<2e-16
Small perturbance dummy
Number of items
R2 adjusted
0.7867
18
Density 2nd EV (3 items)
see also ‘notes’ page associated with this slide
19
Density 2nd EV (5 items)
see also ‘notes’ page associated with this slide
20
Density 2nd EV (8 items)
see also ‘notes’ page associated with this slide
21
Density
nd
2
EV (10 items)
see also ‘notes’ page associated with this slide
22
Why over-dimensionalising?
Likert items that express the same underlying dimension are
strongly correlated when similar in terms of difficulty (→ high
loadings on the same factor), but they may correlate weakly
when different in terms of difficulty (→ not loading high on the
same factor). This is particularly so for normal or skewed
normal, and much less so for bimodal and uniform
distributions.
Consider from slide 13 items 1 (easiest) and 15 (most difficult).
The following 2 slides present the cross-tabulation for these
items for each of the four population distributions that we
distinguish. Note the (Pearson) correlation values.
Conclusion: Pearson product-moment correlations are inadequate
to express the association between these items.
Effect of ‘difficulty’ - 1
Effect of ‘difficulty’ - 2
CFA: Assessing model fit (chi-square)
The simulated data for sets of 5, 8 and 10 items were analysed
with CFA, testing a uni-dimensional model. Model fit is
expressed in terms of Chi-square values. The Chi-square
values of these models have, respectively, 5, 20 and 35 df.
In 100% of instances was this model rejected at p<.001 (while we
know from the structure of the simulations that the true
latent structure is uni-dimensional)
A regression model with Chi-square as DV shows the same
variables as drivers as in the model for the number of
significant eigenvalues (slides 17/18).
26
Conclusions and implications
• Empirical research that is predominantly based on survey data
(and hence on categorical data) should refrain from factor
analysis (EFA as well as CFA) for charting the latent structure
of pools of items, or as an approach to multiple item
measurement
• Existing literature that uses EFA/CFA for these purposes will
contain a large number of unjustified conceptual distinctions
and multiple item measurements in the form of suboptimal
factor-scores
• The specific way in which unjustified conceptual distinctions
appear is dependent on distributions of sampled populations
on underlying dimensions. Comparisons of different
populations in terms of latent variables will therefore result in
many apparent differneces which are entirely artefactual.
27
Alternatives for FA?
IRT models (Item Response Theory)
• For cumulative items (appropriate for responses
expressing dominance relations between items and
subjects):
– Mokken (see, e.g., Van Schuur 2011)
– Rasch (see, e.g. Engelhard 2013; Bond and Fox 2007)
• For ‘point’ items (appropriate for responses
expressing proximity between items and subjects):
– IRT unfolding (e.g., MUDFOLD)
28
Q&A
??
29
References
•
•
•
•
•
•
•
Bond, T.G. and Fox, C.M. 2007. Applying the Rasch Model: Fundamental
Measurement in the Human Sciences (2nd ed.). Routledge.
Englehard. G. 2013. Invariant Measurement: Using Rasch Models in the Social,
Behavioral, and Health Sciences. Routledge
Mokken, R.J. 1997. “Nonparametric Models for Dichotomous Responses”, in W.J.
van der Linden and R.K. Hambleton (eds.) Handbook of Modern Item-Response
Theory. Springer. pp.351-368
van Schuur, W.H. 2011. Ordinal Item Response Theory: Mokken Scale Analysis.
Sage.
Van der Linden, W.J and R.K. Hambleton (eds.) 1997. Handbook of Modern ItemResponse Theory. Springer.
van Schuur, W.H. 1992. “Nonparametric unidimensional unfolding for
multicategory data”. Political Analysis 4, 41-74.
van Schuur, W.H. and H.A.L. Kiers (1994 ). “Why factor analysis is often the wrong
model for analyzing bipolar concepts, and what model to use instead”. Applied
Psychological Measurement, 18, 97-110
30
Download