Asking the right questions, getting the wrong answers On the risks of using factor analysis on survey data Cees van der Eijk and Jonathan Rose Workshop on Issues in latent structure analysis using survey data A Nottingham-Warwick-Birmingham Advanced Quantitative Methods (AQM) Workshop – organised by the Nottingham ESRC DTC in collaboration with the Methods and Data Institute Nottingham, 8 March 2013 The Right Question “which items (if any) from a pool can be regarded as indicators of the same underlying phenomenon?” This is the right question when analysing survey data, because – Questionnaire design often includes multiple operationalisations – Even when not explicitly intended in the design of a questionnaire, many seemingly different behavioural, opinion or attitudinal items relate to a smaller number of underlying orientations – If a set of items can validly be regarded as indicators of a latent variable, respondents’ positions on that variable can be estimated better on the basis of a composite measure than by any single item (better: more reliably, and with more discrimination) – The conceptual framework of a study simplifies, becomes more general (and generalizable) if a set of items can validly be regarded as indicators of a latent variable. 2 Multiple indicators in surveys Most popular forms of multiple indicators for probing attitudes, orientations, predispositions etc. in surveys are • Dichotomies – yes/no – agree/disagree – Does / does not apply, etc. • Likert items: stimuli for which respondent has to indicate the extent of agreement or support. E.g., response categories ranging from ‘disagree strongly’ to ‘agree strongly’. Number of categories ranges mostly between 4 and 7, depending on the number of gradations and the inclusion (or not) of a ‘neither’ category. Often used in the form of item ‘batteries’. 3 A popular solution: Factor Analysis Many researchers ubiquitously use factor analytic methods to diagnose the number and character of meaning dimensions underlying a pool of items. Reasons for this popularity: – Apparent simplicity of the method – Availability in virtually all commonly available statpacks, including sophisticated looking tests of applicability such as ‘tests of sphericity’ – It is the recommended approach in many textbooks and methodological sites – ‘Everyone else does it’, and I don’t have to spend time explaining other methods to referees or readers – ‘I can always interpret the results’ 4 But, are the answers right? Most texts refer to the assumptions of factor analysis that the items are assumed to be interval-level. This raises the question whether dichotomous items and Likert items lend themselves to factor analysis. How robust are factor analytic results for violations of the assumption of interval data? 5 Can dichotomous items be FA’d? The use of factor analysis on dichotomous items is often frowned upon. Yet, there are numerous texts and sites stating that dichotomous items can be factor analysed ‘with caution’ (the argument is generally that FA uses product-moment correlations as input, and those can be validly calculated on dichotomies, as the product-moment correlation is numerically identical to the designated 2×2 measure of association which is phi). Occasionally, the argument is made that FA on dichotomous items is justified if based on tetrachoric correlations. 6 Can Likert items be FA’d? Ordinal items of the Likert type are often seen as amenable to factor analysis, particularly when it is thought that ‘assignment of ordinal categories to the data does not seriously distort the underlying metric scaling’ (Mueller and Kim). Likert-type items are more often than not regarded as conforming to this requirement. They are then referred to as quasi-interval, and seen as appropriate for being factor analysed. 7 But are the answers right? – 2 Yet, some critics warn against factor analysing dichotomous or ordinal (Likert type) items, mainly on the following grounds: • factor analysis on dichotomous or Likert items would lead to ‘over-dimensionalisation’, i.e., a tendency of factor analysis to suggest a larger number of factors than is warranted. This has obvious (negative) consequences for conceptualisation and theory development. Small scale simulations and didactic examples demonstrate that this can indeed occur (Van Schuur; Van der Eijk). It is, however, unclear how large this risk is and when it is most likely to occur (i.e., under what conditions) 8 Risk assessment Researchers are caught between conflicting recommendations, and they have few, if any, support in assessing how frequently and how seriously the alleged risks are. The aim of this study is to assess the risk that factor analysis of ordinal (Likert-type) survey items will lead to incorrect diagnosis of the latent structure underlying those items. The verdict ‘incorrect’ requires knowledge of the true underlying structure. In empirical studies this is generally absent (barring very strong and contestable assumptions). But data simulation allows unequivocal knowledge about the true latent structure, and thus the possibility to evaluate the risk of incorrect results from factor analysis. 9 Data simulation strategy • Define a large pool of Likert-type survey items all of which express a single underlying continuum, plus random measurement error. As a consequence, the true underlying dimensionality is 1. • The items may vary in terms of number of response categories (although we use only 5 category items here), and in terms of difficulty (i.e., the location at the underlying continuum of the boundaries between adjacent response categories) • Specify a distribution of a population of respondents on the same underlying continuum. • Simulate individual-level responses to items on the basis of – The position of the category boundaries of the item – The position of the simulated respondent – A perturbation for each response, based on a random draw from a standard normal distribution 10 Data simulation strategy - 2 • Sample at random sets of items from the pool of available items • Simulate responses from a sample from the population • Factor analyse the generated data • Assess the proportion of incorrect inferences about latent structure when following standard recommendations about the conduct and interpretation of factor analytic outcomes. Keep in mind that the true latent structure is known: the entire pool of items, and hence every sample from it, is stochastically uni-dimensional 11 Simulated data We constructed: – A pool of 27 items ranging from very difficult to very easy on the underlying continuum. From this pool sets of items are randomly drawn consisting of, respectively, 3, 5, 8 and 10 items – 4 different distributions of the population on the latent dimension: uniform, bimodal, normal, and skewed normal. From each of these distributions 800 different samples of 10,000 respondents are drawn. All respondents in a sample respond to all items from one particular sample of items – In half the samples we employ a large random perturbation that affects the responses, in the other half of the samples we specify a smaller perturbation 12 The pool of available items (k=27) 13 What is an ‘incorrect’ finding? In Exploratory Factor Analysis (EFA) the findings hinge on the criteria used for the choice of the number of factors. Most commonly used in the literature (in spite of warnings not to do so mechanically) is Kaiser’s criterion (eigenvalues>1). Therefore we look at the magnitude of the eigenvalues, and particularly the magnitude of the second eigenvalue (whether or not it is >1). In Confirmatory Factor Analysis (CFA) a unidimensional model is specified and subjected to a test of fit, e.g., chi-square. Hence, an ‘incorrect’ finding would consist of a significant value of this test statistic as that would imply the rejection of the unidimensional model. 14 Analyses of the simulated data Each simulated dataset of 10,000 cases responding to k items is subjected to: – An exploratory factor analysis (EFA) – A confirmatory factor analysis (CFA), in which the model is tested that all k items are indicators of a single underlying dimension From each set of analyses the following data are harvested: – The magnitude of the 1st and 2nd eigenvalues (EFA), chi-square (CFA) – The range of the difficulties of the items; each item is characterised by the interpolated median of the 10,000 responses; the range of the set is expressed as the inter-quartile range – The population distribution from which respondents were drawn – Whether the perturbation affecting responses was small or large – The number of items in the sample of items Total number of simulated datasets is 3200 15 Degrees of risk When applying the EFA ‘default’ option of extracting only dimensions with eigenvalues > 1, we find the following risks of incorrect indications of 2 underlying factors in the data, for different numbers of items and different population distributions: Analyses Regressing the number of eigenvalues > 1 (Kaiser’s criterion for determining the dimensionality of the latent space, which is nearly universally shows it to be strongly dependent on the number of items, the population distribution, and the difficulty range of the item set. recommended and used as a default) Variable Coefficient Std. Error t-value p-value Intercept Bi-modal distribution dummy Skewed normal distribution dummy 0.594173 0.0186 31.944 <2e-16 -0.013233 0.014322 -0.924 0.3556 0.35468 0.01497 23.693 <2e-16 Normal distribution dummy 0.169761 0.014322 11.853 <2e-16 Difficulty range 0.164071 0.0124 13.232 <2e-16 -0.024643 0.010232 -2.408 0.0161 0.04227 0.001963 21.536 <2e-16 Small perturbance dummy Number of items R2 adjusted 0.3306 17 Analyses - 2 Regressing the magnitude of the 2nd eigenvalue yields the following: Variable Coefficient Std. Error t-value p-value Intercept -0.022871 0.009614 -2.379 0.0174 Bi-modal distribution dummy -0.110159 0.007403 -14.881 <2e-16 Skewed normal distribution dummy 0.443042 0.007737 57.261 <2e-16 Normal distribution dummy 0.269762 0.007402 36.443 <2e-16 Difficulty range 0.223663 0.006409 34.899 <2e-16 -0.109247 0.005288 -20.658 <2e-16 0.052782 0.001014 52.031 <2e-16 Small perturbance dummy Number of items R2 adjusted 0.7867 18 Density 2nd EV (3 items) see also ‘notes’ page associated with this slide 19 Density 2nd EV (5 items) see also ‘notes’ page associated with this slide 20 Density 2nd EV (8 items) see also ‘notes’ page associated with this slide 21 Density nd 2 EV (10 items) see also ‘notes’ page associated with this slide 22 Why over-dimensionalising? Likert items that express the same underlying dimension are strongly correlated when similar in terms of difficulty (→ high loadings on the same factor), but they may correlate weakly when different in terms of difficulty (→ not loading high on the same factor). This is particularly so for normal or skewed normal, and much less so for bimodal and uniform distributions. Consider from slide 13 items 1 (easiest) and 15 (most difficult). The following 2 slides present the cross-tabulation for these items for each of the four population distributions that we distinguish. Note the (Pearson) correlation values. Conclusion: Pearson product-moment correlations are inadequate to express the association between these items. Effect of ‘difficulty’ - 1 Effect of ‘difficulty’ - 2 CFA: Assessing model fit (chi-square) The simulated data for sets of 5, 8 and 10 items were analysed with CFA, testing a uni-dimensional model. Model fit is expressed in terms of Chi-square values. The Chi-square values of these models have, respectively, 5, 20 and 35 df. In 100% of instances was this model rejected at p<.001 (while we know from the structure of the simulations that the true latent structure is uni-dimensional) A regression model with Chi-square as DV shows the same variables as drivers as in the model for the number of significant eigenvalues (slides 17/18). 26 Conclusions and implications • Empirical research that is predominantly based on survey data (and hence on categorical data) should refrain from factor analysis (EFA as well as CFA) for charting the latent structure of pools of items, or as an approach to multiple item measurement • Existing literature that uses EFA/CFA for these purposes will contain a large number of unjustified conceptual distinctions and multiple item measurements in the form of suboptimal factor-scores • The specific way in which unjustified conceptual distinctions appear is dependent on distributions of sampled populations on underlying dimensions. Comparisons of different populations in terms of latent variables will therefore result in many apparent differneces which are entirely artefactual. 27 Alternatives for FA? IRT models (Item Response Theory) • For cumulative items (appropriate for responses expressing dominance relations between items and subjects): – Mokken (see, e.g., Van Schuur 2011) – Rasch (see, e.g. Engelhard 2013; Bond and Fox 2007) • For ‘point’ items (appropriate for responses expressing proximity between items and subjects): – IRT unfolding (e.g., MUDFOLD) 28 Q&A ?? 29 References • • • • • • • Bond, T.G. and Fox, C.M. 2007. Applying the Rasch Model: Fundamental Measurement in the Human Sciences (2nd ed.). Routledge. Englehard. G. 2013. Invariant Measurement: Using Rasch Models in the Social, Behavioral, and Health Sciences. Routledge Mokken, R.J. 1997. “Nonparametric Models for Dichotomous Responses”, in W.J. van der Linden and R.K. Hambleton (eds.) Handbook of Modern Item-Response Theory. Springer. pp.351-368 van Schuur, W.H. 2011. Ordinal Item Response Theory: Mokken Scale Analysis. Sage. Van der Linden, W.J and R.K. Hambleton (eds.) 1997. Handbook of Modern ItemResponse Theory. Springer. van Schuur, W.H. 1992. “Nonparametric unidimensional unfolding for multicategory data”. Political Analysis 4, 41-74. van Schuur, W.H. and H.A.L. Kiers (1994 ). “Why factor analysis is often the wrong model for analyzing bipolar concepts, and what model to use instead”. Applied Psychological Measurement, 18, 97-110 30