AN INTRODUCTION TO FACTOR ANALYSIS Philip Hyland University of Ulster philipehyland@gmail.com www.philiphyland.webs.com Workshop Outline 1. General Introduction to Factor Analysis 2. Exploratory Factor Analysis 3. Confirmatory Factor Analysis 4. Bi-Factor Modelling FACTOR ANALYSIS • Learning statistics can be stressful. • Latent Variable Modelling, Structural Equation Modelling, Factor Analysis, Path Analysis.......Ahhhhhhh! • No equations – No maths FACTOR ANALYSIS • You probably already know everything you need in order to in order to understand factor analysis. • It’s all about correlation. • Hands on learning is what it takes! LETS TALK ABOUT STRESS • Stress is well known to be associated with poor health. • Cancer, heart disease, poor immune functioning. • Little was known about the mechanisms by which stress affects the body physically. • Stress affects the body at a cellular level – telomere length & telomerase activity. Epel, E. S. et al. (2004). Accelerated telomere shortening in response to life stress. PNAS, 101, 17312-17315. Epel, E. S. et al. (2004). Accelerated telomere shortening in response to life stress. PNAS, 101, 17312-17315. STRESS & TELOMERE DEGRADATION Perceived Psychological Stress r = -.31, p < .01 Perceived Psychological Stress r = -.24, p < .01 Telomere Length Telomerase Activity Epel, E. S. et al. (2004). Accelerated telomere shortening in response to life stress. PNAS, 101, 17312-17315. LETS TALK ABOUT STRESS • Incredible findings! - Our psychological state appears to affect our physiology at the most basic level. • A limitation with this study! • You will understand what it is by the end of this lecture • Clue – It has to do with how we measure psychological variables! Epel, E. S. et al. (2004). Accelerated telomere shortening in response to life stress. PNAS, 101, 17312-17315. THE HARDEST SCIENCE • “Psychology is the really hard science” ~ Michael Schermer • Why is this so? • Psychologists are interested in things that are not directly observable. • How can we possibly study that which we cannot directly see? • The answer lies in factor analysis! http://www.michaelshermer.com/2007/10/really-hard-science/ FACTOR ANALYSIS WHAT IS IT ALL ABOUT? KEEPING THINGS SIMPLE • Factor analysis is all about simplification. • A method that allows us to understand large quantities of observable variables in terms of a smaller number of unobservable variables. • Like it or not, we are all factor analysts! WHAT’S PETER GRIFFIN LIKE? KEEPING THINGS SIMPLE • • • • Well, he sticks his tongue in fans. He mixes his cereal with red bull. He’s loiters in the wrong places. He wires his nipples to jumper leads. These are all directly observable phenomena. It might be easier just to say he’s stupid! Stupidity is a latent variable. KEEPING THINGS SIMPLE • We’ve just conducted a factor analysis! • We explained a range of observable characteristics in terms of something simpler which isn’t directly observable. • The thing to notice is that all of these individual observable characteristics seem to be highly related to each other. KEEPING THINGS SIMPLE • A more psychological example! • How has John been feeling recently? • He feels sad all the time • He talks of committing suicide • Lost interest in activities he use to enjoy • No motivation • In other words, he is depressed. STUDYING THE INVISIBLE • As psychologists we are interested in studying the human mind. • We are usually interested in studying elusive phenomena – anxiety, psychological stress, social identity, irrationality, personality etc. • All unobservable constructs – factor analysis is the psychologists most important tool. COVARIATION • Whether we are talking as regular people or as stuffy statistically-minded psychologists the method of factor analysis is identical – it’s all about covariation! • What’s covariation? It’s about the level of association between a set of variables! • A correlation coefficient is a standardised covariance. COVARIATION • The relationships that we are interested in when it comes to Factor Analysis are the relationships between the latent variable (e.g. Psychopathy) and the observed variables employed to measure the latent construct. • Factor Loadings (and measurement error) • We estimate these relationships (latent to observed) by looking at the correlations among observed variables. FACTOR LOADINGS & ERROR • The relationship between the observed indicators of Psychopathy and the latent construct is expressed in terms of a regression coefficient – known as a “factor loading”. LV • Why a regression coefficient and not a correlation coefficient? • The FA model assumes that the latent variable influences or determines the nature of the observed indicators. • As Psychopathy intensifies the levels of endorsement for every given observable indicator should increase. .8 OV COVARIATION • An observed variable can take many forms: an indicator on a self-report measure, a score on a test, a physiological measure, reaction time etc. • Psychologists tend not to distinguish between an observed variable and the latent variable. • Total score on the Psychopathy Checklist is considered equal to the true score. • Why is this a concern? It has to do with measurement error. MEASUREMENT ERROR • The observed level of Psychopathy is extremely unlikely to be a perfect representation of the true level of Psychopathy. • Self-report measures are fallible, imperfect methods of capturing the psychological construct under investigation. • Observed scores will be related to the true level of that variable but it will hardly be perfect. • Not a problem in the physical/”hard” sciences when all you deal with is observed variables. MEASUREMENT ERROR • Measurement error is comprised of two forms: random error and systematic error. • Random error is that which occurs due to chance or innocuous factors – lack of concentration, forgetting that 1 is strong endorsement and actually circling a 5. • Systematic error is the result of the particular indicator tapping into some other variable inadvertently – borderline personality disorder. MEASUREMENT ERROR • Imperfect measurement means that our observable indicators are not only measuring the construct we are interested in, but they are also measuring things we are not interested in. • Measurement error has the consequence of reducing the true correlation that exists between two variables. • Measurement error can never artificially increase the correlation between two variables, only decrease it. FACTOR ANALYSIS • Factor Analysis involves estimating the relationship between the observed indicators and the latent variable by determining the covariation among observable indicators. • The variation among observable indicators can be due to two factors: 1. The influence of a latent variable - Psychopathy 2. Other unwanted factors – Measurement error • These unwanted factors are independent of (or unrelated to) the latent variable. LATENT VARIABLES HOW DO WE MEASURE THEM? LATENT VARIABLES • When psychologists seek to measure an unobserved variable, we generally try to capture that variable using multiple indicators. • Unlikely on theoretical grounds that a single question can capture the complexity of a psychological construct (psychopathy, social anxiety etc.). • Methodologically it is also preferable to use multiple indicators because it allows for greater reliability. • With numerous indicators we can obtain greater confidence that the intended latent construct is being reliably measured. LV x1 x2 x3 x4 LATENT VARIABLES • So where are we? We’ve determined the unobserved psychological construct that we are interested in measuring and we have carefully selected a number of directly observable variables that we believe will effectively capture that latent construct. Now for the FA! • FA is simply about estimating the strength of the relationships from the latent variable to each of the indicators (factor loadings) and estimating the amount of variation in the observable indicator not explained by the latent variable (measurement error). • Remember I said if you understand correlation, you understand factor analysis. Here’s why. FACTOR LOADINGS • Factor loading can range from +/- 1 just like in a correlational analysis. • The closer to 1 the better - higher factor loadings demonstrate a higher degree of association between the latent variable and that indicator. • More of the variance in responses to that indicator is attributable to the latent factor than to measurement error. • Simply put, high factor loadings signify that the indicator is effectively capturing the construct we are most interested in. FACTOR LOADINGS • If we had a factor loading of .80 it means 64% of variance in responses to that indicator is attributable to the latent variable. • The remaining variance is due to either systematic or random sources of variation • Factor loadings above .6 are desirable (Hair, Anderson,Tatham, & Black, 1998) • Factor loadings > .4 are acceptable Hair, J. F., Jr., R. E. Anderson, R. L. Tatham, & W. C. Black (1998). Multivariate Data Analysis with Readings, 5th Edition. Englewood Cliffs, NJ: Prentice Hall. FACTOR LOADINGS • Based on this notion of variance explained Comery and Lee (1992) have proposed the following conventions. • 0.32 = 10% Variance Explained Poor • 0.45 = 20% Variance Explained Fair • 0.55 = 30% Variance Explained Good • 0.63 = 40% Variance Explained Very Good • 0.71+ = 50%+ Variance Explained Excellent Comery, A. L., & Lee, H. B. (1992). A first course factor analysis (2nd ed.). Routledge: London.. CONCLUSION • Factor analysis being about simplification is an invaluable tool to the scientifically minded psychologist. • The goal of science is to develop testable theories to explain natural phenomena. • Our models or theories that explain complex observable phenomena need to be parsimonious. • We want to explain as much about that complex variable as we can with the simplest model possible. CONCLUSION • Factor analysis allows for the development of parsimonious theoretical models. • By simplifying large amounts of data into fewer and more meaningful variables. • Factor analysis also facilitates more accurate assessments of relationships between variables. • Dose this by creating latent variables which take into account measurement error. CONCLUSION • Back to our study at the start of this lecture. • How might this study have been improved? Perceived Psychological Stress r = -.31, p < .01 Telomere Length • Perceived psychological stress measured as an observed variable? • What if we created a latent variable? • What might the effect be on the relationship between these two variables? Thank you for your time! Questions? Philip Hyland philipehyland@gmail.com www.philiphyland.webs.com