1. Intro To Factor Analysis

advertisement
AN INTRODUCTION TO
FACTOR ANALYSIS
Philip Hyland
University of Ulster
philipehyland@gmail.com
www.philiphyland.webs.com
Workshop Outline
1. General Introduction to Factor Analysis
2. Exploratory Factor Analysis
3. Confirmatory Factor Analysis
4. Bi-Factor Modelling
FACTOR ANALYSIS
• Learning statistics can be
stressful.
• Latent Variable Modelling,
Structural Equation
Modelling, Factor Analysis,
Path Analysis.......Ahhhhhhh!
• No equations – No maths
FACTOR ANALYSIS
• You probably already know
everything you need in order to in
order to understand factor
analysis.
• It’s all about correlation.
• Hands on learning is what it
takes!
LETS TALK ABOUT STRESS
• Stress is well known to be associated
with poor health.
• Cancer, heart disease, poor immune
functioning.
• Little was known about the
mechanisms by which stress affects the
body physically.
• Stress affects the body at a cellular
level – telomere length & telomerase
activity.
Epel, E. S. et al. (2004). Accelerated telomere shortening in response to
life stress. PNAS, 101, 17312-17315.
Epel, E. S. et al. (2004). Accelerated telomere shortening in response to
life stress. PNAS, 101, 17312-17315.
STRESS & TELOMERE DEGRADATION
Perceived Psychological
Stress
r = -.31, p < .01
Perceived Psychological
Stress
r = -.24, p < .01
Telomere Length
Telomerase Activity
Epel, E. S. et al. (2004). Accelerated telomere shortening in response to
life stress. PNAS, 101, 17312-17315.
LETS TALK ABOUT STRESS
• Incredible findings! - Our psychological
state appears to affect our physiology at
the most basic level.
• A limitation with this study!
• You will understand what it is by the end
of this lecture
• Clue – It has to do with how we
measure psychological variables!
Epel, E. S. et al. (2004). Accelerated telomere shortening in response to
life stress. PNAS, 101, 17312-17315.
THE HARDEST SCIENCE
• “Psychology is the really hard science”
~ Michael Schermer
• Why is this so?
• Psychologists are interested in things
that are not directly observable.
• How can we possibly study that which
we cannot directly see?
• The answer lies in factor analysis!
http://www.michaelshermer.com/2007/10/really-hard-science/
FACTOR ANALYSIS
WHAT IS IT ALL ABOUT?
KEEPING THINGS SIMPLE
• Factor analysis is all about
simplification.
• A method that allows us to understand
large quantities of observable variables
in terms of a smaller number of
unobservable variables.
• Like it or not, we are all factor analysts!
WHAT’S PETER GRIFFIN LIKE?
KEEPING THINGS SIMPLE
•
•
•
•
Well, he sticks his tongue in fans.
He mixes his cereal with red bull.
He’s loiters in the wrong places.
He wires his nipples to jumper leads.
These are all directly observable phenomena.
It might be easier just to say he’s stupid!
Stupidity is a latent variable.
KEEPING THINGS SIMPLE
• We’ve just conducted a factor analysis!
• We explained a range of observable characteristics
in terms of something simpler which isn’t directly
observable.
• The thing to notice is that all of these individual
observable characteristics seem to be highly related
to each other.
KEEPING THINGS SIMPLE
• A more psychological example!
• How has John been feeling recently?
• He feels sad all the time
• He talks of committing suicide
• Lost interest in activities he use to enjoy
• No motivation
• In other words, he is depressed.
STUDYING THE INVISIBLE
• As psychologists we are interested in studying the
human mind.
• We are usually interested in studying elusive
phenomena – anxiety, psychological stress, social
identity, irrationality, personality etc.
• All unobservable constructs – factor analysis is the
psychologists most important tool.
COVARIATION
• Whether we are talking as regular people or as stuffy
statistically-minded psychologists the method of factor
analysis is identical – it’s all about covariation!
• What’s covariation? It’s about the level of association
between a set of variables!
• A correlation coefficient is a standardised covariance.
COVARIATION
• The relationships that we are interested in when it
comes to Factor Analysis are the relationships
between the latent variable (e.g. Psychopathy) and
the observed variables employed to measure the
latent construct.
• Factor Loadings (and measurement error)
• We estimate these relationships (latent to observed)
by looking at the correlations among observed
variables.
FACTOR LOADINGS & ERROR
• The relationship between the observed
indicators of Psychopathy and the latent
construct is expressed in terms of a regression
coefficient – known as a “factor loading”.
LV
• Why a regression coefficient and not a
correlation coefficient?
• The FA model assumes that the latent variable
influences or determines the nature of the
observed indicators.
• As Psychopathy intensifies the levels of
endorsement for every given observable
indicator should increase.
.8
OV
COVARIATION
• An observed variable can take many forms:
an indicator on a self-report measure, a
score on a test, a physiological measure,
reaction time etc.
• Psychologists tend not to distinguish
between an observed variable and the
latent variable.
• Total score on the Psychopathy Checklist is
considered equal to the true score.
• Why is this a concern? It has to do with
measurement error.
MEASUREMENT ERROR
• The observed level of Psychopathy is
extremely unlikely to be a perfect
representation of the true level of Psychopathy.
• Self-report measures are fallible, imperfect
methods of capturing the psychological
construct under investigation.
• Observed scores will be related to the true level
of that variable but it will hardly be perfect.
• Not a problem in the physical/”hard” sciences
when all you deal with is observed variables.
MEASUREMENT ERROR
• Measurement error is comprised of two
forms: random error and systematic error.
• Random error is that which occurs due to
chance or innocuous factors – lack of
concentration, forgetting that 1 is strong
endorsement and actually circling a 5.
• Systematic error is the result of the
particular indicator tapping into some other
variable inadvertently – borderline
personality disorder.
MEASUREMENT ERROR
• Imperfect measurement means that our observable
indicators are not only measuring the construct we are
interested in, but they are also measuring things we are
not interested in.
• Measurement error has the consequence of reducing
the true correlation that exists between two variables.
• Measurement error can never artificially increase the
correlation between two variables, only decrease it.
FACTOR ANALYSIS
• Factor Analysis involves estimating the relationship
between the observed indicators and the latent variable
by determining the covariation among observable
indicators.
• The variation among observable indicators can be due
to two factors:
1. The influence of a latent variable - Psychopathy
2. Other unwanted factors – Measurement error
• These unwanted factors are independent of (or
unrelated to) the latent variable.
LATENT VARIABLES
HOW DO WE MEASURE THEM?
LATENT VARIABLES
• When psychologists seek to measure an
unobserved variable, we generally try to
capture that variable using multiple
indicators.
• Unlikely on theoretical grounds that a single
question can capture the complexity of a
psychological construct (psychopathy, social
anxiety etc.).
• Methodologically it is also preferable to use
multiple indicators because it allows for
greater reliability.
• With numerous indicators we can obtain
greater confidence that the intended latent
construct is being reliably measured.
LV
x1
x2
x3
x4
LATENT VARIABLES
• So where are we? We’ve determined the unobserved
psychological construct that we are interested in measuring
and we have carefully selected a number of directly
observable variables that we believe will effectively capture
that latent construct. Now for the FA!
• FA is simply about estimating the strength of the relationships
from the latent variable to each of the indicators (factor
loadings) and estimating the amount of variation in the
observable indicator not explained by the latent variable
(measurement error).
• Remember I said if you understand correlation, you
understand factor analysis. Here’s why.
FACTOR LOADINGS
• Factor loading can range from +/- 1 just like in a
correlational analysis.
• The closer to 1 the better - higher factor loadings
demonstrate a higher degree of association between
the latent variable and that indicator.
• More of the variance in responses to that indicator is
attributable to the latent factor than to measurement
error.
• Simply put, high factor loadings signify that the
indicator is effectively capturing the construct we are
most interested in.
FACTOR LOADINGS
• If we had a factor loading of .80 it means 64% of
variance in responses to that indicator is attributable
to the latent variable.
• The remaining variance is due to either systematic or
random sources of variation
• Factor loadings above .6 are desirable (Hair,
Anderson,Tatham, & Black, 1998)
• Factor loadings > .4 are acceptable
Hair, J. F., Jr., R. E. Anderson, R. L. Tatham, & W. C. Black (1998). Multivariate Data Analysis
with Readings, 5th Edition. Englewood Cliffs, NJ: Prentice Hall.
FACTOR LOADINGS
• Based on this notion of variance explained Comery
and Lee (1992) have proposed the following
conventions.
• 0.32 = 10% Variance Explained
Poor
• 0.45 = 20% Variance Explained
Fair
• 0.55 = 30% Variance Explained
Good
• 0.63 = 40% Variance Explained
Very Good
• 0.71+ = 50%+ Variance Explained
Excellent
Comery, A. L., & Lee, H. B. (1992). A first course factor analysis (2nd ed.). Routledge: London..
CONCLUSION
• Factor analysis being about simplification is
an invaluable tool to the scientifically
minded psychologist.
• The goal of science is to develop testable
theories to explain natural phenomena.
• Our models or theories that explain
complex observable phenomena need to be
parsimonious.
• We want to explain as much about that
complex variable as we can with the
simplest model possible.
CONCLUSION
• Factor analysis allows for the development of
parsimonious theoretical models.
• By simplifying large amounts of data into fewer and
more meaningful variables.
• Factor analysis also facilitates more accurate
assessments of relationships between variables.
• Dose this by creating latent variables which take into
account measurement error.
CONCLUSION
• Back to our study at the start of this lecture.
• How might this study have been improved?
Perceived Psychological
Stress
r = -.31, p < .01
Telomere Length
• Perceived psychological stress measured as an
observed variable?
• What if we created a latent variable?
• What might the effect be on the relationship between
these two variables?
Thank you for
your time!
Questions?
Philip Hyland
philipehyland@gmail.com
www.philiphyland.webs.com
Download