FACTOR ANALYSIS:

advertisement
CONTINUING PROFESSIONAL DEVELOPMENT: INTRODUCTION TO RESEARCH STATISTICS
MULTIVARIATE STATISTICS: Factor Analysis
These notes provide the technical background to Factor Analysis and
accompany the material covered in the final session, which describes the
conceptual aspects of the technique.
1. Introduction
Exploratory Factor Analysis (EFA) is a technique which allows us to reduce a
large number of correlated variables to a smaller number of ‘super variables’. It
does this by attempting to account for the pattern of correlations between the
variables in terms of a much smaller number of latent variables or factors. A
latent variable is one that cannot be measured directly, but is assumed to be
related to a number of measurable, observable, manifest variables. For
example, in order to find out about extraversion (latent variable) we ask
questions about how many friends people have, how much they enjoy going to
social gatherings and what kind of hobbies they enjoy (manifest variables).
These factors can be either orthogonal (independent and uncorrelated) or
Oblique (they are correlated and share some variance between them). EFA is
used when we want to understand the relationships between a set of variables
and to summarise them, rather than whether one variable has a significant effect
on another.

Example
In the field of personality, a questionnaire might contain several hundred
questions. How people respond to these questions is thought to be governed by
five or so underlying factors, or ‘traits’ as they are called by personality
theorists. According to the ‘Big Five Theory of Personality’ (Costa & McCrae,
1980), these factors are neuroticism, extraversion, agreeableness, openness to
experience and conscientiousness, which are supposed to reflect major traits,
each of which is a bundle of several specific facets of behaviour. The advantage
of grouping variables together in this way is that instead of having to measure
and consider several hundred aspects of behaviour, we may choose a smaller
representative subset of indicators for each factor. This achieves considerable
economy, and makes any description of behaviour more efficient and easier to
understand.

Basic Issues
If you ever come to use EFA in serious research, then you will have to study it
in considerably more detail than is reported here. Factor Analysis is not just a
single technique, but part of a whole family of techniques which includes
Principal Components Analysis (PCA), Exploratory Factor Analysis (EFA –
we’ll be covering this technique in some detail) and Confirmatory Factor
Analysis (CFA – a more advanced technique used for testing hypotheses). There
are also many different methods of identifying factors, called ‘extracting’, for
example; Principal Axis Factoring (PAF – we’ll concentrate on this method
here) or Maximum Likelihood Factoring. Furthermore, there are ways that we
can ensure that we get as good a ‘fit’ for our factors as possible, what is known
as rotation. What you should get out of this course is a general understanding
of EFA: what it is, when it is appropriate to use it, what kinds of data are
suitable, how big a ratio of participants to variables is required, what makes a
‘good’ solution (there are no ‘right’ solutions, just good and bad ones), and how
to interpret the SPSS output. But be warned – EFA is a vast and controversial
topic and there are contradictory views on almost every aspect of it!
One more important point to make is this: it is very easy to use EFA once you
have mastered the SPSS commands, but the same rule applies here as in other
complex methods of analysis – GIGO – ‘garbage in, garbage out’. A researcher
should never just throw a whole load of variables into EFA and expect to get
something sensible out of it. You must always start with a set of variables that
adequately sample some theoretical domain, and which contains enough items
to ‘anchor’ any factors that you hope to reveal. Keep this in mind when you are
reading research articles!
2. Provisional answers to questions

What is exploratory FA?
As described above, EFA is a technique used to summarise or describe a large,
complex group of variables, using a relatively small number of dimensions or
latent variables. These latent variables represent the relationships between sets
of interrelated manifest variables. In fact, EFA involves a statistical model,
which attempts to reproduce the observed correlation matrix. The differences
between the observed and predicted correlations are called residuals, a term
CPD Introduction to Research Statistics: Factor Analysis notes



you should be familiar with from the Multiple Regression part of today’s
course. In a good solution, these residuals should be small.

When should EFA be used?
We should only use EFA when there are good theoretical reasons to suspect that
some set of variables (eg: questionnaire or test items, such as personality scale
statements) will be represented by a smaller set of latent variables. There should
be a good number of substantial correlations in the correlation matrix; otherwise
EFA may not succeed in finding an acceptable solution (what makes an
acceptable solution is described later). There are various ‘rules of thumb’ about
what the ratio of cases (participants) to variables (eg: questionnaire items)
should be, ranging from 5:1 to 10:1. Large samples are needed to obtain a
‘stable’ solution, but there is no absolute criterion for deciding what is ‘large’.
Estimates range from 150 (minimum) to 500 when there are 40+ variables
(Cliff, 1982). It depends to some extent on the specific aims of the analysis, the
properties of the data, the number of factors extracted, the size of the
correlations and so on (Wolins, 1982).
Extract factors from the correlation matrix (SPSS does this for you)
Rotate factors to improve interpretation (SPSS will do this)
Interpret results (you must do this yourself)
3. Output
One feature of EFA in SPSS is that you get a series of output matrices that give
different kinds of information. Depending on the method of extraction used, you
may also get the results of some statistical tests on the correlation matrix that
was derived from the data. Here are some explanations of some of the terms and
output.

Correlation matrix
A matrix that lists the correlations between each pair of variables in the
analysis. Every value appears twice (as it must – think about this!) and gives the
degree to which each variable is associated with every other variable.

Factor
A latent variable that analysis (using SPSS) has identified as describing a
significant proportion of the variance in the data. A large number of variables
may contribute to the effectiveness of a particular factor in describing this
variance.

What makes a ‘good’ solution?
One kind of ‘good’ solution is one which makes psychological sense in terms of
some developed or developing theory. It is one that helps the researcher to
understand the patterns in the data. It helps if each variable has a high loading
(>0.3) on a single factor and low (or zero) loadings on all the others (see
examples, later). This ideal is known as the simple structure, and there are
several methods of rotating the initial solution so that a simple structure is
obtained.

Factor loadings
A factor loading is the degree to which every variable correlates with a factor.
This is an important concept, as all the following terms are calculated from the
table of factor loadings. If a factor loading is high (above 0.3) or very high
(above 0.6), then the relevant variable helps to describe that factor quite well.
Factor loadings below 0.3 may be ignored.

How do we perform EFA?
You will be carrying out EFA on a number of different datasets. We have
already looked at some personality variables, and a set of data from this
questionnaire will be used in the first practical class. The steps in performing
EFA are as follows:



CPD Factor Analysis.doc

Communalities
In EFA, there are several kinds of variance. Any given variable (eg:
questionnaire item responses) will have some variance that it shares with the
factors and this is called its communality, a value between 0 and 1 that is
inserted in the diagonal of the correlation matrix which SPSS derives from the
data and on which the analysis is performed. As a property of a particular
variable, a very low communality (eg: 0.002) indicates that the variable has so
Select and measure variables (test scores, questionnaire items, etc.)
Prepare the correlation matrix (SPSS does this for you)
Determine the number of factors (SPSS has a default method for this)
2
CPD Introduction to Research Statistics: Factor Analysis notes
CPD Factor Analysis.doc

little in common with the other variables in the dataset, that it is not worth
having it in the analysis. The programme starts out with the estimates of
communalities and then keeps re-running the analysis (called ‘iterating’),
adjusting these values until it cannot improve the fit of the factors to the patterns
in the data. The sum of the communalities is the variance in the data that is
distributed among the factors. This figure is always less than the total variance
in the dataset, because the ‘unique’ and ‘error’ variance are omitted, so that a
linear combination of the factors will not reproduce exactly the original scores
on the observed variables (or ‘indicators’).
Rotated factor matrix
Because all of the variables (‘indicators’) have some degree of association with
all the factors, there is an infinite number of ways that these associations could
be represented. Therefore, no factor solution gives you ‘the truth’ – a single
definitive answer about how best to represent the patterns of relationships in the
data. This is often referred to as the problem of ‘factor indeterminacy’. Rotating
the factors is simply a way to distribute the factor loadings in such a way as to
make the job of interpreting the ‘meaning’ of the factors easier. The aim is to
ensure that each variable (indicator) loads highly on only one factor, thus
ensuring a simple structure (see earlier).

Eigenvalues
An eigenvalue is equal to the sum of the squared factor loadings for a particular
variable on the factor with which the eigenvalue is associated. In simple terms,
the larger the eigenvalue, the larger the proportion of variance in the data
accounted for by that factor. The plot of eigenvalues against the number of
factors (scree plot) was proposed by Cattell (1966) as an aid in deciding on the
optimum number of factors to extract. Deciding how many factors will best
represent the patterns of correlations in the data is one of the main problems in
EFA because, although SPSS has a default method for doing this, it is really up
to the analyst to decide. By default, the programme will only extract
eigenvalues greater than 1, but this is not always the optimum. If the scree plot
shows a clear distinction between the first few factors and the rest (which pile
up like the rubble at the bottom of a steep slope, hence ‘scree’ plot), then this is
an indication that the first few are the ones that matter, and the analysis can be
re-run with only this number of factors requested. Sometimes there are good
theoretical reasons to expect that the data can be adequately represented by a
certain number of factors, and the analyst can specify this number to be
extracted.

Interpreting and naming factors
In order to interpret the factors, you have to see which collection of variables
loads most highly on a factor, and then see what that set of items have in
common. For example, if the items are questionnaire statements, then you could
see what it is that the statements are asking, what they all have in common and
interpret the factor accordingly. This helps us to decide what a factor represents.
A positive loading (eg: 0.52) will indicate a positive relationship with the factor,
whereas one with a negative sign will suggest an inverse relationship. As a ‘rule
of thumb’, regardless of whether they are positive or negative, we consider
loadings above 0.6 to be very high, above 0.3 to be high, and less than 0.3 to be
irrelevant and thus ignored. Having decided what each factor might represent,
we can then assign suitable names or labels to them.

Orthogonal vs Oblique rotation
In an orthogonal rotation (eg: ‘varimax’), it is assumed that factors are not
correlated with one another, and therefore the axes (in the solution) are kept at
90 degrees to each other. However, for many concept domains in psychology, it
is more likely that factors will be correlated with one another to some degree.
For example, your beliefs about equality may be correlated with your beliefs
about taxation. If you think that the factors may be correlated, you can ask for
an oblique rotation (eg: ‘oblimin’), rather than an orthogonal one. Oblique
solutions are a little harder to interpret because you have to take into account the
correlations between factors when looking at the factor loadings. However, for
some datasets, the two types of rotation lead to similar results, and when that is

Factor matrix
The values in this matrix are for unrotated factors. This matrix is also known as
the ‘factor pattern’ matrix, consisting essentially of coefficients/weights for the
regression of each variable on the factors. This is not the matrix that we usually
use to interpret the factors.
3
CPD Introduction to Research Statistics: Factor Analysis notes
CPD Factor Analysis.doc
easy to talk to people in Halls”, Y1 “I’m quite good at maths”, Y2 “I really
enjoy writing essays” and Y3 “My favourite place on campus is the Library”).
After entering the data from 100 respondents, an EFA is performed and the
following output produced; first, a table of communalities (including the
starting values, or ‘initial estimates’, and the final values for the solution.
the case, it is easier to report the results of the orthogonal rotation. The main
thing to bear in mind is that in an orthogonal rotation you interpret the rotated
factor matrix, whereas in an oblique rotation, people usually base their
interpretation on the pattern matrix.

Residuals
Whenever you simplify ‘the story’ that the data are telling you, you ‘throw
away’ some of the information in the dataset. Since EFA aims to explain all the
may correlations between variables in a simpler way, there will inevitably be
some variance left over that is unexplained. The usefulness of a factor solution
can be assessed by how well it manages to reproduce the original values in the
data. Two more pieces of SPSS output will give you this information: the
reproduced correlation matrix and a matrix of residuals (what is left after the
factors have been extracted). A good fit is indicated when the residuals are
small (say, less than 0.05).
4. How many factors?
Determining the number of factors to extract and rotate is not always easy. If
too few are extracted, then the factors may be too ‘broad brush’ to be useful; if
too many are extracted, then factors can start to break up into useless fragments.
None of the communalities is too small (all are above about 0.05)
There are a number of measures of ‘goodness-of-fit’ but the ultimate test of a
good solution is whether:



it captures a reasonable amount of variance (all factors together)
it has a relatively simple structure (each item loading mainly on only one
of the factors)
it can be interpreted in the light of the theory that motivated the research
in the first place.
5. A simple example
In the practical sessions, you’ll be using the menus in SPSS to select variables
from the dataset and commands to run an EFA. However, in order to introduce
you to the SPSS output, the following example is from an imaginary dataset that
gave the output below. There are six items in a questionnaire on self-concept
(X1 “I enjoy parties”, X2 “I’d rather share a flat than live alone”, X3 “I find it
The scree plot confirms that no more than two factors are needed to represent
the data well.
4
CPD Introduction to Research Statistics: Factor Analysis notes
CPD Factor Analysis.doc
Having identified the variables that correlate with one another (identified by the
EFA procedure), you could now add up the scores for each respondent on each
of the three variables for the first factor (ie: the values for Y1, Y2 and Y3 in the
original, raw data). If we look at the nature of the items, we can see that the first
factor represents ‘social’ aspects of self-concept. Doing the same for the second
factor (adding together an individual respondent’s scores on the three X items)
gives an overall score of ‘academic self-concept’ for that respondent.
EXERCISES
The following exercises are to done over the two practical sessions. You should
be familiar with some of the early procedures. For the EFA procedures, please
refer to the notes earlier in this handout.
Remember that we don’t usually try to interpret the Factor Matrix because the
factors do not usually look very clear at this stage. However, for this particular
‘made-up’ example, we can see that two factors have clearly emerged, with the
X variables loading on Factor 2 and the Y variables loading on Factor 1
6. Saving a copy of the data file
To start with, you should save a copy of the file ‘CPD-driving.sav’ into your
file space. You can find ‘CPD-driving.sav’ by visiting the web-page at:
http://www.ex.ac.uk/~cnwburge



Go to the ‘Teaching’ section of the page and click on “CPD Statistics”
You will find other relevant files on this page (including the
questionnaire from which this data set is derived and copies of all the
slides and handouts for the ‘Introduction to Research Statistics’ courses),
which you are free to download
Save the file to a folder on your PC by clicking with the right mouse
button and selecting “save target/file as”.
Whenever you need the file again, you now have a copy from which to work.
7. Exploring the data set
[i]
Cross-tabulation
Before you start any kind of analysis of a new data set, you should explore the
data so that you know what the variables are and what each number actually
As you can see, very little rotation has been needed to clarify the nature of the
factors – the loadings have hardly been changed (redistributed) at all.
5
CPD Introduction to Research Statistics: Factor Analysis notes
means. If you move the cursor to the grey cell at the top of a column, a label
will appear, telling you what the variable is.
Comparing the expected count with the observed count will tell you whether or
not there is a higher observed frequency than expected in that particular cell.
This will then tell you where the significant differences lie.
The variables in the file are as follows: gender, age, area in which respondent
lives, whether respondent drives for a living, length of time respondent has held
a driving licence (in years and months), annual mileage, preferred speed on a
variety of different roads at day and at night (motorways, dual carriageways, Aroads, country lanes, residential roads and busy high street) and finally, a series
of scores (var01–var20) relating to items on a personality trait inventory.
Using the Crosstabs procedure, how many female respondents live in a rural
area, and what percentage of the total sample do they make up? (10, 4.6%).
How many male respondents are between the ages of 36 and 40? What
percentage of the total sample do they constitute? (35, 16.1%).
[ii]
Creating a scale
You might want to sum people’s scores on several items to create a kind of
index of their attitude, for example; we know that some of the personality
inventory items in the data set relate to the Thrill-Sedate Driver Scale
(Meadows, 1994). These items are numbered 7 to 13 in the questionnaire
(‘CPD-driving_questionnaire.doc’) and var7 to var13 in the data set. How do
we create a single scale score?
You can use the descriptives and frequencies commands to do investigate the
data, but they cannot tell you everything. If we wanted to find out how many
women there are in the dataset who live in rural areas, we must use a Crosstabs
(cross-tabulation) command:

Click on Analyze in the top menu, then select Descriptive Statistics,
and click on Crosstabs.
 Select the two variables that you want to compare (in this case gender
and area), put one in the Row box, and one in the Column box.
 Click on Statistics, and check the Chi-Square box. Click on Continue.
 Click on OK.
The output tells us how many men and women in the data set come from each
type of area, and the chi-square option tells us whether there are significantly
different numbers in each cell. However, it is not clear where these differences
lie, so:



CPD Factor Analysis.doc
First of all, some of the items may have been counterbalanced, so we have to
reverse the scoring on these variables before we add them together to give a
single scale score. Currently, a high score may indicate strong positive tendency
on some of the items, whereas the opposite is true of other items. We need to
ensure that scores of '5' represent the same tendencies throughout the scale (in
this case, high ‘Thrill’ driving style), so that the items scores may be added
together to create a meaningful overall score.
Missing values
Make sure before computing a new variable like this that you have already
defined missing values, otherwise these will be included in the scale score. For
example, you would not want the values ‘99’ for ‘no response' included in your
scales, so defining these as missing will mean that that particular respondent
will not be included in the analysis.
Click on Analyze in the top menu, then select Descriptive Statistics,
and click on Crosstabs (so long as you haven’t done anything else since
the first Crosstabs analysis above, the gender and area variables should
still be in the correct boxes, if not move them into the row and column
boxes and Click on Statistics, and check the Chi-Square box. Click on
Continue).
Click on Cells and check the ‘Expected’ counts box (also, try selecting
the ‘Row’, ‘Column’ and ‘Total’ percentage boxes). Click on
Continue.
Click on OK.




6
Double-click on the grey cell at the top of the relevant column of data
Click on Missing Values
Select Discrete Missing Values and type in 99 into one of the boxes
Click on Continue and OK
CPD Introduction to Research Statistics: Factor Analysis notes
Recoding variable scores
It is usually fairly clear which items need to be recoded. If strong agreement
with the item statement indicates positive tendency, then that item is okay to
include in the scale without recoding. However, if disagreement with the
statement indicates positive tendency, that item's scores must be recoded.
Looking at the actual questions in the questionnaire, it is clear that items var07,
var08, var09 and var10 should all be recoded (var11-var13 are okay, because
strong agreement implies a high ‘Thrill’ driving style).
Now take a look at your new variable (it will have appeared in a column on the
far right of your data sheet – get a ‘descriptives’ analysis on it. You should find
that the maximum and minimum values make sense in terms of the original
values. The seven ‘Thrill-Sedate’ items are scored between 1 and 5, so there
should be no scores lower than 7 (ie: 1 x 7) and none higher than 35 (ie: 5 x 7).
If there are scores outside these limits, perhaps you forgot to exclude missing
values.
[iii]
Checking the scale’s internal reliability
Checking the internal reliability of a scale is vital. It assesses how much each
item score is correlated with the overall scale score (a simplified version of the
correlation matrix that I talked about in the lecture).
Follow these steps to recode each item and then compute a scale composed of
all item variables:




Go to Transform… …Recode… …Into different variables and select
the first item variable (var07) that requires recoding
Give a name for the new recoded variable, such as var07r and label it as
'reversed var07'
Set the new values by clicking old and new values and entering the old
and new values in the appropriate boxes (adding each transformation as
you go along). So you finish up with 15; 24; 33; 42; and 51
Click Continue and then change and check that the transformation has
worked by getting a frequency table for the old and new variables –
var07 and var07r. Have the values reversed properly? If not, then you
may need to do it again!
To check scale reliability:





Follow the same procedure for the other items in the scale that need to be
reversed
Click on Transform… …Compute and typing a name for the scale
(e.g.: "Thrill") in the Target variable box and type the following in the
'numeric expression' box:
var07r + var08r + var09r + var10r + var11 + var12 + var13

Click on Analyze… …Scale… …Reliability Analysis
Select the items that you want to include in the scale (in this case, all the
items between var07 and var13 that didn’t require recoding in the earlier
step, plus all the recoded ones – in other words, those listed in the
previous ‘scale calculation’ step), and move them into the Items box.
Click on Statistics
Select ‘Scale if item deleted’ and Inter-item ‘Correlations’
Click on Continue… …OK
In the output, you can first see the correlations between items in the proposed
scale. This is just like the correlation matrix referred to in the lectures.
Secondly, you will see a list of the items in the scale with a certain amount of
information about the items and the overall scale. The statistic that SPSS uses to
check reliability is Cronbach’s Alpha, which takes values between zero and 1.
The closer to 1 the value, the better, with acceptable reliability if Alpha exceeds
about 0.7.The column on the far right will tell us if there are any items currently
in the scale that don’t correlate with the rest. If any of the values in that column
exceed the value for ‘Alpha’ at the bottom of the table, then the scale would be
better without that item – it should be removed from the scale and the
Reliability Analysis run again. For this example, you should get a value for
Alpha of 0.7793, with none of the seven items requiring removal from the scale.
Scale calculation
Once you have successfully reversed the counterbalanced item variables, you
can compute your scale.

CPD Factor Analysis.doc
Click on OK
7
CPD Introduction to Research Statistics: Factor Analysis notes
each successive factor. In this example, the cumulative variance explained by
the first four factors is 52.4%. You can ignore the remaining columns.
8. Factor analysis of ‘CPD-Driving.sav’
The Scree plot is displayed next. You can see that in this example, although
four factors have been extracted (using the SPSS default criteria – see later), the
scree plot shows that a 3 factor solution might be better – the big difference in
the slope of the line comes after three factors have been extracted. You can see
this more clearly if you place a ruler along the slope in the scree plot. The
discontinuity between the first three factors and the remaining set is clear – they
have a far ‘steeper’ slope than the later factors. Perhaps three factors may be
better than four? See section [iii] p.9 for further discussion of this issue.
[i]
Orthogonal (Varimax) rotation (uncorrelated factors)
An orthogonal (varimax) analysis will identify factors that are entirely
independent of each other. Using the data in CPD-Driving.sav we will run a
factor analysis on the personality trait items (var01 to var20).
Use the following procedure to carry out the analysis:









CPD Factor Analysis.doc
Analyze… …Data reduction… …Factor
Select all the items from var01 to var20 and move them into the
Variables box
Click on Extraction
Click on the  button next to the Method box and select Principal Axis
Factoring from the drop-down list
Make sure there is a tick in the Scree Plot option
Click on Continue
Click on Rotation, select Varimax (make sure the circle is checked)
Click on Options, select Sort by size and Suppress absolute values less
than 0.1 and then change the value to 0.3 (instead of 0.1)
Click on Continue… …OK.
Next comes the factor matrix, showing the loadings for each of the variables
on each of the four factors. Remember that this is for unrotated factors, so move
on to look at the rotated factor matrix below it, which will be easier to
interpret. Each factor has a number of variables which have higher loadings, and
the rest have lower ones. Remember that we have asked SPSS to ‘suppress’ or
ignore any values below 0.1, so these will be represented by blank spaces. You
should concentrate on those values greater than 0.3, as any lower than this can
also be ignored. To make things easier, you could go back and ask SPSS to
suppress values less than 0.3 – that will clean up the rotated factor matrix and
make it easier to interpret.
Finally comes the factor rotation matrix, which can also be ignored (it simply
specifies the rotation that has been applied to the factors).
Output
First, you have the Communalities. These are all okay, as there are none lower
than about 0.2 (anything less than 0.1 should prompt you to drop that particular
variable, as it clearly does not have enough in common with the factors in the
solution to be useful. If you drop a variable, you should run the analysis again,
but without the problem variable).
[ii]
Correlated factors – oblique (oblimin) rotation
You may have noticed that some of the questions in the questionnaire seem to
measure similar things (for example, ‘the law’ is mentioned in variable items
that do not appear to load heavily on the same factor). Two or more of the
factors identified in the last exercise may well correlate with one another, as
personality variables have a habit of doing. An orthogonal analysis may not be
the most logical procedure to carry out. Using the data in Driving01.sav we will
run an oblique factor analysis on the personality trait items (var01 to var20),
which will identify factors that may be correlated to some degree.
The next table displays the Eigenvalues for each potential factor. You will have
as many factors as there were variables to begin with, but this does not result in
any kind of data reduction – not very useful. The first four factors have
eigenvalues greater than 1, so SPSS will extract these factors by default (SPSS
automatically extracts all factors with eigenvalues greater than 1, unless you tell
it to do otherwise). In column 2 you have the amount of variance ‘explained’
by each factor, and in the next column, the cumulative variance explained by
8
CPD Introduction to Research Statistics: Factor Analysis notes
CPD Factor Analysis.doc
We may have an idea of how many factors we should extract – the scree plot
can give some heavy hints (as mentioned earlier). The scree plot is not exact –
there is a degree of judgement in drawing these lines and judging where the
major change in slope comes, but with larger samples it usually pretty reliable.
Use the procedure described above, but when you click on the Rotation button,
instead of checking the Varimax option, check the Direct Oblimin option
instead. Compare the output from this analysis with the output from the varimax
analysis. The first few sections will look the same, because both analyses use
the same process to extract the factors. The difference is once the initial solution
has been identified and SPSS rotates it, in order to clarify the solution (by
redistributing the variance across the factors). Instead of a rotated factor matrix,
you will have ‘pattern’ and ‘structure’ matrices.
I reckon that three factors would lead to a better solution than four, so try
running the analysis again, but this time specify that you want a 3 factor
solution by specifying the Number of factors to extract as 3 in the Extraction
options window. The solution fits quite well, with all variable items loading
quite high on only one factor, while still explaining a fairly high proportion of
variance (46.8%) thus revealing a good simple structure.
Look at the factors and the loadings in the pattern matrix (concentrate on
loadings greater than +/- 0.3). Do they look the same as the varimax solution?
One thing that has changed is that, although the factors look similar, the
loadings will have changed a bit, and not all load in the same way as before. For
example, var02 (“These days a person doesn’t really know quite who he can
count on”) no longer loads on factor 3, but only on factor 4. How does this
change the interpretation of factors 3 and 4?
9. Where to go from here?
Once the EFA procedures have identified the variables that load on each factor,
we can use the scale-building and reliability procedures described earlier in
these exercises to produce internally-reliable scales which we may then use to
describe differences between people. You could use an unrelated samples t-test
to look at the differences between genders in the factor 2 ‘Thrill-sedate’ items,
or ANOVA to investigate differences between age-groups in the factor 1 ‘future
orientation’ items.
Finally is the factor correlation matrix. In a varimax solution, you can usually
ignore the plus or minus signs in front of the factor loadings, because the factors
are entirely independent of one another. However, in oblique (oblimin)
analyses, we have to take these into account because the factors correlate with
one another, and therefore we need to know if there is a positive or a negative
relationship between factors. The relationship between correlated factors must
inherently take into account the sign of the loadings. In this example, the
negative correlations are so small as to be unimportant (correlations less than
0.1 are usually non-significant), and so this is not an issue. However, you
should be aware that this may not always be the case. It may seem confusing at
first, but working out the logic behind the relationships between factors makes
sense when you look at the variable items that represent the factors.
Cris Burgess (2006)
[iii]
Extracting a specific number of factors
Up to now, you have been letting SPSS decide how many factors to extract, and
it has been using the default criterion (called the ‘Kaiser’ criterion) of extracting
factors with eigenvalues greater than 1. Look at the second table in your output:
four factors have eigenvalues greater than 1, so SPSS extracts and rotates four
factors. However, this criterion doesn’t always guarantee the optimal solution.
9
Download