FACTOR ANALYSIS:

CONTINUING PROFESSIONAL DEVELOPMENT: INTRODUCTION TO RESEARCH STATISTICS MULTIVARIATE STATISTICS: Factor Analysis These notes provide the technical background to Factor Analysis and accompany the material covered in the final session, which describes the conceptual aspects of the technique. 1. Introduction Exploratory Factor Analysis (EFA) is a technique which allows us to reduce a large number of correlated variables to a smaller number of ‘super variables’. It does this by attempting to account for the pattern of correlations between the variables in terms of a much smaller number of latent variables or factors. A latent variable is one that cannot be measured directly, but is assumed to be related to a number of measurable, observable, manifest variables. For example, in order to find out about extraversion (latent variable) we ask questions about how many friends people have, how much they enjoy going to social gatherings and what kind of hobbies they enjoy (manifest variables). These factors can be either orthogonal (independent and uncorrelated) or Oblique (they are correlated and share some variance between them). EFA is used when we want to understand the relationships between a set of variables and to summarise them, rather than whether one variable has a significant effect on another.  Example In the field of personality, a questionnaire might contain several hundred questions. How people respond to these questions is thought to be governed by five or so underlying factors, or ‘traits’ as they are called by personality theorists. According to the ‘Big Five Theory of Personality’ (Costa & McCrae, 1980), these factors are neuroticism, extraversion, agreeableness, openness to experience and conscientiousness, which are supposed to reflect major traits, each of which is a bundle of several specific facets of behaviour. The advantage of grouping variables together in this way is that instead of having to measure and consider several hundred aspects of behaviour, we may choose a smaller representative subset of indicators for each factor. This achieves considerable economy, and makes any description of behaviour more efficient and easier to understand.  Basic Issues If you ever come to use EFA in serious research, then you will have to study it in considerably more detail than is reported here. Factor Analysis is not just a single technique, but part of a whole family of techniques which includes Principal Components Analysis (PCA), Exploratory Factor Analysis (EFA – we’ll be covering this technique in some detail) and Confirmatory Factor Analysis (CFA – a more advanced technique used for testing hypotheses). There are also many different methods of identifying factors, called ‘extracting’, for example; Principal Axis Factoring (PAF – we’ll concentrate on this method here) or Maximum Likelihood Factoring. Furthermore, there are ways that we can ensure that we get as good a ‘fit’ for our factors as possible, what is known as rotation. What you should get out of this course is a general understanding of EFA: what it is, when it is appropriate to use it, what kinds of data are suitable, how big a ratio of participants to variables is required, what makes a ‘good’ solution (there are no ‘right’ solutions, just good and bad ones), and how to interpret the SPSS output. But be warned – EFA is a vast and controversial topic and there are contradictory views on almost every aspect of it! One more important point to make is this: it is very easy to use EFA once you have mastered the SPSS commands, but the same rule applies here as in other complex methods of analysis – GIGO – ‘garbage in, garbage out’. A researcher should never just throw a whole load of variables into EFA and expect to get something sensible out of it. You must always start with a set of variables that adequately sample some theoretical domain, and which contains enough items to ‘anchor’ any factors that you hope to reveal. Keep this in mind when you are reading research articles! 2. Provisional answers to questions  What is exploratory FA? As described above, EFA is a technique used to summarise or describe a large, complex group of variables, using a relatively small number of dimensions or latent variables. These latent variables represent the relationships between sets of interrelated manifest variables. In fact, EFA involves a statistical model, which attempts to reproduce the observed correlation matrix. The differences between the observed and predicted correlations are called residuals, a term CPD Introduction to Research Statistics: Factor Analysis notes    you should be familiar with from the Multiple Regression part of today’s course. In a good solution, these residuals should be small.  When should EFA be used? We should only use EFA when there are good theoretical reasons to suspect that some set of variables (eg: questionnaire or test items, such as personality scale statements) will be represented by a smaller set of latent variables. There should be a good number of substantial correlations in the correlation matrix; otherwise EFA may not succeed in finding an acceptable solution (what makes an acceptable solution is described later). There are various ‘rules of thumb’ about what the ratio of cases (participants) to variables (eg: questionnaire items) should be, ranging from 5:1 to 10:1. Large samples are needed to obtain a ‘stable’ solution, but there is no absolute criterion for deciding what is ‘large’. Estimates range from 150 (minimum) to 500 when there are 40+ variables (Cliff, 1982). It depends to some extent on the specific aims of the analysis, the properties of the data, the number of factors extracted, the size of the correlations and so on (Wolins, 1982). Extract factors from the correlation matrix (SPSS does this for you) Rotate factors to improve interpretation (SPSS will do this) Interpret results (you must do this yourself) 3. Output One feature of EFA in SPSS is that you get a series of output matrices that give different kinds of information. Depending on the method of extraction used, you may also get the results of some statistical tests on the correlation matrix that was derived from the data. Here are some explanations of some of the terms and output.  Correlation matrix A matrix that lists the correlations between each pair of variables in the analysis. Every value appears twice (as it must – think about this!) and gives the degree to which each variable is associated with every other variable.  Factor A latent variable that analysis (using SPSS) has identified as describing a significant proportion of the variance in the data. A large number of variables may contribute to the effectiveness of a particular factor in describing this variance.  What makes a ‘good’ solution? One kind of ‘good’ solution is one which makes psychological sense in terms of some developed or developing theory. It is one that helps the researcher to understand the patterns in the data. It helps if each variable has a high loading (>0.3) on a single factor and low (or zero) loadings on all the others (see examples, later). This ideal is known as the simple structure, and there are several methods of rotating the initial solution so that a simple structure is obtained.  Factor loadings A factor loading is the degree to which every variable correlates with a factor. This is an important concept, as all the following terms are calculated from the table of factor loadings. If a factor loading is high (above 0.3) or very high (above 0.6), then the relevant variable helps to describe that factor quite well. Factor loadings below 0.3 may be ignored.  How do we perform EFA? You will be carrying out EFA on a number of different datasets. We have already looked at some personality variables, and a set of data from this questionnaire will be used in the first practical class. The steps in performing EFA are as follows:    CPD Factor Analysis.doc  Communalities In EFA, there are several kinds of variance. Any given variable (eg: questionnaire item responses) will have some variance that it shares with the factors and this is called its communality, a value between 0 and 1 that is inserted in the diagonal of the correlation matrix which SPSS derives from the data and on which the analysis is performed. As a property of a particular variable, a very low communality (eg: 0.002) indicates that the variable has so Select and measure variables (test scores, questionnaire items, etc.) Prepare the correlation matrix (SPSS does this for you) Determine the number of factors (SPSS has a default method for this) 2 CPD Introduction to Research Statistics: Factor Analysis notes CPD Factor Analysis.doc  little in common with the other variables in the dataset, that it is not worth having it in the analysis. The programme starts out with the estimates of communalities and then keeps re-running the analysis (called ‘iterating’), adjusting these values until it cannot improve the fit of the factors to the patterns in the data. The sum of the communalities is the variance in the data that is distributed among the factors. This figure is always less than the total variance in the dataset, because the ‘unique’ and ‘error’ variance are omitted, so that a linear combination of the factors will not reproduce exactly the original scores on the observed variables (or ‘indicators’). Rotated factor matrix Because all of the variables (‘indicators’) have some degree of association with all the factors, there is an infinite number of ways that these associations could be represented. Therefore, no factor solution gives you ‘the truth’ – a single definitive answer about how best to represent the patterns of relationships in the data. This is often referred to as the problem of ‘factor indeterminacy’. Rotating the factors is simply a way to distribute the factor loadings in such a way as to make the job of interpreting the ‘meaning’ of the factors easier. The aim is to ensure that each variable (indicator) loads highly on only one factor, thus ensuring a simple structure (see earlier).  Eigenvalues An eigenvalue is equal to the sum of the squared factor loadings for a particular variable on the factor with which the eigenvalue is associated. In simple terms, the larger the eigenvalue, the larger the proportion of variance in the data accounted for by that factor. The plot of eigenvalues against the number of factors (scree plot) was proposed by Cattell (1966) as an aid in deciding on the optimum number of factors to extract. Deciding how many factors will best represent the patterns of correlations in the data is one of the main problems in EFA because, although SPSS has a default method for doing this, it is really up to the analyst to decide. By default, the programme will only extract eigenvalues greater than 1, but this is not always the optimum. If the scree plot shows a clear distinction between the first few factors and the rest (which pile up like the rubble at the bottom of a steep slope, hence ‘scree’ plot), then this is an indication that the first few are the ones that matter, and the analysis can be re-run with only this number of factors requested. Sometimes there are good theoretical reasons to expect that the data can be adequately represented by a certain number of factors, and the analyst can specify this number to be extracted.  Interpreting and naming factors In order to interpret the factors, you have to see which collection of variables loads most highly on a factor, and then see what that set of items have in common. For example, if the items are questionnaire statements, then you could see what it is that the statements are asking, what they all have in common and interpret the factor accordingly. This helps us to decide what a factor represents. A positive loading (eg: 0.52) will indicate a positive relationship with the factor, whereas one with a negative sign will suggest an inverse relationship. As a ‘rule of thumb’, regardless of whether they are positive or negative, we consider loadings above 0.6 to be very high, above 0.3 to be high, and less than 0.3 to be irrelevant and thus ignored. Having decided what each factor might represent, we can then assign suitable names or labels to them.  Orthogonal vs Oblique rotation In an orthogonal rotation (eg: ‘varimax’), it is assumed that factors are not correlated with one another, and therefore the axes (in the solution) are kept at 90 degrees to each other. However, for many concept domains in psychology, it is more likely that factors will be correlated with one another to some degree. For example, your beliefs about equality may be correlated with your beliefs about taxation. If you think that the factors may be correlated, you can ask for an oblique rotation (eg: ‘oblimin’), rather than an orthogonal one. Oblique solutions are a little harder to interpret because you have to take into account the correlations between factors when looking at the factor loadings. However, for some datasets, the two types of rotation lead to similar results, and when that is  Factor matrix The values in this matrix are for unrotated factors. This matrix is also known as the ‘factor pattern’ matrix, consisting essentially of coefficients/weights for the regression of each variable on the factors. This is not the matrix that we usually use to interpret the factors. 3 CPD Introduction to Research Statistics: Factor Analysis notes CPD Factor Analysis.doc easy to talk to people in Halls”, Y1 “I’m quite good at maths”, Y2 “I really enjoy writing essays” and Y3 “My favourite place on campus is the Library”). After entering the data from 100 respondents, an EFA is performed and the following output produced; first, a table of communalities (including the starting values, or ‘initial estimates’, and the final values for the solution. the case, it is easier to report the results of the orthogonal rotation. The main thing to bear in mind is that in an orthogonal rotation you interpret the rotated factor matrix, whereas in an oblique rotation, people usually base their interpretation on the pattern matrix.  Residuals Whenever you simplify ‘the story’ that the data are telling you, you ‘throw away’ some of the information in the dataset. Since EFA aims to explain all the may correlations between variables in a simpler way, there will inevitably be some variance left over that is unexplained. The usefulness of a factor solution can be assessed by how well it manages to reproduce the original values in the data. Two more pieces of SPSS output will give you this information: the reproduced correlation matrix and a matrix of residuals (what is left after the factors have been extracted). A good fit is indicated when the residuals are small (say, less than 0.05). 4. How many factors? Determining the number of factors to extract and rotate is not always easy. If too few are extracted, then the factors may be too ‘broad brush’ to be useful; if too many are extracted, then factors can start to break up into useless fragments. None of the communalities is too small (all are above about 0.05) There are a number of measures of ‘goodness-of-fit’ but the ultimate test of a good solution is whether:    it captures a reasonable amount of variance (all factors together) it has a relatively simple structure (each item loading mainly on only one of the factors) it can be interpreted in the light of the theory that motivated the research in the first place. 5. A simple example In the practical sessions, you’ll be using the menus in SPSS to select variables from the dataset and commands to run an EFA. However, in order to introduce you to the SPSS output, the following example is from an imaginary dataset that gave the output below. There are six items in a questionnaire on self-concept (X1 “I enjoy parties”, X2 “I’d rather share a flat than live alone”, X3 “I find it The scree plot confirms that no more than two factors are needed to represent the data well. 4 CPD Introduction to Research Statistics: Factor Analysis notes CPD Factor Analysis.doc Having identified the variables that correlate with one another (identified by the EFA procedure), you could now add up the scores for each respondent on each of the three variables for the first factor (ie: the values for Y1, Y2 and Y3 in the original, raw data). If we look at the nature of the items, we can see that the first factor represents ‘social’ aspects of self-concept. Doing the same for the second factor (adding together an individual respondent’s scores on the three X items) gives an overall score of ‘academic self-concept’ for that respondent. EXERCISES The following exercises are to done over the two practical sessions. You should be familiar with some of the early procedures. For the EFA procedures, please refer to the notes earlier in this handout. Remember that we don’t usually try to interpret the Factor Matrix because the factors do not usually look very clear at this stage. However, for this particular ‘made-up’ example, we can see that two factors have clearly emerged, with the X variables loading on Factor 2 and the Y variables loading on Factor 1 6. Saving a copy of the data file To start with, you should save a copy of the file ‘CPD-driving.sav’ into your file space. You can find ‘CPD-driving.sav’ by visiting the web-page at: http://www.ex.ac.uk/~cnwburge    Go to the ‘Teaching’ section of the page and click on “CPD Statistics” You will find other relevant files on this page (including the questionnaire from which this data set is derived and copies of all the slides and handouts for the ‘Introduction to Research Statistics’ courses), which you are free to download Save the file to a folder on your PC by clicking with the right mouse button and selecting “save target/file as”. Whenever you need the file again, you now have a copy from which to work. 7. Exploring the data set [i] Cross-tabulation Before you start any kind of analysis of a new data set, you should explore the data so that you know what the variables are and what each number actually As you can see, very little rotation has been needed to clarify the nature of the factors – the loadings have hardly been changed (redistributed) at all. 5 CPD Introduction to Research Statistics: Factor Analysis notes means. If you move the cursor to the grey cell at the top of a column, a label will appear, telling you what the variable is. Comparing the expected count with the observed count will tell you whether or not there is a higher observed frequency than expected in that particular cell. This will then tell you where the significant differences lie. The variables in the file are as follows: gender, age, area in which respondent lives, whether respondent drives for a living, length of time respondent has held a driving licence (in years and months), annual mileage, preferred speed on a variety of different roads at day and at night (motorways, dual carriageways, Aroads, country lanes, residential roads and busy high street) and finally, a series of scores (var01–var20) relating to items on a personality trait inventory. Using the Crosstabs procedure, how many female respondents live in a rural area, and what percentage of the total sample do they make up? (10, 4.6%). How many male respondents are between the ages of 36 and 40? What percentage of the total sample do they constitute? (35, 16.1%). [ii] Creating a scale You might want to sum people’s scores on several items to create a kind of index of their attitude, for example; we know that some of the personality inventory items in the data set relate to the Thrill-Sedate Driver Scale (Meadows, 1994). These items are numbered 7 to 13 in the questionnaire (‘CPD-driving_questionnaire.doc’) and var7 to var13 in the data set. How do we create a single scale score? You can use the descriptives and frequencies commands to do investigate the data, but they cannot tell you everything. If we wanted to find out how many women there are in the dataset who live in rural areas, we must use a Crosstabs (cross-tabulation) command:  Click on Analyze in the top menu, then select Descriptive Statistics, and click on Crosstabs.  Select the two variables that you want to compare (in this case gender and area), put one in the Row box, and one in the Column box.  Click on Statistics, and check the Chi-Square box. Click on Continue.  Click on OK. The output tells us how many men and women in the data set come from each type of area, and the chi-square option tells us whether there are significantly different numbers in each cell. However, it is not clear where these differences lie, so:    CPD Factor Analysis.doc First of all, some of the items may have been counterbalanced, so we have to reverse the scoring on these variables before we add them together to give a single scale score. Currently, a high score may indicate strong positive tendency on some of the items, whereas the opposite is true of other items. We need to ensure that scores of '5' represent the same tendencies throughout the scale (in this case, high ‘Thrill’ driving style), so that the items scores may be added together to create a meaningful overall score. Missing values Make sure before computing a new variable like this that you have already defined missing values, otherwise these will be included in the scale score. For example, you would not want the values ‘99’ for ‘no response' included in your scales, so defining these as missing will mean that that particular respondent will not be included in the analysis. Click on Analyze in the top menu, then select Descriptive Statistics, and click on Crosstabs (so long as you haven’t done anything else since the first Crosstabs analysis above, the gender and area variables should still be in the correct boxes, if not move them into the row and column boxes and Click on Statistics, and check the Chi-Square box. Click on Continue). Click on Cells and check the ‘Expected’ counts box (also, try selecting the ‘Row’, ‘Column’ and ‘Total’ percentage boxes). Click on Continue. Click on OK.     6 Double-click on the grey cell at the top of the relevant column of data Click on Missing Values Select Discrete Missing Values and type in 99 into one of the boxes Click on Continue and OK CPD Introduction to Research Statistics: Factor Analysis notes Recoding variable scores It is usually fairly clear which items need to be recoded. If strong agreement with the item statement indicates positive tendency, then that item is okay to include in the scale without recoding. However, if disagreement with the statement indicates positive tendency, that item's scores must be recoded. Looking at the actual questions in the questionnaire, it is clear that items var07, var08, var09 and var10 should all be recoded (var11-var13 are okay, because strong agreement implies a high ‘Thrill’ driving style). Now take a look at your new variable (it will have appeared in a column on the far right of your data sheet – get a ‘descriptives’ analysis on it. You should find that the maximum and minimum values make sense in terms of the original values. The seven ‘Thrill-Sedate’ items are scored between 1 and 5, so there should be no scores lower than 7 (ie: 1 x 7) and none higher than 35 (ie: 5 x 7). If there are scores outside these limits, perhaps you forgot to exclude missing values. [iii] Checking the scale’s internal reliability Checking the internal reliability of a scale is vital. It assesses how much each item score is correlated with the overall scale score (a simplified version of the correlation matrix that I talked about in the lecture). Follow these steps to recode each item and then compute a scale composed of all item variables:     Go to Transform… …Recode… …Into different variables and select the first item variable (var07) that requires recoding Give a name for the new recoded variable, such as var07r and label it as 'reversed var07' Set the new values by clicking old and new values and entering the old and new values in the appropriate boxes (adding each transformation as you go along). So you finish up with 15; 24; 33; 42; and 51 Click Continue and then change and check that the transformation has worked by getting a frequency table for the old and new variables – var07 and var07r. Have the values reversed properly? If not, then you may need to do it again! To check scale reliability:      Follow the same procedure for the other items in the scale that need to be reversed Click on Transform… …Compute and typing a name for the scale (e.g.: "Thrill") in the Target variable box and type the following in the 'numeric expression' box: var07r + var08r + var09r + var10r + var11 + var12 + var13  Click on Analyze… …Scale… …Reliability Analysis Select the items that you want to include in the scale (in this case, all the items between var07 and var13 that didn’t require recoding in the earlier step, plus all the recoded ones – in other words, those listed in the previous ‘scale calculation’ step), and move them into the Items box. Click on Statistics Select ‘Scale if item deleted’ and Inter-item ‘Correlations’ Click on Continue… …OK In the output, you can first see the correlations between items in the proposed scale. This is just like the correlation matrix referred to in the lectures. Secondly, you will see a list of the items in the scale with a certain amount of information about the items and the overall scale. The statistic that SPSS uses to check reliability is Cronbach’s Alpha, which takes values between zero and 1. The closer to 1 the value, the better, with acceptable reliability if Alpha exceeds about 0.7.The column on the far right will tell us if there are any items currently in the scale that don’t correlate with the rest. If any of the values in that column exceed the value for ‘Alpha’ at the bottom of the table, then the scale would be better without that item – it should be removed from the scale and the Reliability Analysis run again. For this example, you should get a value for Alpha of 0.7793, with none of the seven items requiring removal from the scale. Scale calculation Once you have successfully reversed the counterbalanced item variables, you can compute your scale.  CPD Factor Analysis.doc Click on OK 7 CPD Introduction to Research Statistics: Factor Analysis notes each successive factor. In this example, the cumulative variance explained by the first four factors is 52.4%. You can ignore the remaining columns. 8. Factor analysis of ‘CPD-Driving.sav’ The Scree plot is displayed next. You can see that in this example, although four factors have been extracted (using the SPSS default criteria – see later), the scree plot shows that a 3 factor solution might be better – the big difference in the slope of the line comes after three factors have been extracted. You can see this more clearly if you place a ruler along the slope in the scree plot. The discontinuity between the first three factors and the remaining set is clear – they have a far ‘steeper’ slope than the later factors. Perhaps three factors may be better than four? See section [iii] p.9 for further discussion of this issue. [i] Orthogonal (Varimax) rotation (uncorrelated factors) An orthogonal (varimax) analysis will identify factors that are entirely independent of each other. Using the data in CPD-Driving.sav we will run a factor analysis on the personality trait items (var01 to var20). Use the following procedure to carry out the analysis:          CPD Factor Analysis.doc Analyze… …Data reduction… …Factor Select all the items from var01 to var20 and move them into the Variables box Click on Extraction Click on the  button next to the Method box and select Principal Axis Factoring from the drop-down list Make sure there is a tick in the Scree Plot option Click on Continue Click on Rotation, select Varimax (make sure the circle is checked) Click on Options, select Sort by size and Suppress absolute values less than 0.1 and then change the value to 0.3 (instead of 0.1) Click on Continue… …OK. Next comes the factor matrix, showing the loadings for each of the variables on each of the four factors. Remember that this is for unrotated factors, so move on to look at the rotated factor matrix below it, which will be easier to interpret. Each factor has a number of variables which have higher loadings, and the rest have lower ones. Remember that we have asked SPSS to ‘suppress’ or ignore any values below 0.1, so these will be represented by blank spaces. You should concentrate on those values greater than 0.3, as any lower than this can also be ignored. To make things easier, you could go back and ask SPSS to suppress values less than 0.3 – that will clean up the rotated factor matrix and make it easier to interpret. Finally comes the factor rotation matrix, which can also be ignored (it simply specifies the rotation that has been applied to the factors). Output First, you have the Communalities. These are all okay, as there are none lower than about 0.2 (anything less than 0.1 should prompt you to drop that particular variable, as it clearly does not have enough in common with the factors in the solution to be useful. If you drop a variable, you should run the analysis again, but without the problem variable). [ii] Correlated factors – oblique (oblimin) rotation You may have noticed that some of the questions in the questionnaire seem to measure similar things (for example, ‘the law’ is mentioned in variable items that do not appear to load heavily on the same factor). Two or more of the factors identified in the last exercise may well correlate with one another, as personality variables have a habit of doing. An orthogonal analysis may not be the most logical procedure to carry out. Using the data in Driving01.sav we will run an oblique factor analysis on the personality trait items (var01 to var20), which will identify factors that may be correlated to some degree. The next table displays the Eigenvalues for each potential factor. You will have as many factors as there were variables to begin with, but this does not result in any kind of data reduction – not very useful. The first four factors have eigenvalues greater than 1, so SPSS will extract these factors by default (SPSS automatically extracts all factors with eigenvalues greater than 1, unless you tell it to do otherwise). In column 2 you have the amount of variance ‘explained’ by each factor, and in the next column, the cumulative variance explained by 8 CPD Introduction to Research Statistics: Factor Analysis notes CPD Factor Analysis.doc We may have an idea of how many factors we should extract – the scree plot can give some heavy hints (as mentioned earlier). The scree plot is not exact – there is a degree of judgement in drawing these lines and judging where the major change in slope comes, but with larger samples it usually pretty reliable. Use the procedure described above, but when you click on the Rotation button, instead of checking the Varimax option, check the Direct Oblimin option instead. Compare the output from this analysis with the output from the varimax analysis. The first few sections will look the same, because both analyses use the same process to extract the factors. The difference is once the initial solution has been identified and SPSS rotates it, in order to clarify the solution (by redistributing the variance across the factors). Instead of a rotated factor matrix, you will have ‘pattern’ and ‘structure’ matrices. I reckon that three factors would lead to a better solution than four, so try running the analysis again, but this time specify that you want a 3 factor solution by specifying the Number of factors to extract as 3 in the Extraction options window. The solution fits quite well, with all variable items loading quite high on only one factor, while still explaining a fairly high proportion of variance (46.8%) thus revealing a good simple structure. Look at the factors and the loadings in the pattern matrix (concentrate on loadings greater than +/- 0.3). Do they look the same as the varimax solution? One thing that has changed is that, although the factors look similar, the loadings will have changed a bit, and not all load in the same way as before. For example, var02 (“These days a person doesn’t really know quite who he can count on”) no longer loads on factor 3, but only on factor 4. How does this change the interpretation of factors 3 and 4? 9. Where to go from here? Once the EFA procedures have identified the variables that load on each factor, we can use the scale-building and reliability procedures described earlier in these exercises to produce internally-reliable scales which we may then use to describe differences between people. You could use an unrelated samples t-test to look at the differences between genders in the factor 2 ‘Thrill-sedate’ items, or ANOVA to investigate differences between age-groups in the factor 1 ‘future orientation’ items. Finally is the factor correlation matrix. In a varimax solution, you can usually ignore the plus or minus signs in front of the factor loadings, because the factors are entirely independent of one another. However, in oblique (oblimin) analyses, we have to take these into account because the factors correlate with one another, and therefore we need to know if there is a positive or a negative relationship between factors. The relationship between correlated factors must inherently take into account the sign of the loadings. In this example, the negative correlations are so small as to be unimportant (correlations less than 0.1 are usually non-significant), and so this is not an issue. However, you should be aware that this may not always be the case. It may seem confusing at first, but working out the logic behind the relationships between factors makes sense when you look at the variable items that represent the factors. Cris Burgess (2006) [iii] Extracting a specific number of factors Up to now, you have been letting SPSS decide how many factors to extract, and it has been using the default criterion (called the ‘Kaiser’ criterion) of extracting factors with eigenvalues greater than 1. Look at the second table in your output: four factors have eigenvalues greater than 1, so SPSS extracts and rotates four factors. However, this criterion doesn’t always guarantee the optimal solution. 9

FACTOR ANALYSIS:

Related documents

Products

Support

FACTOR ANALYSIS:

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib