Ninness, C., Lauter, J. Coffee, M., Clary, L., Kelly, E., Rumph, M., Rumph, R., Kyle, R., & Ninness, S. (2012) Behavioral and Biological N EPS 651 Multivariate Analysis Factor Analysis, Principal Components Analysis, and Neural Network Analysis (Self-Organizing Maps) For next week: Continue with T&F Chapter 13 and please read the study below posted on our webpage: T&F Chapter 13 --> 13.5.3 page 642 Several slides are based on material from the UCLA SPSS Academic Technology Services http://www.ats.ucla.edu/stat/spss/output/factor1.htm Principal components analysis (PCA) and Factor Analysis are methods of data reduction: Suppose that you have a dozen variables that are correlated. You might use principal components analysis to reduce your 12 measures to a few principal components. For example, you may be most interested in obtaining the component scores (which are variables that are added to your data set) and/or to look at the dimensionality of the data. For example, if two components are extracted and those two components accounted for 68% of the total variance, then we would say that two dimensions in the component space account for 68% of the variance. Unlike factor analysis, principal components analysis is not usually used to identify underlying latent variables. [direct quote from below]. http://www.ats.ucla.edu/stat/spss/output/factor1.htm FA and PCA: Data reduction methods If raw data are used, the procedure will create the original correlation matrix or covariance matrix, as specified by the user. If the correlation matrix is used, the variables are standardized and the total variance will equal the number of variables used in the analysis (because each standardized variable has a variance equal to 1). If the “covariance matrix” is used, the variables will remain in their original metric. However, one must take care to use variables whose variances and scales are similar. Unlike factor analysis, which analyzes the common variance, the original matrix in a principal components analysis analyzes the total variance. Also, principal components analysis assumes that each original measure is collected without measurement error [direct quote]. http://www.ats.ucla.edu/stat/spss/output/factor1.htm Spin Control Factor analysis is a method of data reduction also – forgiving relative to PCA Factor Analysis seeks to find underlying unobservable (latent) variables that are reflected in the observed variables (manifest variables). There are many different methods that can be used to conduct a factor analysis (such as principal axis factor, maximum likelihood, generalized least squares, unweighted least squares). There are also many different types of rotations that can be done after the initial extraction of factors, including orthogonal rotations, such as varimax and equimax, which impose the restriction that the factors cannot be correlated, and oblique rotations, such as promax, which allow the factors to be correlated with one another. You also need to determine the number of factors that you want to extract. Given the number of factor analytic techniques and options, it is not surprising that different analysts could reach very different results analyzing the same data set. However, all analysts are looking for a simple structure. A simple structure is pattern of results such that each variable loads highly onto one and only one factor. [direct quote] http://www.ats.ucla.edu/stat/spss/output/factor1.htm FA vs. PCA conceptually FA produces factors PCA produces components FA I1 I2 PCA I3 I1 I2 I3 6 Kinds of Research Questions re PCA and FA What does each factor mean? Interpretation? Your call What is the percentage of variance in the data accounted for by the factors? SPSS & psyNet will show you Which factors account for the most variance? SPSS & psyNet How well does the factor structure fit a given theory? Your call What would each subject’s score be if they could be measured directly on the factors? Excellent question! 7 Before you can even start to answer these questions using FA should be > .6 should be < .05 Kaiser-Meyer-Olkin Measure of Sampling Adequacy - This measure varies between 0 and 1, and values closer to 1 are better. A value of .6 is a suggested minimum. It answers the question: Is there enough data relative to the number of variables. Bartlett's Test of Sphericity - This tests the null hypothesis that the correlation matrix is an identity matrix. An identity matrix is a matrix in which all of the diagonal elements are 1 and all off diagonal elements are 0. Ostensibly, you want to reject this null hypothesis. This, of course, is psychobabble. Taken together, these two tests provide a minimum standard which should be passed before a factor analysis (or a principal components analysis) should be conducted. What is a Common Factor? It is an abstraction, a “hypothetical construct” that relates to at least two of our measurement variables into a factor In FA, psychometricians / statisticians try to estimate the common factors that contribute to the variance in a set of variables. Is this an act of logical conclusion, a creation, or a figment of a psychometrician’s imagination ? Depends on who you ask What is a Unique Factor? It is a factor that contributes to the variance in only one variable. There is one unique factor for each variable. The unique factors are unrelated to one another and unrelated to the common factors. We want to exclude these unique factors from our solution. Seems reasonable … right? Assumptions Factor analysis needs large samples and it is one of the only draw backs • The more reliable the correlations are the smaller the number of subjects needed • Need enough subjects for stable estimates -How many is enough 11 Assumptions Take home hint: • 50 very poor, 100 poor, 200 fair, 300 good, 500 very good and 1000+ excellent • Shoot for minimum of 300 usually 12 • More highly correlated markers fewer subjects Assumptions No outliers – obvious influence on correlations would bias results Multicollinearity In PCA it is not problem; no inversions In FA, if det(R) or any eigenvalue approaches 0 -> multicollinearity is likely The above Assumptions at Work: Note that the metric for all these variables is the same (since they employed a rating scale). So do we do we run the FA as correlation or covariance matrices / does it matter? Sample Data Set From Chapter 13 (p. 617) Tabacknick and Fidell Principal Components and Factor Analysis Skiers S1 S2 S3 S4 S5 Cost 32 61 59 36 62 Variables Lift Depth 64 65 37 62 40 45 62 34 46 43 Powder 67 65 43 35 40 Keep in mind, multivariate normality is assumed when statistical inference is used to determine the number of factors. The above dataset is far too small to fulfill the normality assumption. However, even large datasets frequently violate this assumption and compromise the analysis. Multivariate normality also implies that relationships among pairs of variables are linear. The analysis is degraded when linearity fails, because correlation measures linear relationship and does not reflect nonlinear relationship. Linearity among variables is assessed through visual inspection of scatterplots. Equations – Extractions - Components Correlation matrix w/ 1s in the diag Cost Lift Depth Powder Cost Lift Depth Powder 1 -0.952990 -0.055276 -0.129999 -0.952990 1 -0.091107 -0.036248 -0.055276 -0.091107 1 0.990174 -0.129999 -0.036248 0.990174 1 Large correlation between Cost and Lift and another between Depth and Powder Looks like two possible factors – why? Are you sure about this? L=V’RV => L = V’ R V EigenValueMatrix = TransposeEigenVectorMatrix * CorMat * EigenVecMat We are reducing to a few factors which duplicate the matrix? Does this seem reasonable? Equations – Extraction - Obtaining components In a two-by-two matrix we derive eigenvalues with two eigenvectors each containing two elements In a four-by-four matrix we derive eigenvalues with eigenvectors each containing four elements L=V’RV It is important to know how L is constructed Where L is the eigenvalue matrix and V is the eigenvector matrix. This diagonalized the R matrix and reorganized the variance into eigenvalues A 4 x 4 matrix can be summarized by 4 numbers instead of 16. 5 4 Remember this? 54 2 2- 1 1 2 - (5- ) * -5 + -2 + 2 0 0 (2- (5 * 2) ) - - (4 * 1) (1 *4) -7 + 6=0 With a two-by-two matrix we derive eigenvalues with two eigenvectors each containing two elements With a four-by-four matrix we derive eigenvalues with eigenvectors each containing four elements it simply becomes a longer polynomial 54 2 2- 1 (5- ) * -5 + -2 + 2 (2- - (5 * 2) ) (1 *4) -7 + 6=0 - (4 * 1) = 0 Determinant Where a = 1, b = -7 and c = 6 i i = - (- 7 ) + = i -b 1 b2 - 4 ac 2a (-7)2 - 4 (1) * (6) =6 2 (1) - (- 7 ) - = -+ (-7)2 - 4 (1) * (6) 2 (1) =6 2 an equation of the second degree with two roots [eigenvalues] =1 =1 21 From Eigenvalues to Eigenvectors R=VLV’ Equations – Extractions – Obtaining components • SPSS matrix output Skiers S1 S2 S3 S4 S5 Cost 32 61 59 36 62 Variables Lift Depth Powder 64 65 67 37 62 65 40 45 43 62 34 35 46 43 40 Careful here. 1.91 is correct, but it appears as a “2” in the text Obtaining L = the eigenvalue matrix V’ Our original correlation matrix V Bartlett's Test of Sphericity - This tests the null hypothesis that the correlation matrix is an identity matrix. An identity matrix is matrix in which all of the diagonal elements are 1 and all off diagonal elements are 0. Equations – Extraction – Obtaining Components Other than the magic “2” below – this is a decent example 1.91 We have “extracted” two Factors from four variables Using a small data set Following SPSS Extraction and Rotation and all that jazz… in this case, not much difference [others data sets show big change] Cost Lift Depth Powder Factor 1 Factor 2 -0.401 0.907 0.251 -0.954 0.933 0.351 0.957 0.288 Here we see that Factor 1 is mostly Depth and Powder (Snow Condition Factor) Factor 2 is mostly Cost and Lift, which is a Resort Factor Both factors have complex loadings Using SPSS 12, SPSS 20 and psyNet.SOM This is a variation on your homework. Just use your own numbers and replicate the process. (we may use this hypothetical data as part of a study) Skiers S1 S2 S3 S4 S5 Cost 32 61 59 36 62 Variables Lift Depth 64 65 37 62 40 45 62 34 46 43 Powder 67 65 43 35 40 Here is an easier way than doing it by hand: Arrange data in Excel Format as below: SPSS 20 Select Data Reduction: SPSS 12 Select Data Reduction: SPSS 20 Select Variables Descriptives: SPSS 12 Select Variables and Descriptives: SPSS 20 Start with a basic run using Principal Components: SPSS 12 Eigenvalues over 1 Start with a basic run using Principal Components: SPSS 12 Fixed number of factors Select Varimax: SPSS 12 Select Varimax: SPSS 20 Under Options, select exclude cases likewise and sort by size: SPSS 12 Under Options, select exclude cases likewise and sort by size: SPSS 20 Under Scores, select “save variables” and “display matrix”: SPSS 20 Watch what pops out of your oven A real time saver Matching psyNet PCA correlation matrix with SPSS FA This part is the same but the rest of PCA goes in an entirely different direction Kaiser's measure of sampling adequacy: Values of .6 and above are required for a good FA. Remember these guys? An MSA of .9 is marvelous, .4 is not too impressive – Hey it was a small sample Normally, variables with small MSAs should be deleted Looks like two factors can be isolated/extracted which ones? and what shall we call them? Here they are again // they have eigenvalues > 1 We are reducing to a few factors which duplicate the matrix? Fairly Close Rotations – Nice hints here SPSS will provide an Orthogonal Rotation without your help – look at the iterations Extraction, Rotation, and Meaning of Factors Orthogonal Rotation [assume no correlation among the factors] Loading Matrix – correlation between each variable and the factor Oblique Rotation [assumes possible correlations among the factors] Factor Correlation Matrix – correlation between the factors Structure Matrix – correlation between factors and variables Oblique Rotations – Fun but not today Factor extraction is usually followed by rotation in order to maximize large correlation and minimize small correlations Rotation usually increases simple structure and interpretability. The most commonly used is the Varimax variance maximizing procedure which maximizes factor loading variance Rotating your axis “orthogonally” ~ sounds painfully chiropractic Where are your components located on these graphs? What are the upper and lower limits on each of these axes? Cost and Lift may be a factor, but they are polar opposites Abbreviated Equations Factor weight matrix [B] is found by dividing the loading matrix [A] by the correlation matrix [R-1]. See matrix output B R 1 A Factors scores [F] are found by multiplying the standardized scores [Z] for each individual by the factor weight matrix [B]and adding them up. F ZB Abbreviated Equations The specific goals of PCA or FA are to summarize patterns of correlations among observed variables, to reduce a large number of observed variables to a smaller number of factors, to provide an operational definition (a regression equation) for an underlying process by using observed variables to test a theory about the nature of underlying processes. Z FA ' You can also estimate what each subject would score on the “standardized variables.” This is a revealing procedure—often overlooked. Standardized variables as factors Predictions based on Factor analysis: Standard-Scores 1.1447 Cost Lift Depth Powder 0.96637 1 32 64 65 67 2 61 37 62 65 -0.41852 3 59 40 45 43 -1.11855 4 36 62 34 35 -0.574 5 62 46 43 40 Predictions based on Factor analysis: Standard-Scores 1.18534 Cost Lift Depth Powder -0.90355 1 32 64 65 67 - 0.70694 2 61 37 62 65 0.98342 3 59 40 45 43 4 36 62 34 35 Interesting stuff… what about cost? -0.55827 5 62 46 43 40 Predictions based on Factor analysis: Standard-Scores 0.39393 -0.59481 - 0.73794 -0.64991 And this is supposed to represent ? 1.58873 SOM Classification of Ski Data Variables Skiers S1 S2 S3 S4 S5 Skiers Cost Lift Depth Powder Cost 32 61 59 36 62 S1 32 64 65 67 Variables Lift Depth 64 65 37 62 40 45 62 34 46 43 S2 61 37 62 65 S3 59 40 45 43 Powder 67 65 43 35 40 S4 36 62 34 35 S5 62 46 43 40 Transpose data before saving as a CSV file. Transpose data to analyze by class/factors 4 rows 4 columns In CSV format SOM Classification of Ski Data SOM classification 1: Depth and Powder across 5 SS Nice match with FA 1 Cost Lift Powder Depth Cost Lift Depth Powder -1.36772 0.835832 0.683862 -1.06379 0.911816 1.27029 -1.14505 -0.87668 1.091376 -0.33994 1.285737 1.031973 -0.40602 -1.33649 -0.5752 1.275638 1.125563 -0.52526 -1.12556 -0.75038 1 32 64 65 67 2 61 37 62 65 3 59 40 45 43 4 36 62 34 35 5 62 46 43 40 Cost Class/Factor ?? SOM classification 2: Cost across 5 SS SOM classification 3: Lift across 5 SS Lift Class/Factor Near match with FA 2 C ost Lift Depth Powder Cost Lift Powder Depth 1 32 64 65 67 -1.36772 1.27029 1.285737 1.275638 2 61 37 62 65 0.835832 -1.14505 1.031973 1.125563 3 59 40 45 43 0.683862 -0.87668 -0.40602 -0.52526 4 36 62 34 35 -1.06379 1.091376 -1.33649 -1.12556 5 62 46 43 40 0.911816 -0.33994 -0.5752 -0.75038 Factor 1: Appears to address Depth and Powder SOM classification 1: Depth and Powder across 5 SS Nice match with FA 1 This could be placed into a logistic regression and predict with reasonable accuracy Factor 2: Appears to address Lift SOM classification 3: Lift across 5 SS Predictions based on Factor analysis: Standard-Scores Factor Analysis Factor 3: ?? SOM classification 2: Cost across 5 SS Center for Machine Learning and Intelligent Systems Iris Setosa Iris Versicolour Iris Virginica This dataset has provided the foundation for multivariate statistics and machine learning Transpose data before saving as a CSV file. Transpose data to analyze by class/factors 4 rows 150 columns In CSV format Factor Analysis: Factor 1 Factor Analysis: Factor 2 petal width in cm sepal length in cm sepal width in cm petal length in cm SOM Neural Network: Class 1 SOM Neural Network: Class 2 sepal length in cm sepal width in cm petal length in cm petal width in cm Factor Analysis: Factor 1 SOM Neural Network: Class 1 sepal length in cm sepal width in cm petal length in cm This could be placed into a logistic regression and predict with near perfect accuracy Really ?? Look at the original Everybody but psychologists seem to understand this