Factor Analysis Assistant Professor Dr. Sangdao Wongsai Faculty of Science and Technology Thammasat University, Rangsit campus Handouts today What is factor analysis? Applications of factor analysis Steps in factor analysis An example Designed by Aj Sangdao 2 What is factor analysis? Factor analysis is a data reduction tool that Describe the interrelationships among the variables. represents the correlated variables with a smaller set of “natural grouping” variables (i.e. derived factors) that relatively independent of one another. removes redundancy or duplication from a set of correlated variables. assist theoretical interpretation of complex datasets. Designed by Aj Sangdao 3 Differences between MLR, PCA and FA Multiple regression analysis is a method that explains the total variability of a dependent (response) variable using independent (predictor) variables. Principal component analysis is a technique that selects the m components that explain as much of the total variance in a set of high-dimensional variables as possible. Factor analysis is a method that selects the m underlying factors that have common variances shared among the original (measured) variables. Designed by Aj Sangdao 4 Applications of factor analysis Identification of underlying factors Sampling of variables create new variables (i.e. factors) that describe the correlations among observed variables. select a small set of representative variables from a larger set. Uses in further analyses reduce multicollinearity problem in regression analysis. group variables into homogeneous sets in cluster analysis. Designed by Aj Sangdao 5 Steps in factor analysis 1. 2. 3. 4. 5. 6. Collect data or obtain from a database Explore data Obtain correlation matrix Select an appropriate number of factors Select an estimation method If necessary, drop variable(s) and repeat steps 3 and 6 7. If necessary, rotate factors 8. Interpret (rotated) factors Designed by Aj Sangdao 6 Steps in factor analysis Collect data Explore data Obtain correlation matrix Select an appropriate number of factors Select an appropriate estimation method Drop variable(s) Yes Interpret factors No Rotate factors, if needed Designed by Aj Sangdao 7 An example: Water quality Twelve water quality variables were measured monthly between 2001 and 2007 from three reservoirs in NSW, Australia. Seventy one observations were recorded. Research question: Is there any underlying natural groupings (i.e. factors) of the water quality dataset? Designed by Aj Sangdao 8 Correlation matrix of 12 variables Designed by Aj Sangdao 9 Factor analysis The pattern of the high and low correlations in the correlation matrix is such that the variables in a particular subset have high correlations among themselves but low correlations with all the other variables. Then there may be a single underlying factor that give rise to the variables in the subsets. If the other variables can be similarly grouped into subsets with a like pattern of correlations, then a few factors can represent these groups of variables. Designed by Aj Sangdao 10 Factor analysis Three factors are retained. variable factor1 factor2 factor3 uniqueness Sec SiO2 Con Tem DO pH NO3 NH4 TP TN Fe OC 0.07 0.11 -0.20 -0.96 0.93 0.55 0.41 -0.11 -0.16 -0.02 0.02 -0.12 -0.21 0.93 0.16 -0.01 0.10 -0.42 0.49 -0.10 0.23 -0.22 0.68 -0.01 0.02 -0.14 -0.38 -0.02 -0.09 -0.06 0.41 0.38 0.36 0.98 -0.30 0.20 0.95 0.15 0.74 0.07 0.19 0.46 0.40 0.86 0.81 0.01 0.45 0.96 Designed by Aj Sangdao 11 Factor analysis Three factors are retained. variable factor1 factor2 factor3 uniqueness Sec SiO2 Con Tem DO pH NO3 NH4 TP TN Fe OC 0.07 -0.21 0.02 0.11 0.93 -0.14 Factor loadings -0.20 0.16 -0.38 -0.96 -0.01 -0.02 0.93 0.10 -0.09 0.55 -0.42 -0.06 0.41 0.49 0.41 -0.11 -0.10 0.38 -0.16 0.23 0.36 -0.02 -0.22 0.98 0.02 0.68 -0.30 -0.12 -0.01 0.20 0.95 0.15 0.74 0.07 0.19 0.46 0.40 0.86 0.81 0.01 0.45 0.96 Designed by Aj Sangdao If any variable has a uniqueness > 0.7, it will be dropped from factor analysis. • How many factors? • What is factor loading? • What is uniqueness? • How do we calculate these values? 12 Factor analysis Five variables with high uniqueness are dropped from the factor analysis. Then, factor analysis is performed on a correlation of the remaining variables. variable factor1 factor2 factor3 uniqueness Tem DO pH SiO2 Fe NO3 TN -0.89 0.92 0.62 0.03 0.03 0.19 -0.08 0.00 0.13 -0.38 0.87 0.77 0.34 -0.46 -0.18 0.04 -0.19 0.18 -0.07 0.82 0.65 0.09 0.16 0.41 0.24 0.41 0.16 0.36 Designed by Aj Sangdao 13 Factor analysis Five variables with high uniqueness are dropped from the factor analysis. Then, factor analysis is performed on a correlation of the remaining variables. variable factor1 factor2 factor3 uniqueness Tem DO pH SiO2 Fe NO3 TN -0.89 0.92 0.62 0.03 0.03 0.19 -0.08 0.00 0.13 -0.38 0.87 0.77 0.34 -0.46 -0.18 0.04 -0.19 0.18 -0.07 0.82 0.65 0.09 0.16 0.41 0.24 0.41 0.16 0.36 Designed by Aj Sangdao Each variable now has its uniqueness less than 0.7. 14 Interpretation of derived factors Factor 1 accounts for physical characteristics, comprising positive loadings for water pH and dissolved oxygen, and a negative loading for temperature. Factor 2 reflects mineral elements, consisting of positive loadings for silicate and filterable iron. Factor 3 characterises nitrogen elements, comprising positive loadings for total nitrogen and nitrates. Designed by Aj Sangdao 15 FA: Shared variance and error Total variance of Xj l / unieueness ) = communality + uniqueness F1 Assuming that we perform the factor analysis model on a correlation matrix of a data set of 5 variables; X1, X2, X3, X4, and X5. F2 ✓ X1 X2 Two factors, F1 and F2, are created by the maximum likelihood estimation. X3 ✓ X4 X5 ✓ (l ) com Var ( X j ) = h h 2 j =1 (l ) k =1 2 j = (l 2 j =1 k =2 2 j Uni 94 + u 2j =1 Ftctork ) + (l ) fpc k =1 2 j =1 1 origina f7C 2 k =2 2 j =1 ✗ u 2j Factor 1 represents the correlation of the variables X1, X4, and X5 since they have higher loading on this factor than another factor. Factor 2 comprises the correlation of the variables X2 and X3. Designed by Aj Sangdao 16 Factor analysis model A data matrix X comprises n observations in rows and p variables in columns, where n > p. Each element xij represents the observation i for variable j. x11 x 21 X= . xn1 x12 x22 . xn 2 ... x1 p ... x2 p ... . ... xnp Designed by Aj Sangdao 17 Factor analysis model Then, a correlation matrix of the original variables is generated. The correlation matrix can be also interpreted as the covariance matrix between the standardized variables. It is scale invariant, that is, it does not change if we change the units of measurement. Designed by Aj Sangdao 18 Factor analysis model Correlation coefficient is a measure of both the direction and the strength. The magnitude of correlation coefficient measures the strength. The closer it is to 1 or -1, the stronger the relationship. If the correlation coefficient > 0, then there is a positive relationship between two variables. If the correlation coefficient < 0, then the relationship is negative. Designed by Aj Sangdao 19 Factor analysis model Factor analysis model expresses each variable as a linear combination of underlying common factors with an accompanying error term. X = ΛF + ε where X is the matrix of observed variables Xj, j = 1, 2, …, p F is the vector of factors Fk, k = 1, 2, …, m (m < p) Λ is the matrix of coefficients of the factors Fk on the variables Xj – the so-called factor loadings. ε is the vector of random errors of the variables Xj. Designed by Aj Sangdao 20 Factor analysis model Factor analysis model expresses each variable as a linear combination of underlying common factors with an accompanying error term. 1 า Factor rariable ex : l11 l12 X1 1 2 X 2 = l2 l2 ... ... ... 1 2 X j l p l p X = ΛF + ε where ... ... ... ... 2 "" on l1m F 1 e1 l2m F 2 e2 + ... ... ... m m l p F e j Xj is the jth observed variable, j = 1, 2, …, p l kj is the loading of the jth variable on the kth factor, k = 1, 2, …, m (m < p) Fk is the kth factor, k = 1, 2, …, k ej is the random error of the jth variable. กิ๋ Designed by Aj Sangdao 21 Factor analysis model For p variables and m common factors, the model is X 1 = l11 F 1 + l12 F 2 + ... + l12 F m + e1 X 2 = l21 F 1 + l22 F 2 + ... + l22 F m + e2 X = ΛF + ε ... X p = l1p F 1 + l p2 F 2 + ... + l p2 F m + e p where Xj is the jth observed variable, j = 1, 2, …, p l kj is the loading of the jth variable on the kth factor, k = 1, 2, …, m (m < p) Fk is the kth factor, k = 1, 2, …, k ej is the random error of the jth variable. Designed by Aj Sangdao 22 Factor analysis model It is noted that the number of m factors is less than the number of p original variables. The m common factors are assumed to have zero means and unit variances (variance = 1). It is due to the fact that the factor analysis model is usually carried out from the correlation matrix of the original variables. Designed by Aj Sangdao 23 Factor analysis model Each variable is standardized to have a mean of 0. The variance of each standardized variable is 1. Thus, the total variance of all standardized variables is the sum of their variances, that is equal to the number of p original variables. The variance of each standardized variable is composed of a part due to the m common factors and a part due to its own random error. The total variance of all the standardized variables is therefore equal to the sum of the part of variance explained by the m common factors and the part of variance from the random errors of the p original variables. Designed by Aj Sangdao 24 Factor analysis model The factor loading, l k , is the loading of Xj j on the Fk. For the kth factor, the sum of squared (ss) factor loadings from all the p variables is the variance of the kth factor. It specifies the amount of variance in the data with standardized variables can be explained by the kth factor. It is called the common variance of the kth factor (green box). For the jth variable, the sum of squared (ss) factor loadings from the m common factors is the variance of the jth variables accounted for by the m factors. It is called the communality of the jth variable, h 2j (blue box). F 1 F 2 ... F m X 1 l11 X 2 l21 ... ... X p l1p l12 ... l1m m 2 l2 ... l2 ... ... ... l p2 ... l pm ss1 ss 2 ... ss k ( l1 )2 1 1 2 ( l2 ) ... ( l 1 ) 2 p Designed by Aj Sangdao (l ) (l ) 2 2 1 ... 2 2 2 ... ... ... ( l p2 ) 2 ( l ) l ( ) m 2 1 h12 m 2 2 h22 ... 2 ... ( l pm ) hp2 25 Communality and uniqueness The total variance of each Xj can be decomposed into two parts corresponding to communality and uniqueness. Total variance = communality + uniqueness. Ivariancel Communality ( h 2j ) is the proportion of the variance of each Xj accounted for by the m common factors. It is the sum of squares of the loadings of the jth variable contributed by the m factors. m h = ∑ (l kj ) 2 =(l1j ) 2 + (l 2j ) 2 + ...(l mj ) 2 2 j k =1 2 Uniqueness ( u j ) is the unexplained variance of each Xj due to the random errors, indicating how distinctive (specific) the measure of each Xj is from the remaining variables. u 2j = 1 − h 2j = 1 − (l1j ) 2 + (l 2j ) 2 + ...(l mj ) 2 Pase 4 on 5 +4. ¢ ata > var Designed by Aj Sangdao = 1 26 Variance of the jth variable p The communality of the jth variable is the sum of ss loadings by row. The sum of the communalities is equal to the cumulative part of the total variance in the standardized variables that can be accounted for by the m common factors. p m ∑ h = ∑∑ ( l ) j =1 2 j k 2 j j =1 k =1 m for j = 1; ∑ ( l k =1 m for j = 2; ∑ ( l k =1 ) k 2 1 ; j = 1, 2,..., p; k = 1, 2,..., m F1 = (l11 ) 2 + (l12 ) 2 + ... + (l1m ) 2 ) k 2 2 = (l ) + (l ) + ... + (l ) 1 2 2 2 2 2 m 2 2 ... m for j = p; ∑ ( l k =1 ) k 2 p 1 2 l X1 ( 1 ) 1 2 X 2 ( l2 ) ... ... 1 2 l X p ( p ) F2 ( l12 ) ... 2 (l ) 2 2 2 ... ( l p2 ) 2 Fm m 2 l ( 1 ) h12 2 m 2 ... ( l2 ) h2 ... ... 2 2 ... ( l pm ) hp ... = (l1p ) 2 + (l p2 ) 2 + ... + (l pm ) 2 Designed by Aj Sangdao 27 Variance of the jth variable The proportion of the total variance in the standardized variables accounted for by the m common factors is equal to the sum of the communalities divided by the total variance of the standardized data (that is equal to the number of p variables.). p 2 h ∑ j j =1 p ; j = 1, 2,..., p; k = 1, 2,..., m Designed by Aj Sangdao 28 Variance of the kth factor The common variance of the kth factor is the sum of ss loadings by column. The sum of the common variances is equal to the cumulative part of the total variance in the standardized variables that can be explained by the m common factors. m p ∑∑ ( l ) k =1 j =1 k 2 j ; j = 1, 2,..., p; k = 1, 2,..., m p for k = 1; ∑ ( l j =1 ) 1 2 j p F1 = (l11 ) 2 + (l21 ) 2 + ... + (l1p ) 2 for k = 2; ∑ ( l 2j ) = (l12 ) 2 + (l22 ) 2 + ... + (l p2 ) 2 2 j =1 ... p for k = m; ∑ ( l j =1 ) m 2 j 1 2 l X1 ( 1 ) 1 2 X 2 ( l2 ) ... ... 1 2 l X p ( p ) = (l ) + (l ) + ... + (l ) m 2 1 m 2 2 m 2 p Designed by Aj Sangdao ss1 F2 ( l12 ) ... 2 (l ) 2 2 2 ... ( l p2 ) 2 Fm m 2 l ( 1 ) h12 2 m 2 ... ( l2 ) h2 ... ... 2 2 ... ( l pm ) hp ... ss 2 ... ss k 29 Variance of the kth factor The proportion of the total variance in the standardized variables explained by the m common factors is equal to the sum of the common variances divided by the total variance of the standardized data (that is equal to the number of p variables.). m p ∑∑ ( l ) k =1 j =1 p k 2 j ; j = 1, 2,..., p; k = 1, 2,..., m Designed by Aj Sangdao 30 The shared variance of the m factors The sum of the common variances explained by the m factors is equal to the sum of the communalities of the standardized variables. These sums are called the shared variance of the m factors. p m p ∑ h = ∑∑ ( l ) j =1 2 j k =1 j =1 k 2 j ; j = 1, 2,..., p; k = 1, 2,..., m Designed by Aj Sangdao 31 The shared variance of the m factors The proportion of the shared variance from the m factors is equal to the common variance explained by the kth factor divided by the shared variance of the m factors. p ∑ (l ) j =1 p ∑h j =1 p k 2 j 2 j = ∑ (l ) k 2 j j =1 m p ∑∑ ( l ) k =1 j =1 k 2 j ; j = 1, 2,..., p; k = 1, 2,..., m Designed by Aj Sangdao 32 The total variance of standardized data The proportion of the total variance in the standardized variables accounted for by the m common factors is equal to the shared variance from the m common factors divided by the total variance of the standardized data (that is equal to the number of p variables.). p ∑h j =1 p m 2 j = p ∑∑ ( l ) k =1 j =1 p k 2 j ; j = 1, 2,..., p; k = 1, 2,..., m Designed by Aj Sangdao 33 Uniqueness If the communality of each variable is less than the total variance of that variable, then the discrepancy between these two quantities is the uniqueness. If the communality of each variable is equal to the total variance of that variable, then there is no part of the total variance that cannot be accounted for by the factor model (uniqueness = 0). The uniqueness is separated into two types; the specificity and the measurement error. The specificity is the variance that is specific to a particular variable. It is a systematic variance that is unshared with other variables. The error comes from errors of measurement and basically anything unexplained by common or specific variance. Designed by Aj Sangdao 34 Uniqueness The uniqueness is the part of the variance that is due to the unique factor ej, the random error of the jth variable. Therefore, any variables with high uniqueness should be dropped from the factor analysis. communality communality > < uniqueness uniqueness X should be considered as a linear combination of underlying factors. It is more likely to be highly correlated to other variables in the data set. X has its own specific contribution. It is less likely to be correlated to other variables in the data set. Designed by Aj Sangdao 35 Shared variance and error Total variance of Xj = communality + uniqueness F1 Assuming that we perform the factor analysis model on a correlation matrix of a data set of 5 variables; X1, X2, X3, X4, and X5. F2 X1 X2 Two factors, F1 and F2, are created by the maximum likelihood estimation. X3 X4 X5 (l ) k =1 2 j (l ) k =2 2 j Var ( X j ) = h 2j =1 + u 2j =1 h 2j =1 = ( l kj==11 ) + ( l kj==12 ) 2 2 u 2j Factor 1 represents the correlation of the variables X1, X4, and X5 since they have higher loading on this factor than another factor. Factor 2 comprises the correlation of the variables X2 and X3. Designed by Aj Sangdao 36 How do we estimate factor loadings? The factor loading, l kj , is the loading of Xj on the Fk. The factor loading turns out to be the correlation between the standardized variables and the underlying factors. The matrix of factor loadings is called the pattern matrix. Estimation method options for the factor loading are Principal component model (analysis, extraction – it may be called by different names) Principal factor model (analysis, principal axis factor analysis) Weighted least squares Maximum likelihood method Designed by Aj Sangdao 37 Estimation: PCA Principal component analysis (PCA) estimation is computed based on the eigenvalue and eigenvectors of the correlation matrix of the original variables. The first component is generated to achieve the maximum (largest) variance of the measure variables. The relationship between the factors and the components is expressed as: Fk = Ck λk ; k = 1, 2,..., m Ck = F k λk Designed by Aj Sangdao 38 Estimation: PCA Recall, an equation of a linear combination of variables for the kth component can be inverted to express the variable Xj as a function of the components. Such that a factor model is written by the following equation: C k = a1k X 1 + a2k X 2 + ... + a kp X p X j = a1j C1 + a 2j C 2 + ... + a jp C p Ck = F k λk X j = a1j F 1 λ 1 + a 2j F 2 λ 2 + ... + a jp F p λ p Designed by Aj Sangdao 39 Estimation: PCA In the factor analysis model, only m number of factors are selected. Thus, the model is composed of two terms, that are the underlying unobserved factors depending on the loadings and random errors. X j = a1j F 1 λ 1 + a 2j F 2 λ 2 + ... + a jp F p λ p l kj = a kj λ k X j = l1j F 1 + l 2j F 2 + ... + l mj F m + e j where e j = a mj +1 F m +1 λ m +1 + ... + a jp F p λ p Designed by Aj Sangdao 40 Estimation: PCA The individual communalities tell how well the model is working for the individual variables. The total communality gives an overall assessment of performance of the factor model. The communality for a given variable can be interpreted as the proportion of variation in that variable explained by the m factors. In other words, if we perform multiple regression of the jth variable against the m common factors, an R2 will be equal to the total communality. The communalities and the specific variances will depend on the number of factors in the model. Designed by Aj Sangdao 41 Estimation: PFA The principal factor analysis (PFA) considers maximizing the total communality a more attractive objective than maximizing the total proportion of the explained variance (as is done in PCA). The estimation uses the communalities in place of the original variance, and works out many iterations. However, the initial communalities may be obtained from the other estimation methods since it is not known prior to analysis of PFA. Designed by Aj Sangdao 42 Estimation: Maximum Likelihood Maximum Likelihood Estimation requires that the data are sampled from a multivariate normal distribution. This is a drawback of this method. Some data may not have such distribution. It is noted that the normality assumption is important only if you wish to generalize the results of your analysis beyond the sample collected. Computationally this process is complex. In general, there is no closed-form solution to this maximization problem so iterative methods are applied. Designed by Aj Sangdao 43 How many factors? The number m of common factors are chosen prior to the analysis, and are smaller than the number of p original variables. It is often not the case that the number of m common factors are known in advanced. It is thus possible to allow the data themselves to determine this number. Unlike the factor analysis, the number of principal components created is equal to the number of p original variables, although only the m PC is recommended to retained for further applications. Therefore, the m retained components will be selected after the analysis and according to their proportion variance of the total variability in the original data. Designed by Aj Sangdao 44 How many factors? Methods of choosing the initial number of factors: Kaiser's criterion (1970) The principal component analysis is chosen as the estimation method for finding the initial number of factors. Such that the Kaiser rule is based on PCA's eigenvalues and eigenvectors of the correlation matrix of the measured variables. The Kaiser rule of how many factors have eigenvalues more than 1 is used to decide how many factors to extract. Designed by Aj Sangdao 45 How many factors? Methods of choosing the initial number of factors: Scree-plot Principal component analysis is chosen as the estimation method for finding the initial number of factors. It is a plot of eigenvalues versus its number (Cattle 1966). Looking for the elbow for the cliff and the scree (constant eigenvalues) Designed by Aj Sangdao 46 How many factors? Methods of choosing the initial number of factors: Cumulative percentage of total variation A criterion is set to be between 60-90%. The m factors will be retained when the chosen percentage exceeds the criterion. Chi-squared test statistic To test the hypothesis that m factors are adequate to fit the model. Designed by Aj Sangdao 47 Criteria for selecting variables Communality indicates the variance in each variable explained by the extracted factors; ideally, above 0.5 for each variable. Factor loading indicates how strongly each variable loads on each factor. It should generally be above |.5| for each variable. Reliability measure checks the internal consistency of the variables included for each factor using Cronbach's alpha. It should be above 0.7 for each variable. A residual correlation matrix: The closer to the zero the better the model fits to the data. Values on the main diagonal of a matrix are common variances of that measured variable on the m factors. Designed by Aj Sangdao 48 Interpretation about the total variability Proportion variance of the total variance in the original set of p standardized variables that can be accounted for by the kth factor is equal to the ss loading (common variance) for the kth factor divided by the total variance of all p standardized variables. Proportion variance of the total variance in the m common factors from the factor analysis model is equal to the ss loading (common variance) for the kth factor divided by the total variance from the m factors in the factor model (that is the sum of the communalities). Designed by Aj Sangdao 49 Interpretation about the new factors Give an appropriate name (label) to each of the factors as it represents mutually correlated effect of many variables that are not seen before performing the factor analysis. Interpretation for each factor depends on the loading (weight) of each variable on that factor. Then, we determine what those variables have in common. Whatever the variables have in common will indicate the meaning of the factor. Designed by Aj Sangdao 50 Interpretation about the new factors Ideally, for any given variable, it has a high loading (correlation) on only one factor. Factor loading is a correlation between a variable and a factor, indicating the strength (how strong a factor influences a measured variable) and direction of a factor on a observed variable. In practice, it is often difficult to interpret. For example, a particular variable may have an equal weight on all m factors. It is recommended to rotate the new factors to ease the interpretation. It is called the rotated factors. Designed by Aj Sangdao 51 Factor rotations Theoretically, factor rotations can be done in an infinite number of ways. Factor rotation consists of finding new axes to represent the factors. These new axes are selected so that they go through clusters or subgroups of the points representing the data variables in a plot of the 2-dimensional factor axes. Two typical options of factor rotations are orthogonal rotation (e.g., varimax) oblique rotation (e.g., promax, oblimin, quartimin) มีความคลาดเคลื่อนเล็กน้อย เพื่อให้สมจริงมากขึ้น Designed by Aj Sangdao 52 Factor rotations Orthogonal rotation Oblique rotation https://www.slideshare.net/ssuser1ab9f7/factor-analysis-7113647 Designed by Aj Sangdao 53 Factor rotations Varimax rotation creates the uncorrelated rotated factors. It restricts the new axes to being orthogonal (perpendicular) to each other. The angle between axes representing the two rotated factors is 90°. Variables that go through the rotated factor 1 have high loadings on the axis 1, and nearly zero loadings on the rotated factor 2 on the axis 2. Variables that go through the rotated factor 2 have high loadings on axis 2 and nearly zero loadings on the rotated factor 1 on the axis 1, and so on. Designed by Aj Sangdao 54 Factor rotations The varimax rotation is achieved by maximizing the common variance within each factor by making the large loadings larger and the small loadings smaller. That simplifies the column of a factor loading matrix by minimizing the number of variables that have high loadings on each factor. It helps in the interpretation of the factors by highlighting the particular variables appearing in each factor. Designed by Aj Sangdao 55 Factor rotations The factor loadings for each factor have been changed after the rotation. It is noted that the communality of each variable is unchanged but the common variance of each factor is changed to achieve the maximum variance for a given factor. In other words, the sum of ss loadings by row is not altered but the sum of ss loadings by column is changed to satisfy the conditions of the orthogonal rotation. The pattern matrix is composed of the correlations between the variables and the factors, that are factor loadings. Designed by Aj Sangdao 56 Factor rotations Oblique rotation allows the rotated factors are correlated, that are nonorthogonal (not perpendicular) to each other. It produces estimates of correlations among rotated factors. The factor structure matrix comprises variances and correlations between rotated factors. The pattern matrix comprises correlations between the variables and the factors. Several oblique rotation procedures are commonly used. Oblimin, Promax, Equamax, Quartimin, etc. Designed by Aj Sangdao 57 Assumptions for Factor Analysis Normality The normality assumption is important only if you wish to generalize the results of your analysis beyond the sample collected. Check the Skewness and Kurtosis statistics (Kline, 2011) Skewness statistic < 3 and Kurtosis statistic < 10 Depends on what estimation method is used. If the principal factor analysis is used, it does not assume multivariate normality. If the maximum likelihood estimate is used, it assumes multivariate normality. Designed by Aj Sangdao 58 Assumptions for Factor Analysis Linear relations Before conducting a factor analysis, we should check the inter-correlation between variables. If any variables do not correlate with any other variables, then they should be excluded before performing the factor analysis. Also, if the variables correlate too highly (extreme multicollinearity) or perfectly correlated (singularity), then the factor analysis may be not suitable for such data. A simple method is to calculate the correlation matrix, r -1 < r < +1 If an absolute of r is very close to zero, then there is less relationship among variables. If an absolute of r is very close to 1, then there is more relationship among variables. A positive sign indicates a directly proportional relationship. A negative sign indicates a inversely proportional relationship. Designed by Aj Sangdao 59 Assumptions for Factor Analysis Factorability: a degree of collinearity among the variables Bartlett test of sphericity Null Hypothesis: A correlation matrix of the data is an identity matrix (there are no relationships between the variables.). The identity matrix of size n is the n * n square matrix with ones on the main diagonal and zeros elsewhere. Bartlett's test approximates a chi-squared distribution. Very small values of significance (below 0.05) indicate the data is appropriate for factor analysis. Reference: Bartlett, M. S. (1954). A note on the multiplying factors for various chi square approximations. Journal of Royal Statistical Society, 16 (Series B), 296-298. Designed by Aj Sangdao 60 Assumptions for Factor Analysis Factorability: a degree of collinearity among the variables Kaiser-Meyer-Olkin (KMO) is a measure of sampling adequacy (MSA). Null hypothesis: a measure of how suited your data is for Factor Analysis. The test measures sampling adequacy for each variable in the model and for the complete model. The statistic is a measure of the proportion of variance among variables that might be common variance. That might be indicative of underlying or lalent common factors. The lower the proportion, the more suited your data is to Factor Analysis. Designed by Aj Sangdao 61 Assumptions for Factor Analysis Factorability: a degree of collinearity among the variables Kaiser-Meyer-Olkin (KMO) is a measure of sampling adequacy (MSA). KMO ranges between 0 and 1. A rule of thumb for interpreting the statistic: KMO values between 0.8 and 1 indicate the sampling is greatly adequate. KMO values between 0.5 and 0.7 indicate the sampling is moderately adequate. KMO values less than 0.5 indicate the sampling is not adequate. (may consider to collect more data or rethink about which variables to include.) A value of 0 indicates that the sum of partial correlations is large relative to the sum of correlations, indicating dispersion in the pattern of correlations. Factor analysis is thus likely to be inappropriate. A value of 1 indicates that the patterns of correlations are relatively compact and so factor analysis should yield distinct and reliable factors. Reference: Kaiser, H. 1974. An index of factor simplicity. Psychometrika. 39: 31-36. Designed by Aj Sangdao 62