PRINCIPAL COMPONENTS ANALYSIS AND FACTOR ANALYSIS Principal components analysis and Factor analysis are used to identify underlying constructs or factors that explain the correlations among a set of items. They are often used to summarize a large number of items with a smaller number of derived items, called factors. BASIC DEFINITIONS Communality – Denoted h2. It is the proportion of the variance of an item that is accounted for by the common factors in a factor analysis. The unique variance of an item is given by 1− h2 = Item Specific variance + Item error variance (random error). Eigenvalue - The standardized variance associated with a particular factor. The sum of the eigenvalues cannot exceed the number of items in the analysis, since each item contributes 1 to the sum of variances. Factor: A linear combination of items (in a regression sense, where the total test score is the dependent variable, and the items are the independent variables). Factor loading: For a given item in a given factor, this is the correlation between the vector of subjects’ responses to that item, with the vector of (subjects’) predicted scores, according to a regression equation treating the entire set of items as independent variables. The factor loading expresses the correlation of the item with the factor. The square of this factor loading indicates the proportion of variance shared by the item with the factor. Factor Pattern Matrix - A matrix containing the coefficients or "loadings" used to express the item in terms of the factors. This is the same as the “structure matrix” if the factors are orthogonal (uncorrelated). Factor Structure Matrix - A matrix containing the correlations of the item with each of the factors. This is the same as the pattern matrix if the factors are orthogonal, i.e., uncorrelated (principal components analysis). Rotated factor solution – A factor solution, where the axis of the factor plot is rotated for the purposes of uncovering a more meaningful pattern of item factor loadings. Scree plot: A plot of the obtained eigenvalue for each factor. CORRELATION MATRIX INFORMATION Reproduced - The estimated correlation matrix from the factor solution. Residuals (differences between observed and reproduced correlations) are also displayed. Anti-image - The anti-image correlation matrix contains the negatives of the partial correlation coefficients, and the anti-image covariance matrix contains the negatives of the partial covariances. Most of the offdiagonal elements should be small in a good factor model. KMO and Bartlett's test of sphericity - The Kaiser-Meyer-Olkin measure of sampling adequacy tests whether the partial correlations among items are small. Bartlett's test of sphericity tests whether the correlation matrix is an identity matrix, which would indicate that the factor model is inappropriate. 1 EXTRACTION METHODS IN FACTOR ANALYSIS Principal components refers to the principal components model, in which items are assumed to be exact linear combinations of factors. The Principal components method assumes that components (“factors”) are uncorrelated. It also assumes that the communality of each item sums to 1 over all components (factors), implying that each item has 0 unique variance. The remaining factor extraction methods allow the variance of each item to be composed to be a function of both item communality and nonzero unique item variance. The following are methods of Common Factor Analysis: Principal axis factoring uses squared multiple correlations as initial estimates of the communalities. These communalites are entered into the diagonals of the correlation matrix, before factors are extracted from this matrix. Maximum likelihood produces parameter estimates that are the most likely to have produced the observed correlations, if the sample is from a multivariate normal population. Minimum residual factor analysis extracts factors from the correlation matrix, ignoring the diagonal elements. Alpha factoring treats the items as a sample from the universe of possible items. It selects factors with the intent of maximizing coefficient alpha reliability. Image factoring is based on the concept of an "image" of an item, based on the multiple regression of one item (dependent variable) on all the other items (independent variables). Unweighted least squares minimizes the squared differences between the observed and reproduced correlation matrices. Generalized least squares also minimizes the squared differences between observed and reproduced correlations, weighting them by the uniqueness of the items involved. EXTRACTION CRITERIA You can either retain all factors whose eigenvalues exceed a specified value (e.g. > 1), or retain a specific number of factors. FACTOR ROTATION METHODS Varimax rotates the axis such that the two vertices remain 90 degrees (perpindicular) to each other. Assumes uncorrelated factors. Also referred to as “orthogonal” rotation. Oblique rotation (Direct Oblimin) rotates the axis such that the vertices can have any angle (e.g., other than 90 degrees). Allows factors to be correlated. One can specify Delta to control the extent to which factors can be correlated among themselves. Delta should be 0 or negative, with 0 yielding the most highly correlated factors and large negative numbers yielding nearly orthogonal solutions. 2 FACTOR SCORES A factor score coefficient matrix shows the coefficients by which items are multiplied to obtain factor scores. Regression Scores - Regression factor scores have a mean of 0 and variance equal to the squared multiple correlation between the estimated factor scores and the true factor values. They can be correlated even when factors are assumed to be orthogonal. The sum of squared discrepancies between true and estimated factors over individuals is minimized. Bartlett Scores - Bartlett factor scores have a mean of 0. The sum of squares of the unique factors over the range of items is minimized. Anderson-Rubin Scores - Anderson-Rubin factor scores are a modification of Bartlett scores to ensure orthogonality of the estimated factors. They have a mean of 0 and a standard deviation of 1. 3