Factor Analysis - Introduction • Factor analysis is used to draw inferences on unobservable quantities such as intelligence, musical ability, patriotism, consumer attitudes, that cannot be measured directly. • The goal of factor analysis is to describe correlations between p measured traits in terms of variation in few underlying and unobservable factors. • Changes across subjects in the value of one or more unobserved factors could affect the values of an entire subset of measured traits and cause them to be highly correlated. 654 Factor Analysis - Example • A marketing firm wishes to determine how consumers choose to patronize certain stores. • Customers at various stores were asked to complete a survey with about p = 80 questions. • Marketing researchers postulate that consumer choices are based on a few underlying factors such as: friendliness of personnel, level of customer service, store atmosphere, product assortment, product quality and general price level. • A factor analysis would use correlations among responses to the 80 questions to determine if they can be grouped into six sub-groups that reflect variation in the six postulated factors. 655 Orthogonal Factor Model • X = (X1, X2, . . . , Xp)0 is a p−dimensional vector of observable traits distributed with mean vector µ and covariance matrix Σ. • The factor model postulates that X can be written as a linear combination of a set of m common factors F1, F2, ..., Fm and p additional unique factors 1, 2, ..., p, so that X1 − µ1 = `11F1 + `12F2 + · · · + `1mFm + 1 X2 − µ2 = `21F1 + `22F2 + · · · + `2mFm + 2 , ... ... Xp − µp = `p1F1 + `p2F2 + · · · + `pmFm + p where `ij is called a factor loading or the loading of the ith trait on the jth factor. 656 Orthogonal Factor Model • In matrix notation: (X − µ)p×1 = Lp×mFm×1 + p×1, where L is the matrix of factor loadings and F is the vector of values for the m unobservable common factors. • Notice that the model looks very much like an ordinary linear model. Since we do not observe anything on the right hand side, however, we cannot do anything with this model unless we impose some more structure. The orthogonal factor model assumes that E(F) = 0, V ar(F) = E(FF0) = I, E() = 0, V ar() = E(0) = Ψ = diag{ψi}, i = 1, ..., p, and F, are independent, so that Cov(F, ) = 0. 657 Orthogonal Factor Model • Assuming that the variances of the factors are all one is not a restriction, as it can be achieved by properly scaling the factor loadings. • Assuming that the common factors are uncorrelated and the unique factors are uncorrelated are the defining restrictions of the orthoginal factor model. • The assumptions of the orthogonal factor model have implications for the structure of Σ. If (X − µ)p×1 = Lp×mFm×1 + p×1, then it follows that (X − µ)(X − µ)0 = (LF + )(LF + )0 = (LF + )((LF)0 + 0) = LFF0L0 + F0L0 + LF0 + 0. 658 Orthogonal Factor Model • Taking expectations of both sides of the equation we find that: Σ = E(X − µ)(X − µ)0 = LE(FF0)L0 + E(F0)L0 + LE(F0) + E(0) = LL0 + Ψ, since E(FF0) = V ar(F) = I and E(F0) = Cov(, F) = 0. Also, (X − µ)F0 = LFF0 + F0, Cov(X, F) = E[(X − µ)F0] = LE(FF0) + E(F0) = L. 659 Orthogonal Factor Model • Under the orthogonal factor model, therefore: 2 2 Var(Xi) = σii = `2 i1 + `i2 + · · · + `im + ψi Cov(Xi, Xk ) = σik = `i1`k1 + `i2`k2 + · · · + `im`km. • The portion of the variance σii that is contributed by the m common factors is the communality and the portion that is not explained by the common factors is called the uniqueness (or the specific variance): 2 + · · · + `2 + ψ = h2 + ψ . σii = `2 + ` i i i1 i2 im i 660 Orthogonal Factor Model • The model assumes that the p(p + 1)/2 variances and covariances of X can be reproduced from the pm + p factor loadings and the variances of the p unique factors. • Situations in which m is small relative to p is when factor analysis works best. If, for example, p = 12 and m = 2, the 78 elements of Σ can be reproduced from 2 × 12 + 12 = 36 parameters in the factor model. • Not all covariance matrices can be factored as LL0 + Ψ. 661 Example: No Proper Solution • Let p = 3 and m = 1 and suppose that the covariance of X1, X2, X3 is 1 0.9 0.7 Σ = 0.9 1 0.4 . 0.7 0.4 1 • The orthogonal factor model requires that Σ = LL0 + Ψ. Under the one factor model assumption, we get: 1 = `2 11 + ψ1 0.90 = `11`21 1 = `2 21 + ψ2 0.70 = `11`31 0.40 = `21`31 1 = `2 31 + ψ3 662 Example: No Proper Solution • From the equations above, we see that `21 = (0.4/0.7)`11. • Since 0.90 = `11`21, substituting for `21 implies that `2 11 = 1.575 or `11 = ±1.255. • Here is where the problem starts. Since (by assumption) Var(F1) = 1 and also Var(X1) = 1, and since therefore Cov(X1, F1) = Corr(X1, F1) = `11, we notice that we have a solution inconsistent with the assumptions of the model because a correlation cannot be larger than 1 or smaller than -1. 663 Example: No Proper Solution • Further, 2 1 = `2 11 + ψ1 −→ ψ1 = 1 − `11 = −0.575, which cannot be true because ψ = Var(1). Thus, for m = 1 we get a numerical solution that is not consistent with the model or with the interpretation of its parameters. 664 Rotation of Factor Loadings • When m > 1, there is no unique set of loadings and thus there is ambiguity associated with the factor model. • Consider any m × m orthogonal matrix T such that T T 0 = T 0T = I. We can rewrite our model: X − µ = LF + = LT T 0F + = L∗F∗ + , with L∗ = LT and F∗ = T 0F. 665 Rotation of Factor Loadings • Since E(F∗) = T 0E(F) = 0, and Var(F∗) = T0Var(F)T = T0T = I, it is impossible to distinguish between loadings L and L∗ from just a set of data even though in general they will be different. • Notice that the two sets of loadings generate the same covariance matrix Σ: 0 Σ = LL0 + Ψ = LT T 0L0 + Ψ = L∗L∗ + Ψ. 666 Rotation of Factor Loadings • How to resolve this ambiguity? • Typically, we first obtain the matrix of loadings (recognizing that it is not unique) and then rotate it by mutliplying by an orthogonal matrix. • We choose the orthogonal matrix using some desired criterion. For example, a varimax rotation of the factor loadings results in a set of loadings with maximum variability. • Other criteria for arriving at a unique set of loadings have also been proposed. • We consider rotations in more detail in a little while. 667 Estimation in Orthogonal Factor Models • We begin with a sample of size n of p−dimensional vectors x1, x2, ..., xn and (based on our knowledge of the problem) choose a small number m of factors. • For the chosen m, we want to estimate the factor loading matrix L and the specific variances in the model Σ = LL0 + Ψ • We use S as an estimator of Σ and first investigate whether the correlations among the p variables are large enough to justify the analysis. If rij ≈ 0 for all i, j, the unique factors ψi will dominate and we will not be able to identify common underlying factors. 668 Estimation in Orthogonal Factor Models • Common estimation methods are: – The principal component method – The iterative principal factor method – Maximum likelihood estimation (assumes normaility) • The last two methods focus on using variation in common factors to describe correlations among measured traits. Principal component analysis gives more attention to variances. • Estimated factor loadings from any of those methods can be rotated, as explained later, to facilitate interpretation of results. 669 The Principal Component Method • Let (λi, ei) denote the eigenvalues and eigenvectors of Σ and recall the spectral decomposition which establishes that Σ = λ1e1e01 + λ2e2e02 + · · · + λpepe0p. √ • Use L to denote the p×p matrix with columns equal to λiei, i = 1, ..., p. Then the spectral decomposition of Σ is given by Σ = LL0 + 0 = LL0, and corresponds to a factor model in which there are as many factors as variables m = p and where the specific variances ψi = 0. The loadings on the jth factor are just the q coefficients of the jth principal component multiplied by λj . 670 The Principal Component Method • The principal component solution just described is not interesting because we have as many factors as we have variables. • We really want m < p so that we can explain the covariance structure in the measurements using just a small number of underlying factors. • If the last p − m eigenvalues are small, we can ignore the last p − m terms in the spectral decomposition and write Σ ≈ Lp×mL0m×p. 671 The Principal Component Method • The communality for the i-th observed variable is the amount of its variance that can be atttributed to the variation in the m factors hi = m X `2 ij for i = 1, 2, ..., p j=1 • The variances ψi of the specific factors can then be taken to be the diagonal elements of the difference matrix Σ − LL0, where L is p × m. That is, ψi = σii − m X `2 ij for i = 1, ..., p j=1 or Ψ = diag(Σ − Lp×mL0m×p) 672 The Principal Component Method • Note that using m < p factors will produce an approximation Lp×mL0m×p + Ψ for Σ that exactly reproduces the variances of the p measured traits but only approximates the correlations. • If variables are measured in very different scales, we work with the standardized variables as we did when extracting PCs. This is equivalent to modeling the correlation matrix P rather than the covariance matrix Σ. 673 Principal Component Estimation • To implement the PC method, we must use S to estimate Σ (or use R if the observations are standardized) and use X̄ to estimate µ. • The principal component estimate of the loading matrix for the m factor model is q L̃ = [ λ̂1ê1 q λ̂2ê2 ... q λ̂mêm], where (λ̂i, êi) are the eigenvalues and eigenvectors of S (or of R if the observations are standardized). 674 Principal Component Estimation • The estimated specific variances are given by the diagonal elements of S − L̃L̃0, so Ψ̃ = ψ̃1 0 · · · 0 0 ψ̃2 · · · 0 ... ... ... ... 0 0 · · · ψ̃p , ψ̃i = sii − m X `˜2 ij . j=1 • Communalities are estimated as 2 + `˜2 + · · · + `˜2 . ˜ h̃2 = ` i i1 i2 im • If variables are standardized, then we substitute R for S and substitute 1 for each sii. 675 Principal Component Estimation • In many applications of factor analysis, m, the number of factors, is decided prior to the analysis. • If we do not know m, we can try to determine the ’best’ m by looking at the results from fitting the model with different values for m. • Examine how well the off-diagonal elements of S (or R) are reproduced by the fitted model L̃L̃0 + Ψ̃ because by definition of ψ̃i, diagonal elements of S are reproduced exactly but the off-diagonal elements are not. The chosen m is appropriate if the residual matrix S − (L̃L̃0 + Ψ̃) has small off-diagonal elements. 676 Principal Component Estimation • Another approach to deciding on m is to examine the contribution of each potential factor to the total variance. • The contribution of the k-th factor for the i-th trait, sii, is estimated as to the sample variance `˜2 ik . • The contribution of the k-th factor to the total sample variance s11 + s22 + ... + spp is estimated as q q 2 + ... + `˜2 = ( λ̂ ê )0 ( λ̂ ê ) = λ̂ . ˜ `˜2 + ` k k k k k pk 1k 2k 677 Principal Component Estimation • As in the case of PCs, in general Proportion of total sample variance due to jth factor ! λ̂j , = s11 + s22 + · · · + spp or equals λ̂j /p if factors are extracted from R. • Thus, a ’reasonable’ number of factors is indicated by the minimum number of PCs that explain a suitably large proportion of the total variance. 678 Strategy for PC Factor Analysis • First, center observations (and perhaps standardize) • If m is determined by subject matter knowledge, fit the m factor model by: – Extracting the p PCs from S or from R. – Constructing the p × m matrix of loadings L̃ by keeping the PCs associated to the largest m eigenvalues of S (or R). • If m is not known a priori, examine the estimated eignevalues to determine the number of factors that account for a suitably large proportion of the total variance and examine the off-diagonal elements of the residual matrix. 679 Example: Stock Price Data • Data are weekly gains in stock prices for 100 consecutive weeks for five companies: Allied Chemical, Du Pont, Union Carbide, Exxon and Texaco. • Note that the first three are chemical companies and the last two are oil companies. • The data are first standardized and the sample correlation matrix R is computed. • Fit an m = 2 orthogonal factor model 680 Example: Stock Price Data • The sample correlation matrix R is R= 1 0.58 0.51 0.39 1 0.60 0.39 1 0.44 1 0.46 0.32 0.42 0.52 1 . • Eigenvalues and eigenvectors of R are 2.86 0.81 λ̂ = 0.54 , 0.45 0.34 0.464 0.457 ê1 = 0.470 , 0.416 0.421 −0.241 −0.509 ê2 = −0.261 . 0.525 0.582 681 Example: Stock Price Data • Recall that the method √of principal components results in factor loadings equal to λ̂eˆj , so in this case Variable Allied Chem Du Pont Union Carb Exxon Texaco Loadings on factor 1 `˜i1 0.784 0.773 0.794 0.713 0.712 Loadings on factor 2 `˜i2 -0.217 -0.458 -0.234 0.472 0.524 Specific variances ψ̃i = 1 − h̃2 i 0.34 0.19 0.31 0.27 0.22 • The proportions of total variance accounted by the first and second factors are λ̃1/p = 0.571 and λ̃2/p = 0.162. 682 Example: Stock Price Data • The first factor appears to be a market-wide effect on weekly stock price gains whereas the second factor reflects industry specific effects on chemical and oil stock price returns. • The residual matrix is given by 0 R − (L̃L̃ + Ψ̃) = 0 −0.13 −0.16 −0.07 0.02 0 −0.12 0.06 0.01 0 −0.02 −0.02 . 0 −0.23 0 • By construction, the diagonal elements of the residual matrix are zero. Are the off-diagonal elements small enough? 683 Example: Stock Price Data • In this example, most of the residuals appear to be small, with the exception of the {4, 5} element and perhaps also the {1, 2}, {1, 3}, {2, 3} elements. • Since the {4, 5} element of the residual matrix is negative, we know that L̃L̃0 is producing a correlation value between Exxon and Texaco that is larger than the observed value. • When the off-diagonals in the residual matrix are not small, we might consider changing the number m of factors to see whether we can reproduce the correlations between the variables better. • In this example, we would probably not change the model. 684 SAS code: Stock Price Data /* This program is posted as stocks.sas. It performs factor analyses for the weekly stock return data from Johnson & Wichern. The data are posted as stocks.dat */ data set1; infile "c:\stat501\data\stocks.dat"; input x1-x5; label x1 = ALLIED CHEM x2 = DUPONT x3 = UNION CARBIDE x4 = EXXON x5 = TEXACO; run; 685 /* Compute principal components */ proc factor data=set1 method=prin scree nfactors=2 simple corr ev res msa nplot=2 out=scorepc outstat=facpc; var x1-x5; run; x1 x2 x3 x4 x5 ALLIED CHEM DUPONT UNION CARBIDE EXXON TEXACO Factor Pattern Factor1 Factor2 0.78344 -0.21665 0.77251 -0.45794 0.79432 -0.23439 0.71268 0.47248 0.71209 0.52373 686 R code: Stock Price Data # # # # # # # # # # # # # # # This code creates scatter plot matrices and does factor analysis for 100 consecutive weeks of gains in prices for five stocks. This code is posted as stocks.R The data are posted as stocks.dat There is one line of data for each week and the weekly gains are represented as x1 = ALLIED CHEMICAL x2 = DUPONT x3 = UNION CARBIDE x4 = EXXON x5 = TEXACO 687 stocks <- read.table("c:/stat501/data/stocks.dat", header=F, col.names=c("x1", "x2", "x3", "x4", "x5")) # Print the first six rows of the data file 1 2 3 4 5 6 stocks[1:6, ] x1 x2 x3 x4 x5 0.000000 0.000000 0.000000 0.039473 0.000000 0.027027 -0.044855 -0.003030 -0.014466 0.043478 0.122807 0.060773 0.088146 0.086238 0.078124 0.057031 0.029948 0.066808 0.013513 0.019512 0.063670 -0.003793 -0.039788 -0.018644 -0.024154 0.003521 0.050761 0.082873 0.074265 0.049504 688 # Create a scatter plot matrix of the standardized data stockss <- scale(stocks, center=T, scale=T) pairs(stockss,labels=c("Allied","Dupont","Carbide", "Exxon", "Texaco"), panel=function(x,y){panel.smooth(x,y) abline(lsfit(x,y),lty=2) }) 689 −2 0 1 2 3 ● ●●● ●● ●● ●●● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ●● ●● ● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ●● ● ● ●●● ●● ● ●●●● ●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ●●● ●●● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ●●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ●●● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●●●● −2 ● ● ●● ● ●● ●● ● ●● ● ●● ● ● ● ●● ● ●● ●● ●● ● ●●●● ●●● ● ● ● ●● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● Dupont ● ● ●●●● ●● ● ● ● ●● ● ●● ● ●●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●●● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●●● ● ●● ●● ● ● ● ● ●● ●● ●● ●● ● ● ● ●●●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●●● ● ● ●● ● ●●● ●●●●● ● ● ● ●●● ● ● ●●● ● ●● ●● ● ●● ●● ● ●●● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ●●●● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ●● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● −2 0 1 2 0 1 2 3 −2 ● ●● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ●●● ●●●●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ●●● ● ●● ● ●● ● ● ●● ●●● ● ●● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●●●●● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● Exxon ●● ● ●●● ● ● ●● ●●● ● ●● ●●● ● ● ● ●● ● ●● ● ● ● ●●●●●● ●● ● ●●●● ● ● ● ● ● ●●●●●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ●●●● ●●●● ● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ●●●● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● Carbide ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●●● ●●● ●●●● ●● ● ● ● ● ● ● ●●●● ● ●● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ● 2 ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ●● ● ● ●●●●● ●●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ●●●●● ● ● ●●● ● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● 1 −2 0 1 2 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ●●● ● ● ● ●● ● ● ●● ●●●● ●●●●● ● ● ●● ●●● ●● ●● ●● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ●● ● ● ●● ● ● ●● 3 ● ● ●● ● ●● ● ● ●●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●●●●●● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● 3 0 ● ● ● ● ●●●● ● ● ● ● ● ● ●● ●●● ● ●●● ● ●● ●●● ●● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ●●●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● 2 ● ● 2 Allied 1 1 ● ● ● ● ● ●●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●●● ● ●●●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● 0 −2 −2 Texaco 0 0 1 2 3 −2 −2 −2 0 1 2 3 690 # Compute principal components from the correlation matrix s.cor <- var(stockss) s.cor x1 x2 x3 x4 x5 x1 1.0000000 0.5769308 0.5086555 0.3867206 0.4621781 x2 0.5769308 1.0000000 0.5983817 0.3895188 0.3219545 x3 0.5086555 0.5983817 1.0000000 0.4361014 0.4256266 x4 0.3867206 0.3895188 0.4361014 1.0000000 0.5235293 x5 0.4621781 0.3219545 0.4256266 0.5235293 1.0000000 s.pc <- prcomp(stocks, scale=T, center=T) # List component coefficients 691 x1 x2 x3 x4 x5 s.pc$rotation PC1 PC2 PC3 PC4 PC5 0.4635414 -0.2408580 0.6133475 -0.3813591 0.4533066 0.4570769 -0.5090981 -0.1778906 -0.2113173 -0.6749813 0.4699796 -0.2605708 -0.3370501 0.6641056 0.3957057 0.4216766 0.5252677 -0.5390141 -0.4728058 0.1794471 0.4213290 0.5822399 0.4336136 0.3812200 -0.3874650 # # Estimate proportion of variation explained by each principal component,and cummulative proportions s <- var(s.pc$x) pvar<-round(diag(s)/sum(diag(s)), digits=6) pvar PC1 PC2 PC3 PC4 PC5 0.571298 0.161824 0.108008 0.090270 0.068600 cpvar <- round(cumsum(diag(s))/sum(diag(s)), digits=6) cpvar PC1 PC2 PC3 PC4 PC5 0.571298 0.733122 0.841130 0.931400 1.000000 692 # Plot component scores par(fin=c(5,5)) plot(s.pc$x[,1],s.pc$x[,2], xlab="PC1: Overall Market", ylab="PC2: Oil vs. Chemical",type="p") 693 2 ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● −1 0 ● −2 PC2: Oil vs. Chemical 3 ● −4 −2 0 2 PC1: Overall Market ● 4 Principal Factor Method • The principal factor method for estimating factor loadings can be viewed as an iterative modification of the principal components methods that allows for greater focus on explaining correlations among observed traits . • The principal factor approach begins with a guess about the communalities and then iteratively updates those estimates until some convergence criterion is satisfied. (Try several different sets of starting values.) • The principal factor method chooses factor loadings that more closely reproduce correlations. Principal components are more heavily influenced by accounting for variances and will explain a higher proportion of the total variance. 694 Principal Factor Method • Consider the estimated model for the correlation matrix. R ≈ L∗(L∗)0 + Ψ∗ • The estimated loading matrix should provide a good approximation for all of the correlations and part of the variances as follows (h∗1)2 r12 r13 · · · r1p r21 (h∗2)2 r23 · · · r2p ∗ ∗ 0 ∗ L (L ) ≈ R − Ψ ≈ ... ... ... . . . ... rp1 rp2 rp3 · · · (h∗p)2 where (h∗i )2 = 1 − Ψ∗i is the estimated communality for the i-th factor • Find a factor loading matrix L∗ so that L∗(L∗)0 is a good approximation for R − Ψ∗, rather than trying to make L∗(L∗)0 a good approximation for R. 695 Principal Factor Method • Start by obtaining initial estimates of the communalities, (h∗1)2, (h∗2)2, ..., (h∗p)2. • Then the estimated loading matrix should provide a good approximation to (h∗1)2 r12 r13 · · · r1p ∗ )2 r r21 (h r2p ∗ 23 · · · 2 R−Ψ ≈ ... ... ... . . . ... rp1 rp2 rp3 · · · (h∗p)2 where (h∗i )2 = 1 − Ψ∗i is the estimated communality for the i-th factor • Use the spectral decomposition of R − Ψ∗ to find a good approximation to R − Ψ∗. 696 Principal Factor Method • Use the initial estimates to compute (h∗1)2 r12 r13 · · · r1p r21 (h∗2)2 r23 · · · r2p ∗ R−Ψ ≈ ... ... ... . . . ... rp1 rp2 rp3 · · · (h∗p)2 The estimated loading matrix is obtained from the eignevalues and eigenvectors of this matrix as L∗ = q λ∗1e∗1 · · · · q λ∗me∗m • Update the specific variances Ψ∗i = 1 − m X λ∗j [e∗ij ]2 j=1 697 Principal Factor Method • Use the updated communalities to re-evaluate (h∗1)2 r12 r13 · · · r1p ∗ )2 r r21 (h r2p ∗ 23 · · · 2 R−Ψ ≈ ... ... ... . . . ... rp1 rp2 rp3 · · · (h∗p)2 and compute a new estimate of the loading matirx, L∗ = q λ∗1e∗1 · · · · q λ∗me∗m • Repeat this process until it converges. 698 Principal Factor Method • Note that R − Ψ∗ is generally not positive definite and some eignvalues can be negative. • The results are sensitive to the choice of the number of factors m • If m is too large, some communalities can be larger than one, which would imply that variation in factor values accounts for more than 100 percent of the variation in the measured traits. SAS has two options to deal with this: – HEYWOOD: Set any estimated communality larger than one equal to one and continue iterations with the remaining variables. – ULTRAHEYWOOD: Continue iterations with all of the variables and hope that iterations eventually bring you back into the allowable parameter space. 699 Stock Price Example • The estimated loadings from the principal factor method are displayed below. Variable Allied Chem Du Pont Union Carb Exxon Texaco Loadings on factor 1 `∗i1 0.70 0.71 0.72 0.62 0.62 Loadings on factor 2 `∗i2 -0.09 -0.25 -0.11 0.23 0.28 Specific variances ψi∗ = 1 − (h∗i )2 0.50 0.44 0.47 0.57 0.54 Communalities (h∗i )2 0.50 0.56 0.53 0.43 0.46 • The proportions of total variance accounted by the first and second factors are λ∗1/p = 0.45 and λ∗2/p = 0.04. 700 Stock Price Example • Interpretations are similar. The first factor appears to be a market-wide effect on weekly stock price gains whereas the second factor reflects industry specific effects on chemical and oil stock price returns. • The loadings tend to be smaller than those obtained from principal component analysis. Why? 701 Stock Price Example • The residual matrix is ∗ ∗ 0 ∗ R−[(L )(L ) +Ψ ] = 0 0.057 −0.005 −0.029 0.053 0 0.060 0.007 −0.049 0 0.015 0.010 . 0 0.073 0 • Since the principal factor methods give greater emphasis to describing correlations, the off-diagonal elements of the residual matrix tend to be smaller than corresponding elements for principal component estimation. 702 /* SAS code for the principal factor method */ proc factor data=set1 method=prinit scree nfactors=2 simple corr ev res msa nplot=2 out=scorepf outstat=facpf; var x1-x5; priors smc; run; To obtain the initial communaliy for the i-th variable, ”smc” in the priors statement uses the R2 value for the regression of the i-th variable on the other p − 1 variables in the analysis. 703 Maximum Likelihood Estimation • To implement the ML method, we need to include some assumptions about the distribution of the p-dimensional vector Xj and the m-dimensional vector Fj : Xj ∼ Np(µ, Σ), Fj ∼ Nm(0, Im), j ∼ Np(0, Ψp×p), where Xj = LFj + j , Σ = LL0 + Ψ, and Fj is independent of j . Also, Ψ is a diagonal matrix. • Because the L that maximizes the likelihood for this model is not unique, we need another set of restrictions that lead to a unique maximum of the likelihood function: L0Ψ−1L = ∆, where ∆ is a diagonal matrix. 704 Maximum Likelihood Estimation • Maximizing the likelihood function with respect to (µ, L, Ψ) is not easy because the likelihood depends in a very non-linear fashion on the parameters. • Recall that (L, Ψ) enter the likelihood through Σ which in turn enters the likelihood as an inverse matrix in a quadratic form and also as a determinant. • Efficient numerical algorithms exist to maximize the likelihood iteratively and obtain ML estimates L̂m×m = {`ˆij }, Ψ̂ = diag{ψ̂i}, µ̂ = x̄. 705 Maximum Likelihood Estimation • The MLEs of the communalities for the p variables (with m factors)are ĥ2 i = m X `ˆ2 ij , i = 1, ..., p, j=1 and the proportion of the total variance accounted for by the jth factor is given by Pp ˆ2 ` i=1 ij trace(S) if using S or dividing by p if using R. 706 Maximum Likelihood Estimation: Stock Prices • Results; Variable Allied Chem Du Pont Union Carb. Exxon Texaco Prop. of variance Loadings for factor 1 0.684 0.694 0.681 0.621 0.792 0.485 Loadings for factor 2 0.189 0.517 0.248 -0.073 -0.442 0.113 Specific variances 0.50 0.25 0.47 0.61 0.18 707 Maximum Likelihood Estimation: Stock Prices • The interpretation of the two factors remains more or less the same (although Exxon is hardly loading on the second factor now). • What has improved considerably is the residual matrix, since most of the entries are essentially zero now: 0 R − (L̂L̂ + Ψ̂) = 0 0.005 −0.004 −0.024 −0.004 0 −0.003 −0.004 0.000 0 0.031 −0.004 . 0 −0.000 0 708 Likelihood Ratio test for Number of Factors • We wish to test whether the m factor model appropriately describes the covariances among the p variables. • We test H0 : Σp×p = Lp×mL0m×p + Ψp×p versus Ha : Σ is a positive definite matrix • A likelihood ratio test for H0 can be constructed as the ratio of the maximum of likelihood function under H0 and the maximum of the likelihood function under no restrictions (i.e., under Ha). 709 Likelihood Ratio test for Number of Factors • Under Ha the multivariate normal likelihood is maximized at 1 Pn (x − x̄)(x − x̄)0 . µ̂ = x̄, and Σ̂ = n i i=1 i • Under H0, the multivariate normal likelihood is maximized at µ̂ = x̄ and L̂L̂0 + Ψ̂, the mle for Σ for the orthogonal factor model with m factors. 710 Likelihood Ratio test for Number of Factors • It is easy to show that the result is proportional to 1 0 −n/2 |L̂L̂ + Ψ̂| exp − ntr[(L̂L̂0 + Ψ̂)−1Σ̂] , 2 Then, the log-likelihood ratio statistic for testing H0 is −2 ln Λ = −2 ln !−n/2 0 |L̂L̂ + Ψ̂| |Σ̂| + n[tr((L̂L̂0 + Ψ̂)−1Σ̂) − p]. • Since the second term in the expression above can be shown to be zero, the test statistic simplifies to −2 ln Λ = n ln |L̂L̂0 + Ψ̂| |Σ̂| ! . 711 Likelihood Ratio test for Number of Factors • The degrees of freedom for the large sample chi-square approximation to this test statistic are (1/2)[(p − m)2 − p − m] Where does this come from? • Under Ha we estimate p means and p(p+1)/2 elements of Σ. • Under H0, we estimate p means p diagonal elements of Ψ and mp elements of L. • An additional set of m(m − 1)/2 restrictions are imposed on the mp elements of L by the identifiability restriction L0Ψ−1L = ∆ 712 Likelihood Ratio test for Number of Factors • Putting it all together, we have m(m − 1) 1 p(p + 1) ]−[p+pm+p− ] = [(p−m)2−p−m] df = [p+ 2 2 2 degrees of freedom. • Bartlett showed that the χ2 approximation to the sampling distribution of −2 ln Λ can be improved if the test statistic is computed as |L̂L̂0 + Ψ̂| −2 ln Λ = (n − 1 − (2p + 4m + 5)/6) ln . |Σ̂| 713 Likelihood Ratio test for Number of Factors • We reject H0 at level α if |L̂L̂0 + Ψ̂| −2 ln Λ = (n − 1 − (2p + 4m + 5)/6) ln > χ2 df , |Σ̂| 2 − p − m] for large n and large n − p. with df = 1 [(p − m) 2 1 (2p + 1 − √8p + 1). • To have df > 0, we must have m < 2 • If the data are not a random sample from a multivariate normal distribution, this test tends to indicate the need for too many factors. 714 Stock Price Data • We carry out the log-likelihood ratio test at level α = 0.05 to see whether the two-factor model is appropriate for these data. • We have estimated the factor loadings and the unique factors using R rather than S, but the exact same test statistic (with R in place of Σ̂) can be used for the test. • Since we evaluated L̂, Ψ̂ and R, we can easily compute 0.194414 |L̂L̂0 + Ψ̂| = = 1.0065. |R| 0.193163 715 Stock Price Data • Then, the test statistic is [100 − 1 − (10 + 8 + 5) ] ln(1.0065) = 0.58. 6 • Since 0.58 < χ2 1 = 3.84, we fail to reject H0 and conclude that the two-factor model adequately describes the correlations among the five stock prices (p-value=0.4482). • Another way to say it is that the data do not contradict the two-factor model. 716 Stock Price Data • Test the null hypothesis that one factor is sufficient (10 + 4 + 5) |L̂L̂0 + Ψ̂| [100 − 1 − ] ln = 15.49 6 |Σ̂| with 5 degrees of freedom. • Since p − value = 0.0085, we reject the null hypothesis and conclude that a one factor model is not sufficient to describe the correlations among the weekly returns for the five stocks. 717 Maximum Likelihood Estimation: Stock Prices • We implemented the ML method for the stock price example. • The same SAS program can be used, but now use ’method = ml’ instead of ’prinit’ in the proc factor statement. proc factor data=set1 method=ml scree nfactors=2 simple corr ev res msa nplot=2 out=scorepf outstat=facpf; var x1-x5; priors smc; run; 718 Initial Factor Method: Maximum Likelihood Prior Communality Estimates: SMC x1 0.43337337 x2 0.46787840 x3 0.44606336 x4 0.34657438 Preliminary Eigenvalues: Total = 3.56872 1 2 3 4 5 Eigenvalue 3.93731304 0.34602630 -.08251059 -.23621888 -.39588964 Difference 3.59128674 0.42853688 0.15370829 0.15967076 x5 0.37065882 Average = 0.713744 Proportion 1.1033 0.0970 -0.0231 -0.0662 -0.1109 Cumulative 1.1033 1.2002 1.1771 1.1109 1.0000 2 factors will be retained by the NFACTOR criterion. 719 Iteration Criterion Ridge Change 1 0.0160066 0.0000 0.4484 2 0.0060474 0.0000 0.1016 3 0.0060444 0.0000 0.0014 4 0.0060444 0.0000 0.0004 5 0.0060444 0.0000 0.0001 6 0.0060444 0.0000 0.0000 Communalities 0.49395 0.28915 0.50409 0.39074 0.50348 0.39046 0.50349 0.39041 0.50349 0.39038 0.50349 0.39037 0.74910 0.81911 0.74984 0.82178 0.74870 0.82316 0.74845 0.82357 0.74840 0.82371 0.74839 0.82375 0.52487 0.52414 0.52547 0.52555 0.52557 0.52557 Convergence criterion satisfied. 720 Significance Tests Based on 100 Observations H0: HA: H0: HA: Test No common factors At least one common factor 2 Factors are sufficient More factors are needed DF 10 Chi-Square 158.6325 Pr>ChiSq <.0001 1 0.5752 0.4482 Chi-Square without Bartlett’s Correction Akaike’s Information Criterion Schwarz’s Bayesian Criterion Tucker and Lewis’s Reliability Coefficient 0.5983972 -1.4016028 -4.0067730 1.0285788 Squared Canonical Correlations Factor1 0.88925225 Factor2 0.70421149 721 Eigenvalues of the Weighted Reduced Correlation Matrix: Total = 10.4103228 Average = 2.08206456 1 2 3 4 5 Eigenvalue 8.02952889 2.38079393 0.08339744 -.01201128 -.07138616 Difference 5.64873495 2.29739649 0.09540872 0.05937488 Proportion 0.7713 0.2287 0.0080 -0.0012 -0.0069 Cumulative 0.7713 1.0000 1.0080 1.0069 1.0000 Factor Pattern x5 x2 x1 x3 x4 TEXACO DUPONT ALLIED CHEM UNION CARBIDE EXXON Factor1 0.79413 0.69216 0.68315 0.68015 0.62087 Factor2 0.43944 -0.51894 -0.19184 -0.25092 0.07000 722 Variance Explained by Each Factor Factor Factor1 Factor2 Weighted 8.02952889 2.38079393 Unweighted 2.42450727 0.56706851 Final Communality Estimates and Variable Weights Total Communality: Weighted = 10.410323 Unweighted = 2.991576 Variable x1 x2 x3 x4 x5 Communality 0.50349452 0.74838963 0.52556852 0.39037462 0.82374847 Weight 2.01407634 3.97439996 2.10778594 1.64035160 5.67370899 723 Residual Correlations With Uniqueness on the Diagonal x1 x2 x3 x4 x5 ALLIED CHEM DUPONT UNION CARBIDE EXXON TEXACO x1 x2 x3 x4 x5 x1 0.49651 0.00453 -0.00413 -0.02399 0.00397 ALLIED CHEM DUPONT UNION CARBIDE EXXON TEXACO x2 0.00453 0.25161 -0.00261 -0.00389 0.00033 x4 -0.02399 -0.00389 0.03138 0.60963 -0.00028 x3 -0.00413 -0.00261 0.47443 0.03138 -0.00424 x5 0.00397 0.00033 -0.00424 -0.00028 0.17625 Root Mean Square Off-Diagonal Residuals: Overall = 0.01286055 x1 0.01253908 x2 0.00326122 x3 0.01602114 x4 0.01984784 x5 0.00291394 724 Maximum Likelihood Estimation with R stocks.fac <- factanal(stocks, factors=2, method=’mle’, scale=T, center=T) stocks.fac Call: factanal(x = stocks, factors = 2, method = "mle", scale=T, center=T) Uniquenesses: x1 x2 x3 x4 x5 0.497 0.252 0.474 0.610 0.176 725 Loadings: Factor1 x1 0.601 x2 0.849 x3 0.643 x4 0.365 x5 0.207 Factor2 0.378 0.165 0.336 0.507 0.884 Factor1 Factor2 SS loadings 1.671 1.321 Proportion Var 0.334 0.264 Cumulative Var 0.334 0.598 Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 0.58 on 1 degree of freedom. The p-value is 0.448 726 Measure of Sampling Adequacy • In constructing a survey or other type of instrument to examine ”factors” like political attitudes, social constructs, mental abilities, consumer confidence, that cannot be measured directly, it is common to include a set of questions or items that should vary together as the values of the factor vary across subjects. • In a job attitude assessment, for example, you might include questions of the following type in the survey: 1. How well do you like your job? 2. How eager are you to go to work in the morning? 3. How professionally satisfied are you? 4. What is your level of frustration with assignment to meaningless tasks? 727 Measure of Sampling Adequacy • If the responses to these and other questions had strong positive or negative correlations, you may be able to identify a ”job satisfaction” factor. • In marketing, education, and behavioral research, the existence of highly correlated resposnes to a battery of questions is used to defend the notion that an important factor exists and to justify the use of the survey form or measurement instrument. • Measures of sampling adequacy are based on correlations and partial correlations among responses to different questions (or items) 728 Measure of Sampling Adequacy • Kaiser (1970, Psychometrica) proposed a statistic called Measure of Sampling Adequacy (MSA). • MSA measures the relative sizes of the pairwise correlations to the partial correlations between all pairs of variables as follows: P P 2 j k6=j qjk M SA = 1 − P P 2 j k6=j rjk where rjk is the marginal sample correlation between variables j and k and qjk is the partial correlation between the two variables after accounting for all other variables in the set. 729 Measure of Sampling Adequacy • If the r’s are relatively large and the q’s are relatively small, MSA tends to 1. This indicates that more than two variables are changing together as the values of the ”factor” vary across subjects. • Side note: If p = 3 for example, q12 is given by r12 − r13r23 q q12 = . 2 2 (1 − r13)(1 − r23) 730 Measure of Sampling Adequacy • The MSA can take on values between 0 and 1 and Kaiser proposed the following guidelines: MSA range 0.9 to 1 0.8 to 0.9 0.7 to 0.8 Interpretation Marvelous data Meritorious data Middling MSA range 0.6 to 0.7 0.5 to 0.6 0.0 to 0.5 Interpretation Mediocre data Miserable Unacceptable • SAS computes MSAs for each of the p variables. Kaiser’s Measure of Sampling Adequacy: Overall MSA = 0.78185677 x1 0.80012154 x2 0.74270277 x3 0.80925875 x4 0.80426564 x5 0.75573970 731 Tucker and Lewis Reliability Coefficient • Tucker and Lewis (1973) Psychometrica, 38, pp 1-10. • From the mle’s for m factors compute – L̂m estimates of factor loadings – Ψ̂m = diag(R − L̂mL̂0m) −1/2 – Gm = Ψ̂m −1/2 diag(R − L̂mL̂0m)Ψ̂m 732 Tucker and Lewis Reliability Coefficient • gm(ij) is called the partial correlation between variables Xi and Xj controlling for the m common factors • Sum of the squared partial correlations controlling for the m common factors is Fm = • Mean square is XX 2 gm(ij) i<j Mm = Fm dfm where dfm = 0.5[(p − m)2 − p − m] are the degrees of freedom for the likelihood ratio test of the null hypothesis that m factors are sufficient 733 Tucker and Lewis Reliability Coefficient • The mean square for the model with zero common factors is M0 = PP 2 i<j rij p(p − 1)/2 • The reliability coefficient is ρ̂m = M0 − Mm M0 − n−1 m 2m is a Bartlett correction factor. − where nm = (n−1)− 2p+5 6 3 For the weekly stock gains data SAS reports Tucker and Lewis’s Reliability Coefficient 1.0285788 734 Cornbach’s Alpha • A set of observed variables X = (X1, X2, . . . , Xp)0 that all ”measure” the same latent trait should all have high positive pairwise correlations • Define PP PP 1 ˆ Cov(X , X ) i j i<j 2 p(p−1)/2 i<j Sij r̄ = = Pp 1 Pp ˆ p − 1 i=1 Sii p i=1 V ar(Xi ) • Cornbach’s Alpha is " # P p V ar(X ) pr̄ = 1− i P i α= 1 + (p − 1)r̄ p−1 V ar( i Xi) 735 Cornbach’s Alpha • For standardized variables, Pp i=1 V ar(Xi ) = p • If ρij = 1 for all i 6= j, then V ar( X i Xi ) = X V ar(Xi) + 2 XXq i 2 q X V ar(Xi) = i = p2 V ar(Xi)V ar(Xj ) i<j 736 Cornbach’s Alpha • In the extreme case where all pairwise correlations are 1 we have P p 1 i V ar(Xi ) = = P V ar( i Xi) p2 p and " " # # P p p V ar(X ) p α= 1− i P i = 1− 2 =1 p−1 V ar( i Xi) p−1 p • When r̄ = 0, then α = 0 • Be sure that the scores for all items are orientated in the same direction so all correlations are positive 737 Factor Rotation • As mentioned earlier, multiplying a matrix of factor loadings by any orthogonal matrix leads to the same approximation to the covariance (or correlation) matrix. • This means that mathematically, it does not matter whether we estimate the loadings as L̂ or as L̂∗ = L̂T where T 0T = T T 0 = I. • The estimated residual matrix matrix also remains unchanged: 0 0 0 0 ∗ ∗ Sn − L̂L̂ − Ψ̂ = Sn − L̂T T L̂ − Ψ̂ = Sn − L̂ L̂ − Ψ̂ • The specific variances ψ̂i and therefore the communalities also remain unchanged. 738 Factor Rotation • Since only the loadings change by rotation, we rotate factors to see if we can better interpret results. • There are many choices for a rotation matrix T , and to choose, we first establish a mathematical criterion and then see which T can best satisfy the criterion. • One possible objective is to have each one of the p variables load highly on only one factor and have moderate to negligible loads on all other factors. • It is not always possible to achieve this type of result. • This can be examined with a varimax rotation 739 Varimax Rotation • Define `˜∗ij = `ˆ∗ij /ĥi as the ”scaled” loading of the i-th variable on the j-th rotated factor. • Compute the variance of the squares of the scaled loadings for the j-th rotated factor 2 p p p p 1 X ∗ 2 1 X ∗ 2 1 X ∗ 4 1 X ∗ 2 (`˜ij ) − (`˜kj ) (`˜ij ) − (`˜kj ) = p i=1 p k=1 p i=1 p k=1 2 • The varimax procedure finds the orthogonal transformation of the loading matrix that maximizes the sum of those variances, summing across all m rotated factors. • After rotation each of the p variables should load highly on at most one of the rotated factors 740 Maximum Likelihood Estimation: Stock Prices • Varimax rotation of m = 2 factors Variable Allied Chem Du Pont Union Carb. Exxon Texaco Prop. of var. MLE factor 1 0.68 0.69 0.68 0.62 0.79 0.485 MLE factor 2 0.19 0.52 0.25 -0.07 -0.44 0.113 Rotated factor 1 0.60 0.85 0.64 0.36 0.21 0.334 Rotated factor 2 0.38 0.16 0.34 0.51 0.88 0.264 741 Quartimax Rotation • The varimax rotation will destroy an ”overall” factor • The quartimax rotation try to 1. Preserve an overall factor such that each of the p variables has a high loading on that factor 2. Create other factors such that each of the p variables has a high loading on at most one factor 742 Quartimax Rotation • Define `˜∗ij = `ˆ∗ij /ĥi as the ”scaled” loading of the i-th variable on the j-th rotated factor. • Compute the variance of the squares of the scaled loadings for the i-th variable 22 2 m m m m X X 1 X 1 X 1 1 ˜∗ 2 ∗ ∗ 4 (`˜ik ) = (`˜ij ) − (`˜∗ik )2 (`ij ) − m j=1 m k=1 m j=1 m k=1 • The quartimax procedure finds the orthogonal transformation of the loading matrix that maximizes the sum of those variances, summing across all p variables. 743 Maximum Likelihood Estimation: Stock Prices • Quartimax rotation of m = 2 factors Variable Allied Chem Du Pont Union Carb. Exxon Texaco Prop. of var. MLE factor 1 0.68 0.69 0.68 0.62 0.79 0.485 MLE factor 2 0.19 0.52 0.25 -0.07 -0.44 0.113 Rotated factor 1 0.71 0.83 0.72 0.56 0.60 0.477 Rotated factor 2 0.05 -0.26 -0.01 0.27 0.68 0.121 744 PROMAX Transformation • The varimax and quartimax rotations produce uncorrelated factors • PROMAX is a non-orthogonal (oblique) transformation that 1. is not a rotation 2. can produce correlated factors 745 PROMAX Transformation 1. First perform a varimax rotate to obtain loadings L∗ 2. Construct another p × m matrix Q such that qij = |`∗ij |k−1`∗ij = 0 for `∗ij 6= 0 for `∗ij = 0 where k > 1 is selected by trial and error, ususally k < 4. 3. Find a matrix U such that each column of L∗U is close to the corresponding column of Q. Choose the j-th column of U to minimize (qj − L∗uj )0(qj − L∗uj ) This yields U = [(L∗)0(L∗)]−1(L∗)0Q 746 PROMAX Transformation 1. Rescale U so that the transformed factors have unit variance. Compute D2 = diag[(U 0U )−1] and M = U D. 2. The PROMAX factors are obtained from L∗F = L∗M M −1F = LP FP • The PROMAX transformation yields factors with loadings LP = L∗M 747 PROMAX Transformation • The correlation martix for the transformed factors is (M 0M )−1 Derivation: V ar(FP ) = V ar(M −1F) = M −1V ar(F)(M −1)0 = M −1I(M −1)0 = (M 0M )−1 748 Maximum Likelihood Estimation: Stock Prices • PROMAX transformation of m = 2 factors Variable Allied Chem Du Pont Union Carb. Exxon Texaco MLE factor 1 0.68 0.69 0.68 0.62 0.79 MLE factor 2 0.19 0.52 0.25 -0.07 -0.44 PROMAX factor 1 0.56 0.90 0.62 0.26 -0.02 PROMAX factor 2 0.24 -0.08 0.18 0.45 0.92 Estimated correlation between the factors is 0.49 749 /* Varimax Rotation */ proc factor data=set1 method=ml nfactors=2 res msa mineigen=0 maxiter=75 conv=.0001 outstat=facml2 rotate=varimax reorder; var x1-x5; priors smc; run; proc factor data=set1 method=prin nfactors=3 mineigen=0 res msa maxiter=75 conv=.0001 outstat=facml3 rotate=varimax reorder; var x1-x5; priors smc; run; 750 The FACTOR Procedure Rotation Method: Varimax Orthogonal Transformation Matrix 1 2 1 0.67223 -0.74034 2 0.74034 0.67223 Rotated Factor Pattern x2 x3 x1 x5 x4 DUPONT UNION CARBIDE ALLIED CHEM TEXACO EXXON Factor1 0.84949 0.64299 0.60126 0.20851 0.36554 Factor2 0.16359 0.33487 0.37680 0.88333 0.50671 751 Variance Explained by Each Factor Factor Factor1 Factor2 Weighted 4.93345489 5.47686792 Unweighted 1.67367719 1.31789858 Final Communality Estimates and Variable Weights Total Communality: Weighted = 10.410323 Unweighted = 2.991576 Variable x1 x2 x3 x4 x5 Communality 0.50349452 0.74838963 0.52556852 0.39037462 0.82374847 Weight 2.01407634 3.97439996 2.10778594 1.64035160 5.67370899 752 /* Apply other rotations to the two factor solution */ proc factor data=facml2 nfactors=2 conv=.0001 rotate=quartimax reorder; var x1-x5; priors smc; run; 753 The FACTOR Procedure Rotation Method: Quartimax Orthogonal Transformation Matrix 1 2 1 0.88007 0.47484 2 -0.47484 0.88007 Rotated Factor Pattern x2 x3 x1 x4 x5 DUPONT UNION CARBIDE ALLIED CHEM EXXON TEXACO Factor1 0.82529 0.72488 0.70807 0.56231 0.60294 Factor2 -0.25940 -0.01060 0.04611 0.27237 0.67839 754 Variance Explained by Each Factor Factor Factor1 Factor2 Weighted 7.40558481 3.00473801 Unweighted 2.38765329 0.60392248 Final Communality Estimates and Variable Weights Total Communality: Weighted = 10.410323 Unweighted = 2.991576 Variable x1 x2 x3 x4 x5 Communality 0.50349452 0.74838963 0.52556852 0.39037462 0.82374847 Weight 2.01407634 3.97439996 2.10778594 1.64035160 5.67370899 755 proc factor data=facml2 nfactors=2 conv=.0001 rotate=promax reorder score; var x1-x5; priors smc; run; 756 The FACTOR Procedure Prerotation Method: Varimax Orthogonal Transformation Matrix 1 2 1 1.00000 -0.00000 2 0.00000 1.00000 x2 x3 x1 x5 x4 Rotated Factor Pattern Factor1 DUPONT 0.84949 UNION CARBIDE 0.64299 ALLIED CHEM 0.60126 TEXACO 0.20851 EXXON 0.36554 Factor2 0.16359 0.33487 0.37680 0.88333 0.50671 757 Variance Explained by Each Factor Factor Factor1 Factor2 Weighted 4.93345489 5.47686792 Unweighted 1.67367719 1.31789858 Final Communality Estimates and Variable Weights Total Communality: Weighted = 10.410323 Unweighted = 2.991576 Variable x1 x2 x3 x4 x5 Communality 0.50349452 0.74838963 0.52556852 0.39037462 0.82374847 Weight 2.01407634 3.97439996 2.10778594 1.64035160 5.67370899 758 The FACTOR Procedure Rotation Method: Promax (power = 3) Target Matrix for Procrustean Transformation x2 x3 x1 x5 x4 DUPONT UNION CARBIDE ALLIED CHEM TEXACO EXXON Factor1 1.00000 0.73686 0.64258 0.01281 0.21150 Factor2 0.00733 0.10691 0.16242 1.00000 0.57860 Procrustean Transformation Matrix 1 2 1 1.24806851 -0.3302485 2 -0.3149179 1.20534878 759 Normalized Oblique Transformation Matrix 1 2 1 1.11385973 -0.2810538 2 -0.3035593 1.10793795 Inter-Factor Correlations Factor1 Factor2 Factor1 1.00000 0.49218 Factor2 0.49218 1.00000 760 Rotated Factor Pattern (Standardized Regression Coefficients) x2 x3 x1 x5 x4 DUPONT UNION CARBIDE ALLIED CHEM TEXACO EXXON Factor1 0.90023 0.62208 0.56382 -0.01601 0.26475 Factor2 -0.07663 0.17583 0.23495 0.91538 0.45044 761 Reference Structure (Semipartial Correlations) x2 x3 x1 x5 x4 DUPONT UNION CARBIDE ALLIED CHEM TEXACO EXXON Factor1 0.78365 0.54152 0.49081 -0.01394 0.23046 Factor2 -0.06670 0.15306 0.20452 0.79683 0.39211 Variance Explained by Each Factor Eliminating Other Factors Factor Factor1 Factor2 Weighted 3.63219833 4.00599560 Unweighted 1.20154790 0.85839588 762 Factor Structure (Correlations) x2 x3 x1 x5 x4 DUPONT UNION CARBIDE ALLIED CHEM TEXACO EXXON Factor1 0.86252 0.70862 0.67946 0.43452 0.48644 Factor2 0.36645 0.48200 0.51245 0.90750 0.58074 Variance Explained by Each Factor Ignoring Other Factors Factor Factor1 Factor2 Weighted 6.40432722 6.77812448 Unweighted 2.13317989 1.79002787 763 Final Communality Estimates and Variable Weights Total Communality: Weighted = 10.410323 Unweighted = 2.991576 Variable x1 x2 x3 x4 x5 Communality 0.50349452 0.74838963 0.52556852 0.39037462 0.82374847 Weight 2.01407634 3.97439996 2.10778594 1.64035160 5.67370899 764 Scoring Coefficients Estimated by Regression Squared Multiple Correlations of the Variables with Each Factor Factor1 0.83601404 Factor2 0.84825886 Standardized Scoring Coefficients x2 x3 x1 x5 x4 DUPONT UNION CARBIDE ALLIED CHEM TEXACO EXXON Factor1 0.58435 0.21791 0.18991 0.02557 0.07697 Factor2 -0.01834 0.06645 0.08065 0.78737 0.11550 765 R code posted in stocks.R # Compute maximun likelihood estimates for factors stocks.fac <- factanal(stocks, factors=2, method=’mle’, scale=T, center=T) stocks.fac Call: factanal(x = stocks, factors = 2, method = "mle", scale = T, center = T) Uniquenesses: x1 x2 x3 x4 x5 0.497 0.252 0.474 0.610 0.176 766 Loadings: Factor1 x1 0.601 x2 0.849 x3 0.643 x4 0.365 x5 0.207 Factor2 0.378 0.165 0.336 0.507 0.884 Factor1 Factor2 SS loadings 1.671 1.321 Proportion Var 0.334 0.264 Cumulative Var 0.334 0.598 Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 0.58 on 1 degree of freedom. The p-value is 0.448 767 # Apply the Varimax rotation (this was the default) stocks.fac <- factanal(stocks, factors=2, rotation="varimax", method="mle", scores="regression") stocks.fac Call: factanal(x = stocks, factors = 2, scores = "regression", rotation = "varimax", method = "mle") Uniquenesses: x1 x2 x3 x4 x5 0.497 0.252 0.474 0.610 0.176 768 Loadings: Factor1 x1 0.601 x2 0.849 x3 0.643 x4 0.365 x5 0.207 Factor2 0.378 0.165 0.336 0.507 0.884 Factor1 Factor2 SS loadings 1.671 1.321 Proportion Var 0.334 0.264 Cumulative Var 0.334 0.598 Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 0.58 on 1 degree of freedom. The p-value is 0.448 769 # Compute the residual matrix x1 x2 x3 x4 x5 pred<-stocks.fac$loadings%*%t(stocks.fac$loadings) +diag(stocks.fac$uniqueness) resid <- s.cor - pred resid x1 x2 x3 x4 -8.4905e-07 4.5256e-03 -4.1267e-03 -2.3991e-02 3.9726e-03 4.5256e-03 1.8641e-07 -2.6074e-03 -3.8942e-03 3.2772e-04 -4.1267e-03 -2.6074e-03 7.4009e-08 3.1384e-02 -4.2413e-03 -2.3991e-02 -3.8942e-03 3.1384e-02 -1.7985e-07 -2.8193e-04 3.9726e-03 3.2772e-04 -4.2413e-03 -2.8193e-04 2.7257e-08 770 # List factor scores stocks.fac$scores Factor1 Factor2 1 -0.05976839 0.02677225 2 -1.22226529 1.44753390 3 1.62739601 2.47578797 . . . . . . 99 0.71440818 -0.02122186 100 -0.59289113 0.34344739 771 # # # # # # You could use the following code in Splus to apply the quartimax rotation, but this is not available in R # Apply the Promax rotation stocks.fac <- factanal(stocks, factors=2, rotation="quartimax", method="mle", scores="regression") stocks.fac promax(stocks.fac$loadings, m=3) 772 Loadings: Factor1 Factor2 x1 0.570 0.195 x2 0.965 -0.174 x3 0.639 0.125 x4 0.226 0.456 x5 -0.128 0.984 Factor1 Factor2 SS loadings 1.732 1.260 Proportion Var 0.346 0.252 Cumulative Var 0.346 0.598 $rotmat [,1] [,2] [1,] 1.2200900 -0.4410072 [2,] -0.4315772 1.2167132 773 # # > > + > > You could try to apply the promax rotation with the factanal function. It selects the power as m=4 stocks.fac <- factanal(stocks, factors=2, rotation="promax", method="mle", scores="regression") stocks.fac Call: factanal(x = stocks, factors = 2, scores = "regression", rotation = " Uniquenesses: x1 x2 x3 x4 x5 0.497 0.252 0.474 0.610 0.176 774 Loadings: Factor1 Factor2 x1 0.576 0.175 x2 1.011 -0.232 x3 0.653 x4 0.202 0.466 x5 -0.202 1.037 Factor1 Factor2 SS loadings 1.863 1.387 Proportion Var 0.373 0.277 Cumulative Var 0.373 0.650 Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 0.58 on 1 degree of freedom. The p-value is 0.448 775 Example: Test Scores • The sample correlation between p = 6 test scores collected from n = 220 students is given below: R= 1.0 0.439 0.410 0.288 1.0 0.351 0.354 1.0 0.164 1.0 0.329 0.320 0.190 0.595 1.0 0.248 0.329 0.181 0.470 0.464 1.0 . • An m = 2 factor model was fitted to this correlation matrix using ML. 776 Example: Test Scores • Estimated factor loadings and communalities are Variable Gaelic English History Arithmetic Algebra Geometry Loadings on factor 1 0.553 0.568 0.392 0.740 0.724 0.595 Loadings on factor 2 0.429 0.288 0.450 -0.273 -0.211 -0.132 Communalities ĥ2 i 0.490 0.406 0.356 0.623 0.569 0.372 777 Example: Test Scores • All variables load highly on the first factor. We call that a ’general intelligence’ factor. • Half of the loadings are positive and half are negative on the second factor. The positive loadings correspond to the ’verbal’ scores and the negative correspond to the ’math’ scores. • Correlations between scores on verbal tests and math tests tend to be lower than correlations among scores on math tests or correlations among scores on verbal tests, so this is a ’math versus verbal’ factor. • We plot the six loadings for each factor (`ˆi1, `ˆi2) on the original coordinate system and also on a rotated set of coordinates chosen so that one axis goes through the loadings (`ˆ41, `ˆ42) of the fourth variable on the two factors. 778 Factor Rotation for Test Scores 779 Varimax Rotation for Test Scores • Loadings for rotated factors using the varimax criterion are as follows: Variable Gaelic English History Arithmetic Algebra Geometry Loadings on F1∗ 0.232 0.321 0.085 0.770 0.723 0.572 Loadings on F2∗ 0.660 0.551 0.591 0.173 0.215 0.213 Communalities ĥ2 i 0.490 0.406 0.356 0.623 0.569 0.372 • F1∗ is primarily a ’mathematics ability factor’ and F2∗ is a ’verbal ability factor’. 780 PROMAX Rotation for Test Scores • F1∗ is more clearly a ’mathematics ability factor’ and F2∗ is a ’verbal ability factor’. Variable Gaelic English History Arithmetic Algebra Geometry Loadings on F1∗ 0.059 0.191 -0.084 0.809 0.743 0.575 Loadings on F2∗ -0.668 0.519 0.635 0.041 0.021 0.064 Communalities ĥ2 i 0.490 0.406 0.356 0.623 0.569 0.372 • These factors have correlation 0.505 • Code is posted as Lawley.sas and Lawley.R 781 Factor Scores • Sometimes, we require estimates of the factor values for each respondent in the sample. • For j = 1, ..., n, the estimated m−dimensional vector of factor scores is fˆj = estimate of values fj attained by Fj . • To estimate fj we act as if L̂ (or L̃) and Ψ̂ (or Ψ̃) are true values. • Recall the model (Xj − µ) = LFj + j . 782 Weighted Least Squares for Factor Scores • From the model above, we can try to minimize the weighted sum of squares p X 2 i = 0 Ψ−1 = (x − µ − Lf )0 Ψ−1 (x − µ − Lf ). i=1 ψi • The weighted least squares solution is fˆ = (L0Ψ−1L)−1L0Ψ−1(x − µ). • Substituting the MLEs L̂, Ψ̂, x̄ in place of the true values, we have fˆj = (L̂0Ψ̂−1L̂)−1L̂0Ψ̂−1(xj − x̄). 783 Weighted Least Squares for Factor Scores • When factor loadings are estimated with the PC method, ordinary least squares is sometimes used to get the factor scores. In this case f˜j = (L̃0L̃)−1L̃0(xj − x̄). • Implicitly, this assumes that the ψ̂i are equal or almost equal. q • Since L̃ = [ λ̂1ê1 q λ̂2ê2 ... f˜j = q λ̂mêm], we find that −1/2 0 λ̂1 ê1(xj − x̄) −1/2 0 λ̂2 ê2(xj − x̄) . ... −1/2 λ̂m ê0m(xj − x̄) 784 Weighted Least Squares for Factor Scores • Note that the factor scores estimated from the principal component solution are nothing more than the scores for −1/2 , the first m principal components scaled by λ̂i for i = 1, ..., m. • The estimated factor scores have zero mean and zero covariances. 785 Regression Method for Factor Scores • We again start from the model (X − µ) = LF + . • If the F, are normally distributed, we note that X −µ F ! ∼ Np+m(0, Σ∗), with Σ∗(p+m)×(p+m) = " Σ = LL0 + Ψ L L0 I # . • The conditional distribution of F given X = x is therefore also normal, with E(F |x) = L0Σ−1(x − µ) = L0(LL0 + Ψ)−1(x − µ) Cov(F |x) = I − L0Σ−1L = I − L0(LL0 + Ψ)−1L. 786 Regression Method for Factor Scores • An estimate of fj is obtained by substituting estimates for the unknown quantities in the expression above, so that fˆj = Ê(Fj |xj ) = L̂0(L̂L̂0 + Ψ̂)−1(xj − x̄). • Sometimes, to reduce the errors that may be introduced if the number of factors m is not quite appropriate, S is used in place of (L̂L̂0 + Ψ̂) in the expression for fˆj . • If loadings are rotated using T , the relationship between the resulting factor scores and the factor scores arising from unrotated loadings is fˆj∗ = T fˆj . 787 Factor Scores for the Stock Price Data • We estimated the value of the two factor scores for each of the 100 observations in the stock price example. Note that scatter plot shows approximate mean 0 and variance 1 and approximate bivariate normality. 788 Heywood Cases and Other Potential Problems • When loadings are estimated iteratively (as in the Principal Factor or ML methods) some problems may arise: – Some of the estimated eigenvalues of R − Ψ may become negative. – Some of the communalities may become larger than one. • Since communalities estimate variances, they must be positive. If any communality exceeds 1.0, the variance of the corresponding specific factor is negative. 789 Heywood Cases and Other Potential Problems • When can these things happen? 1. When we start with bad prior communality estimates for iterating. 2. When there are too many common factors. 3. When we do not have enough data to obtain stable estimates. 4. When the common factor model is just not appropriate for the data. • These problems can occur even in seemingly ’good’ datasets (see example below) and create serious problems when we try to interpret results. 790 Heywood Cases and Other Potential Problems • When a final communality estimate equals 1, we refer to this as a Heywood case (a solution on the boundary of the parameter space). • When a final communality estimate exceeds 1, we have an Ultra Heywood case. • An Ultra Heywood case implies that unique factor has negative variance, a huge red flag that something is wrong and that results from the factor analysis are invalid. 791 Heywood Cases and Other Potential Problems • There is some concern about the validity of results in Heywood cases (boundary solutions where the factors account for all of the variation in at least one trait). It is difficult to interpret results (for example, how can a confidence interval be constructed for a parameter at the boundary?) • Large sample results may not be valid for Heywood cases, in particular large sample chi-square approximations for a likelihood ratio tests (e.g. test for the number of factors). 792 Heywood Cases and Other Potential Problems • When iterative algorithms in SAS fail to converge, we can include a heywood or an ultra heywood option in the proc factor statement. – The heywood option fixes to 1 any communality that goes above 1 and continues iterating on the rest. – The ultraheywood option continues iterating on all variables, regardless of the size of the communalities, hoping that the soultion eventually comes back inside the parameter space (this rarely happens). • When factor loadings are estimated using the principal components method, none of these problems arise (assuming that R is of full column rank). 793 Example: track records for men and women • Use maximum liklelihood estimation to fit factor models to standardized observations on national track records for men and women in n = 54 countries (Table 8.6 and Table 1.9). • The dataset for women includes seven variables x1, ..., x7 that correspond to national records on 100m, 200m, 400m, 800m, 1500m, 3000m and marathon races. • The dataset for men includes eight variables x1, ..., x8 that correspond to national records on 100m, 200m, 400m, 800m, 1500m, 5000m, 10000m and marathon races. • The first three times are in seconds and the remaining times are in minutes. 794 Example: track records for men and women • First examine scatterplots and correlation matrices. • Test for the appropriate number of factors using likelihood ratio tests. • The HEYWOOD option is needed in the FACTOR procedure in SAS for the three factor model for both men and women. • SAS code to carry out these analyses is posted as records.sas and R code is posted as records.R on the course web page. 795 Example: track records for men • The scatter plot matrix reveals a couple of extreme countries with slow times but they conform to the correlation pattern. All relationships are approximately straight line relationships. • Results indicate that the two factor model may be reasonable (p-value=.0173). MLE’s for the factor loadings are inside the parameter space and the two rotated (varimax) factors are easy to interpret. • Since the MLE for the three factor solution is on the boundary of the parameter space, we are not justified in using a large sample chi-square approximation for the likelihood ratio test and the reported p-value (.2264) is unreliabale. 796 100m ● ●● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ●●●● ● ● ●●● ● ●● ●● ● ● ● ● ● ●● ● ●● ●● ●●●●● ● ●●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ●●●●●●●●● ● ● ●●● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●●● ●● ● ● ● ● ● ● ●●● ● ●●● ● ● ●● ● ●● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● −1 1 ● ● ● ● ● ● ●●●● ● ● ● ● ●● ●●●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● −1 1 ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ●● ● ●● ● ● ●● ●● ●●● ● ● ●●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ●● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ●● ●●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● 3 ● ● ● ● ●● ● ● ●●● ●●●● ●● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 3 ● 3 ● ● ● ● ●● ● ●● ● ●● ●● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● 800m ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ●● ● ● ● ● ● ● ● ●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●●●● ● ● ● ●●●● ● ●●● ●● ●● ● ● ●● ●●●●●●●●●● ● 400m ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ●● ● ● 3 ● ● ● ●●● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 200m ● 1 ● ● ● ● −1 ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● 3 1 −2 ● 3 2 ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● 1 ● −2 0 −1 ● 3 3 1 1 −2 −2 −1 1 3 ● ● ● ● ●●● ● ● ●●●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ● ●● ●●● ● ● ●●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●●● ● ● ●●● ● ● ●●● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ●●●●● ●● ●● ● ● ● ● ● ●●●●●● ● ●●●● ●●●●● ●● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● 1500m ● ●● ● ●● ●●●● ●● ●● ● ●●●● ● ●● ●● ● ● ● ● ● ● ●● ● ●●●● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 5000m ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ● ●●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 3 −1 1 ● ● ●● ●●● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●●● ●● −2 0 2 ● ● ●● ●●●●● ● ●●●● ● ●● ●●● ● ●● ●● ●● ●●●●● ●●●● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● −2 1 ● ● ● ●●● ● ● ●●● ● ● ●●●●● ●● ●● ●● ●●●● ● ● ● ● ●●●●●●● 3 ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● −1 1 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● 3 ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● 10000m −1 1 3 ● ● ● −1 1 ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●●●● ● ●● ●● ● ●● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● −1 1 marathon 3 797 Example: track records for men Race 100m 200m 400m 800m 1500m 5000m 10000m Marathon Varimax Factor 1 0.402 0.409 0.515 0.671 0.748 0.886 0.900 0.865 Varimax Factor 2 0.839 0.892 0.711 0.581 0.554 0.450 0.424 0.402 Promax Factor 1 0.097 0.068 0.300 0.568 0.682 0.915 0.946 0.910 Promax Factor 2 0.868 0.933 0.643 0.393 0.316 0.109 0.069 0.063 ĥ2 i 0.865 0.963 0.772 0.787 0.866 0.987 0.989 0.912 • The Promax factors have correlation 0.697 • There is a distance race factor and a sprint factor 798 Example: track records for women • The two factor model does not appear to fit for women (p-value<.0001). • The algorithm for fitting the three-factor model needed the Heywood option to converge to a boundary solution. (The reported p-value of 0.0295 for the likelihood ratio test is unreliable.) • Two varimax rotated factors have a similar interpretation to the rotated factors for men, but the marathon does not load so highly on the distance factor 799 Example: track records for women • Three varimax rotated factors allow for a sprint factor, a 1500m and 3000m factor, and a third factor for which the 800m and the marathon have relatively high loadings. • A four factor solution does not split up the loadings on the third factor for the 800m and marathon races. 800 −1 ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● −2 0 2 100m ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ●●●● ●●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●●●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●● ●●● ●● ●● ● ● ●●● ●●●● ● −1 1 ● ●● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● 5 ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ●●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● 3 ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ●● ●● ● ● ● ● ●●● ● ●●●● ●● ● ● ● ●● ● ●●● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ●●●●● ●● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● 3 −1 1 ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ●● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ●● ●● ● ● ● ●● ●● ●● ●● ● ● ●● ●●● ● ● ● ●● ● ● ●● ● ●●●● ●●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●●●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ●● ● ●●● ● ● ● ● ●●● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 800m ● ● ●● ● ●● ●●● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ●● ●● ●● ● ●● ●● ● ● ●● ●● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ●●● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ●●● ●●●● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● 3 ● ● 400m −1 ●● ● ● ● ●● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ●●● ● ● ● ●● ● ● ●● ● ● ● 1 3 ● 3 ● ● ● ● ● ● ●● ●●●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● 200m 1 ● 2 2 0 0 −2 −2 5 ● 3 ● ● ●●●● ●● ● ● ●● ● ●●● ● ●● ●● ●●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3000m ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● 1 ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ●● ●●● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● 1500m ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● 3 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ●●● ●● ● ●●● ● ●● ●● ● ● ● ●● ● ● −2 0 2 ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● −1 1 ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● 3 ● ●● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 1 ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 3 marathon 1 ● −1 −1 1 ● ● ●● ● ●● ●●● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ●● ● ● ●●● ● ●●● −1 ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●●●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● −1 1 3 801 Example: track records for women Race 100m 200m 400m 800m 1500m 3000m Marathon Varimax Factor 1 0.455 0.449 0.395 0.728 0.879 0.915 0.670 Varimax Factor 2 0.836 0.880 0.832 0.571 0.460 0.367 0.432 Promax Factor 1 0.151 0.120 0.075 0.639 0.891 0.985 0.662 Promax Factor 2 0.841 0.901 0.867 0.359 0.139 0.000 0.200 ĥ2 i 0.906 0.976 0.848 0.856 0.984 0.972 0.662 • The Promax factors have correlation 0.693 • There is a distance race factor and a sprint factor • When a third factor is included it reflects the three countries with very slow 800m and marathon times 802