Factor Analysis - Introduction

advertisement
Factor Analysis - Introduction
• Factor analysis is used to draw inferences on unobservable
quantities such as intelligence, musical ability, patriotism,
consumer attitudes, that cannot be measured directly.
• The goal of factor analysis is to describe correlations between
p measured traits in terms of variation in few underlying and
unobservable factors.
• Changes across subjects in the value of one or more
unobserved factors could affect the values of an entire
subset of measured traits and cause them to be highly
correlated.
654
Factor Analysis - Example
• A marketing firm wishes to determine how consumers choose
to patronize certain stores.
• Customers at various stores were asked to complete a survey
with about p = 80 questions.
• Marketing researchers postulate that consumer choices are
based on a few underlying factors such as: friendliness of
personnel, level of customer service, store atmosphere,
product assortment, product quality and general price level.
• A factor analysis would use correlations among responses to
the 80 questions to determine if they can be grouped into six
sub-groups that reflect variation in the six postulated factors.
655
Orthogonal Factor Model
• X = (X1, X2, . . . , Xp)0 is a p−dimensional vector of
observable traits distributed with mean vector µ and
covariance matrix Σ.
• The factor model postulates that X can be written as a linear
combination of a set of m common factors F1, F2, ..., Fm and
p additional unique factors 1, 2, ..., p, so that
X1 − µ1 = `11F1 + `12F2 + · · · + `1mFm + 1
X2 − µ2 = `21F1 + `22F2 + · · · + `2mFm + 2
,
...
...
Xp − µp = `p1F1 + `p2F2 + · · · + `pmFm + p
where `ij is called a factor loading or the loading of the ith
trait on the jth factor.
656
Orthogonal Factor Model
• In matrix notation: (X − µ)p×1 = Lp×mFm×1 + p×1, where L
is the matrix of factor loadings and F is the vector of values
for the m unobservable common factors.
• Notice that the model looks very much like an ordinary linear
model. Since we do not observe anything on the right hand
side, however, we cannot do anything with this model unless we impose some more structure. The orthogonal factor
model assumes that
E(F) = 0,
V ar(F) = E(FF0) = I,
E() = 0,
V ar() = E(0) = Ψ = diag{ψi}, i = 1, ..., p,
and F, are independent, so that Cov(F, ) = 0.
657
Orthogonal Factor Model
• Assuming that the variances of the factors are all one is not
a restriction, as it can be achieved by properly scaling the
factor loadings.
• Assuming that the common factors are uncorrelated and the
unique factors are uncorrelated are the defining restrictions
of the orthoginal factor model.
• The assumptions of the orthogonal factor model have
implications for the structure of Σ. If
(X − µ)p×1 = Lp×mFm×1 + p×1,
then it follows that
(X − µ)(X − µ)0 = (LF + )(LF + )0
= (LF + )((LF)0 + 0)
= LFF0L0 + F0L0 + LF0 + 0.
658
Orthogonal Factor Model
• Taking expectations of both sides of the equation we find
that:
Σ = E(X − µ)(X − µ)0
= LE(FF0)L0 + E(F0)L0 + LE(F0) + E(0)
= LL0 + Ψ,
since E(FF0) = V ar(F) = I and E(F0) = Cov(, F) = 0.
Also,
(X − µ)F0 = LFF0 + F0,
Cov(X, F) = E[(X − µ)F0] = LE(FF0) + E(F0) = L.
659
Orthogonal Factor Model
• Under the orthogonal factor model, therefore:
2
2
Var(Xi) = σii = `2
i1 + `i2 + · · · + `im + ψi
Cov(Xi, Xk ) = σik = `i1`k1 + `i2`k2 + · · · + `im`km.
• The portion of the variance σii that is contributed by the m
common factors is the communality and the portion that is
not explained by the common factors is called the uniqueness
(or the specific variance):
2 + · · · + `2 + ψ = h2 + ψ .
σii = `2
+
`
i
i
i1
i2
im
i
660
Orthogonal Factor Model
• The model assumes that the p(p + 1)/2 variances and
covariances of X can be reproduced from the pm + p factor
loadings and the variances of the p unique factors.
• Situations in which m is small relative to p is when factor
analysis works best. If, for example, p = 12 and m = 2, the
78 elements of Σ can be reproduced from 2 × 12 + 12 = 36
parameters in the factor model.
• Not all covariance matrices can be factored as LL0 + Ψ.
661
Example: No Proper Solution
• Let p = 3 and m = 1 and suppose that the covariance of
X1, X2, X3 is


1 0.9 0.7


Σ =  0.9 1 0.4  .
0.7 0.4 1
• The orthogonal factor model requires that Σ = LL0 + Ψ.
Under the one factor model assumption, we get:
1 = `2
11 + ψ1
0.90 = `11`21
1 = `2
21 + ψ2
0.70 = `11`31
0.40 = `21`31
1 = `2
31 + ψ3
662
Example: No Proper Solution
• From the equations above, we see that `21 = (0.4/0.7)`11.
• Since 0.90 = `11`21, substituting for `21 implies that `2
11 =
1.575 or `11 = ±1.255.
• Here is where the problem starts. Since (by assumption)
Var(F1) = 1 and also Var(X1) = 1, and since therefore
Cov(X1, F1) = Corr(X1, F1) = `11, we notice that we have a
solution inconsistent with the assumptions of the model because a correlation cannot be larger than 1 or smaller than
-1.
663
Example: No Proper Solution
• Further,
2
1 = `2
11 + ψ1 −→ ψ1 = 1 − `11 = −0.575,
which cannot be true because ψ = Var(1). Thus, for m = 1
we get a numerical solution that is not consistent with the
model or with the interpretation of its parameters.
664
Rotation of Factor Loadings
• When m > 1, there is no unique set of loadings and thus
there is ambiguity associated with the factor model.
• Consider any m × m orthogonal matrix T such that T T 0 =
T 0T = I. We can rewrite our model:
X − µ = LF + = LT T 0F + = L∗F∗ + ,
with L∗ = LT and F∗ = T 0F.
665
Rotation of Factor Loadings
• Since
E(F∗) = T 0E(F) = 0,
and Var(F∗) = T0Var(F)T = T0T = I,
it is impossible to distinguish between loadings L and L∗
from just a set of data even though in general they will be
different.
• Notice that the two sets of loadings generate the same covariance matrix Σ:
0
Σ = LL0 + Ψ = LT T 0L0 + Ψ = L∗L∗ + Ψ.
666
Rotation of Factor Loadings
• How to resolve this ambiguity?
• Typically, we first obtain the matrix of loadings (recognizing
that it is not unique) and then rotate it by mutliplying by an
orthogonal matrix.
• We choose the orthogonal matrix using some desired criterion. For example, a varimax rotation of the factor loadings
results in a set of loadings with maximum variability.
• Other criteria for arriving at a unique set of loadings have
also been proposed.
• We consider rotations in more detail in a little while.
667
Estimation in Orthogonal Factor Models
• We begin with a sample of size n of p−dimensional vectors
x1, x2, ..., xn and (based on our knowledge of the problem)
choose a small number m of factors.
• For the chosen m, we want to estimate the factor loading
matrix L and the specific variances in the model
Σ = LL0 + Ψ
• We use S as an estimator of Σ and first investigate whether
the correlations among the p variables are large enough to
justify the analysis. If rij ≈ 0 for all i, j, the unique factors
ψi will dominate and we will not be able to identify common
underlying factors.
668
Estimation in Orthogonal Factor Models
• Common estimation methods are:
– The principal component method
– The iterative principal factor method
– Maximum likelihood estimation (assumes normaility)
• The last two methods focus on using variation in common
factors to describe correlations among measured traits. Principal component analysis gives more attention to variances.
• Estimated factor loadings from any of those methods can
be rotated, as explained later, to facilitate interpretation of
results.
669
The Principal Component Method
• Let (λi, ei) denote the eigenvalues and eigenvectors of Σ and
recall the spectral decomposition which establishes that
Σ = λ1e1e01 + λ2e2e02 + · · · + λpepe0p.
√
• Use L to denote the p×p matrix with columns equal to λiei,
i = 1, ..., p. Then the spectral decomposition of Σ is given
by
Σ = LL0 + 0 = LL0,
and corresponds to a factor model in which there are as many
factors as variables m = p and where the specific variances
ψi = 0. The loadings on the jth factor are just the
q coefficients of the jth principal component multiplied by λj .
670
The Principal Component Method
• The principal component solution just described is not
interesting because we have as many factors as we have
variables.
• We really want m < p so that we can explain the covariance
structure in the measurements using just a small number of
underlying factors.
• If the last p − m eigenvalues are small, we can ignore the last
p − m terms in the spectral decomposition and write
Σ ≈ Lp×mL0m×p.
671
The Principal Component Method
• The communality for the i-th observed variable is the amount
of its variance that can be atttributed to the variation in the
m factors
hi =
m
X
`2
ij
for i = 1, 2, ..., p
j=1
• The variances ψi of the specific factors can then be taken to
be the diagonal elements of the difference matrix Σ − LL0,
where L is p × m. That is,
ψi = σii −
m
X
`2
ij
for i = 1, ..., p
j=1
or Ψ = diag(Σ − Lp×mL0m×p)
672
The Principal Component Method
• Note that using m < p factors will produce an approximation
Lp×mL0m×p + Ψ
for Σ that exactly reproduces the variances of the p measured
traits but only approximates the correlations.
• If variables are measured in very different scales, we work
with the standardized variables as we did when extracting
PCs. This is equivalent to modeling the correlation matrix
P rather than the covariance matrix Σ.
673
Principal Component Estimation
• To implement the PC method, we must use S to estimate
Σ (or use R if the observations are standardized) and use X̄
to estimate µ.
• The principal component estimate of the loading matrix for
the m factor model is
q
L̃ = [ λ̂1ê1
q
λ̂2ê2 ...
q
λ̂mêm],
where (λ̂i, êi) are the eigenvalues and eigenvectors of S (or
of R if the observations are standardized).
674
Principal Component Estimation
• The estimated specific variances are given by the diagonal
elements of S − L̃L̃0, so



Ψ̃ = 

ψ̃1 0 · · · 0
0 ψ̃2 · · · 0
...
...
...
...
0 0 · · · ψ̃p



,

ψ̃i = sii −
m
X
`˜2
ij .
j=1
• Communalities are estimated as
2 + `˜2 + · · · + `˜2 .
˜
h̃2
=
`
i
i1
i2
im
• If variables are standardized, then we substitute R for S and
substitute 1 for each sii.
675
Principal Component Estimation
• In many applications of factor analysis, m, the number of
factors, is decided prior to the analysis.
• If we do not know m, we can try to determine the ’best’ m
by looking at the results from fitting the model with different
values for m.
• Examine how well the off-diagonal elements of S (or R) are
reproduced by the fitted model L̃L̃0 + Ψ̃ because by definition
of ψ̃i, diagonal elements of S are reproduced exactly but the
off-diagonal elements are not. The chosen m is appropriate
if the residual matrix
S − (L̃L̃0 + Ψ̃)
has small off-diagonal elements.
676
Principal Component Estimation
• Another approach to deciding on m is to examine the
contribution of each potential factor to the total variance.
• The contribution of the k-th factor
for the i-th trait, sii, is estimated as
to the sample variance
`˜2
ik .
• The contribution of the k-th factor to the total sample
variance s11 + s22 + ... + spp is estimated as
q
q
2 + ... + `˜2 = ( λ̂ ê )0 ( λ̂ ê ) = λ̂ .
˜
`˜2
+
`
k k
k k
k
pk
1k
2k
677
Principal Component Estimation
• As in the case of PCs, in general
Proportion of total sample
variance due to jth factor
!
λ̂j
,
=
s11 + s22 + · · · + spp
or equals λ̂j /p if factors are extracted from R.
• Thus, a ’reasonable’ number of factors is indicated by
the minimum number of PCs that explain a suitably large
proportion of the total variance.
678
Strategy for PC Factor Analysis
• First, center observations (and perhaps standardize)
• If m is determined by subject matter knowledge, fit the m
factor model by:
– Extracting the p PCs from S or from R.
– Constructing the p × m matrix of loadings L̃ by keeping
the PCs associated to the largest m eigenvalues of S (or
R).
• If m is not known a priori, examine the estimated
eignevalues to determine the number of factors that
account for a suitably large proportion of the total
variance and examine the off-diagonal elements of the
residual matrix.
679
Example: Stock Price Data
• Data are weekly gains in stock prices for 100 consecutive
weeks for five companies: Allied Chemical, Du Pont, Union
Carbide, Exxon and Texaco.
• Note that the first three are chemical companies and the last
two are oil companies.
• The data are first standardized and the sample correlation
matrix R is computed.
• Fit an m = 2 orthogonal factor model
680
Example: Stock Price Data
• The sample correlation matrix R is




R=



1 0.58 0.51 0.39
1
0.60 0.39
1
0.44
1
0.46
0.32
0.42
0.52
1




.



• Eigenvalues and eigenvectors of R are

2.86



 0.81 



λ̂ =  0.54 
,
 0.45 


0.34

0.464



 0.457 



ê1 =  0.470 
,
 0.416 


0.421

−0.241



 −0.509 



ê2 =  −0.261 
.
 0.525 


0.582
681
Example: Stock Price Data
• Recall that the method √of principal components results in
factor loadings equal to λ̂eˆj , so in this case
Variable
Allied Chem
Du Pont
Union Carb
Exxon
Texaco
Loadings
on factor 1
`˜i1
0.784
0.773
0.794
0.713
0.712
Loadings
on factor 2
`˜i2
-0.217
-0.458
-0.234
0.472
0.524
Specific
variances
ψ̃i = 1 − h̃2
i
0.34
0.19
0.31
0.27
0.22
• The proportions of total variance accounted by the first and
second factors are λ̃1/p = 0.571 and λ̃2/p = 0.162.
682
Example: Stock Price Data
• The first factor appears to be a market-wide effect on weekly
stock price gains whereas the second factor reflects industry
specific effects on chemical and oil stock price returns.
• The residual matrix is given by




0
R − (L̃L̃ + Ψ̃) = 




0 −0.13 −0.16 −0.07 0.02

0
−0.12 0.06
0.01 

0
−0.02 −0.02 
.
0
−0.23 

0
• By construction, the diagonal elements of the residual matrix
are zero. Are the off-diagonal elements small enough?
683
Example: Stock Price Data
• In this example, most of the residuals appear to be small,
with the exception of the {4, 5} element and perhaps also
the {1, 2}, {1, 3}, {2, 3} elements.
• Since the {4, 5} element of the residual matrix is negative,
we know that L̃L̃0 is producing a correlation value between
Exxon and Texaco that is larger than the observed value.
• When the off-diagonals in the residual matrix are not small,
we might consider changing the number m of factors to see
whether we can reproduce the correlations between the variables better.
• In this example, we would probably not change the model.
684
SAS code: Stock Price Data
/*
This program is posted as stocks.sas. It performs
factor analyses for the weekly stock return data from
Johnson & Wichern. The data are posted as stocks.dat */
data set1;
infile "c:\stat501\data\stocks.dat";
input x1-x5;
label x1 = ALLIED CHEM
x2 = DUPONT
x3 = UNION CARBIDE
x4 = EXXON
x5 = TEXACO;
run;
685
/* Compute principal components */
proc factor data=set1 method=prin scree nfactors=2
simple corr ev res msa nplot=2
out=scorepc outstat=facpc;
var x1-x5;
run;
x1
x2
x3
x4
x5
ALLIED CHEM
DUPONT
UNION CARBIDE
EXXON
TEXACO
Factor Pattern
Factor1
Factor2
0.78344
-0.21665
0.77251
-0.45794
0.79432
-0.23439
0.71268
0.47248
0.71209
0.52373
686
R code: Stock Price Data
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
This code creates scatter plot matrices and does
factor analysis for 100 consecutive weeks of gains
in prices for five stocks. This code is posted as
stocks.R
The data are posted as
stocks.dat
There is one line of data for each week and the
weekly gains are represented as
x1 = ALLIED CHEMICAL
x2 = DUPONT
x3 = UNION CARBIDE
x4 = EXXON
x5 = TEXACO
687
stocks <- read.table("c:/stat501/data/stocks.dat", header=F,
col.names=c("x1", "x2", "x3", "x4", "x5"))
# Print the first six rows of the data file
1
2
3
4
5
6
stocks[1:6, ]
x1
x2
x3
x4
x5
0.000000 0.000000 0.000000 0.039473 0.000000
0.027027 -0.044855 -0.003030 -0.014466 0.043478
0.122807 0.060773 0.088146 0.086238 0.078124
0.057031 0.029948 0.066808 0.013513 0.019512
0.063670 -0.003793 -0.039788 -0.018644 -0.024154
0.003521 0.050761 0.082873 0.074265 0.049504
688
# Create a scatter plot matrix of the standardized data
stockss <- scale(stocks, center=T, scale=T)
pairs(stockss,labels=c("Allied","Dupont","Carbide", "Exxon",
"Texaco"), panel=function(x,y){panel.smooth(x,y)
abline(lsfit(x,y),lty=2) })
689
−2
0
1
2
3
●
●●●
●●
●● ●●●
●
● ● ● ● ●●
●
●●● ●
● ●
●●
●
●●
●● ●
●●●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●●●
●
●
●
●
●
● ●●●
●●
●
●
●●●
●●
● ●●●●
●●
●
● ●
●●
●
●●
●
●
●
● ●● ●
●
●
●
● ●●
●● ●●
●●
●
●●
●
●
●
●
●
●
●
●● ●●
●●●
●●●
●● ●
●●
● ● ●●●
●
● ●● ● ●
●
●
●●● ●
●●
●● ● ●●
●
●
●
●
●
●
● ● ●
●●●
●●● ● ●
●
● ●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●
● ●●● ●
●● ● ●
● ●
● ●
●●
●●●●
● ●
●
● ●●
●
●
●●
●●
●●
●
●
●●● ●● ●
●
●●
●
●●
●
●
●
●●
●
●
● ● ●
●
●
●
●● ●● ●● ●
●
● ● ●● ●
●●
●
●
● ●
●●●●
−2
●
●
●●
● ●● ●● ● ●●
●
●●
●
●
●
●●
●
●●
●● ●●
● ●●●●
●●●
●
●
●
●●
● ●
●
●●●
● ●
●
●●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
● ● ●●●
●
●
●
● ● ●● ●
●
●●
●
●
Dupont
●
●
●●●●
●●
● ●
●
●●
●
●● ●
●●● ●● ●
●
● ●●
●
●
● ●● ●
●
●
●
●
●
●● ●● ●
●●
●●
●
●
●●
● ●
●●●
●
●
●
●
● ●
●●
●●
●
●●●
●●
● ●
●
●
● ●
● ●
●●
●
● ●●
●
●
●
●●
●●
●
●
●
●
● ●
●
●● ● ● ●
●●
●●
● ●●●
●●● ●
●
●
●
● ●●
●● ●
●●
●
● ●●●
●
●● ●● ● ●
●
●
●● ●●
●●
●● ●
●
●
●●●●
●●
●
● ●●
●
●
●
●
● ●●
●
● ●
●
● ●
●●
●
● ●
●
●
● ● ●
●
●
●
0 1 2 3
●
●
●
●
●
●
●
● ●
●
● ●
●
●
● ●●●● ●●●
●
●
●●
● ●●● ●●●●●
●
●
●
●●●
●
●
●●● ●
●●
●●
●
●● ●●
●
●●●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●●
● ● ●
●
● ●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ● ●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●●
● ●
●
●
●
●●●● ● ●
●●
●● ●●
●
●
●●
●
●
●
●
●
●
●
●●●
●● ● ●
●●
● ●● ● ● ●
●●●
● ●● ●●
●
●
●
●
●
● ●
● ●
−2
0
1
2
0 1 2 3
−2
●
●● ● ● ●
●●●●●
● ●
●
● ●
●
● ●
●
●●●
●●●●● ●
●
●
● ●●●●
●
●
●
●
●
●
● ●
●●
●
●
●●
● ●●●
●
●
●
●●●
● ●● ●
●●
●
●
●●
●●●
●
●● ●●●
●
●●● ● ●
●
●
●
● ●
●
●
● ●●
●
●
●●
●
●
●
● ● ●
● ●
●
●● ●● ●●●
●●
● ●
●
●● ● ● ● ● ●
●
● ●●
●
●●
● ●●●●●
●
●
●
●●
●
●
●●
●
●● ●●
● ●
●
●●
●
●●●
●
●
●
●●
●
●
●
●● ● ●
●●
●
●
●
●
●
● ●
●
●
● ●●
●
●
● ● ● ●●●
●
●
●
● ● ●
●●
●● ● ●
●
●
●●
● ● ●●
● ●●
●
●
●●
●●
● ● ●●
●
●
●
●●
●
●●
● ●●
● ● ●
●●● ●
●
● ●●
● ●
●●● ● ●
●
●●
● ●
●
●●●
●
●
● ●
●
●
●
●●
Exxon
●●
●
●●●
●
●
●● ●●●
●
●● ●●● ●
● ● ●●
● ●●
●
● ● ●●●●●●
●● ●
●●●● ●
● ●
●
●
●●●●●● ●●
●●
● ●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●● ●●
●● ●
●
●
●
● ●●
●
●
●
●
●
●
●
● ●
● ●
●●
●●
● ●
●●
●
●
●
●
● ●●
● ●
●● ●
●
●
●
●
●●●
●
● ●
● ●
●
● ●
●
●
●●
● ●● ●●●
●
●
●
●
●● ●●
● ●●
●
●
●
●● ●
●
●
●● ●
●●
● ● ●●
●
●
●
●
●
● ●● ●
●
●
●● ●
●
● ●● ●●● ●
● ●●
●
●
● ●● ●
● ●
● ● ●
●
●●
●
●
●●
● ●●
●●
●
●
●
●●●●
●●●● ●
●● ●
●
● ●● ●
● ● ●
●
●●●
● ●●●●
●
●
●●●
●
● ●
●● ●
●
●
● ●●
●
●
●
●
●
Carbide
●
●
●
● ●
●
●
●
●
●
●
●
● ● ●
●
●
● ● ● ●●
● ●
●
●
●
●●●
●
●
●●●●●
●●
●
●
●
●
●
●
●
●
●● ●● ● ● ●
●● ●●●
●●● ●●●● ●●
●
●
●
●
●
●
●●●● ●
●●
●●
●
● ● ● ● ●●
● ●●
●●
●
●
●●
● ●
2
●
●
● ●
●
●
●
●
● ●
●●
●
●
●●
●● ●
●
●●
●●
●
● ●●●●●
●●●
●
●● ●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
● ●● ● ●● ● ●
●
●●
● ●●
●●●●●
● ●
●●●
● ●●●● ●
●
●● ●
●
●
●
●
●● ● ●
●
●
1
−2
0 1 2 3
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●● ● ●●● ●●
●
●●●
●
●
●
●●
● ●
●●
●●●●
●●●●●
●
●
●●
●●●
●●
●●
●● ●
●●●
●
●●
●
●
●
●
●
●
●●
● ●●●
● ●
●●
● ●●
●
●
●●
●
●
●●
3
●
●
●●
● ●●
●
●
●●●
● ● ● ●● ● ●
●● ●● ●
●
●
●
●
●●
●
●● ● ●
●●●
●
●
●● ● ●
●
●
●
●
●
●●●●●●
●
●
●●
●
●
●
●
●
●●● ●
●● ●
●
●
●
●
● ●●
●
●
●
●
● ● ●●●●
●
●
● ●
●
● ●
●
●
●
3
0
●
●
●
● ●●●● ● ●
● ●
● ● ●● ●●●
● ●●● ●
●●
●●● ●●
●
●
●●● ●
●●
●
●
●
●
●
●●
● ● ●●●●
●
● ●●
●
●
●●
● ●
●
●
●
●● ● ●
●
●
● ●
●
●
●
●●●●
●
●
●
● ●●
●
● ●
2
●
●
2
Allied
1
1
●
●
● ●
● ●●●● ● ●
●●
● ● ●
●
●
● ●● ● ●
● ●
● ●● ●
●●●
●
●
●
● ●●●
●
●●●● ●
●●●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●
●●●
●
●
● ●
●
●
●
●
●●
● ●●
●
●
●●
●
●
●
●●
● ●
●
0
−2
−2
Texaco
0
0 1 2 3
−2
−2
−2
0
1
2
3
690
#
Compute principal components from the correlation matrix
s.cor <- var(stockss)
s.cor
x1
x2
x3
x4
x5
x1
1.0000000
0.5769308
0.5086555
0.3867206
0.4621781
x2
0.5769308
1.0000000
0.5983817
0.3895188
0.3219545
x3
0.5086555
0.5983817
1.0000000
0.4361014
0.4256266
x4
0.3867206
0.3895188
0.4361014
1.0000000
0.5235293
x5
0.4621781
0.3219545
0.4256266
0.5235293
1.0000000
s.pc <- prcomp(stocks, scale=T, center=T)
#
List component coefficients
691
x1
x2
x3
x4
x5
s.pc$rotation
PC1
PC2
PC3
PC4
PC5
0.4635414 -0.2408580 0.6133475 -0.3813591 0.4533066
0.4570769 -0.5090981 -0.1778906 -0.2113173 -0.6749813
0.4699796 -0.2605708 -0.3370501 0.6641056 0.3957057
0.4216766 0.5252677 -0.5390141 -0.4728058 0.1794471
0.4213290 0.5822399 0.4336136 0.3812200 -0.3874650
#
#
Estimate proportion of variation explained by each
principal component,and cummulative proportions
s <- var(s.pc$x)
pvar<-round(diag(s)/sum(diag(s)), digits=6)
pvar
PC1
PC2
PC3
PC4
PC5
0.571298 0.161824 0.108008 0.090270 0.068600
cpvar <- round(cumsum(diag(s))/sum(diag(s)), digits=6)
cpvar
PC1
PC2
PC3
PC4
PC5
0.571298 0.733122 0.841130 0.931400 1.000000
692
#
Plot component scores
par(fin=c(5,5))
plot(s.pc$x[,1],s.pc$x[,2],
xlab="PC1: Overall Market",
ylab="PC2: Oil vs. Chemical",type="p")
693
2
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
● ●●
●
●● ●
●
● ●
● ●● ●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
● ● ●● ● ●
●
●
●
●
●●
●● ● ●●
●
●
●
●
● ●
● ●
●
●
●
●●
●
●●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
1
●
●
−1
0
●
−2
PC2: Oil vs. Chemical
3
●
−4
−2
0
2
PC1: Overall Market
●
4
Principal Factor Method
• The principal factor method for estimating factor loadings
can be viewed as an iterative modification of the principal
components methods that allows for greater focus on
explaining correlations among observed traits .
• The principal factor approach begins with a guess about the
communalities and then iteratively updates those estimates
until some convergence criterion is satisfied. (Try several
different sets of starting values.)
• The principal factor method chooses factor loadings that
more closely reproduce correlations. Principal components
are more heavily influenced by accounting for variances and
will explain a higher proportion of the total variance.
694
Principal Factor Method
• Consider the estimated model for the correlation matrix.
R ≈ L∗(L∗)0 + Ψ∗
• The estimated loading matrix should provide a good approximation for all of the correlations and part of the variances
as follows
(h∗1)2 r12 r13 · · · r1p

 r21
(h∗2)2 r23 · · · r2p
∗
∗
0
∗

L (L ) ≈ R − Ψ ≈ 
...
...
...
. . . ...

rp1
rp2 rp3 · · · (h∗p)2







where (h∗i )2 = 1 − Ψ∗i is the estimated communality for the
i-th factor
• Find a factor loading matrix L∗ so that L∗(L∗)0 is a good
approximation for R − Ψ∗, rather than trying to make L∗(L∗)0
a good approximation for R.
695
Principal Factor Method
• Start by obtaining initial estimates of the communalities,
(h∗1)2, (h∗2)2, ..., (h∗p)2.
• Then the estimated loading matrix should provide a good
approximation to
(h∗1)2 r12 r13 · · · r1p

∗ )2 r
 r21
(h
r2p
∗
23 · · ·
2
R−Ψ ≈
...
...
...
. . . ...


rp1
rp2 rp3 · · · (h∗p)2







where (h∗i )2 = 1 − Ψ∗i is the estimated communality for the
i-th factor
• Use the spectral decomposition of R − Ψ∗ to find a good
approximation to R − Ψ∗.
696
Principal Factor Method
• Use the initial estimates to compute
(h∗1)2 r12 r13 · · · r1p

 r21
(h∗2)2 r23 · · · r2p
∗

R−Ψ ≈
...
...
...
. . . ...

rp1
rp2 rp3 · · · (h∗p)2







The estimated loading matrix is obtained from the eignevalues and eigenvectors of this matrix as
L∗ =
q
λ∗1e∗1 · · · ·
q
λ∗me∗m
• Update the specific variances
Ψ∗i = 1 −
m
X
λ∗j [e∗ij ]2
j=1
697
Principal Factor Method
• Use the updated communalities to re-evaluate
(h∗1)2 r12 r13 · · · r1p

∗ )2 r
 r21
(h
r2p
∗
23 · · ·
2
R−Ψ ≈
...
...
...
. . . ...


rp1
rp2 rp3 · · · (h∗p)2







and compute a new estimate of the loading matirx,
L∗ =
q
λ∗1e∗1 · · · ·
q
λ∗me∗m
• Repeat this process until it converges.
698
Principal Factor Method
• Note that R − Ψ∗ is generally not positive definite and some
eignvalues can be negative.
• The results are sensitive to the choice of the number of
factors m
• If m is too large, some communalities can be larger than
one, which would imply that variation in factor values
accounts for more than 100 percent of the variation in the
measured traits. SAS has two options to deal with this:
– HEYWOOD: Set any estimated communality larger than
one equal to one and continue iterations with the
remaining variables.
– ULTRAHEYWOOD: Continue iterations with all of the
variables and hope that iterations eventually bring you
back into the allowable parameter space.
699
Stock Price Example
• The estimated loadings from the principal factor method are
displayed below.
Variable
Allied Chem
Du Pont
Union Carb
Exxon
Texaco
Loadings
on factor 1
`∗i1
0.70
0.71
0.72
0.62
0.62
Loadings
on factor 2
`∗i2
-0.09
-0.25
-0.11
0.23
0.28
Specific
variances
ψi∗ = 1 − (h∗i )2
0.50
0.44
0.47
0.57
0.54
Communalities
(h∗i )2
0.50
0.56
0.53
0.43
0.46
• The proportions of total variance accounted by the first and
second factors are λ∗1/p = 0.45 and λ∗2/p = 0.04.
700
Stock Price Example
• Interpretations are similar. The first factor appears to be a
market-wide effect on weekly stock price gains whereas the
second factor reflects industry specific effects on chemical
and oil stock price returns.
• The loadings tend to be smaller than those obtained from
principal component analysis. Why?
701
Stock Price Example
• The residual matrix is




∗
∗
0
∗
R−[(L )(L ) +Ψ ] = 




0 0.057 −0.005 −0.029 0.053

0
0.060
0.007 −0.049 

0
0.015
0.010 
.
0
0.073 

0
• Since the principal factor methods give greater emphasis
to describing correlations, the off-diagonal elements of
the residual matrix tend to be smaller than corresponding
elements for principal component estimation.
702
/* SAS code for the principal factor method */
proc factor data=set1 method=prinit scree nfactors=2
simple corr ev res msa nplot=2
out=scorepf outstat=facpf;
var x1-x5;
priors smc;
run;
To obtain the initial communaliy for the i-th variable, ”smc” in
the priors statement uses the R2 value for the regression of the
i-th variable on the other p − 1 variables in the analysis.
703
Maximum Likelihood Estimation
• To implement the ML method, we need to include some
assumptions about the distribution of the p-dimensional
vector Xj and the m-dimensional vector Fj :
Xj ∼ Np(µ, Σ), Fj ∼ Nm(0, Im), j ∼ Np(0, Ψp×p),
where Xj = LFj + j , Σ = LL0 + Ψ, and Fj is independent of
j . Also, Ψ is a diagonal matrix.
• Because the L that maximizes the likelihood for this model
is not unique, we need another set of restrictions that lead
to a unique maximum of the likelihood function:
L0Ψ−1L = ∆,
where ∆ is a diagonal matrix.
704
Maximum Likelihood Estimation
• Maximizing the likelihood function with respect to (µ, L, Ψ) is
not easy because the likelihood depends in a very non-linear
fashion on the parameters.
• Recall that (L, Ψ) enter the likelihood through Σ which in
turn enters the likelihood as an inverse matrix in a quadratic
form and also as a determinant.
• Efficient numerical algorithms exist to maximize the
likelihood iteratively and obtain ML estimates
L̂m×m = {`ˆij },
Ψ̂ = diag{ψ̂i},
µ̂ = x̄.
705
Maximum Likelihood Estimation
• The MLEs of the communalities for the p variables (with m
factors)are
ĥ2
i =
m
X
`ˆ2
ij ,
i = 1, ..., p,
j=1
and the proportion of the total variance accounted for by the
jth factor is given by
Pp
ˆ2
`
i=1 ij
trace(S)
if using S
or dividing by p if using R.
706
Maximum Likelihood Estimation: Stock Prices
• Results;
Variable
Allied Chem
Du Pont
Union Carb.
Exxon
Texaco
Prop. of variance
Loadings
for factor 1
0.684
0.694
0.681
0.621
0.792
0.485
Loadings
for factor 2
0.189
0.517
0.248
-0.073
-0.442
0.113
Specific
variances
0.50
0.25
0.47
0.61
0.18
707
Maximum Likelihood Estimation: Stock Prices
• The interpretation of the two factors remains more or less
the same (although Exxon is hardly loading on the second
factor now).
• What has improved considerably is the residual matrix, since
most of the entries are essentially zero now:




0
R − (L̂L̂ + Ψ̂) = 




0 0.005 −0.004 −0.024 −0.004

0
−0.003 −0.004 0.000 

0
0.031 −0.004 
.
0
−0.000 

0
708
Likelihood Ratio test for Number of Factors
• We wish to test whether the m factor model appropriately
describes the covariances among the p variables.
• We test
H0 : Σp×p = Lp×mL0m×p + Ψp×p
versus
Ha : Σ is a positive definite matrix
• A likelihood ratio test for H0 can be constructed as the ratio
of the maximum of likelihood function under H0 and the
maximum of the likelihood function under no restrictions
(i.e., under Ha).
709
Likelihood Ratio test for Number of Factors
• Under Ha the multivariate normal likelihood is maximized at
1 Pn (x − x̄)(x − x̄)0 .
µ̂ = x̄, and Σ̂ = n
i
i=1 i
• Under H0, the multivariate normal likelihood is maximized at
µ̂ = x̄ and L̂L̂0 + Ψ̂, the mle for Σ for the orthogonal factor
model with m factors.
710
Likelihood Ratio test for Number of Factors
• It is easy to show that the result is proportional to
1
0
−n/2
|L̂L̂ + Ψ̂|
exp − ntr[(L̂L̂0 + Ψ̂)−1Σ̂] ,
2
Then, the log-likelihood ratio statistic for testing H0 is
−2 ln Λ = −2 ln
!−n/2
0
|L̂L̂ + Ψ̂|
|Σ̂|
+ n[tr((L̂L̂0 + Ψ̂)−1Σ̂) − p].
• Since the second term in the expression above can be shown
to be zero, the test statistic simplifies to
−2 ln Λ = n ln
|L̂L̂0 + Ψ̂|
|Σ̂|
!
.
711
Likelihood Ratio test for Number of Factors
• The degrees of freedom for the large sample chi-square
approximation to this test statistic are
(1/2)[(p − m)2 − p − m]
Where does this come from?
• Under Ha we estimate p means and p(p+1)/2 elements of Σ.
• Under H0, we estimate p means p diagonal elements of Ψ
and mp elements of L.
• An additional set of m(m − 1)/2 restrictions are imposed
on the mp elements of L by the identifiability restriction
L0Ψ−1L = ∆
712
Likelihood Ratio test for Number of Factors
• Putting it all together, we have
m(m − 1)
1
p(p + 1)
]−[p+pm+p−
] = [(p−m)2−p−m]
df = [p+
2
2
2
degrees of freedom.
• Bartlett showed that the χ2 approximation to the sampling
distribution of −2 ln Λ can be improved if the test statistic is
computed as
|L̂L̂0 + Ψ̂|
−2 ln Λ = (n − 1 − (2p + 4m + 5)/6) ln
.
|Σ̂|
713
Likelihood Ratio test for Number of Factors
• We reject H0 at level α if
|L̂L̂0 + Ψ̂|
−2 ln Λ = (n − 1 − (2p + 4m + 5)/6) ln
> χ2
df ,
|Σ̂|
2 − p − m] for large n and large n − p.
with df = 1
[(p
−
m)
2
1 (2p + 1 − √8p + 1).
• To have df > 0, we must have m < 2
• If the data are not a random sample from a multivariate
normal distribution, this test tends to indicate the need for
too many factors.
714
Stock Price Data
• We carry out the log-likelihood ratio test at level α = 0.05
to see whether the two-factor model is appropriate for these
data.
• We have estimated the factor loadings and the unique factors
using R rather than S, but the exact same test statistic (with
R in place of Σ̂) can be used for the test.
• Since we evaluated L̂, Ψ̂ and R, we can easily compute
0.194414
|L̂L̂0 + Ψ̂|
=
= 1.0065.
|R|
0.193163
715
Stock Price Data
• Then, the test statistic is
[100 − 1 −
(10 + 8 + 5)
] ln(1.0065) = 0.58.
6
• Since 0.58 < χ2
1 = 3.84, we fail to reject H0 and conclude
that the two-factor model adequately describes the
correlations among the five stock prices (p-value=0.4482).
• Another way to say it is that the data do not contradict the
two-factor model.
716
Stock Price Data
• Test the null hypothesis that one factor is sufficient
(10 + 4 + 5)
|L̂L̂0 + Ψ̂|
[100 − 1 −
] ln
= 15.49
6
|Σ̂|
with 5 degrees of freedom.
• Since p − value = 0.0085, we reject the null hypothesis and
conclude that a one factor model is not sufficient to describe
the correlations among the weekly returns for the five stocks.
717
Maximum Likelihood Estimation: Stock Prices
• We implemented the ML method for the stock price example.
• The same SAS program can be used, but now use ’method
= ml’ instead of ’prinit’ in the proc factor statement.
proc factor data=set1 method=ml scree nfactors=2
simple corr ev res msa nplot=2
out=scorepf outstat=facpf;
var x1-x5;
priors smc;
run;
718
Initial Factor Method: Maximum Likelihood
Prior Communality Estimates: SMC
x1
0.43337337
x2
0.46787840
x3
0.44606336
x4
0.34657438
Preliminary Eigenvalues: Total = 3.56872
1
2
3
4
5
Eigenvalue
3.93731304
0.34602630
-.08251059
-.23621888
-.39588964
Difference
3.59128674
0.42853688
0.15370829
0.15967076
x5
0.37065882
Average = 0.713744
Proportion
1.1033
0.0970
-0.0231
-0.0662
-0.1109
Cumulative
1.1033
1.2002
1.1771
1.1109
1.0000
2 factors will be retained by the NFACTOR criterion.
719
Iteration
Criterion
Ridge
Change
1
0.0160066
0.0000
0.4484
2
0.0060474
0.0000
0.1016
3
0.0060444
0.0000
0.0014
4
0.0060444
0.0000
0.0004
5
0.0060444
0.0000
0.0001
6
0.0060444
0.0000
0.0000
Communalities
0.49395
0.28915
0.50409
0.39074
0.50348
0.39046
0.50349
0.39041
0.50349
0.39038
0.50349
0.39037
0.74910
0.81911
0.74984
0.82178
0.74870
0.82316
0.74845
0.82357
0.74840
0.82371
0.74839
0.82375
0.52487
0.52414
0.52547
0.52555
0.52557
0.52557
Convergence criterion satisfied.
720
Significance Tests Based on 100 Observations
H0:
HA:
H0:
HA:
Test
No common factors
At least one common factor
2 Factors are sufficient
More factors are needed
DF
10
Chi-Square
158.6325
Pr>ChiSq
<.0001
1
0.5752
0.4482
Chi-Square without Bartlett’s Correction
Akaike’s Information Criterion
Schwarz’s Bayesian Criterion
Tucker and Lewis’s Reliability Coefficient
0.5983972
-1.4016028
-4.0067730
1.0285788
Squared Canonical Correlations
Factor1
0.88925225
Factor2
0.70421149
721
Eigenvalues of the Weighted Reduced Correlation
Matrix: Total = 10.4103228 Average = 2.08206456
1
2
3
4
5
Eigenvalue
8.02952889
2.38079393
0.08339744
-.01201128
-.07138616
Difference
5.64873495
2.29739649
0.09540872
0.05937488
Proportion
0.7713
0.2287
0.0080
-0.0012
-0.0069
Cumulative
0.7713
1.0000
1.0080
1.0069
1.0000
Factor Pattern
x5
x2
x1
x3
x4
TEXACO
DUPONT
ALLIED CHEM
UNION CARBIDE
EXXON
Factor1
0.79413
0.69216
0.68315
0.68015
0.62087
Factor2
0.43944
-0.51894
-0.19184
-0.25092
0.07000
722
Variance Explained by Each Factor
Factor
Factor1
Factor2
Weighted
8.02952889
2.38079393
Unweighted
2.42450727
0.56706851
Final Communality Estimates and Variable Weights
Total Communality: Weighted = 10.410323
Unweighted = 2.991576
Variable
x1
x2
x3
x4
x5
Communality
0.50349452
0.74838963
0.52556852
0.39037462
0.82374847
Weight
2.01407634
3.97439996
2.10778594
1.64035160
5.67370899
723
Residual Correlations With Uniqueness on the Diagonal
x1
x2
x3
x4
x5
ALLIED CHEM
DUPONT
UNION CARBIDE
EXXON
TEXACO
x1
x2
x3
x4
x5
x1
0.49651
0.00453
-0.00413
-0.02399
0.00397
ALLIED CHEM
DUPONT
UNION CARBIDE
EXXON
TEXACO
x2
0.00453
0.25161
-0.00261
-0.00389
0.00033
x4
-0.02399
-0.00389
0.03138
0.60963
-0.00028
x3
-0.00413
-0.00261
0.47443
0.03138
-0.00424
x5
0.00397
0.00033
-0.00424
-0.00028
0.17625
Root Mean Square Off-Diagonal Residuals: Overall = 0.01286055
x1
0.01253908
x2
0.00326122
x3
0.01602114
x4
0.01984784
x5
0.00291394
724
Maximum Likelihood Estimation with R
stocks.fac <- factanal(stocks, factors=2,
method=’mle’, scale=T, center=T)
stocks.fac
Call:
factanal(x = stocks, factors = 2, method = "mle", scale=T, center=T)
Uniquenesses:
x1
x2
x3
x4
x5
0.497 0.252 0.474 0.610 0.176
725
Loadings:
Factor1
x1 0.601
x2 0.849
x3 0.643
x4 0.365
x5 0.207
Factor2
0.378
0.165
0.336
0.507
0.884
Factor1 Factor2
SS loadings
1.671
1.321
Proportion Var
0.334
0.264
Cumulative Var
0.334
0.598
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 0.58 on 1 degree of freedom.
The p-value is 0.448
726
Measure of Sampling Adequacy
• In constructing a survey or other type of instrument to
examine ”factors” like political attitudes, social constructs,
mental abilities, consumer confidence, that cannot be
measured directly, it is common to include a set of questions
or items that should vary together as the values of the factor
vary across subjects.
• In a job attitude assessment, for example, you might include
questions of the following type in the survey:
1. How well do you like your job?
2. How eager are you to go to work in the morning?
3. How professionally satisfied are you?
4. What is your level of frustration with assignment to
meaningless tasks?
727
Measure of Sampling Adequacy
• If the responses to these and other questions had strong
positive or negative correlations, you may be able to identify
a ”job satisfaction” factor.
• In marketing, education, and behavioral research, the
existence of highly correlated resposnes to a battery of
questions is used to defend the notion that an important
factor exists and to justify the use of the survey form or
measurement instrument.
• Measures of sampling adequacy are based on correlations
and partial correlations among responses to different
questions (or items)
728
Measure of Sampling Adequacy
• Kaiser (1970, Psychometrica) proposed a statistic called
Measure of Sampling Adequacy (MSA).
• MSA measures the relative sizes of the pairwise correlations
to the partial correlations between all pairs of variables as
follows:
P P
2
j k6=j qjk
M SA = 1 − P P
2
j k6=j rjk
where rjk is the marginal sample correlation between
variables j and k and qjk is the partial correlation between
the two variables after accounting for all other variables in
the set.
729
Measure of Sampling Adequacy
• If the r’s are relatively large and the q’s are relatively small,
MSA tends to 1. This indicates that more than two variables
are changing together as the values of the ”factor” vary
across subjects.
• Side note: If p = 3 for example, q12 is given by
r12 − r13r23
q
q12 =
.
2
2
(1 − r13)(1 − r23)
730
Measure of Sampling Adequacy
• The MSA can take on values between 0 and 1 and Kaiser
proposed the following guidelines:
MSA range
0.9 to 1
0.8 to 0.9
0.7 to 0.8
Interpretation
Marvelous data
Meritorious data
Middling
MSA range
0.6 to 0.7
0.5 to 0.6
0.0 to 0.5
Interpretation
Mediocre data
Miserable
Unacceptable
• SAS computes MSAs for each of the p variables.
Kaiser’s Measure of Sampling Adequacy: Overall MSA = 0.78185677
x1
0.80012154
x2
0.74270277
x3
0.80925875
x4
0.80426564
x5
0.75573970
731
Tucker and Lewis Reliability Coefficient
• Tucker and Lewis (1973) Psychometrica, 38, pp 1-10.
• From the mle’s for m factors compute
– L̂m estimates of factor loadings
– Ψ̂m = diag(R − L̂mL̂0m)
−1/2
– Gm = Ψ̂m
−1/2
diag(R − L̂mL̂0m)Ψ̂m
732
Tucker and Lewis Reliability Coefficient
• gm(ij) is called the partial correlation between variables Xi
and Xj controlling for the m common factors
• Sum of the squared partial correlations controlling for the m
common factors is
Fm =
• Mean square is
XX
2
gm(ij)
i<j
Mm =
Fm
dfm
where dfm = 0.5[(p − m)2 − p − m] are the degrees of freedom
for the likelihood ratio test of the null hypothesis that m
factors are sufficient
733
Tucker and Lewis Reliability Coefficient
• The mean square for the model with zero common factors is
M0 =
PP
2
i<j rij
p(p − 1)/2
• The reliability coefficient is
ρ̂m =
M0 − Mm
M0 − n−1
m
2m is a Bartlett correction factor.
−
where nm = (n−1)− 2p+5
6
3
For the weekly stock gains data SAS reports
Tucker and Lewis’s Reliability Coefficient
1.0285788
734
Cornbach’s Alpha
• A set of observed variables X = (X1, X2, . . . , Xp)0 that all
”measure” the same latent trait should all have high positive
pairwise correlations
• Define
PP
PP
1
ˆ
Cov(X
,
X
)
i
j
i<j
2
p(p−1)/2
i<j Sij
r̄ =
=
Pp
1 Pp
ˆ
p
−
1
i=1 Sii
p i=1 V ar(Xi )
• Cornbach’s Alpha is
"
#
P
p
V ar(X )
pr̄
=
1− i P i
α=
1 + (p − 1)r̄
p−1
V ar( i Xi)
735
Cornbach’s Alpha
• For standardized variables,
Pp
i=1 V ar(Xi ) = p
• If ρij = 1 for all i 6= j, then
V ar(
X
i
Xi ) =
X
V ar(Xi) + 2
XXq
i
2

q
X

V ar(Xi)
=
i
= p2
V ar(Xi)V ar(Xj )
i<j
736
Cornbach’s Alpha
• In the extreme case where all pairwise correlations are 1 we
have
P
p
1
i V ar(Xi )
=
=
P
V ar( i Xi)
p2
p
and
"
"
#
#
P
p
p
V ar(X )
p
α=
1− i P i =
1− 2 =1
p−1
V ar( i Xi)
p−1
p
• When r̄ = 0, then α = 0
• Be sure that the scores for all items are orientated in the
same direction so all correlations are positive
737
Factor Rotation
• As mentioned earlier, multiplying a matrix of factor loadings
by any orthogonal matrix leads to the same approximation
to the covariance (or correlation) matrix.
• This means that mathematically, it does not matter whether
we estimate the loadings as L̂ or as L̂∗ = L̂T where T 0T =
T T 0 = I.
• The estimated residual matrix matrix also remains unchanged:
0
0
0
0
∗
∗
Sn − L̂L̂ − Ψ̂ = Sn − L̂T T L̂ − Ψ̂ = Sn − L̂ L̂ − Ψ̂
• The specific variances ψ̂i and therefore the communalities
also remain unchanged.
738
Factor Rotation
• Since only the loadings change by rotation, we rotate factors
to see if we can better interpret results.
• There are many choices for a rotation matrix T , and to
choose, we first establish a mathematical criterion and then
see which T can best satisfy the criterion.
• One possible objective is to have each one of the p variables
load highly on only one factor and have moderate to negligible loads on all other factors.
• It is not always possible to achieve this type of result.
• This can be examined with a varimax rotation
739
Varimax Rotation
• Define `˜∗ij = `ˆ∗ij /ĥi as the ”scaled” loading of the i-th variable
on the j-th rotated factor.
• Compute the variance of the squares of the scaled loadings
for the j-th rotated factor

2
p
p
p
p
1 X  ∗ 2 1 X ∗ 2
1  X ∗ 4 1  X ∗ 2 
(`˜ij ) −
(`˜kj )
(`˜ij ) −
(`˜kj )
= 

p i=1
p k=1
p i=1
p k=1

2

• The varimax procedure finds the orthogonal transformation
of the loading matrix that maximizes the sum of those
variances, summing across all m rotated factors.
• After rotation each of the p variables should load highly on
at most one of the rotated factors
740
Maximum Likelihood Estimation: Stock Prices
• Varimax rotation of m = 2 factors
Variable
Allied Chem
Du Pont
Union Carb.
Exxon
Texaco
Prop. of var.
MLE
factor 1
0.68
0.69
0.68
0.62
0.79
0.485
MLE
factor 2
0.19
0.52
0.25
-0.07
-0.44
0.113
Rotated
factor 1
0.60
0.85
0.64
0.36
0.21
0.334
Rotated
factor 2
0.38
0.16
0.34
0.51
0.88
0.264
741
Quartimax Rotation
• The varimax rotation will destroy an ”overall” factor
• The quartimax rotation try to
1. Preserve an overall factor such that each of the p variables
has a high loading on that factor
2. Create other factors such that each of the p variables has
a high loading on at most one factor
742
Quartimax Rotation
• Define `˜∗ij = `ˆ∗ij /ĥi as the ”scaled” loading of the i-th variable
on the j-th rotated factor.
• Compute the variance of the squares of the scaled loadings
for the i-th variable

22

2
m
m
m
m
X
X
1 X
1 X
1
1
 ˜∗ 2



∗
∗
4
(`˜ik )  =
(`˜ij ) − 
(`˜∗ik )2 
(`ij ) −

m j=1
m k=1
m j=1
m k=1


• The quartimax procedure finds the orthogonal transformation of the loading matrix that maximizes the sum of those
variances, summing across all p variables.
743
Maximum Likelihood Estimation: Stock Prices
• Quartimax rotation of m = 2 factors
Variable
Allied Chem
Du Pont
Union Carb.
Exxon
Texaco
Prop. of var.
MLE
factor 1
0.68
0.69
0.68
0.62
0.79
0.485
MLE
factor 2
0.19
0.52
0.25
-0.07
-0.44
0.113
Rotated
factor 1
0.71
0.83
0.72
0.56
0.60
0.477
Rotated
factor 2
0.05
-0.26
-0.01
0.27
0.68
0.121
744
PROMAX Transformation
• The varimax and quartimax rotations produce uncorrelated
factors
• PROMAX is a non-orthogonal (oblique) transformation that
1. is not a rotation
2. can produce correlated factors
745
PROMAX Transformation
1. First perform a varimax rotate to obtain loadings L∗
2. Construct another p × m matrix Q such that
qij = |`∗ij |k−1`∗ij
= 0
for `∗ij 6= 0
for `∗ij = 0
where k > 1 is selected by trial and error, ususally k < 4.
3. Find a matrix U such that each column of L∗U is close to
the corresponding column of Q. Choose the j-th column of
U to minimize
(qj − L∗uj )0(qj − L∗uj )
This yields
U = [(L∗)0(L∗)]−1(L∗)0Q
746
PROMAX Transformation
1. Rescale U so that the transformed factors have unit variance.
Compute D2 = diag[(U 0U )−1] and M = U D.
2. The PROMAX factors are obtained from
L∗F = L∗M M −1F = LP FP
• The PROMAX transformation yields factors with loadings
LP = L∗M
747
PROMAX Transformation
• The correlation martix for the transformed factors is (M 0M )−1
Derivation:
V ar(FP ) = V ar(M −1F)
= M −1V ar(F)(M −1)0
= M −1I(M −1)0
= (M 0M )−1
748
Maximum Likelihood Estimation: Stock Prices
• PROMAX transformation of m = 2 factors
Variable
Allied Chem
Du Pont
Union Carb.
Exxon
Texaco
MLE
factor 1
0.68
0.69
0.68
0.62
0.79
MLE
factor 2
0.19
0.52
0.25
-0.07
-0.44
PROMAX
factor 1
0.56
0.90
0.62
0.26
-0.02
PROMAX
factor 2
0.24
-0.08
0.18
0.45
0.92
Estimated correlation between the factors is 0.49
749
/*
Varimax Rotation */
proc factor data=set1 method=ml nfactors=2
res msa mineigen=0 maxiter=75 conv=.0001
outstat=facml2 rotate=varimax reorder;
var x1-x5;
priors smc;
run;
proc factor data=set1 method=prin nfactors=3
mineigen=0 res msa maxiter=75 conv=.0001
outstat=facml3 rotate=varimax reorder;
var x1-x5;
priors smc;
run;
750
The FACTOR Procedure
Rotation Method: Varimax
Orthogonal Transformation Matrix
1
2
1
0.67223
-0.74034
2
0.74034
0.67223
Rotated Factor Pattern
x2
x3
x1
x5
x4
DUPONT
UNION CARBIDE
ALLIED CHEM
TEXACO
EXXON
Factor1
0.84949
0.64299
0.60126
0.20851
0.36554
Factor2
0.16359
0.33487
0.37680
0.88333
0.50671
751
Variance Explained by Each Factor
Factor
Factor1
Factor2
Weighted
4.93345489
5.47686792
Unweighted
1.67367719
1.31789858
Final Communality Estimates and Variable Weights
Total Communality: Weighted = 10.410323
Unweighted = 2.991576
Variable
x1
x2
x3
x4
x5
Communality
0.50349452
0.74838963
0.52556852
0.39037462
0.82374847
Weight
2.01407634
3.97439996
2.10778594
1.64035160
5.67370899
752
/* Apply other rotations to the two
factor solution */
proc factor data=facml2 nfactors=2
conv=.0001 rotate=quartimax reorder;
var x1-x5;
priors smc;
run;
753
The FACTOR Procedure
Rotation Method: Quartimax
Orthogonal Transformation Matrix
1
2
1
0.88007
0.47484
2
-0.47484
0.88007
Rotated Factor Pattern
x2
x3
x1
x4
x5
DUPONT
UNION CARBIDE
ALLIED CHEM
EXXON
TEXACO
Factor1
0.82529
0.72488
0.70807
0.56231
0.60294
Factor2
-0.25940
-0.01060
0.04611
0.27237
0.67839
754
Variance Explained by Each Factor
Factor
Factor1
Factor2
Weighted
7.40558481
3.00473801
Unweighted
2.38765329
0.60392248
Final Communality Estimates and Variable Weights
Total Communality: Weighted = 10.410323
Unweighted = 2.991576
Variable
x1
x2
x3
x4
x5
Communality
0.50349452
0.74838963
0.52556852
0.39037462
0.82374847
Weight
2.01407634
3.97439996
2.10778594
1.64035160
5.67370899
755
proc factor data=facml2 nfactors=2
conv=.0001 rotate=promax reorder
score;
var x1-x5;
priors smc;
run;
756
The FACTOR Procedure
Prerotation Method: Varimax
Orthogonal Transformation Matrix
1
2
1
1.00000
-0.00000
2
0.00000
1.00000
x2
x3
x1
x5
x4
Rotated Factor Pattern
Factor1
DUPONT
0.84949
UNION CARBIDE
0.64299
ALLIED CHEM
0.60126
TEXACO
0.20851
EXXON
0.36554
Factor2
0.16359
0.33487
0.37680
0.88333
0.50671
757
Variance Explained by Each Factor
Factor
Factor1
Factor2
Weighted
4.93345489
5.47686792
Unweighted
1.67367719
1.31789858
Final Communality Estimates and Variable Weights
Total Communality: Weighted = 10.410323
Unweighted = 2.991576
Variable
x1
x2
x3
x4
x5
Communality
0.50349452
0.74838963
0.52556852
0.39037462
0.82374847
Weight
2.01407634
3.97439996
2.10778594
1.64035160
5.67370899
758
The FACTOR Procedure
Rotation Method: Promax (power = 3)
Target Matrix for Procrustean Transformation
x2
x3
x1
x5
x4
DUPONT
UNION CARBIDE
ALLIED CHEM
TEXACO
EXXON
Factor1
1.00000
0.73686
0.64258
0.01281
0.21150
Factor2
0.00733
0.10691
0.16242
1.00000
0.57860
Procrustean Transformation Matrix
1
2
1
1.24806851
-0.3302485
2
-0.3149179
1.20534878
759
Normalized Oblique Transformation Matrix
1
2
1
1.11385973
-0.2810538
2
-0.3035593
1.10793795
Inter-Factor Correlations
Factor1
Factor2
Factor1
1.00000
0.49218
Factor2
0.49218
1.00000
760
Rotated Factor Pattern (Standardized Regression Coefficients)
x2
x3
x1
x5
x4
DUPONT
UNION CARBIDE
ALLIED CHEM
TEXACO
EXXON
Factor1
0.90023
0.62208
0.56382
-0.01601
0.26475
Factor2
-0.07663
0.17583
0.23495
0.91538
0.45044
761
Reference Structure (Semipartial Correlations)
x2
x3
x1
x5
x4
DUPONT
UNION CARBIDE
ALLIED CHEM
TEXACO
EXXON
Factor1
0.78365
0.54152
0.49081
-0.01394
0.23046
Factor2
-0.06670
0.15306
0.20452
0.79683
0.39211
Variance Explained by Each Factor Eliminating Other Factors
Factor
Factor1
Factor2
Weighted
3.63219833
4.00599560
Unweighted
1.20154790
0.85839588
762
Factor Structure (Correlations)
x2
x3
x1
x5
x4
DUPONT
UNION CARBIDE
ALLIED CHEM
TEXACO
EXXON
Factor1
0.86252
0.70862
0.67946
0.43452
0.48644
Factor2
0.36645
0.48200
0.51245
0.90750
0.58074
Variance Explained by Each Factor Ignoring Other Factors
Factor
Factor1
Factor2
Weighted
6.40432722
6.77812448
Unweighted
2.13317989
1.79002787
763
Final Communality Estimates and Variable Weights
Total Communality: Weighted = 10.410323
Unweighted = 2.991576
Variable
x1
x2
x3
x4
x5
Communality
0.50349452
0.74838963
0.52556852
0.39037462
0.82374847
Weight
2.01407634
3.97439996
2.10778594
1.64035160
5.67370899
764
Scoring Coefficients Estimated by Regression
Squared Multiple Correlations of the Variables with Each Factor
Factor1
0.83601404
Factor2
0.84825886
Standardized Scoring Coefficients
x2
x3
x1
x5
x4
DUPONT
UNION CARBIDE
ALLIED CHEM
TEXACO
EXXON
Factor1
0.58435
0.21791
0.18991
0.02557
0.07697
Factor2
-0.01834
0.06645
0.08065
0.78737
0.11550
765
R code posted in stocks.R
# Compute maximun likelihood estimates for factors
stocks.fac <- factanal(stocks, factors=2,
method=’mle’, scale=T, center=T)
stocks.fac
Call: factanal(x = stocks, factors = 2, method = "mle",
scale = T, center = T)
Uniquenesses:
x1
x2
x3
x4
x5
0.497 0.252 0.474 0.610 0.176
766
Loadings:
Factor1
x1 0.601
x2 0.849
x3 0.643
x4 0.365
x5 0.207
Factor2
0.378
0.165
0.336
0.507
0.884
Factor1 Factor2
SS loadings
1.671
1.321
Proportion Var
0.334
0.264
Cumulative Var
0.334
0.598
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 0.58 on 1 degree of freedom.
The p-value is 0.448
767
#
Apply the Varimax rotation (this was the default)
stocks.fac <- factanal(stocks, factors=2, rotation="varimax",
method="mle", scores="regression")
stocks.fac
Call: factanal(x = stocks, factors = 2, scores = "regression",
rotation = "varimax", method = "mle")
Uniquenesses:
x1
x2
x3
x4
x5
0.497 0.252 0.474 0.610 0.176
768
Loadings:
Factor1
x1 0.601
x2 0.849
x3 0.643
x4 0.365
x5 0.207
Factor2
0.378
0.165
0.336
0.507
0.884
Factor1 Factor2
SS loadings
1.671
1.321
Proportion Var
0.334
0.264
Cumulative Var
0.334
0.598
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 0.58 on 1 degree of freedom.
The p-value is 0.448
769
#
Compute the residual matrix
x1
x2
x3
x4
x5
pred<-stocks.fac$loadings%*%t(stocks.fac$loadings)
+diag(stocks.fac$uniqueness)
resid <- s.cor - pred
resid
x1
x2
x3
x4
-8.4905e-07 4.5256e-03 -4.1267e-03 -2.3991e-02 3.9726e-03
4.5256e-03 1.8641e-07 -2.6074e-03 -3.8942e-03 3.2772e-04
-4.1267e-03 -2.6074e-03 7.4009e-08 3.1384e-02 -4.2413e-03
-2.3991e-02 -3.8942e-03 3.1384e-02 -1.7985e-07 -2.8193e-04
3.9726e-03 3.2772e-04 -4.2413e-03 -2.8193e-04 2.7257e-08
770
#
List factor scores
stocks.fac$scores
Factor1
Factor2
1
-0.05976839 0.02677225
2
-1.22226529 1.44753390
3
1.62739601 2.47578797
.
.
.
.
.
.
99
0.71440818 -0.02122186
100 -0.59289113 0.34344739
771
#
#
#
#
#
#
You could use the following code in Splus to apply
the quartimax rotation, but this is not available in R
#
Apply the Promax rotation
stocks.fac <- factanal(stocks, factors=2, rotation="quartimax",
method="mle", scores="regression")
stocks.fac
promax(stocks.fac$loadings, m=3)
772
Loadings:
Factor1 Factor2
x1 0.570
0.195
x2 0.965 -0.174
x3 0.639
0.125
x4 0.226
0.456
x5 -0.128
0.984
Factor1 Factor2
SS loadings
1.732
1.260
Proportion Var
0.346
0.252
Cumulative Var
0.346
0.598
$rotmat
[,1]
[,2]
[1,] 1.2200900 -0.4410072
[2,] -0.4315772 1.2167132
773
#
#
>
>
+
>
>
You could try to apply the promax rotation with the
factanal function. It selects the power as m=4
stocks.fac <- factanal(stocks, factors=2, rotation="promax",
method="mle", scores="regression")
stocks.fac
Call:
factanal(x = stocks, factors = 2, scores = "regression", rotation = "
Uniquenesses:
x1
x2
x3
x4
x5
0.497 0.252 0.474 0.610 0.176
774
Loadings:
Factor1 Factor2
x1 0.576
0.175
x2 1.011 -0.232
x3 0.653
x4 0.202
0.466
x5 -0.202
1.037
Factor1 Factor2
SS loadings
1.863
1.387
Proportion Var
0.373
0.277
Cumulative Var
0.373
0.650
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 0.58 on 1 degree of freedom.
The p-value is 0.448
775
Example: Test Scores
• The sample correlation between p = 6 test scores collected
from n = 220 students is given below:





R=




1.0 0.439 0.410 0.288
1.0 0.351 0.354
1.0 0.164
1.0
0.329
0.320
0.190
0.595
1.0
0.248
0.329
0.181
0.470
0.464
1.0





.




• An m = 2 factor model was fitted to this correlation matrix
using ML.
776
Example: Test Scores
• Estimated factor loadings and communalities are
Variable
Gaelic
English
History
Arithmetic
Algebra
Geometry
Loadings on
factor 1
0.553
0.568
0.392
0.740
0.724
0.595
Loadings on
factor 2
0.429
0.288
0.450
-0.273
-0.211
-0.132
Communalities
ĥ2
i
0.490
0.406
0.356
0.623
0.569
0.372
777
Example: Test Scores
• All variables load highly on the first factor. We call that a
’general intelligence’ factor.
• Half of the loadings are positive and half are negative on the
second factor. The positive loadings correspond to the ’verbal’ scores and the negative correspond to the ’math’ scores.
• Correlations between scores on verbal tests and math tests
tend to be lower than correlations among scores on math
tests or correlations among scores on verbal tests, so this is
a ’math versus verbal’ factor.
• We plot the six loadings for each factor (`ˆi1, `ˆi2) on the original coordinate system and also on a rotated set of coordinates chosen so that one axis goes through the loadings
(`ˆ41, `ˆ42) of the fourth variable on the two factors.
778
Factor Rotation for Test Scores
779
Varimax Rotation for Test Scores
• Loadings for rotated factors using the varimax criterion are
as follows:
Variable
Gaelic
English
History
Arithmetic
Algebra
Geometry
Loadings on
F1∗
0.232
0.321
0.085
0.770
0.723
0.572
Loadings on
F2∗
0.660
0.551
0.591
0.173
0.215
0.213
Communalities
ĥ2
i
0.490
0.406
0.356
0.623
0.569
0.372
• F1∗ is primarily a ’mathematics ability factor’ and F2∗ is a
’verbal ability factor’.
780
PROMAX Rotation for Test Scores
• F1∗ is more clearly a ’mathematics ability factor’ and F2∗ is a
’verbal ability factor’.
Variable
Gaelic
English
History
Arithmetic
Algebra
Geometry
Loadings on
F1∗
0.059
0.191
-0.084
0.809
0.743
0.575
Loadings on
F2∗
-0.668
0.519
0.635
0.041
0.021
0.064
Communalities
ĥ2
i
0.490
0.406
0.356
0.623
0.569
0.372
• These factors have correlation 0.505
• Code is posted as Lawley.sas and Lawley.R
781
Factor Scores
• Sometimes, we require estimates of the factor values for each
respondent in the sample.
• For j = 1, ..., n, the estimated m−dimensional vector of factor
scores is
fˆj = estimate of values fj attained by Fj .
• To estimate fj we act as if L̂ (or L̃) and Ψ̂ (or Ψ̃) are
true values.
• Recall the model
(Xj − µ) = LFj + j .
782
Weighted Least Squares for Factor Scores
• From the model above, we can try to minimize the weighted
sum of squares
p
X
2
i = 0 Ψ−1 = (x − µ − Lf )0 Ψ−1 (x − µ − Lf ).
i=1 ψi
• The weighted least squares solution is
fˆ = (L0Ψ−1L)−1L0Ψ−1(x − µ).
• Substituting the MLEs L̂, Ψ̂, x̄ in place of the true values, we
have
fˆj = (L̂0Ψ̂−1L̂)−1L̂0Ψ̂−1(xj − x̄).
783
Weighted Least Squares for Factor Scores
• When factor loadings are estimated with the PC method,
ordinary least squares is sometimes used to get the factor
scores. In this case
f˜j = (L̃0L̃)−1L̃0(xj − x̄).
• Implicitly, this assumes that the ψ̂i are equal or almost equal.
q
• Since L̃ = [ λ̂1ê1
q
λ̂2ê2 ...




f˜j = 


q
λ̂mêm], we find that

−1/2 0
λ̂1
ê1(xj − x̄) 

−1/2 0
λ̂2
ê2(xj − x̄) 
.
...


−1/2
λ̂m ê0m(xj − x̄)
784
Weighted Least Squares for Factor Scores
• Note that the factor scores estimated from the principal
component solution are nothing more than the scores for
−1/2
,
the first m principal components scaled by λ̂i
for i = 1, ..., m.
• The estimated factor scores have zero mean and zero
covariances.
785
Regression Method for Factor Scores
• We again start from the model (X − µ) = LF + .
• If the F, are normally distributed, we note that
X −µ
F
!
∼ Np+m(0, Σ∗),
with
Σ∗(p+m)×(p+m) =
"
Σ = LL0 + Ψ L
L0
I
#
.
• The conditional distribution of F given X = x is therefore
also normal, with
E(F |x) = L0Σ−1(x − µ) = L0(LL0 + Ψ)−1(x − µ)
Cov(F |x) = I − L0Σ−1L = I − L0(LL0 + Ψ)−1L.
786
Regression Method for Factor Scores
• An estimate of fj is obtained by substituting estimates for
the unknown quantities in the expression above, so that
fˆj = Ê(Fj |xj ) = L̂0(L̂L̂0 + Ψ̂)−1(xj − x̄).
• Sometimes, to reduce the errors that may be introduced if
the number of factors m is not quite appropriate, S is used
in place of (L̂L̂0 + Ψ̂) in the expression for fˆj .
• If loadings are rotated using T , the relationship between the
resulting factor scores and the factor scores arising from unrotated loadings is
fˆj∗ = T fˆj .
787
Factor Scores for the Stock Price Data
• We estimated the value of the two factor scores for each of
the 100 observations in the stock price example. Note that
scatter plot shows approximate mean 0 and variance 1 and
approximate bivariate normality.
788
Heywood Cases and Other Potential Problems
• When loadings are estimated iteratively (as in the Principal
Factor or ML methods) some problems may arise:
– Some of the estimated eigenvalues of R − Ψ may become
negative.
– Some of the communalities may become larger than one.
• Since communalities estimate variances, they must be
positive. If any communality exceeds 1.0, the variance
of the corresponding specific factor is negative.
789
Heywood Cases and Other Potential Problems
• When can these things happen?
1. When we start with bad prior communality estimates for
iterating.
2. When there are too many common factors.
3. When we do not have enough data to obtain stable
estimates.
4. When the common factor model is just not appropriate
for the data.
• These problems can occur even in seemingly ’good’ datasets
(see example below) and create serious problems when we
try to interpret results.
790
Heywood Cases and Other Potential Problems
• When a final communality estimate equals 1, we refer to
this as a Heywood case (a solution on the boundary of the
parameter space).
• When a final communality estimate exceeds 1, we have an
Ultra Heywood case.
• An Ultra Heywood case implies that unique factor has
negative variance, a huge red flag that something is
wrong and that results from the factor analysis are invalid.
791
Heywood Cases and Other Potential Problems
• There is some concern about the validity of results in
Heywood cases (boundary solutions where the factors
account for all of the variation in at least one trait).
It is difficult to interpret results (for example, how
can a confidence interval be constructed for a parameter
at the boundary?)
• Large sample results may not be valid for Heywood cases,
in particular large sample chi-square approximations for a
likelihood ratio tests (e.g. test for the number of factors).
792
Heywood Cases and Other Potential Problems
• When iterative algorithms in SAS fail to converge, we can
include a heywood or an ultra heywood option in the proc
factor statement.
– The heywood option fixes to 1 any communality that
goes above 1 and continues iterating on the rest.
– The ultraheywood option continues iterating on all
variables, regardless of the size of the communalities,
hoping that the soultion eventually comes back inside
the parameter space (this rarely happens).
• When factor loadings are estimated using the principal
components method, none of these problems arise (assuming
that R is of full column rank).
793
Example: track records for men and women
• Use maximum liklelihood estimation to fit factor models to
standardized observations on national track records for men
and women in n = 54 countries (Table 8.6 and Table 1.9).
• The dataset for women includes seven variables x1, ..., x7 that
correspond to national records on 100m, 200m, 400m, 800m,
1500m, 3000m and marathon races.
• The dataset for men includes eight variables x1, ..., x8 that
correspond to national records on 100m, 200m, 400m, 800m,
1500m, 5000m, 10000m and marathon races.
• The first three times are in seconds and the remaining times
are in minutes.
794
Example: track records for men and women
• First examine scatterplots and correlation matrices.
• Test for the appropriate number of factors using
likelihood ratio tests.
• The HEYWOOD option is needed in the FACTOR procedure
in SAS for the three factor model for both men and women.
• SAS code to carry out these analyses is posted as records.sas
and R code is posted as records.R on the course web page.
795
Example: track records for men
• The scatter plot matrix reveals a couple of extreme countries
with slow times but they conform to the correlation pattern.
All relationships are approximately straight line relationships.
• Results indicate that the two factor model may be reasonable
(p-value=.0173). MLE’s for the factor loadings are inside the
parameter space and the two rotated (varimax) factors are
easy to interpret.
• Since the MLE for the three factor solution is on the
boundary of the parameter space, we are not justified in
using a large sample chi-square approximation for the
likelihood ratio test and the reported p-value (.2264) is
unreliabale.
796
100m
●
●●
● ●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●●●● ●
● ●●● ●
●● ●●
●
●
●
●
●
●● ● ●●
●●
●●●●● ●
●●●
●
● ●●
●
●
●
●
●
●●
●●
●● ● ●
●●
●●●●●●●●● ●
●
●●● ●● ●
●●●●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●●● ●
●
●●● ●●
●
●
●
●
●
●
●●●
●
●●●
●
●
●● ●
●●
●●
●●
●
●
●●
●
●
●
●
●● ●
●
●
●
● ●
●
● ●
−1 1
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●●●●
●
●
●●
●
●
●●
●
●
●
●
●● ●
●
●●
●●
●
●●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
−1 1
●
●
●
●
●
●● ●
●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●● ●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●●
●
●●
●
● ●●
●●
●●●
●
●
●●●●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●
●●
●
●●
●●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●●●●
●
● ●● ●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●●
●
●
●
● ●
●
●
● ●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●
● ●●
●
●
●
●
●
●
●●
●
3
● ●
●
●
●● ●
●
●●●
●●●●
●●
●●● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
3
●
3
●
●
●
●
●● ●
●● ●
●●
●●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
● ●
●●
●
800m
●
●
●●
●●
●
●
●
●●
●
● ●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●●●
●●
●●
●
●
● ●
●
●
●
●●●●●●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●●
●●
●●
●
●
●
● ●
●
●
●
●
● ●●
●●
●●●●● ●
●
●
●●●●
●
●●●
●●
●●
●
●
●●
●●●●●●●●●●
●
400m
●
●
●●
●
●● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●●
●
●
3
● ● ●
●●●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
200m
●
1
●
●
●
●
−1
● ● ●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●
3
1
−2
●
3
2
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
● ●
1
●
−2 0
−1
●
3
3
1
1
−2
−2
−1 1
3
●
● ●
● ●●●
●
●
●●●●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●●●
●
●●
●●●
●
● ●●●●
●
●
●●
●
●● ●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
● ●●
●
●
●
●●● ● ●
●●●
● ●
●●●
●●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
● ●●
●●●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
● ●●
●
●●
●
●
●
●
●
●
●●
●●●●●
●● ●●
●
●
●
●
●
●●●●●●
●
●●●●
●●●●●
●●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
1500m
● ●● ●
●●
●●●●
●●
●●
●
●●●●
●
●●
●●
●
●
●
●
●
●
●●
●
●●●●
●●
●
● ●
● ●● ●
●
●●
●
● ●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●●
● ●
●●
●
●●
●
●
●
●
●●●
● ●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
5000m
●
●
●
●
●●
●● ●
●●
●●
●
●
●
● ●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●● ●●
●
●
●
● ●
●
●●
●●
●
●
●
●
●●●
●
●●●
●
●●
●●●
●
●
●
●
●
● ●
●
●
●● ●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●●
●
●
●
●●
●
● ●●
●
●
●
●
●●
●●
●
●●● ●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
3
−1 1
●
●
●●
●●●
●
● ●●
●
●●
●
●
●
●●
●
●●
●
●
● ●
●
● ●●
●
●
●
●
●
●
●
●●
●●●●
●●
−2 0
2
● ●
●●
●●●●●
●
●●●●
●
●●
●●●
● ●●
●●
●●
●●●●●
●●●●
●●
●
●
● ●●
● ●●
●
●● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●●
−2
1
●
●
●
●●● ●
●
●●●
●
●
●●●●●
●●
●●
●●
●●●●
●
●
●
●
●●●●●●●
3
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●●
●
●●● ●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
−1 1
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
3
●●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
10000m
−1 1
3
●
●
●
−1 1
●
●
● ●● ●●
●
●●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●●●●
●
●●
●● ● ●●
●
● ●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
−1 1
marathon
3
797
Example: track records for men
Race
100m
200m
400m
800m
1500m
5000m
10000m
Marathon
Varimax
Factor 1
0.402
0.409
0.515
0.671
0.748
0.886
0.900
0.865
Varimax
Factor 2
0.839
0.892
0.711
0.581
0.554
0.450
0.424
0.402
Promax
Factor 1
0.097
0.068
0.300
0.568
0.682
0.915
0.946
0.910
Promax
Factor 2
0.868
0.933
0.643
0.393
0.316
0.109
0.069
0.063
ĥ2
i
0.865
0.963
0.772
0.787
0.866
0.987
0.989
0.912
• The Promax factors have correlation 0.697
• There is a distance race factor and a sprint factor
798
Example: track records for women
• The two factor model does not appear to fit for women
(p-value<.0001).
• The algorithm for fitting the three-factor model needed the
Heywood option to converge to a boundary solution. (The
reported p-value of 0.0295 for the likelihood ratio test is unreliable.)
• Two varimax rotated factors have a similar interpretation to
the rotated factors for men, but the marathon does not load
so highly on the distance factor
799
Example: track records for women
• Three varimax rotated factors allow for a sprint factor, a
1500m and 3000m factor, and a third factor for which the
800m and the marathon have relatively high loadings.
• A four factor solution does not split up the loadings on the
third factor for the 800m and marathon races.
800
−1
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●●●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
● ●
●
●
−2
0
2
100m
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●●●●
●●●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●●
●●●
●●
●●
●
●
●●●
●●●●
●
−1 1
●
●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
5
●
●
●●
●● ●
●●●
● ●
●
●
●
●
●
●●
●
●●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
3
●
●
●
●●
●●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●● ●
●
●● ●
● ●
●●●
● ●●
●●
●
●
●
● ●●●
●
●●●●
●●
●
●
●
●●
●
●●●
●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●●
●●●
●
●●●●●
●●
●
●
●
●● ●
●
●●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
3
−1
1
●
●●
●
● ●●
●
● ●
●
●●
●
●●
●
●
●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●●
●
●●
●
●●
●
●●
● ●
●
●
●●
●
●
●●●
●
●
●
●●
●●
●●
●
●
●
●●
●●
●●
●●
●
●
●●
●●●
●
●
●
●●
●
●
●●
●
●●●●
●●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●● ●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
● ●
●
● ●●
●
●
●
●
●
●●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●●●
●
●●●
●
●
●
● ●
●●●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●● ●
●●●
●
●
●
●
●●●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
800m
●
●
●● ●
●●
●●●
●●
●
●
●
●●
●
●●●
●
●
●
●
●
●●
●●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●● ●●
●●
●
●●
●●
●
●
●●
●●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●●●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●●●●
●●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
3
●
●
400m
−1
●● ●
●
● ●● ●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●
●
●
●
●●
●
●
●●
●
● ●
1
3
●
3
●
●
●
●
● ●
●●
●●●● ●
●●
●
●
●
●
●
●
●●
●
●●● ●
●●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
● ●
●
●
●
●●
●
●
●
●●●●
●
●
●
●● ●
●
●● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
200m
1
●
2
2
0
0
−2
−2
5
●
3
●
●
●●●● ●●
●
●
●● ●
●●●
●
●●
●●
●●●
●●
●
●
● ●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
3000m
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
1
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●●
●●●
●
●
●●
●●
●
●●
●
● ●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
1500m
●
●
●
●
●
●
●
●●
●●●
●●
●
●
● ●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
●●
●●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
3
●
●
●
● ●
●
●
●
● ●
●●
● ●
●
●●
●
● ●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●● ●
● ●
●●●
●
●
●
●●
●● ● ●
● ●
●
●●●
●
●●●
●●
●
●●● ●
●●
●●
● ●
● ●●
●
●
−2
0
2
●
●
●
●●
●
●
●
●
●●
●● ●●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●● ●●
●
●
● ●
●●
●●●●
●● ●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
−1
1
● ●
●
●
●
●●
●●
●
●●
● ●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
3
●
●●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1
1
●
●
●●
●
●
●
●●
●●●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
3
marathon
1
●
−1
−1 1
●
●
●●
●
●●
●●●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●●
●
●
●●●
●
●●●
−1
●
●
●● ●
●
●
●
●●
●●
●
●
●
●
●●●●
●● ●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
● ●
● ●
●
● ●
●
−1
1
3
801
Example: track records for women
Race
100m
200m
400m
800m
1500m
3000m
Marathon
Varimax
Factor 1
0.455
0.449
0.395
0.728
0.879
0.915
0.670
Varimax
Factor 2
0.836
0.880
0.832
0.571
0.460
0.367
0.432
Promax
Factor 1
0.151
0.120
0.075
0.639
0.891
0.985
0.662
Promax
Factor 2
0.841
0.901
0.867
0.359
0.139
0.000
0.200
ĥ2
i
0.906
0.976
0.848
0.856
0.984
0.972
0.662
• The Promax factors have correlation 0.693
• There is a distance race factor and a sprint factor
• When a third factor is included it reflects the three countries
with very slow 800m and marathon times
802
Download