2.1 Marginalized Probabilities in Factor 2 Models

advertisement
2.1 Marginalized Probabilities in Factor
Models
• W is a D × L matrix (loading matrix)
• Ψ is a D × D covariance matrix
Introduction to Machine Learning
– Assumed to be diagonal
• What does W do? The role of the loading matrix is to convert the L length vector (zi ) to a D
length vector. This “transformed” vector is then added with another vector µ and used as a mean.
The actual observation xi is considered as a sample from a multivariate Gaussian with mean equal
to the vector thus obtained and covariance matrix W.
CSE474/574: Factor Models
Varun Chandola <chandola@buffalo.edu>
2.1
Outline
Marginalized Probabilities in Factor Models
p(xi |θ) =
Contents
1 Latent Linear Models
1
2 Factor Analysis Models
2.1 Marginalized Probabilities in Factor Models . . . . .
2.2 Intepreting Latent Factors . . . . . . . . . . . . . . .
2.3 Issue of Unidentifiability with Factor Analysis Model
2.4 Learning Factor Analysis Model Parameters . . . . .
1
2
2
3
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Extending Factor Analysis
1
.
.
.
.
.
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
=
Z
Zzi
zi
p(xi |zi , θ)p(zi )dzi
N (Wzi + µ, Ψ)N (µ0 , Σ0 )dzi
= N (Wµ0 + µ, Ψ + WΣ0 W> )
• Every xi is a multivariate distribution with same parameters!!
• What is the mean and covariance of x (dropping the subscript)?
• Often µ0 is set to 0 and Σ0 = I
• How many parameters needed to specify the covariance?
mean(x) = µ
cov(x) = Ψ + WW>
Latent Linear Models
• Original: D2
Mixture Models
• Factor analysis model: LD + D (remember Ψ is a diagonal matrix)
• One latent variable
zi ∈ {1, 2, . . . , K}
P (zi = k) = πk
K
X
p(xi |θ) =
p(zi = k)pk (xi |θ)
k=1
L
What if zi ∈ < ?
Intepreting Latent Factors
• What is the original intent behind LVMs?
– Richer models of p(x)
• But they can also be used as a lower dimensional representation of x.
• Mixture models?
• Factor analysis model?
p(zi ) = N (zi |µ0 , Σ0 )
Z
p(xi |zi , θ) p(zi )dzi
p(xi |θ) =
zi
2
2.2
Factor Analysis Models
• Assumption: xi is a multivariate Gaussian random variable
– What is p(zi |xi , θ)?
p(zi |xi , θ) = N (mi , Σ)
> −1
−1
Σ , (Σ−1
0 + W Ψ W)
> −1
mi , Σ(W Ψ (xi − µ) + Σ−1
0 µ0 )
The mixture models assume that every observed data point xi comes from a mixture component, zi . So
in some way, each multi-dimensional vector is represented as a discrete category.
• Each xi has a corresponding zi
• Mean is a function of zi
• Each zi is a multivariate Gaussian random variable with mean mi (A L × 1 vector)
• Covariance matrix is fixed
p(xi |zi , θ) = N (Wzi + µ, Ψ)
• One can “embed” xi (D × 1 vector) into a L × 1 space
2.3 Issue of Unidentifiability with Factor
Analysis Model
2.3
3
Issue of Unidentifiability with Factor Analysis Model
• Consider an orthogonal rotation matrix R
RR> = I
c = WR
• Let W
c will also have the same result, i.e., the pdf of observed x will still be the
• The FA model with W
same
• Thus FA model can have multiple solutions
• The predictive power of the model does not change
• But intepreting latent factors can be an issue
2.4
Learning Factor Analysis Model Parameters
• FA model parameters: W, Ψ, µ
– µ0 and Σ0 can be “absorbed” in W and µ, respectively
• A simple extension of the mixture model EM algorithm will work here
Factor Analysis - A Real World Example
• 2004 Cars Data
• Original - 11 features
• Factor analysis results in 2 factors
3
Extending Factor Analysis
• If we use a non-gaussian distribution for p(zi ) we arrive at Independent Component Analysis.
• If Ψ = σ 2 I and W is orthonormal ⇒ FA is equivalent to Probabilistic Principal Components
Analysis (PPCA)
• If σ 2 → 0, FA is equivalent to PCA
• What is PCA?
References
Download