2.1 Marginalized Probabilities in Factor Models • W is a D × L matrix (loading matrix) • Ψ is a D × D covariance matrix Introduction to Machine Learning – Assumed to be diagonal • What does W do? The role of the loading matrix is to convert the L length vector (zi ) to a D length vector. This “transformed” vector is then added with another vector µ and used as a mean. The actual observation xi is considered as a sample from a multivariate Gaussian with mean equal to the vector thus obtained and covariance matrix W. CSE474/574: Factor Models Varun Chandola <chandola@buffalo.edu> 2.1 Outline Marginalized Probabilities in Factor Models p(xi |θ) = Contents 1 Latent Linear Models 1 2 Factor Analysis Models 2.1 Marginalized Probabilities in Factor Models . . . . . 2.2 Intepreting Latent Factors . . . . . . . . . . . . . . . 2.3 Issue of Unidentifiability with Factor Analysis Model 2.4 Learning Factor Analysis Model Parameters . . . . . 1 2 2 3 3 . . . . . . . . . . . . . . . . . . . . . . . . 3 Extending Factor Analysis 1 . . . . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 = Z Zzi zi p(xi |zi , θ)p(zi )dzi N (Wzi + µ, Ψ)N (µ0 , Σ0 )dzi = N (Wµ0 + µ, Ψ + WΣ0 W> ) • Every xi is a multivariate distribution with same parameters!! • What is the mean and covariance of x (dropping the subscript)? • Often µ0 is set to 0 and Σ0 = I • How many parameters needed to specify the covariance? mean(x) = µ cov(x) = Ψ + WW> Latent Linear Models • Original: D2 Mixture Models • Factor analysis model: LD + D (remember Ψ is a diagonal matrix) • One latent variable zi ∈ {1, 2, . . . , K} P (zi = k) = πk K X p(xi |θ) = p(zi = k)pk (xi |θ) k=1 L What if zi ∈ < ? Intepreting Latent Factors • What is the original intent behind LVMs? – Richer models of p(x) • But they can also be used as a lower dimensional representation of x. • Mixture models? • Factor analysis model? p(zi ) = N (zi |µ0 , Σ0 ) Z p(xi |zi , θ) p(zi )dzi p(xi |θ) = zi 2 2.2 Factor Analysis Models • Assumption: xi is a multivariate Gaussian random variable – What is p(zi |xi , θ)? p(zi |xi , θ) = N (mi , Σ) > −1 −1 Σ , (Σ−1 0 + W Ψ W) > −1 mi , Σ(W Ψ (xi − µ) + Σ−1 0 µ0 ) The mixture models assume that every observed data point xi comes from a mixture component, zi . So in some way, each multi-dimensional vector is represented as a discrete category. • Each xi has a corresponding zi • Mean is a function of zi • Each zi is a multivariate Gaussian random variable with mean mi (A L × 1 vector) • Covariance matrix is fixed p(xi |zi , θ) = N (Wzi + µ, Ψ) • One can “embed” xi (D × 1 vector) into a L × 1 space 2.3 Issue of Unidentifiability with Factor Analysis Model 2.3 3 Issue of Unidentifiability with Factor Analysis Model • Consider an orthogonal rotation matrix R RR> = I c = WR • Let W c will also have the same result, i.e., the pdf of observed x will still be the • The FA model with W same • Thus FA model can have multiple solutions • The predictive power of the model does not change • But intepreting latent factors can be an issue 2.4 Learning Factor Analysis Model Parameters • FA model parameters: W, Ψ, µ – µ0 and Σ0 can be “absorbed” in W and µ, respectively • A simple extension of the mixture model EM algorithm will work here Factor Analysis - A Real World Example • 2004 Cars Data • Original - 11 features • Factor analysis results in 2 factors 3 Extending Factor Analysis • If we use a non-gaussian distribution for p(zi ) we arrive at Independent Component Analysis. • If Ψ = σ 2 I and W is orthonormal ⇒ FA is equivalent to Probabilistic Principal Components Analysis (PPCA) • If σ 2 → 0, FA is equivalent to PCA • What is PCA? References