Dimensionality Reduction Dimensionality Reduction • High-dimensional == many features • Find concepts/topics/genres: – Documents: • Features: Thousands of words, millions of word pairs – Surveys – Netflix: 480k users x 177k movies Slides by Jure Leskovec 2 Dimensionality Reduction • Compress / reduce dimensionality: – 106 rows; 103 columns; no updates – random access to any cell(s); small error: OK Slides by Jure Leskovec 3 Dimensionality Reduction • Assumption: Data lies on or near a low d-dimensional subspace • Axes of this subspace are effective representation of the data Slides by Jure Leskovec 4 Why Reduce Dimensions? Why reduce dimensions? • Discover hidden correlations/topics – Words that occur commonly together • Remove redundant and noisy features – Not all words are useful • Interpretation and visualization • Easier storage and processing of the data Slides by Jure Leskovec 5 SVD - Definition A[m x n] = U[m x r] [ r x r] (V[n x r])T • A: Input data matrix – m x n matrix (e.g., m documents, n terms) • U: Left singular vectors – m x r matrix (m documents, r concepts) • : Singular values – r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix A) • V: Right singular vectors – n x r matrix (n terms, r concepts) Slides by Jure Leskovec 6 SVD T n n m A m VT U Slides by Jure Leskovec 7 SVD T n m A 1u1v1 2u2v2 + Slides by Jure Leskovec σi … scalar ui … vector vi … vector8 SVD - Properties It is always possible to decompose a real matrix A into A = U VT , where • U, , V: unique • U, V: column orthonormal: – UT U = I; VT V = I (I: identity matrix) – (Cols. are orthogonal unit vectors) • : diagonal – Entries (singular values) are positive, and sorted in decreasing order (σ1 σ2 ... 0) Slides by Jure Leskovec 9 SVD – Example: Users-to-Movies Serenity Casablanca Amelie 1 2 SciFi 1 5 0 Romnce 0 0 Alien Matrix • A = U VT - example: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 10 SVD – Example: Users-to-Movies Serenity Casablanca Amelie 1 2 SciFi 1 5 0 0 Romnce 0 Alien Matrix • A = U VT - example: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 SciFi-concept Romance-concept = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 11 SVD - Example Serenity Casablanca Amelie 1 2 SciFi 1 5 0 0 Romnce 0 Alien Matrix • A = U VT - example: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 U is “user-to-concept” similarity matrix SciFi-concept Romance-concept = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 12 SVD - Example Serenity Casablanca Amelie 1 2 SciFi 1 5 0 0 Romnce 0 Alien Matrix • A = U VT - example: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 ‘strength’ of SciFi-concept = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 13 SVD - Example Serenity Casablanca Amelie 1 2 SciFi 1 5 0 0 Romnce 0 Alien Matrix • A = U VT - example: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 V is “movie-to-concept” similarity matrix 0 SciFi-concept 0 9.64 0 0 x x 0 5.29 0 0.53 0.80 0.58 0.58 0.58 0 0 0.27 0 0 0 0.71 0.71 Slides by Jure Leskovec 14 SVD - Example Serenity Casablanca Amelie 1 2 SciFi 1 5 0 0 Romnce 0 Alien Matrix • A = U VT - example: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 V is “movie-to-concept” similarity matrix 0 SciFi-concept 0 9.64 0 0 x x 0 5.29 0 0.53 0.80 0.58 0.58 0.58 0 0 0.27 0 0 0 0.71 0.71 Slides by Jure Leskovec 15 SVD - Interpretation #1 ‘movies’, ‘users’ and ‘concepts’: • U: user-to-concept similarity matrix • V: movie-to-concept sim. matrix • : its diagonal elements: ‘strength’ of each concept Slides by Jure Leskovec 16 SVD gives best axis to project on: • ‘best’ = min sum of squares of projection errors • minimum reconstruction error Movie 2 rating SVD - interpretation #2 first right singular vector v1 Movie 1 rating Slides by Jure Leskovec 17 • A = U VT - example: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec Movie 2 rating SVD - Interpretation #2 first right singular vector v1 Movie 1 rating 9.64 0 0 5.29 x v1 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 18 SVD - Interpretation #2 • A = U VT - example: variance (‘spread’) on the v1 axis 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 19 SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 20 SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A: Set the smallest singular values to zero A= 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 21 SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A: Set the smallest singular values to zero A= 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 ~ 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 0 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 22 SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A: Set the smallest singular values to zero: A= 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 ~ 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 0 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 23 SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A: Set the smallest singular values to zero: A= 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 ~ 0.18 0.36 0.18 0.90 0 0 0 x Slides by Jure Leskovec 9.64 x 0.58 0.58 0.58 0 0 24 SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A: Set the smallest singular values to zero B= A= 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 ~ 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Slides by Jure Leskovec Frobenius norm: ǁMǁF = Σij Mij2 ǁA-BǁF = Σij (Aij-Bij)2 is “small” 25 Sigma A U = VT B is approx A Sigma B = U VT Slides by Jure Leskovec 26 SVD – Best Low Rank Approx. • Theorem: Let A = U VT then B = U S VT (σ1σ2…, rank(A)=r) – S = diagonal nxn matrix where si=σi (i=1…k) else si=0 is a best rank-k approximation to A: – B is solution to minB ǁA-BǁF where rank(B)=k 𝜎11 Σ 𝜎𝑟𝑟 • We will need 2 facts: – 𝑀 𝐹 = 𝑖 𝑞𝑖𝑖 2 where M = P Q R is SVD of M – U VT - U S VT = U ( - S) VT Slides by Jure Leskovec 27 SVD – Best Low Rank Approx. • We will need 2 facts: – 𝑀 𝐹 = 𝑘 𝑞𝑘𝑘 2 where M = P Q R is SVD of M We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal – U VT - U S VT = U ( - S) VT Slides by Jure Leskovec 28 SVD – Best Low Rank Approx. • A = U VT , B = U S VT (σ1σ2… 0, rank(A)=r) – S = diagonal nxn matrix where si=σi (i=1…k) else si=0 then B is solution to minB ǁA-BǁF , rank(B)=k Why? r 2 min A B F min S F min si ( i si ) B , rank ( B ) k i 1 U VT - U S VT = U ( - S) VT • We want to choose si to minimize we set si=σ (i=1…k) else s =0 i i k r r min si ( i si ) 2 i 1 i Leskovec k 1 Slides by Jure 2 i 𝑖 i k 1 𝜎𝑖 − 𝑠𝑖 2 2 i 29 SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = u1 u2 x σ1 x σ2 v1 v2 Slides by Jure Leskovec 30 SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix m n 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 k terms = σ1 u1 nx1 vT 1 + σ2 u2 vT 2 +... 1xm Assume: σ1 σ2 σ3 ... 0 Why is setting small σs the thing to do? Vectors ui and vi are unit length, so σi scales them. So, zeroing small σs introduces less error. Slides by Jure Leskovec 31 SVD - Interpretation #2 Q: How many σs to keep? A: Rule-of-a thumb: keep 80-90% of ‘energy’ (=σi2) m n 1 2 1 5 0 0 0 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = σ1 u1 vT 1 + σ2 u2 vT 2 +... assume: σ1 σ2 σ3 ... Slides by Jure Leskovec 32 SVD - Complexity • To compute SVD: – O(nm2) or O(n2m) (whichever is less) • But: – Less work, if we just want singular values – or if we want first k singular vectors – or if the matrix is sparse • Implemented in linear algebra packages like – LINPACK, Matlab, SPlus, Mathematica ... Slides by Jure Leskovec 33 SVD - Conclusions so far • SVD: A= U VT: unique – U: user-to-concept similarities – V: movie-to-concept similarities – : strength of each concept • Dimensionality reduction: – keep the few largest singular values (80-90% of ‘energy’) – SVD: picks up linear correlations Slides by Jure Leskovec 34 Case study: How to query? Serenity Casablanca Amelie 1 2 SciFi 1 5 0 0 Romnce 0 Alien Matrix Q: Find users that like ‘Matrix’ and ‘Alien’ 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 35 Case study: How to query? Serenity Casablanca Amelie 1 2 SciFi 1 5 0 0 Romnce 0 Alien Matrix Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how? 1 2 1 5 0 0 0 1 2 1 5 0 0 0 0 0 0 0 2 3 1 0 0 0 0 2 3 1 = 0.18 0.36 0.18 0.90 0 0 0 0 0 0 x 0 0.53 0.80 0.27 Slides by Jure Leskovec 9.64 0 0 5.29 x 0.58 0.58 0.58 0 0 0 0 0 0.71 360.71 Case study: How to query? Amelie 0 0 0 Alien Casablanca 5 0 Serenity q= Alien Matrix Q: Find users that like ‘Matrix’ A: map query vectors into ‘concept space’ – how? q v2 v1 Project into concept space: Inner product with each ‘concept’ vector vi Matrix Slides by Jure Leskovec 37 Case study: How to query? Amelie 0 0 0 Alien Casablanca 5 0 Serenity q= Alien Matrix Q: Find users that like ‘Matrix’ A: map the vector into ‘concept space’ – how? q v2 v1 Project into concept space: Inner product with each ‘concept’ vector vi q*v1 Matrix Slides by Jure Leskovec 38 Case study: How to query? Compactly, we have: qconcept = q V Casablanca Amelie 5 0 Serenity q= Alien Matrix E.g.: SciFi-concept 0 0 0 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 = 2.9 0 movie-to-concept similarities Slides by Jure Leskovec 39 Case study: How to query? How would the user d that rated (‘Alien’, ‘Serenity’) be handled? dconcept = d V Casablanca Amelie 0 4 Serenity d= Alien Matrix E.g.: 5 0 0 0.58 0.58 0.58 0 0 0 0 0 0.71 0.71 movie-to-concept similarities Slides by Jure Leskovec SciFi-concept = 5.22 0 40 Case study: How to query? Casablanca Amelie d= 5 0 0 2.9 q= 5 0 0 0 0 5.22 0 Alien 0 4 Matrix Serenity Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’! SciFi-concept 0 Similarity ≠ 0 Similarity = 0 Slides by Jure Leskovec 41 SVD: Drawbacks + Optimal low-rank approximation: • in Frobenius norm - Interpretability problem: – A singular vector specifies a linear combination of all input columns or rows - Lack of sparsity: – Singular vectors are dense! VT = U Slides by Jure Leskovec 42