Dimension Independent Matrix Square Reza Zadeh Dimension Independent Matrix Square Introduction First Pass Reza Zadeh Consulting Professor, Stanford University DIMSUM Analysis Experiments More Results June 18 2014 Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 1 / 33 Notation for matrix A Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM Analysis Experiments More Results Given m × n matrix A, with m n. a1,1 a1,2 · · · a1,n a2,1 a2,2 · · · a2,n A= . .. .. . . . . . . . am,1 am,2 · · · am,n A is tall and skinny, example values m = 1012 , n = {104 , 106 }. A has sparse rows, each row has at most L nonzeros. A is stored across hundreds of machines and cannot be streamed through a single machine. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 2 / 33 Computing AT A Dimension Independent Matrix Square Reza Zadeh Introduction We compute AT A. First Pass DIMSUM Analysis AT A is n × n, considerably smaller than A. Experiments AT A is dense. More Results Holds dot products between all pairs of columns of A. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 3 / 33 Guarantees Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM Analysis Experiments More Results There is a knob γ which can be turned to preserve similarities and singular values. Paying O(nLγ) communication cost and O(γ) computation cost. With a low setting of γ, preserve similar entries of AT A (via Cosine, Dice, Overlap, and Jaccard similarity). With a high setting of γ, preserve singular values of AT A. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 4 / 33 Computing All Pairs of Cosine Similarities Dimension Independent Matrix Square Reza Zadeh Introduction We have to find dot products between all pairs of columns of A We prove results for general matrices, but can do better for those entries with cos(i, j) ≥ s First Pass DIMSUM Analysis Experiments More Results Cosine similarity: a widely used definition for “similarity" between two vectors cos(i, j) = ciT cj ||ci ||||cj || ci is the i 0 th column of A Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 5 / 33 Example matrix Dimension Independent Matrix Square Reza Zadeh Introduction Rows: users. Columns: movies. First Pass DIMSUM Analysis Experiments More Results Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 6 / 33 Distributed Computing Environment Dimension Independent Matrix Square Reza Zadeh Introduction First Pass With such large datasets, we must use many machines. DIMSUM Algorithm code available in Spark and Scalding. Analysis Experiments More Results Described with Maps and Reduces so that the framework takes care of distributing the computation. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 7 / 33 Naive Implementation Dimension Independent Matrix Square 1 Given row ri , Map with NaiveMapper (Algorithm 1) Reza Zadeh 2 Reduce using the NaiveReducer (Algorithm 2) Introduction First Pass Algorithm 1 NaiveMapper(ri ) DIMSUM Analysis Experiments More Results for all pairs (aij , aik ) in ri do Emit ((j, k ) → aij aik ) end for Algorithm 2 NaiveReducer((i, j), hv1 , . . . , vR i) P output ciT cj → R i=1 vi Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 8 / 33 Analysis for First Pass Dimension Independent Matrix Square Reza Zadeh Introduction Very easy analysis 1) Shuffle size: O(mL2 ) First Pass 2) Largest reduce-key: O(m) DIMSUM Analysis Experiments More Results Both depend on m, the larger dimension, and are intractable for m = 1012 , L = 100. We’ll bring both down via clever sampling Assuming column norms are known or estimates available Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 9 / 33 Dimension Independent Matrix Square using MapReduce Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM Analysis Algorithm 3 DIMSUMMapper(ri ) for all pairs (aij , aik ) in ri do 1 With probability min 1, γ ||cj ||||c k || emit ((j, k ) → aij aik ) end for Experiments More Results Algorithm 4 DIMSUMReducer((i, j), hv1 , . . . , vR i) if γ ||ci ||||cj || > 1 then output bij → 1 ||ci ||||cj || PR i=1 vi else output bij → end if Reza Zadeh (Stanford) 1 γ PR i=1 vi Dimension Independent Matrix Square 18 June 2014 10 / 33 Analysis for DIMSUM Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM The algorithm outputs bij , which is a matrix of cosine similarities, call it B. Four things to prove: 1 Shuffle size: O(nLγ) 2 Largest reduce-key: O(γ) 3 The sampling scheme preserves similarities when γ = Ω(log(n)/s) 4 The sampling scheme preserves singular values when γ = Ω(n/2 ) Analysis Experiments More Results Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 11 / 33 Shuffle size for DIMSUM Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM Analysis Experiments O(nLγ) has no dependence on the dimension m, this is the heart of DIMSUM. Happens because higher magnitude columns are sampled with lower probability: More Results γ Reza Zadeh (Stanford) 1 ||c1 ||||c2 || Dimension Independent Matrix Square 18 June 2014 12 / 33 Shuffle size for DIMSUM Dimension Independent Matrix Square Reza Zadeh Introduction For matrices with real entries, we can still get a bound First Pass DIMSUM Let H be the smallest nonzero entry in magnitude, after all entries of A have been scaled to be in [−1, 1] Analysis Experiments More Results E.g. for {0, 1} matrices, we have H = 1 Shuffle size is bounded by O(nLγ/H 2 ) Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 13 / 33 Largest reduce key for DIMSUM Dimension Independent Matrix Square Reza Zadeh Introduction DIMSUM Each reduce key receives at most γ values (the oversampling parameter) Analysis Immediately get that reduce-key complexity is O(γ) Experiments Also independent of dimension m. Happens because high magnitude columns are sampled with lower probability. First Pass More Results Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 14 / 33 Correctness Dimension Independent Matrix Square Reza Zadeh Introduction Analysis Since higher magnitude columns are sampled with lower probability, are we guaranteed to obtain correct results w.h.p.? Experiments Yes. By setting γ correctly. More Results Preserve similarities when γ = Ω(log(n)/s) First Pass DIMSUM Preserve singular values when γ = Ω(n/2 ) Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 15 / 33 Correctness Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM Analysis Theorem Let A be an m × n tall and skinny (m > n) matrix. If γ = Ω(n/2 ) and D a diagonal matrix with entries dii = ||ci ||, then the matrix B output by DIMSUM satisfies, Experiments ||DBD − AT A||2 ≤ ||AT A||2 More Results with probability at least 1/2. Relative error guaranteed to be low with constant probability. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 16 / 33 Proof Dimension Independent Matrix Square Reza Zadeh Introduction First Pass Uses Latala’s theorem, bounds 2nd and 4th central moments of entries of B. DIMSUM Analysis Experiments More Results Really need extra power of moments. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 17 / 33 Latala’s Theorem Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM Analysis Experiments More Results Theorem (Latala’s theorem). Let X be a random matrix whose entries xij are independent centered random variables with finite fourth moment. Denoting ||X ||2 as the matrix spectral norm, we have 1/2 !1/2 X X 2 2 E xij E ||X ||2 ≤ C[max E xij + max i j j 1/4 + i X E xij4 ]. i,j Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 18 / 33 Proof Dimension Independent Matrix Square Reza Zadeh Introduction Prove two things First Pass DIMSUM E[(bij − Ebij )2 ] ≤ Analysis Experiments More Results E[(bij − Ebij )4 ] ≤ 1 γ (easy) 2 (not easy) γ2 Details in paper. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 19 / 33 Correctness Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM Analysis Theorem For any two columns ci and cj having cos(ci , cj ) ≥ s, let B P be the output of DIMSUM with entries bij = γ1 m k =1 Xijk with Xijk as the indicator for the k ’th coin in the call to DIMSUMMapper. Now if γ = Ω(α/s), then we have, Experiments h T i Pr ||ci ||||cj ||bij > (1 + δ)[A A]ij ≤ More Results eδ (1 + δ)(1+δ) α and h i Pr ||ci ||||cj ||bi,j < (1 − δ)[AT A]ij < exp(−αδ 2 /2) Relative error guaranteed to be low with high probability. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 20 / 33 Correctness Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM Analysis Experiments More Results Proof. In the paper Uses standard concentration inequality for sums of indicator random variables. Ends up requiring that the oversampling parameter γ be set to γ = log(n2 )/s = 2 log(n)/s. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 21 / 33 Observations Dimension Independent Matrix Square Reza Zadeh Introduction First Pass DIMSUM helpful when there are some popular columns DIMSUM Analysis Experiments More Results e.g. The Netflix Matrix (some columns way more popular than others) Power-law columns are effectively neutralized Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 22 / 33 In practice Dimension Independent Matrix Square Reza Zadeh Introduction First Pass Forget about theoretical settings for γ Crank up γ until application works DIMSUM Analysis Experiments More Results Estimates for ||ci || can be used, expectations still hold, but concentration isn’t guaranteed If using for singular values, watch for ill-conditioned matrices Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 23 / 33 Experiments Dimension Independent Matrix Square Large scale production live at twitter.com Reza Zadeh Introduction First Pass DIMSUM Analysis Experiments More Results Smaller scale experiment with columns as words, and rows as tweets m = 200M, n = 1000, L = 10 Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 24 / 33 Experiments Dimension Independent Matrix Square DISCO Cosine Similarity 2 1.8 Reza Zadeh 1.6 Introduction 1.4 avg relative err First Pass DIMSUM Analysis Experiments More Results 1.2 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 similarity threshold Figure : Average error for all pairs with similarity threshold s. Error decreases for more similar pairs. γ = 200 Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 25 / 33 Other Similarity Measures Dimension Independent Matrix Square Reza Zadeh Picking out similar columns work for some other similarity measures. Introduction First Pass DIMSUM Analysis Similarity Cosine Definition √ #(x,y √) Shuffle Size O(nL log(n)/s) Reduce-key size O(log(n)/s) Jaccard #(x,y ) #(x)+#(y )−#(x,y ) O((n/s) log(n/s)) O(log(n/s)/s) Overlap #(x,y ) min(#(x),#(y )) O(nL log(n)/s) O(log(n)/s) Dice 2#(x,y ) #(x)+#(y ) O(nL log(n)/s) O(log(n)/s) #(x) #(y ) Experiments More Results Table : All sizes are independent of m, the dimension. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 26 / 33 Locality Sensitive Hashing Dimension Independent Matrix Square Reza Zadeh Introduction MinHash from the Locality-Sensitive-Hashing family can have its vanilla implementation greatly improved by DIMSUM. First Pass DIMSUM Analysis Experiments More Results Another set of theorems for shuffle size and correctness in DISCO paper. stanford.edu/~rezab/papers/disco.pdf Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 27 / 33 Conclusion Dimension Independent Matrix Square Reza Zadeh Introduction Consider DIMSUM if you ever need to compute AT A for large sparse A First Pass DIMSUM Analysis Experiments More Results Many more experiments and results in paper at stanford.edu/~rezab Thank you! Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 28 / 33 Shuffle size for DIMSUM Dimension Independent Matrix Square Reza Zadeh Theorem For {0, 1} matrices, the expected shuffle size for DIMSUMMapper is O(nLγ). Proof. The expected contribution from each pair of columns will constitute the shuffle size: = n X n X #(ci ,cj ) i=1 j=i+1 k =1 n X n X X Pr[DIMSUMEmit(ci , cj )] #(ci , cj )Pr[DIMSUMEmit(ci , cj )] i=1 j=i+1 Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 29 / 33 Shuffle size for DIMSUM Dimension Independent Matrix Square Proof. Reza Zadeh Reza Zadeh (Stanford) ≤ n X n X #(ci , cj ) γp p #(ci ) #(cj ) i=1 j=i+1 Dimension Independent Matrix Square 18 June 2014 30 / 33 Shuffle size for DIMSUM Dimension Independent Matrix Square Proof. Reza Zadeh ≤ n X n X #(ci , cj ) γp p #(ci ) #(cj ) i=1 j=i+1 n n γX X 1 1 (by AM-GM) ≤ #(ci , cj )( + ) 2 #(ci ) #(cj ) i=1 j=i+1 Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 30 / 33 Shuffle size for DIMSUM Dimension Independent Matrix Square Proof. Reza Zadeh ≤ n X n X #(ci , cj ) γp p #(ci ) #(cj ) i=1 j=i+1 n n γX X 1 1 (by AM-GM) ≤ #(ci , cj )( + ) 2 #(ci ) #(cj ) i=1 j=i+1 ≤γ n X i=1 Reza Zadeh (Stanford) n 1 X #(ci , cj ) #(ci ) j=1 Dimension Independent Matrix Square 18 June 2014 30 / 33 Shuffle size for DIMSUM Dimension Independent Matrix Square Proof. Reza Zadeh ≤ n X n X #(ci , cj ) γp p #(ci ) #(cj ) i=1 j=i+1 n n γX X 1 1 (by AM-GM) ≤ #(ci , cj )( + ) 2 #(ci ) #(cj ) i=1 j=i+1 ≤γ n X i=1 ≤γ n X i=1 Reza Zadeh (Stanford) n 1 X #(ci , cj ) #(ci ) j=1 1 L#(ci ) = γLn #(ci ) Dimension Independent Matrix Square 18 June 2014 30 / 33 Combiners Dimension Independent Matrix Square Reza Zadeh All bounds are without combining: can only get better with combining For similarities, DIMSUM (without combiners) beats naive with combining outright For singular values, DIMSUM (without combiners) beats naive with combining if the number of machines is large (e.g. 1000) DIMSUM with combining empirically beats naive with combining Difficult to analyze combiners since they happen at many levels. Optimizations break models. DIMSUM with combiners is best of both. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 31 / 33 Combiners Dimension Independent Matrix Square Reza Zadeh With k machines DIMSUM shuffle with combiner: O(min(nLγ, kn2 )) DIMSUM reduce-key with combiner: O(min(γ, k )) Naive shuffle with combiner: O(kn2 ) Naive reduce-key with combiner: O(k ) DIMSUM with combiners is best of both. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 32 / 33 Experiments Dimension Independent Matrix Square DISCO Cosine shuffle size vs accuracy tradeoff 1 2.5 DISCO Shuffle / Naive Shuffle Reza Zadeh 0.8 2 0.6 1.5 0.4 1 0.2 0.5 0 0 2 4 6 8 10 12 14 16 avg relative err DISCO Shuffle / Naive Shuffle avg relative err 0 18 log(p / ε) Figure : As γ = p/ increases, shuffle size increases and error decreases. There is no thresholding for highly similar pairs here. Reza Zadeh (Stanford) Dimension Independent Matrix Square 18 June 2014 33 / 33