Matrix-Vector Multiplication in Sub-Quadratic Time (Some Preprocessing Required) Ryan Williams Carnegie Mellon University 0-0 Introduction Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing 1 Introduction Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can n × n matrix-vector multiplication be? 1-a Introduction Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can n × n matrix-vector multiplication be? Θ(n2 ) steps just to read the matrix! 1-b Introduction Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can n × n matrix-vector multiplication be? Θ(n2 ) steps just to read the matrix! Main Result: If we allow O(n2+ε ) preprocessing, then matrix-vector multiplication over any finite semiring can be done in O(n2 /(ε log n)2 ). 1-c Better Algorithms for Matrix Multiplication Three of the major developments: 2 Better Algorithms for Matrix Multiplication Three of the major developments: • Arlazarov et al., a.k.a. “Four Russians” (1960’s): O(n3 / log n) operations Uses table lookups Good for hardware with short vector operations as primitives 2-a Better Algorithms for Matrix Multiplication Three of the major developments: • Arlazarov et al., a.k.a. “Four Russians” (1960’s): O(n3 / log n) operations Uses table lookups Good for hardware with short vector operations as primitives • Strassen (1969): n log 7 log 2 = O(n2.81 ) operations Asymptotically fast, but overhead in the big-O Experiments in practice are inconclusive about Strassen vs. Four Russians for Boolean matrix multiplication (Bard, 2006) 2-b Better Algorithms for Matrix Multiplication Three of the major developments: • Arlazarov et al., a.k.a. “Four Russians” (1960’s): O(n3 / log n) operations Uses table lookups Good for hardware with short vector operations as primitives • Strassen (1969): n log 7 log 2 = O(n2.81 ) operations Asymptotically fast, but overhead in the big-O Experiments in practice are inconclusive about Strassen vs. Four Russians for Boolean matrix multiplication (Bard, 2006) • Coppersmith and Winograd (1990): O(n2.376 ) operations Not yet practical 2-c Focus: Combinatorial Matrix Multiplication Algorithms 3 Focus: Combinatorial Matrix Multiplication Algorithms • Also called non-algebraic; let’s call them non-subtractive E.g. Four-Russians is combinatorial, Strassen isn’t 3-a Focus: Combinatorial Matrix Multiplication Algorithms • Also called non-algebraic; let’s call them non-subtractive E.g. Four-Russians is combinatorial, Strassen isn’t More Non-Subtractive Boolean Matrix Mult. Algorithms: • Atkinson and Santoro: O(n3 / log3/2 n) on a (log n)-word RAM • Rytter and Basch-Khanna-Motwani: O(n3 / log2 n) on a RAM • Chan: Four Russians can be implemented on O(n3 / log2 n) on a pointer machine 3-b Main Result The O(n3 / log 2 n) matrix multiplication algorithm can be “de-amortized” 4 Main Result The O(n3 / log 2 n) matrix multiplication algorithm can be “de-amortized” More precisely, we can: Preprocess an n × n matrix A over a finite semiring in O(n2+ε ) Such that vector multiplications with A can be done in O(n2 /(ε log n)2 ) 4-a Main Result The O(n3 / log 2 n) matrix multiplication algorithm can be “de-amortized” More precisely, we can: Preprocess an n × n matrix A over a finite semiring in O(n2+ε ) Such that vector multiplications with A can be done in O(n2 /(ε log n)2 ) Allows for “non-subtractive” matrix multiplication to be done on-line 4-b Main Result The O(n3 / log 2 n) matrix multiplication algorithm can be “de-amortized” More precisely, we can: Preprocess an n × n matrix A over a finite semiring in O(n2+ε ) Such that vector multiplications with A can be done in O(n2 /(ε log n)2 ) Allows for “non-subtractive” matrix multiplication to be done on-line Can be implemented on a pointer machine 4-c Main Result The O(n3 / log 2 n) matrix multiplication algorithm can be “de-amortized” More precisely, we can: Preprocess an n × n matrix A over a finite semiring in O(n2+ε ) Such that vector multiplications with A can be done in O(n2 /(ε log n)2 ) Allows for “non-subtractive” matrix multiplication to be done on-line Can be implemented on a pointer machine This Talk: The Boolean case 4-d Preprocessing Phase: The Boolean Case Partition the input matrix A into blocks of ⌈ε log n⌉ × ⌈ε log n⌉ size: A1,1 A1,2 ··· ε log n A= A2,1 .. Ai,j ε log n .. n ,1 A ε log n n A1, ε log n .. ··· 5 ··· A ε logn n , ε logn n Preprocessing Phase: The Boolean Case Build a graph G with parts P1 , . . . , Pn/(ε log n) , Q1 , . . . , Qn/(ε log n) P1 2ε log n 2ε log n Q1 2ε log n 2ε log n Q2 .. .. .. .. Each part has 2ε log n vertices, one for each possible ε log n vector P2 P 2ε log n n ε log n 2ε log n Q 6 n ε log n Preprocessing Phase: The Boolean Case Edges of G: Each vertex v in each Pi has exactly one edge into each Qj Pi 2ε log n 2ε log n Qj v Aj,iv 7 Preprocessing Phase: The Boolean Case Edges of G: Each vertex v in each Pi has exactly one edge into each Qj Pi 2ε log n 2ε log n Qj v Aj,iv Time to build the graph: n ε log n · n ε log n · 2ε log n · (ε log n)2 = O(n2+ε ) number number number of Qj of Pi of nodes in Pi matrix-vector mult of Aj,i and v 7-a How to Do Fast Vector Multiplications Let v be a column vector. Want: A · v. 8 How to Do Fast Vector Multiplications Let v be a column vector. Want: A · v. (1) Break up v into ε log n sized chunks: v= v1 v2 .. . n v ε log n 8-a How to Do Fast Vector Multiplications (2) For each i = 1, . . . , n/(ε log n), look up vi in Pi . 9 How to Do Fast Vector Multiplications (2) For each i = 1, . . . , n/(ε log n), look up vi in Pi . P1 2ε log n 2ε log n Q1 2ε log n 2ε log n Q2 .. .. .. .. v1 P2 v2 P 2ε log n n ε log n 2ε log n Q n ε log n vn/(ε log n) Takes Õ(n) time. 9-a How to Do Fast Vector Multiplications (2) For each i = 1, . . . , n/(ε log n), look up vi in Pi . P1 2ε log n 2ε log n Q1 2ε log n 2ε log n Q2 .. .. .. .. v1 P2 v2 P 2ε log n n ε log n 2ε log n Q n ε log n vn/(ε log n) Takes Õ(n) time. 10 How to Do Fast Vector Multiplications (3) Look up the neighbors of vi, mark each neighbor found. 11 How to Do Fast Vector Multiplications (3) Look up the neighbors of vi, mark each neighbor found. P1 2ε log n 2ε log n Q1 v1 A1,1 · v1 P2 2ε log n 2ε log n v2 Q2 A2,1 · v1 .. .. P 2ε log n n ε log n vn/(ε log n) .. .. 2ε log n Q n ε log n A ε logn n ,1 · v1 11-a How to Do Fast Vector Multiplications (3) Look up the neighbors of vi, mark each neighbor found. P1 2ε log n Q1 2ε log n v1 A1,2 · v2 P2 2ε log n 2ε log n v2 Q2 A2,2 · v2 .. .. P 2ε log n n ε log n vn/(ε log n) .. .. 2ε log n Q n ε log n A ε logn n ,2 · v2 12 How to Do Fast Vector Multiplications (3) Look up the neighbors of vi, mark each neighbor found. P1 2ε log n 2ε log n v1 P2 Q1 A1, ε logn n · vn/(ε log n) 2ε log n 2ε log n v2 Q2 A2, ε logn n · vn/(ε log n) .. .. P 2ε log n n ε log n vn/(ε log n) .. .. 2ε log n Q A n ε log n n n ε log n , ε log n · vn/(ε log n) 13 Takes O n ε log n 2 How to Do Fast Vector Multiplications (4) For each Qj , define vj′ as the OR of all marked vectors in Qj P1 2ε log n 2ε log n v1 P2 Q1 ∨ ⇒ v1′ 2ε log n 2ε log n v2 Q2 ∨ ⇒ v2′ .. .. P 2ε log n n ε log n vn/(ε log n) .. .. 2ε log n Q n ε log n ′ ∨ ⇒ vn/(ε log n) Takes Õ(n1+ε ) time 14 How to Do Fast Vector Multiplications (4) For each Qj , define vj′ as the OR of all marked vectors in Qj P1 2ε log n 2ε log n v1 P2 Q1 ∨ ⇒ v1′ 2ε log n 2ε log n v2 Q2 ∨ ⇒ v2′ .. .. P 2ε log n n ε log n vn/(ε log n) .. .. 2ε log n Q n ε log n ′ ∨ ⇒ vn/(ε log n) Takes Õ(n1+ε ) time 15 How to Do Fast Vector Multiplications v1′ v′ (5) Output v′ := ..2 . Claim: v′ = A · v. . v′ n ε log n 16 How to Do Fast Vector Multiplications v1′ v′ (5) Output v′ := ..2 . Claim: v′ = A · v. . v′ n ε log n Wn/(ε log n) ′ Proof: By definition, vj = i=1 Aj,i · vi . 16-a How to Do Fast Vector Multiplications v1′ v′ (5) Output v′ := ..2 . Claim: v′ = A · v. . v′ n ε log n Wn/(ε log n) ′ Proof: By definition, vj = i=1 Aj,i · vi . v1 A1,1 ··· A1,n/(ε log n) .. . . . .. .. .. Av = . n v ε log An/(ε log n),1 · · · An/(ε log n),n/(ε log n) n 16-b How to Do Fast Vector Multiplications v1′ v′ (5) Output v′ := ..2 . Claim: v′ = A · v. . v′ n ε log n Wn/(ε log n) ′ Proof: By definition, vj = i=1 Aj,i · vi . v1 A1,1 ··· A1,n/(ε log n) . . . . .. .. .. .. Av = n v ε log An/(ε log n),1 · · · An/(ε log n),n/(ε log n) n Wn/(ε log n) Wn/(ε log n) = ( i=1 A1,i · vi , . . . , i=1 A1,n/(ε log n) · vi ) = v ′ . 16-c Some Applications Can quickly compute the neighbors of arbitrary vertex subsets Let A be the adjacency matrix of G = (V, E). Let vS be the indicator vector for a S ⊆V. 17 Some Applications Can quickly compute the neighbors of arbitrary vertex subsets Let A be the adjacency matrix of G = (V, E). Let vS be the indicator vector for a S Proposition: ⊆V. A · vS is the indicator vector for N (S), the neighborhood of S . 17-a Some Applications Can quickly compute the neighbors of arbitrary vertex subsets Let A be the adjacency matrix of G = (V, E). Let vS be the indicator vector for a S Proposition: ⊆V. A · vS is the indicator vector for N (S), the neighborhood of S . Corollary: After O(n2+ε ) preprocessing, can determine the neighborhood of any vertex subset in O(n2 /(ε log n)2 ) time. (One level of BFS in o(n2 ) time) 17-b Graph Queries Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in O(n2 /(ε log n)2 ) time. 18 Graph Queries Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in O(n2 /(ε log n)2 ) time. Proof: Let S ⊆ V. 18-a Graph Queries Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in O(n2 /(ε log n)2 ) time. Proof: Let S ⊆ V. S is dominating ⇐⇒ S ∪ N (S) = V . 18-b Graph Queries Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in O(n2 /(ε log n)2 ) time. Proof: Let S ⊆ V. S is dominating ⇐⇒ S ∪ N (S) = V . S is independent ⇐⇒ S ∩ N (S) = ∅. 18-c Graph Queries Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in O(n2 /(ε log n)2 ) time. Proof: Let S ⊆ V. S is dominating ⇐⇒ S ∪ N (S) = V . S is independent ⇐⇒ S ∩ N (S) = ∅. S is a vertex cover ⇐⇒ V − S is independent. 18-d Graph Queries Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in O(n2 /(ε log n)2 ) time. Proof: Let S ⊆ V. S is dominating ⇐⇒ S ∪ N (S) = V . S is independent ⇐⇒ S ∩ N (S) = ∅. S is a vertex cover ⇐⇒ V − S is independent. Each can be quickly determined from knowing S and N (S). 18-e Triangle Detection 19 Triangle Detection Problem: Triangle Detection Given: Graph G and vertex i. Question: Does i participate in a 3-cycle, a.k.a. triangle? 19-a Triangle Detection Problem: Triangle Detection Given: Graph G and vertex i. Question: Does i participate in a 3-cycle, a.k.a. triangle? Worst Case: Can take Θ(n2 ) time to check all pairs of neighbors of i 19-b Triangle Detection Problem: Triangle Detection Given: Graph G and vertex i. Question: Does i participate in a 3-cycle, a.k.a. triangle? Worst Case: Can take Θ(n2 ) time to check all pairs of neighbors of i Corollary: After O(n2+ε ) preprocessing on G, can solve triangle detection for arbitrary vertices in O(n2 /(ε log n)2 ) time. 19-c Triangle Detection Problem: Triangle Detection Given: Graph G and vertex i. Question: Does i participate in a 3-cycle, a.k.a. triangle? Worst Case: Can take Θ(n2 ) time to check all pairs of neighbors of i Corollary: After O(n2+ε ) preprocessing on G, can solve triangle detection for arbitrary vertices in O(n2 /(ε log n)2 ) time. Proof: Given vertex i, let S be its set of neighbors (gotten in O(n) time). S is not independent ⇐⇒ i participates in a triangle. 19-d Conclusion A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques 20 Conclusion A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques • Is there a preprocessing/multiplication algorithm for sparse matrices? Can we do multiplication in e.g. O(m/poly(log n) + n), where m = number of nonzeroes? 20-a Conclusion A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques • Is there a preprocessing/multiplication algorithm for sparse matrices? Can we do multiplication in e.g. O(m/poly(log n) + n), where m = number of nonzeroes? • Can the algebraic matrix multiplication algorithms (Strassen, etc.) be applied to this problem? 20-b Conclusion A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques • Is there a preprocessing/multiplication algorithm for sparse matrices? Can we do multiplication in e.g. O(m/poly(log n) + n), where m = number of nonzeroes? • Can the algebraic matrix multiplication algorithms (Strassen, etc.) be applied to this problem? • Can our ideas be extended to achieve non-subtractive Boolean matrix 2 multiplication in o(n3 / log n)? 20-c Thank you! 21