Matrix-Vector Multiplication in Sub-Quadratic Time

advertisement
Matrix-Vector Multiplication in Sub-Quadratic Time
(Some Preprocessing Required)
Ryan Williams
Carnegie Mellon University
0-0
Introduction
Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing
1
Introduction
Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing
How fast can n × n matrix-vector multiplication be?
1-a
Introduction
Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing
How fast can n × n matrix-vector multiplication be?
Θ(n2 ) steps just to read the matrix!
1-b
Introduction
Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing
How fast can n × n matrix-vector multiplication be?
Θ(n2 ) steps just to read the matrix!
Main Result: If we allow O(n2+ε ) preprocessing, then matrix-vector
multiplication over any finite semiring can be done in O(n2 /(ε log n)2 ).
1-c
Better Algorithms for Matrix Multiplication
Three of the major developments:
2
Better Algorithms for Matrix Multiplication
Three of the major developments:
• Arlazarov et al., a.k.a. “Four Russians” (1960’s): O(n3 / log n) operations
Uses table lookups
Good for hardware with short vector operations as primitives
2-a
Better Algorithms for Matrix Multiplication
Three of the major developments:
• Arlazarov et al., a.k.a. “Four Russians” (1960’s): O(n3 / log n) operations
Uses table lookups
Good for hardware with short vector operations as primitives
• Strassen (1969): n
log 7
log 2
= O(n2.81 ) operations
Asymptotically fast, but overhead in the big-O
Experiments in practice are inconclusive about Strassen vs. Four
Russians for Boolean matrix multiplication (Bard, 2006)
2-b
Better Algorithms for Matrix Multiplication
Three of the major developments:
• Arlazarov et al., a.k.a. “Four Russians” (1960’s): O(n3 / log n) operations
Uses table lookups
Good for hardware with short vector operations as primitives
• Strassen (1969): n
log 7
log 2
= O(n2.81 ) operations
Asymptotically fast, but overhead in the big-O
Experiments in practice are inconclusive about Strassen vs. Four
Russians for Boolean matrix multiplication (Bard, 2006)
• Coppersmith and Winograd (1990): O(n2.376 ) operations
Not yet practical
2-c
Focus: Combinatorial Matrix Multiplication Algorithms
3
Focus: Combinatorial Matrix Multiplication Algorithms
• Also called non-algebraic; let’s call them non-subtractive
E.g. Four-Russians is combinatorial, Strassen isn’t
3-a
Focus: Combinatorial Matrix Multiplication Algorithms
• Also called non-algebraic; let’s call them non-subtractive
E.g. Four-Russians is combinatorial, Strassen isn’t
More Non-Subtractive Boolean Matrix Mult. Algorithms:
• Atkinson and Santoro: O(n3 / log3/2 n) on a (log n)-word RAM
• Rytter and Basch-Khanna-Motwani: O(n3 / log2 n) on a RAM
• Chan: Four Russians can be implemented on O(n3 / log2 n) on a pointer
machine
3-b
Main Result
The O(n3 / log
2
n) matrix multiplication algorithm can be “de-amortized”
4
Main Result
The O(n3 / log
2
n) matrix multiplication algorithm can be “de-amortized”
More precisely, we can:
Preprocess an n × n matrix A over a finite semiring in O(n2+ε )
Such that vector multiplications with A can be done in O(n2 /(ε log n)2 )
4-a
Main Result
The O(n3 / log
2
n) matrix multiplication algorithm can be “de-amortized”
More precisely, we can:
Preprocess an n × n matrix A over a finite semiring in O(n2+ε )
Such that vector multiplications with A can be done in O(n2 /(ε log n)2 )
Allows for “non-subtractive” matrix multiplication to be done on-line
4-b
Main Result
The O(n3 / log
2
n) matrix multiplication algorithm can be “de-amortized”
More precisely, we can:
Preprocess an n × n matrix A over a finite semiring in O(n2+ε )
Such that vector multiplications with A can be done in O(n2 /(ε log n)2 )
Allows for “non-subtractive” matrix multiplication to be done on-line
Can be implemented on a pointer machine
4-c
Main Result
The O(n3 / log
2
n) matrix multiplication algorithm can be “de-amortized”
More precisely, we can:
Preprocess an n × n matrix A over a finite semiring in O(n2+ε )
Such that vector multiplications with A can be done in O(n2 /(ε log n)2 )
Allows for “non-subtractive” matrix multiplication to be done on-line
Can be implemented on a pointer machine
This Talk: The Boolean case
4-d
Preprocessing Phase: The Boolean Case
Partition the input matrix A into blocks of ⌈ε log n⌉ × ⌈ε log n⌉ size:
A1,1
A1,2
···
ε log n
A=
A2,1
..
Ai,j
ε log n
..
n ,1
A ε log
n
n
A1, ε log
n
..
···
5
···
A ε logn n , ε logn n
Preprocessing Phase: The Boolean Case
Build a graph G with parts P1 , . . . , Pn/(ε log n) , Q1 , . . . , Qn/(ε log n)
P1
2ε log n
2ε log n
Q1
2ε log n
2ε log n
Q2
..
..
..
..
Each part has 2ε log n
vertices, one for each
possible ε log n vector
P2
P
2ε log n
n
ε log n
2ε log n
Q
6
n
ε log n
Preprocessing Phase: The Boolean Case
Edges of G: Each vertex v in each Pi has exactly one edge into each Qj
Pi
2ε log n
2ε log n
Qj
v
Aj,iv
7
Preprocessing Phase: The Boolean Case
Edges of G: Each vertex v in each Pi has exactly one edge into each Qj
Pi
2ε log n
2ε log n
Qj
v
Aj,iv
Time to build the graph:
n
ε log n
·
n
ε log n
· 2ε log n · (ε log n)2 = O(n2+ε )
number number number
of Qj
of Pi
of nodes
in Pi
matrix-vector mult
of Aj,i and v
7-a
How to Do Fast Vector Multiplications
Let v be a column vector. Want:
A · v.
8
How to Do Fast Vector Multiplications
Let v be a column vector. Want:
A · v.
(1) Break up v into ε log n sized chunks:




v=


v1
v2
..
.
n
v ε log
n
8-a







How to Do Fast Vector Multiplications
(2) For each i = 1, . . . , n/(ε log n), look up vi in Pi .
9
How to Do Fast Vector Multiplications
(2) For each i = 1, . . . , n/(ε log n), look up vi in Pi .
P1
2ε log n
2ε log n
Q1
2ε log n
2ε log n
Q2
..
..
..
..
v1
P2
v2
P
2ε log n
n
ε log n
2ε log n
Q
n
ε log n
vn/(ε log n)
Takes Õ(n) time.
9-a
How to Do Fast Vector Multiplications
(2) For each i = 1, . . . , n/(ε log n), look up vi in Pi .
P1
2ε log n
2ε log n
Q1
2ε log n
2ε log n
Q2
..
..
..
..
v1
P2
v2
P
2ε log n
n
ε log n
2ε log n
Q
n
ε log n
vn/(ε log n)
Takes Õ(n) time.
10
How to Do Fast Vector Multiplications
(3) Look up the neighbors of vi, mark each neighbor found.
11
How to Do Fast Vector Multiplications
(3) Look up the neighbors of vi, mark each neighbor found.
P1
2ε log n
2ε log n
Q1
v1
A1,1 · v1
P2
2ε log n
2ε log n
v2
Q2
A2,1 · v1
..
..
P
2ε log n
n
ε log n
vn/(ε log n)
..
..
2ε log n
Q
n
ε log n
A ε logn n ,1 · v1
11-a
How to Do Fast Vector Multiplications
(3) Look up the neighbors of vi, mark each neighbor found.
P1
2ε log n
Q1
2ε log n
v1
A1,2 · v2
P2
2ε log n
2ε log n
v2
Q2
A2,2 · v2
..
..
P
2ε log n
n
ε log n
vn/(ε log n)
..
..
2ε log n
Q
n
ε log n
A ε logn n ,2 · v2
12
How to Do Fast Vector Multiplications
(3) Look up the neighbors of vi, mark each neighbor found.
P1
2ε log n
2ε log n
v1
P2
Q1
A1, ε logn n · vn/(ε log n)
2ε log n
2ε log n
v2
Q2
A2, ε logn n · vn/(ε log n)
..
..
P
2ε log n
n
ε log n
vn/(ε log n)
..
..
2ε log n
Q
A
n
ε log n
n
n
ε log n , ε log n
· vn/(ε log n)
13
Takes O
n
ε log n
2 How to Do Fast Vector Multiplications
(4) For each Qj , define vj′ as the OR of all marked vectors in Qj
P1
2ε log n
2ε log n
v1
P2
Q1
∨ ⇒ v1′
2ε log n
2ε log n
v2
Q2
∨ ⇒ v2′
..
..
P
2ε log n
n
ε log n
vn/(ε log n)
..
..
2ε log n
Q
n
ε log n
′
∨ ⇒ vn/(ε
log n)
Takes Õ(n1+ε ) time
14
How to Do Fast Vector Multiplications
(4) For each Qj , define vj′ as the OR of all marked vectors in Qj
P1
2ε log n
2ε log n
v1
P2
Q1
∨ ⇒ v1′
2ε log n
2ε log n
v2
Q2
∨ ⇒ v2′
..
..
P
2ε log n
n
ε log n
vn/(ε log n)
..
..
2ε log n
Q
n
ε log n
′
∨ ⇒ vn/(ε
log n)
Takes Õ(n1+ε ) time
15
How to Do Fast Vector Multiplications


v1′


 v′ 
(5) Output v′ :=  ..2 . Claim: v′ = A · v.
.


v′ n
ε log n
16
How to Do Fast Vector Multiplications


v1′


 v′ 
(5) Output v′ :=  ..2 . Claim: v′ = A · v.
.


v′ n
ε log n
Wn/(ε log n)
′
Proof: By definition, vj = i=1
Aj,i · vi .
16-a
How to Do Fast Vector Multiplications


v1′


 v′ 
(5) Output v′ :=  ..2 . Claim: v′ = A · v.
.


v′ n
ε log n
Wn/(ε log n)
′
Proof: By definition, vj = i=1
Aj,i · vi .


v1
A1,1
···
A1,n/(ε log n)


..
.
.
.

..
..
..
Av = 
.


n
v ε log
An/(ε log n),1 · · · An/(ε log n),n/(ε log n)
n
16-b




How to Do Fast Vector Multiplications


v1′


 v′ 
(5) Output v′ :=  ..2 . Claim: v′ = A · v.
.


v′ n
ε log n
Wn/(ε log n)
′
Proof: By definition, vj = i=1
Aj,i · vi .



v1
A1,1
···
A1,n/(ε log n)



.
.
.
.


..
..
..
..
Av = 



n
v ε log
An/(ε log n),1 · · · An/(ε log n),n/(ε log n)
n
Wn/(ε log n)
Wn/(ε log n)
= ( i=1
A1,i · vi , . . . , i=1
A1,n/(ε log n) · vi ) = v ′ .
16-c
Some Applications
Can quickly compute the neighbors of arbitrary vertex subsets
Let A be the adjacency matrix of G
= (V, E).
Let vS be the indicator vector for a S
⊆V.
17
Some Applications
Can quickly compute the neighbors of arbitrary vertex subsets
Let A be the adjacency matrix of G
= (V, E).
Let vS be the indicator vector for a S
Proposition:
⊆V.
A · vS is the indicator vector for N (S), the neighborhood of S .
17-a
Some Applications
Can quickly compute the neighbors of arbitrary vertex subsets
Let A be the adjacency matrix of G
= (V, E).
Let vS be the indicator vector for a S
Proposition:
⊆V.
A · vS is the indicator vector for N (S), the neighborhood of S .
Corollary: After O(n2+ε ) preprocessing, can determine the neighborhood of
any vertex subset in O(n2 /(ε log n)2 ) time.
(One level of BFS in o(n2 ) time)
17-b
Graph Queries
Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex
subset is an independent set, a vertex cover, or a dominating set, all in
O(n2 /(ε log n)2 ) time.
18
Graph Queries
Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex
subset is an independent set, a vertex cover, or a dominating set, all in
O(n2 /(ε log n)2 ) time.
Proof: Let S
⊆ V.
18-a
Graph Queries
Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex
subset is an independent set, a vertex cover, or a dominating set, all in
O(n2 /(ε log n)2 ) time.
Proof: Let S
⊆ V.
S is dominating ⇐⇒ S ∪ N (S) = V .
18-b
Graph Queries
Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex
subset is an independent set, a vertex cover, or a dominating set, all in
O(n2 /(ε log n)2 ) time.
Proof: Let S
⊆ V.
S is dominating ⇐⇒ S ∪ N (S) = V .
S is independent ⇐⇒ S ∩ N (S) = ∅.
18-c
Graph Queries
Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex
subset is an independent set, a vertex cover, or a dominating set, all in
O(n2 /(ε log n)2 ) time.
Proof: Let S
⊆ V.
S is dominating ⇐⇒ S ∪ N (S) = V .
S is independent ⇐⇒ S ∩ N (S) = ∅.
S is a vertex cover ⇐⇒ V − S is independent.
18-d
Graph Queries
Corollary: After O(n2+ε ) preprocessing, can determine if a given vertex
subset is an independent set, a vertex cover, or a dominating set, all in
O(n2 /(ε log n)2 ) time.
Proof: Let S
⊆ V.
S is dominating ⇐⇒ S ∪ N (S) = V .
S is independent ⇐⇒ S ∩ N (S) = ∅.
S is a vertex cover ⇐⇒ V − S is independent.
Each can be quickly determined from knowing S and N (S).
18-e
Triangle Detection
19
Triangle Detection
Problem: Triangle Detection
Given: Graph G and vertex i.
Question: Does i participate in a 3-cycle, a.k.a. triangle?
19-a
Triangle Detection
Problem: Triangle Detection
Given: Graph G and vertex i.
Question: Does i participate in a 3-cycle, a.k.a. triangle?
Worst Case: Can take Θ(n2 ) time to check all pairs of neighbors of i
19-b
Triangle Detection
Problem: Triangle Detection
Given: Graph G and vertex i.
Question: Does i participate in a 3-cycle, a.k.a. triangle?
Worst Case: Can take Θ(n2 ) time to check all pairs of neighbors of i
Corollary: After O(n2+ε ) preprocessing on G, can solve triangle detection
for arbitrary vertices in O(n2 /(ε log n)2 ) time.
19-c
Triangle Detection
Problem: Triangle Detection
Given: Graph G and vertex i.
Question: Does i participate in a 3-cycle, a.k.a. triangle?
Worst Case: Can take Θ(n2 ) time to check all pairs of neighbors of i
Corollary: After O(n2+ε ) preprocessing on G, can solve triangle detection
for arbitrary vertices in O(n2 /(ε log n)2 ) time.
Proof: Given vertex i, let S be its set of neighbors (gotten in O(n) time).
S is not independent ⇐⇒ i participates in a triangle.
19-d
Conclusion
A preprocessing/multiplication algorithm for matrix-vector multiplication that
builds on lookup table techniques
20
Conclusion
A preprocessing/multiplication algorithm for matrix-vector multiplication that
builds on lookup table techniques
• Is there a preprocessing/multiplication algorithm for sparse matrices? Can
we do multiplication in e.g. O(m/poly(log n) + n),
where m = number of nonzeroes?
20-a
Conclusion
A preprocessing/multiplication algorithm for matrix-vector multiplication that
builds on lookup table techniques
• Is there a preprocessing/multiplication algorithm for sparse matrices? Can
we do multiplication in e.g. O(m/poly(log n) + n),
where m = number of nonzeroes?
• Can the algebraic matrix multiplication algorithms (Strassen, etc.) be
applied to this problem?
20-b
Conclusion
A preprocessing/multiplication algorithm for matrix-vector multiplication that
builds on lookup table techniques
• Is there a preprocessing/multiplication algorithm for sparse matrices? Can
we do multiplication in e.g. O(m/poly(log n) + n),
where m = number of nonzeroes?
• Can the algebraic matrix multiplication algorithms (Strassen, etc.) be
applied to this problem?
• Can our ideas be extended to achieve non-subtractive Boolean matrix
2
multiplication in o(n3 / log n)?
20-c
Thank you!
21
Download