Dimension Independent Matrix Square

advertisement
Dimension
Independent
Matrix Square
Reza Zadeh
Dimension Independent Matrix Square
Introduction
First Pass
Reza Zadeh
Consulting Professor, Stanford University
DIMSUM
Analysis
Experiments
More Results
June 18 2014
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
1 / 33
Notation for matrix A
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM
Analysis
Experiments
More Results
Given m × n matrix A, with m n.


a1,1 a1,2 · · · a1,n
 a2,1 a2,2 · · · a2,n 


A= .
..
.. 
.
.
.
 .
.
.
. 
am,1 am,2 · · · am,n
A is tall and skinny, example values
m = 1012 , n = {104 , 106 }.
A has sparse rows, each row has at most L nonzeros.
A is stored across hundreds of machines and cannot
be streamed through a single machine.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
2 / 33
Computing AT A
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
We compute AT A.
First Pass
DIMSUM
Analysis
AT A is n × n, considerably smaller than A.
Experiments
AT A is dense.
More Results
Holds dot products between all pairs of columns of A.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
3 / 33
Guarantees
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM
Analysis
Experiments
More Results
There is a knob γ which can be turned to preserve
similarities and singular values. Paying O(nLγ)
communication cost and O(γ) computation cost.
With a low setting of γ, preserve similar entries of AT A
(via Cosine, Dice, Overlap, and Jaccard similarity).
With a high setting of γ, preserve singular values of
AT A.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
4 / 33
Computing All Pairs of Cosine Similarities
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
We have to find dot products between all pairs of
columns of A
We prove results for general matrices, but can do better
for those entries with cos(i, j) ≥ s
First Pass
DIMSUM
Analysis
Experiments
More Results
Cosine similarity: a widely used definition for “similarity"
between two vectors
cos(i, j) =
ciT cj
||ci ||||cj ||
ci is the i 0 th column of A
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
5 / 33
Example matrix
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
Rows: users.
Columns: movies.
First Pass
DIMSUM
Analysis
Experiments
More Results
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
6 / 33
Distributed Computing Environment
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
With such large datasets, we must use many machines.
DIMSUM
Algorithm code available in Spark and Scalding.
Analysis
Experiments
More Results
Described with Maps and Reduces so that the
framework takes care of distributing the computation.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
7 / 33
Naive Implementation
Dimension
Independent
Matrix Square
1
Given row ri , Map with NaiveMapper (Algorithm 1)
Reza Zadeh
2
Reduce using the NaiveReducer (Algorithm 2)
Introduction
First Pass
Algorithm 1 NaiveMapper(ri )
DIMSUM
Analysis
Experiments
More Results
for all pairs (aij , aik ) in ri do
Emit ((j, k ) → aij aik )
end for
Algorithm 2 NaiveReducer((i, j), hv1 , . . . , vR i)
P
output ciT cj → R
i=1 vi
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
8 / 33
Analysis for First Pass
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
Very easy analysis
1) Shuffle size: O(mL2 )
First Pass
2) Largest reduce-key: O(m)
DIMSUM
Analysis
Experiments
More Results
Both depend on m, the larger dimension, and are
intractable for m = 1012 , L = 100.
We’ll bring both down via clever sampling
Assuming column norms are known or estimates
available
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
9 / 33
Dimension Independent Matrix Square using MapReduce
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM
Analysis
Algorithm 3 DIMSUMMapper(ri )
for all pairs (aij , aik ) in ri do
1
With probability min 1, γ ||cj ||||c
k ||
emit ((j, k ) → aij aik )
end for
Experiments
More Results
Algorithm 4 DIMSUMReducer((i, j), hv1 , . . . , vR i)
if
γ
||ci ||||cj ||
> 1 then
output bij →
1
||ci ||||cj ||
PR
i=1 vi
else
output bij →
end if
Reza Zadeh (Stanford)
1
γ
PR
i=1 vi
Dimension Independent Matrix Square
18 June 2014
10 / 33
Analysis for DIMSUM
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM
The algorithm outputs bij , which is a matrix of cosine
similarities, call it B.
Four things to prove:
1
Shuffle size: O(nLγ)
2
Largest reduce-key: O(γ)
3
The sampling scheme preserves similarities when
γ = Ω(log(n)/s)
4
The sampling scheme preserves singular values when
γ = Ω(n/2 )
Analysis
Experiments
More Results
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
11 / 33
Shuffle size for DIMSUM
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM
Analysis
Experiments
O(nLγ) has no dependence on the dimension m, this is
the heart of DIMSUM.
Happens because higher magnitude columns are
sampled with lower probability:
More Results
γ
Reza Zadeh (Stanford)
1
||c1 ||||c2 ||
Dimension Independent Matrix Square
18 June 2014
12 / 33
Shuffle size for DIMSUM
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
For matrices with real entries, we can still get a bound
First Pass
DIMSUM
Let H be the smallest nonzero entry in magnitude, after
all entries of A have been scaled to be in [−1, 1]
Analysis
Experiments
More Results
E.g. for {0, 1} matrices, we have H = 1
Shuffle size is bounded by O(nLγ/H 2 )
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
13 / 33
Largest reduce key for DIMSUM
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
DIMSUM
Each reduce key receives at most γ values (the
oversampling parameter)
Analysis
Immediately get that reduce-key complexity is O(γ)
Experiments
Also independent of dimension m. Happens because
high magnitude columns are sampled with lower
probability.
First Pass
More Results
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
14 / 33
Correctness
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
Analysis
Since higher magnitude columns are sampled with
lower probability, are we guaranteed to obtain correct
results w.h.p.?
Experiments
Yes. By setting γ correctly.
More Results
Preserve similarities when γ = Ω(log(n)/s)
First Pass
DIMSUM
Preserve singular values when γ = Ω(n/2 )
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
15 / 33
Correctness
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM
Analysis
Theorem
Let A be an m × n tall and skinny (m > n) matrix. If
γ = Ω(n/2 ) and D a diagonal matrix with entries dii = ||ci ||,
then the matrix B output by DIMSUM satisfies,
Experiments
||DBD − AT A||2
≤
||AT A||2
More Results
with probability at least 1/2.
Relative error guaranteed to be low with constant probability.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
16 / 33
Proof
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
Uses Latala’s theorem, bounds 2nd and 4th central
moments of entries of B.
DIMSUM
Analysis
Experiments
More Results
Really need extra power of moments.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
17 / 33
Latala’s Theorem
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM
Analysis
Experiments
More Results
Theorem
(Latala’s theorem). Let X be a random matrix whose entries
xij are independent centered random variables with finite
fourth moment. Denoting ||X ||2 as the matrix spectral norm,
we have
1/2

!1/2
X
X
2
2
E xij
E ||X ||2 ≤ C[max 
E xij  + max
i
j
j
1/4

+
i
X
E xij4 
].
i,j
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
18 / 33
Proof
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
Prove two things
First Pass
DIMSUM
E[(bij − Ebij )2 ] ≤
Analysis
Experiments
More Results
E[(bij − Ebij )4 ] ≤
1
γ (easy)
2
(not easy)
γ2
Details in paper.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
19 / 33
Correctness
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM
Analysis
Theorem
For any two columns ci and cj having cos(ci , cj ) ≥ s, let B
P
be the output of DIMSUM with entries bij = γ1 m
k =1 Xijk with
Xijk as the indicator for the k ’th coin in the call to
DIMSUMMapper. Now if γ = Ω(α/s), then we have,
Experiments
h
T
i
Pr ||ci ||||cj ||bij > (1 + δ)[A A]ij ≤
More Results
eδ
(1 + δ)(1+δ)
α
and
h
i
Pr ||ci ||||cj ||bi,j < (1 − δ)[AT A]ij < exp(−αδ 2 /2)
Relative error guaranteed to be low with high probability.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
20 / 33
Correctness
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM
Analysis
Experiments
More Results
Proof.
In the paper
Uses standard concentration inequality for sums of
indicator random variables.
Ends up requiring that the oversampling parameter γ
be set to γ = log(n2 )/s = 2 log(n)/s.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
21 / 33
Observations
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
DIMSUM helpful when there are some popular columns
DIMSUM
Analysis
Experiments
More Results
e.g. The Netflix Matrix (some columns way more
popular than others)
Power-law columns are effectively neutralized
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
22 / 33
In practice
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
First Pass
Forget about theoretical settings for γ
Crank up γ until application works
DIMSUM
Analysis
Experiments
More Results
Estimates for ||ci || can be used, expectations still hold,
but concentration isn’t guaranteed
If using for singular values, watch for ill-conditioned
matrices
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
23 / 33
Experiments
Dimension
Independent
Matrix Square
Large scale production live at twitter.com
Reza Zadeh
Introduction
First Pass
DIMSUM
Analysis
Experiments
More Results
Smaller scale experiment with columns as words, and
rows as tweets
m = 200M, n = 1000, L = 10
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
24 / 33
Experiments
Dimension
Independent
Matrix Square
DISCO Cosine Similarity
2
1.8
Reza Zadeh
1.6
Introduction
1.4
avg relative err
First Pass
DIMSUM
Analysis
Experiments
More Results
1.2
1
0.8
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
similarity threshold
Figure : Average error for all pairs with similarity threshold s.
Error decreases for more similar pairs. γ = 200
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
25 / 33
Other Similarity Measures
Dimension
Independent
Matrix Square
Reza Zadeh
Picking out similar columns work for some other similarity
measures.
Introduction
First Pass
DIMSUM
Analysis
Similarity
Cosine
Definition
√ #(x,y
√)
Shuffle Size
O(nL log(n)/s)
Reduce-key size
O(log(n)/s)
Jaccard
#(x,y )
#(x)+#(y )−#(x,y )
O((n/s) log(n/s))
O(log(n/s)/s)
Overlap
#(x,y )
min(#(x),#(y ))
O(nL log(n)/s)
O(log(n)/s)
Dice
2#(x,y )
#(x)+#(y )
O(nL log(n)/s)
O(log(n)/s)
#(x)
#(y )
Experiments
More Results
Table : All sizes are independent of m, the dimension.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
26 / 33
Locality Sensitive Hashing
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
MinHash from the Locality-Sensitive-Hashing family
can have its vanilla implementation greatly improved by
DIMSUM.
First Pass
DIMSUM
Analysis
Experiments
More Results
Another set of theorems for shuffle size and
correctness in DISCO paper.
stanford.edu/~rezab/papers/disco.pdf
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
27 / 33
Conclusion
Dimension
Independent
Matrix Square
Reza Zadeh
Introduction
Consider DIMSUM if you ever need to compute AT A for
large sparse A
First Pass
DIMSUM
Analysis
Experiments
More Results
Many more experiments and results in paper at
stanford.edu/~rezab
Thank you!
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
28 / 33
Shuffle size for DIMSUM
Dimension
Independent
Matrix Square
Reza Zadeh
Theorem
For {0, 1} matrices, the expected shuffle size for
DIMSUMMapper is O(nLγ).
Proof.
The expected contribution from each pair of columns will
constitute the shuffle size:
=
n X
n
X
#(ci ,cj )
i=1 j=i+1
k =1
n X
n
X
X
Pr[DIMSUMEmit(ci , cj )]
#(ci , cj )Pr[DIMSUMEmit(ci , cj )]
i=1 j=i+1
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
29 / 33
Shuffle size for DIMSUM
Dimension
Independent
Matrix Square
Proof.
Reza Zadeh
Reza Zadeh (Stanford)
≤
n X
n
X
#(ci , cj )
γp
p
#(ci ) #(cj )
i=1 j=i+1
Dimension Independent Matrix Square
18 June 2014
30 / 33
Shuffle size for DIMSUM
Dimension
Independent
Matrix Square
Proof.
Reza Zadeh
≤
n X
n
X
#(ci , cj )
γp
p
#(ci ) #(cj )
i=1 j=i+1
n
n
γX X
1
1
(by AM-GM) ≤
#(ci , cj )(
+
)
2
#(ci ) #(cj )
i=1 j=i+1
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
30 / 33
Shuffle size for DIMSUM
Dimension
Independent
Matrix Square
Proof.
Reza Zadeh
≤
n X
n
X
#(ci , cj )
γp
p
#(ci ) #(cj )
i=1 j=i+1
n
n
γX X
1
1
(by AM-GM) ≤
#(ci , cj )(
+
)
2
#(ci ) #(cj )
i=1 j=i+1
≤γ
n
X
i=1
Reza Zadeh (Stanford)
n
1 X
#(ci , cj )
#(ci )
j=1
Dimension Independent Matrix Square
18 June 2014
30 / 33
Shuffle size for DIMSUM
Dimension
Independent
Matrix Square
Proof.
Reza Zadeh
≤
n X
n
X
#(ci , cj )
γp
p
#(ci ) #(cj )
i=1 j=i+1
n
n
γX X
1
1
(by AM-GM) ≤
#(ci , cj )(
+
)
2
#(ci ) #(cj )
i=1 j=i+1
≤γ
n
X
i=1
≤γ
n
X
i=1
Reza Zadeh (Stanford)
n
1 X
#(ci , cj )
#(ci )
j=1
1
L#(ci ) = γLn
#(ci )
Dimension Independent Matrix Square
18 June 2014
30 / 33
Combiners
Dimension
Independent
Matrix Square
Reza Zadeh
All bounds are without combining: can only get better
with combining
For similarities, DIMSUM (without combiners) beats
naive with combining outright
For singular values, DIMSUM (without combiners)
beats naive with combining if the number of machines
is large (e.g. 1000)
DIMSUM with combining empirically beats naive with
combining
Difficult to analyze combiners since they happen at
many levels. Optimizations break models.
DIMSUM with combiners is best of both.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
31 / 33
Combiners
Dimension
Independent
Matrix Square
Reza Zadeh
With k machines
DIMSUM shuffle with combiner: O(min(nLγ, kn2 ))
DIMSUM reduce-key with combiner: O(min(γ, k ))
Naive shuffle with combiner: O(kn2 )
Naive reduce-key with combiner: O(k )
DIMSUM with combiners is best of both.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
32 / 33
Experiments
Dimension
Independent
Matrix Square
DISCO Cosine shuffle size vs accuracy tradeoff
1
2.5
DISCO Shuffle / Naive Shuffle
Reza Zadeh
0.8
2
0.6
1.5
0.4
1
0.2
0.5
0
0
2
4
6
8
10
12
14
16
avg relative err
DISCO Shuffle / Naive Shuffle
avg relative err
0
18
log(p / ε)
Figure : As γ = p/ increases, shuffle size increases and error
decreases. There is no thresholding for highly similar pairs here.
Reza Zadeh (Stanford)
Dimension Independent Matrix Square
18 June 2014
33 / 33
Download