isl

advertisement
Reconciling Differences: towards
a theory of cloud complexity
George Varghese
UCSD, visiting at Yahoo! Labs
1
Part 1: Reconciling Sets across a link
Joint with D. Eppstein, M. Goodrich, F. Uyeda
Appeared in SIGCOMM 2011
2
Motivation 1: OSPF Routing (1990)
• After partition forms and heals, R1 needs
updates at R2 that arrived during partition.
R1
R2
Partition heals
Must solve the Set-Difference Problem!
3
Motivation 2:Amazon S3 storage (2007)
• Synchronizing replicas.
S1
S2
Periodic Anti-entropy Protocol
between replicas
Set-Difference across cloud again!
4
What is the Set-Difference problem?
Host 1
A
B
E
Host 2
F
A
C
D
F
• What objects are unique to host 1?
• What objects are unique to host 2?
5
Use case 1: Data Synchronization
Host 1
A
C
D
B
E
Host 2
F
A
C
D
B
E
F
• Identify missing data blocks
• Transfer blocks to synchronize sets
6
Use case 2: Data De-duplication
Host 1
A
B
E
Host 2
F
A
C
D
F
• Identify all unique blocks.
• Replace duplicate data with pointers
7
Prior work versus ours
• Trade a sorted list of keys.
– Let n be size of sets, U be size of key space
– O(n log U) communication, O(n log n) computation
– Bloom filters can improve to O(n) communication.
• Polynomial Encodings (Minsky ,Trachtenberg)
– Let “d” be the size of the difference
– O(d log U) communication, O(dn+d3) computation
• Invertible Bloom Filter (our result)
– O(d log U) communication, O(n+d) computation
8
Difference Digests
• Efficiently solves the set-difference problem.
• Consists of two data structures:
– Invertible Bloom Filter (IBF)
• Efficiently computes the set difference.
• Needs the size of the difference
– Strata Estimator
• Approximates the size of the set difference.
• Uses IBF’s as a building block.
9
IBFs: main idea
• Sum over random subsets: Summarize a set by
“checksums” over O(d) random subsets.
• Subtract: Exchange and subtract checksums.
• Eliminate: Hashing for subset choice 
common elements disappear after subtraction
• Invert fast: O(d) equations in d unknowns;
randomness allows expected O(d) inversion.
10
“Checksum” details
• Array of IBF cells that form “checksum” words
– For set difference of size d, use αd cells (α > 1)
• Each element ID is assigned to many IBF cells
• Each cell contains:
idSum
XOR of all IDs assigned to cell
hashSum
XOR of hash(ID) of IDs assigned to cell
count
Number of IDs assigned to cell
11
IBF Encode
A
Assign ID to
many cells
IBF:
Hash1
Hash2
B
C
Hash3
idSum ⊕ A
idSum ⊕ A
idSum ⊕ A
hashSum ⊕
H(A)
count++
hashSum ⊕
H(A)
count++
hashSum ⊕
H(A)
count++
αd
All hosts use the
same hash functions
12
Invertible Bloom Filters (IBF)
Host 1
A
B
E
Host 2
F
IBF 1
A
C
D
F
IBF 2
• Trade IBF’s with remote host
13
Invertible Bloom Filters (IBF)
Host 1
A
B
E
Host 2
F
A
C
D
F
IBF 2
IBF 1
IBF (2 - 1)
• “Subtract” IBF structures
– Produces a new IBF containing only unique objects
14
IBF Subtract
15
Disappearing act
• After subtraction, elements common to both sets
disappear because:
– Any common element (e.g W) is assigned to same cells on
both hosts (same hash functions on both sides)
– On subtraction, W XOR W = 0. Thus, W vanishes.
• While elements in set difference remain, they may
be randomly mixed  need a decode procedure.
16
IBF Decode
H(V ⊕ X ⊕ Z)
≠
H(V) ⊕ H(X) ⊕
H(Z)
Test for Purity:
H( idSum )
H( idSum ) = hashSum
H(V) = H(V)
17
IBF Decode
18
IBF Decode
19
IBF Decode
20
How many IBF cells?
Overhead to decode at >99%
Space Overhead
α
Hash Cnt 3
Hash Cnt 4
Small Diffs:
1.4x – 2.3x
Large Differences:
1.25x - 1.4x
Set Difference
21
How many hash functions?
• 1 hash function produces many pure cells initially but
nothing to undo when an element is removed.
C
A
B
22
How many hash functions?
• 1 hash function produces many pure cells initially but
nothing to undo when an element is removed.
• Many (say 10) hash functions: too many collisions.
C
C
C
B
B
C
B
A
A
A
B
A
23
How many hash functions?
• 1 hash function produces many pure cells initially but
nothing to undo when an element is removed.
• Many (say 10) hash functions: too many collisions.
• We find by experiment that 3 or 4 hash functions
works well. Is there some theoretical reason?
C
C
A
A
B
C
A
B
B
24
Theory
• Let d = difference size, k = # hash functions.
• Theorem 1: With (k + 1) d cells, failure probability
falls exponentially with k.
– For k = 3, implies a 4x tax on storage, a bit weak.
• [Goodrich,Mitzenmacher]: Failure is equivalent to
finding a 2-core (loop) in a random hypergraph
• Theorem 2: With ck d, cells, failure probability falls
exponentially with k.
– c4 = 1.3x tax, agrees with experiments
25
Recall experiments
Overhead to decode at >99%
Space Overhead
Hash Cnt 3
Hash Cnt 4
Large Differences:
1.25x - 1.4x
Set Difference
26
Connection to Coding
• Mystery: IBF decode similar to peeling procedure
used to decode Tornado codes. Why?
• Explanation: Set Difference is equivalent to coding
with insert-delete channels
• Intuition: Given a code for set A, send checkwords
only to B. Think of B as a corrupted form of A.
• Reduction: If code can correct D insertions/deletions,
then B can recover A and the set difference.
Reed Solomon <---> Polynomial Methods
LDPC (Tornado) <---> Difference Digest
27
Random Subsets  Fast Elimination
Sparse
αd
X+Y+Z= ..
Y = ..
X = ..
Pure
Roughly upper triangular and sparse
28
Difference Digests
• Consists of two data structures:
– Invertible Bloom Filter (IBF)
• Efficiently computes the set difference.
• Needs the size of the difference
– Strata Estimator
• Approximates the size of the set difference.
• Uses IBF’s as a building block.
29
Strata Estimator
Estimator
A
B
Consistent
Partitioning
C
1/16
IBF 4
~1/8
IBF 3
~1/4
IBF 2
~1/2
IBF 1
• Divide keys into sampled subsets containing ~1/2k
• Encode each subset into an IBF of small fixed size
– log(n) IBF’s of ~20 cells each
30
Strata Estimator
Estimator 1
Estimator 2
…
…
IBF 4
IBF 4
IBF 3
IBF 3
4x
Host 1
IBF 2
IBF 2
IBF 1
IBF 1
Host 2
Decode
• Attempt to subtract & decode IBF’s at each level.
• If level k decodes, then return:
2k x (the number of ID’s recovered)
31
KeyDiff Service
Application
Add( key )
Remove( key )
Diff( host1, host2
)
Key Service
Application
Key Service
Application
Key Service
• Promising Applications:
– File Synchronization
– P2P file sharing
– Failure Recovery
32
Difference Digest Summary
• Strata Estimator
– Estimates Set Difference.
– For 100K sets, 15KB estimator has <15% error
– O(log n) communication, O(n) computation.
• Invertible Bloom Filter
– Identifies all ID’s in the Set Difference.
– 16 to 28 Bytes per ID in Set Difference.
– O(d) communication, O(n+d) computation
– Worth it if set difference is < 20% of set sizes
33
Connection to Sparse Recovery?
• If we forget about subtraction, in the end we
are recovering a d-sparse vector.
• Note that the hash check is key for figuring
out which cells are pure after differencing.
• Is there a connection to compressed sensing.
Could sensors do the random summing? The
hash summing?
• Connection the other way: could use
compressed sensing for differences?
34
Comparison with Information Theory
and Coding
• Worst case complexity versus average
• It emphasize communication complexity not
computation complexity: we focus on both.
• Existence versus Constructive: some similar
settings (Slepian-Wolf) are existential
• Estimators: We want bounds based on
difference and so start by efficiently
estimating difference.
35
Aside: IBFs in Digital Hardware
Stream of set elements
a , b, x, y
Hash 1
Logic (Read, hash, Write)
Hash 2
Hash 3
Strata Hash
Bank 1
Bank 2
Bank 3
Hash to separate banks for parallelism, slight
cost in space needed. Decode in software
36
Part 2: Towards a theory of
Cloud Complexity
O2
O1
?
O3
Complexity of reconciling “similar” objects?
37
Example: Synching Files
X.ppt.v2
X.ppt.v3
?
X.ppt.v1
Measures: Communication bits, computation
38
So far: Two sets, one link, set difference
{a,b,c}
{d,a,c}
39
Mild Sensitivity Analysis: One
set much larger than other
Small difference d
?
Set A
Set B
(|A|) bits needed, not O (d) : Patrascu 2008
Simpler proof: DKS 2011
40
Asymmetric set difference in
LBFS File System (Mazieres)
C97 C98 C99
File B
C1 C2 C3
File A
1 chunk difference
?
C1 C5 C3 . . .
. . . C97 C98 C99
Chunk Set B at Server
LBFS sends all chunk hashes in File A: O|A|
41
More Sensitivity Analysis: small
intersection: database joins
Small intersection d
?
Set A
Set B
(|A|) bits needed, not O (d) : Follows from
results on hardness of set disjointness
42
Sequences under Edit Distance
(Files for example)
Edit distance 2
A
B
C
D
E
F
File A
?
A
C
D
E
F
G
File B
Insert/delete can renumber all file blocks . . .
43
Sequence reconciliation
(with J. Ullman)
Edit distance 1
H1
H2
H3
A
B
C
D
E
F
File A
A
C
D
E
F
H2
H3
File B
Send 2d+1 piece hashes. Clump unmatched
2
pieces and recurse. O( d log (N) )
44
21 years of Sequence Reconciliation!
• Schwartz, Bowdidge, Burkhard (1990): recurse
on unmatched pieces, not aggregate.
• Rsync: widely used tool that breaks file into
roughly N piece hashes, N is file length.
UCSD, Lunch
Princeton, kids
45
Sets on graphs?
{b,c,d}
{a,b,c}
{d,c,e}
{a,f,g}
46
Generalizes rumor spreading which has
disjoint singleton sets
{b}
{a}
{d}
{g}
CLP10,G11,: O( E n log n /conductance)
47
Generalized Push-Pull
(with N. Goyal and R. Kannan)
{b,c,d}
Pick random edge
Do 2 party set reconciliation
{a,b,c}
{d,c,e}
Complexity: C + D, C as before, D = Sum i(U – Si )
48
Sets on Steiner graphs?
R1
{a} U S
{b} U S
Only terminals need sets. Push-pull wasteful!
49
Butterfly example for Sets
S1
S2
S1
S2
X
D = Diff(S1 ,S2)
S1
Y
D
D
Set difference instead of XOR within network
50
How does reconciliation on Steiner
graphs relate to network coding?
• Objects in general, not just bits.
• Routers do not need objects but can
transform/code objects.
• What transformations within network allow
efficient communication close to lower bound?
51
Sequences with d mutations:
VM code pages (with Ramjee et al)
2 “errors”
A
B
C
D
E
VM A
?
A
X
C
D
Y
VM B
Reconcile Set A = {(A,1)(B,2),(C,3),(D,4),(E,5)}
and Set B = {(A,1),(X,2),(C,3),(D,4),(Y,5)}
52
Twist: IBFs for error correction?
(with M. Mitzenmacher)
• Write message M[1..n] of n words as set S =
{(M[1],1), (M[2], 2), . . (M[n], n)}.
• Calculate IBF(S) and transmit M, IBF(S)
• Receiver uses received message M’ to find
IBF(S’); subtracts from IBF’(S) to locate errors.
• Protect IBF using Reed-Solomon or redundancy
• Why: Potentially O(e) decoding for e errors -Raptor codes achieve this for erasure channels.
53
The Cloud Complexity Milieu
2 Node
Graph
Steiner
Nodes
Sets (Key,values)
EGUV11
GKV11
?
Sequence, Edit
Distance (Files)
SBB90
?
?
Sequence,
errors only (VMs)
MV11
?
?
?
?
?
?
?
?
Sets of sets
(database tables)
Streams (movies)
Other dimensions: approximate, secure, . . .
54
Conclusions: Got Diffs?
• Resiliency and fast recoding of random sums  set
reconciliation; and error correction?
• Sets on graphs
– All terminals: generalizes rumor spreading
– Routers,terminals: resemblance to network coding.
• Cloud complexity: Some points covered, many remain
• Practical, may be useful to synch devices across cloud.
55
Comparison to Logs/Incremental
Updates
• IBF work with no prior context.
• Logs work with prior context, BUT
– Redundant information when sync’ing with
multiple parties.
– Logging must be built into system for each write.
IBF’s may out-perform logs when:
– Logging adds overhead at runtime.
• Synchronizing multiple parties
– Logging requires non-volatile storage.
• Synchronizations
happen infrequently
• Often not present in network devices.
56
Download