Reconciling Differences: towards a theory of cloud complexity George Varghese UCSD, visiting at Yahoo! Labs 1 Part 1: Reconciling Sets across a link Joint with D. Eppstein, M. Goodrich, F. Uyeda Appeared in SIGCOMM 2011 2 Motivation 1: OSPF Routing (1990) • After partition forms and heals, R1 needs updates at R2 that arrived during partition. R1 R2 Partition heals Must solve the Set-Difference Problem! 3 Motivation 2:Amazon S3 storage (2007) • Synchronizing replicas. S1 S2 Periodic Anti-entropy Protocol between replicas Set-Difference across cloud again! 4 What is the Set-Difference problem? Host 1 A B E Host 2 F A C D F • What objects are unique to host 1? • What objects are unique to host 2? 5 Use case 1: Data Synchronization Host 1 A C D B E Host 2 F A C D B E F • Identify missing data blocks • Transfer blocks to synchronize sets 6 Use case 2: Data De-duplication Host 1 A B E Host 2 F A C D F • Identify all unique blocks. • Replace duplicate data with pointers 7 Prior work versus ours • Trade a sorted list of keys. – Let n be size of sets, U be size of key space – O(n log U) communication, O(n log n) computation – Bloom filters can improve to O(n) communication. • Polynomial Encodings (Minsky ,Trachtenberg) – Let “d” be the size of the difference – O(d log U) communication, O(dn+d3) computation • Invertible Bloom Filter (our result) – O(d log U) communication, O(n+d) computation 8 Difference Digests • Efficiently solves the set-difference problem. • Consists of two data structures: – Invertible Bloom Filter (IBF) • Efficiently computes the set difference. • Needs the size of the difference – Strata Estimator • Approximates the size of the set difference. • Uses IBF’s as a building block. 9 IBFs: main idea • Sum over random subsets: Summarize a set by “checksums” over O(d) random subsets. • Subtract: Exchange and subtract checksums. • Eliminate: Hashing for subset choice common elements disappear after subtraction • Invert fast: O(d) equations in d unknowns; randomness allows expected O(d) inversion. 10 “Checksum” details • Array of IBF cells that form “checksum” words – For set difference of size d, use αd cells (α > 1) • Each element ID is assigned to many IBF cells • Each cell contains: idSum XOR of all IDs assigned to cell hashSum XOR of hash(ID) of IDs assigned to cell count Number of IDs assigned to cell 11 IBF Encode A Assign ID to many cells IBF: Hash1 Hash2 B C Hash3 idSum ⊕ A idSum ⊕ A idSum ⊕ A hashSum ⊕ H(A) count++ hashSum ⊕ H(A) count++ hashSum ⊕ H(A) count++ αd All hosts use the same hash functions 12 Invertible Bloom Filters (IBF) Host 1 A B E Host 2 F IBF 1 A C D F IBF 2 • Trade IBF’s with remote host 13 Invertible Bloom Filters (IBF) Host 1 A B E Host 2 F A C D F IBF 2 IBF 1 IBF (2 - 1) • “Subtract” IBF structures – Produces a new IBF containing only unique objects 14 IBF Subtract 15 Disappearing act • After subtraction, elements common to both sets disappear because: – Any common element (e.g W) is assigned to same cells on both hosts (same hash functions on both sides) – On subtraction, W XOR W = 0. Thus, W vanishes. • While elements in set difference remain, they may be randomly mixed need a decode procedure. 16 IBF Decode H(V ⊕ X ⊕ Z) ≠ H(V) ⊕ H(X) ⊕ H(Z) Test for Purity: H( idSum ) H( idSum ) = hashSum H(V) = H(V) 17 IBF Decode 18 IBF Decode 19 IBF Decode 20 How many IBF cells? Overhead to decode at >99% Space Overhead α Hash Cnt 3 Hash Cnt 4 Small Diffs: 1.4x – 2.3x Large Differences: 1.25x - 1.4x Set Difference 21 How many hash functions? • 1 hash function produces many pure cells initially but nothing to undo when an element is removed. C A B 22 How many hash functions? • 1 hash function produces many pure cells initially but nothing to undo when an element is removed. • Many (say 10) hash functions: too many collisions. C C C B B C B A A A B A 23 How many hash functions? • 1 hash function produces many pure cells initially but nothing to undo when an element is removed. • Many (say 10) hash functions: too many collisions. • We find by experiment that 3 or 4 hash functions works well. Is there some theoretical reason? C C A A B C A B B 24 Theory • Let d = difference size, k = # hash functions. • Theorem 1: With (k + 1) d cells, failure probability falls exponentially with k. – For k = 3, implies a 4x tax on storage, a bit weak. • [Goodrich,Mitzenmacher]: Failure is equivalent to finding a 2-core (loop) in a random hypergraph • Theorem 2: With ck d, cells, failure probability falls exponentially with k. – c4 = 1.3x tax, agrees with experiments 25 Recall experiments Overhead to decode at >99% Space Overhead Hash Cnt 3 Hash Cnt 4 Large Differences: 1.25x - 1.4x Set Difference 26 Connection to Coding • Mystery: IBF decode similar to peeling procedure used to decode Tornado codes. Why? • Explanation: Set Difference is equivalent to coding with insert-delete channels • Intuition: Given a code for set A, send checkwords only to B. Think of B as a corrupted form of A. • Reduction: If code can correct D insertions/deletions, then B can recover A and the set difference. Reed Solomon <---> Polynomial Methods LDPC (Tornado) <---> Difference Digest 27 Random Subsets Fast Elimination Sparse αd X+Y+Z= .. Y = .. X = .. Pure Roughly upper triangular and sparse 28 Difference Digests • Consists of two data structures: – Invertible Bloom Filter (IBF) • Efficiently computes the set difference. • Needs the size of the difference – Strata Estimator • Approximates the size of the set difference. • Uses IBF’s as a building block. 29 Strata Estimator Estimator A B Consistent Partitioning C 1/16 IBF 4 ~1/8 IBF 3 ~1/4 IBF 2 ~1/2 IBF 1 • Divide keys into sampled subsets containing ~1/2k • Encode each subset into an IBF of small fixed size – log(n) IBF’s of ~20 cells each 30 Strata Estimator Estimator 1 Estimator 2 … … IBF 4 IBF 4 IBF 3 IBF 3 4x Host 1 IBF 2 IBF 2 IBF 1 IBF 1 Host 2 Decode • Attempt to subtract & decode IBF’s at each level. • If level k decodes, then return: 2k x (the number of ID’s recovered) 31 KeyDiff Service Application Add( key ) Remove( key ) Diff( host1, host2 ) Key Service Application Key Service Application Key Service • Promising Applications: – File Synchronization – P2P file sharing – Failure Recovery 32 Difference Digest Summary • Strata Estimator – Estimates Set Difference. – For 100K sets, 15KB estimator has <15% error – O(log n) communication, O(n) computation. • Invertible Bloom Filter – Identifies all ID’s in the Set Difference. – 16 to 28 Bytes per ID in Set Difference. – O(d) communication, O(n+d) computation – Worth it if set difference is < 20% of set sizes 33 Connection to Sparse Recovery? • If we forget about subtraction, in the end we are recovering a d-sparse vector. • Note that the hash check is key for figuring out which cells are pure after differencing. • Is there a connection to compressed sensing. Could sensors do the random summing? The hash summing? • Connection the other way: could use compressed sensing for differences? 34 Comparison with Information Theory and Coding • Worst case complexity versus average • It emphasize communication complexity not computation complexity: we focus on both. • Existence versus Constructive: some similar settings (Slepian-Wolf) are existential • Estimators: We want bounds based on difference and so start by efficiently estimating difference. 35 Aside: IBFs in Digital Hardware Stream of set elements a , b, x, y Hash 1 Logic (Read, hash, Write) Hash 2 Hash 3 Strata Hash Bank 1 Bank 2 Bank 3 Hash to separate banks for parallelism, slight cost in space needed. Decode in software 36 Part 2: Towards a theory of Cloud Complexity O2 O1 ? O3 Complexity of reconciling “similar” objects? 37 Example: Synching Files X.ppt.v2 X.ppt.v3 ? X.ppt.v1 Measures: Communication bits, computation 38 So far: Two sets, one link, set difference {a,b,c} {d,a,c} 39 Mild Sensitivity Analysis: One set much larger than other Small difference d ? Set A Set B (|A|) bits needed, not O (d) : Patrascu 2008 Simpler proof: DKS 2011 40 Asymmetric set difference in LBFS File System (Mazieres) C97 C98 C99 File B C1 C2 C3 File A 1 chunk difference ? C1 C5 C3 . . . . . . C97 C98 C99 Chunk Set B at Server LBFS sends all chunk hashes in File A: O|A| 41 More Sensitivity Analysis: small intersection: database joins Small intersection d ? Set A Set B (|A|) bits needed, not O (d) : Follows from results on hardness of set disjointness 42 Sequences under Edit Distance (Files for example) Edit distance 2 A B C D E F File A ? A C D E F G File B Insert/delete can renumber all file blocks . . . 43 Sequence reconciliation (with J. Ullman) Edit distance 1 H1 H2 H3 A B C D E F File A A C D E F H2 H3 File B Send 2d+1 piece hashes. Clump unmatched 2 pieces and recurse. O( d log (N) ) 44 21 years of Sequence Reconciliation! • Schwartz, Bowdidge, Burkhard (1990): recurse on unmatched pieces, not aggregate. • Rsync: widely used tool that breaks file into roughly N piece hashes, N is file length. UCSD, Lunch Princeton, kids 45 Sets on graphs? {b,c,d} {a,b,c} {d,c,e} {a,f,g} 46 Generalizes rumor spreading which has disjoint singleton sets {b} {a} {d} {g} CLP10,G11,: O( E n log n /conductance) 47 Generalized Push-Pull (with N. Goyal and R. Kannan) {b,c,d} Pick random edge Do 2 party set reconciliation {a,b,c} {d,c,e} Complexity: C + D, C as before, D = Sum i(U – Si ) 48 Sets on Steiner graphs? R1 {a} U S {b} U S Only terminals need sets. Push-pull wasteful! 49 Butterfly example for Sets S1 S2 S1 S2 X D = Diff(S1 ,S2) S1 Y D D Set difference instead of XOR within network 50 How does reconciliation on Steiner graphs relate to network coding? • Objects in general, not just bits. • Routers do not need objects but can transform/code objects. • What transformations within network allow efficient communication close to lower bound? 51 Sequences with d mutations: VM code pages (with Ramjee et al) 2 “errors” A B C D E VM A ? A X C D Y VM B Reconcile Set A = {(A,1)(B,2),(C,3),(D,4),(E,5)} and Set B = {(A,1),(X,2),(C,3),(D,4),(Y,5)} 52 Twist: IBFs for error correction? (with M. Mitzenmacher) • Write message M[1..n] of n words as set S = {(M[1],1), (M[2], 2), . . (M[n], n)}. • Calculate IBF(S) and transmit M, IBF(S) • Receiver uses received message M’ to find IBF(S’); subtracts from IBF’(S) to locate errors. • Protect IBF using Reed-Solomon or redundancy • Why: Potentially O(e) decoding for e errors -Raptor codes achieve this for erasure channels. 53 The Cloud Complexity Milieu 2 Node Graph Steiner Nodes Sets (Key,values) EGUV11 GKV11 ? Sequence, Edit Distance (Files) SBB90 ? ? Sequence, errors only (VMs) MV11 ? ? ? ? ? ? ? ? Sets of sets (database tables) Streams (movies) Other dimensions: approximate, secure, . . . 54 Conclusions: Got Diffs? • Resiliency and fast recoding of random sums set reconciliation; and error correction? • Sets on graphs – All terminals: generalizes rumor spreading – Routers,terminals: resemblance to network coding. • Cloud complexity: Some points covered, many remain • Practical, may be useful to synch devices across cloud. 55 Comparison to Logs/Incremental Updates • IBF work with no prior context. • Logs work with prior context, BUT – Redundant information when sync’ing with multiple parties. – Logging must be built into system for each write. IBF’s may out-perform logs when: – Logging adds overhead at runtime. • Synchronizing multiple parties – Logging requires non-volatile storage. • Synchronizations happen infrequently • Often not present in network devices. 56