PEER TO PEER AND DISTRIBUTED HASH TABLES CS 271 1 Distributed Hash Tables Challenge: To design and implement a robust and scalable distributed system composed of inexpensive, individually unreliable computers in unrelated administrative domains Partial thanks Idit Keidar) CS 271 2 Searching for distributed data • Goal: Make billions of objects available to millions of concurrent users – e.g., music files • Need a distributed data structure to keep track of objects on different sires. – map object to locations • Basic Operations: – Insert(key) – Lookup(key) CS 271 3 Searching N1 Key=“title” Value=MP3 data… Publisher N2 Internet N4 N5 CS 271 N3 ? Client Lookup(“title”) N6 4 Simple Solution • First There was Napster – Centralized server/database for lookup – Only file-sharing is peer-to-peer, lookup is not • Launched in 1999, peaked at 1.5 million simultaneous users, and shut down in July 2001. CS 271 5 Napster: Publish insert(X, 123.2.21.23) ... Publish I have X, Y, and Z! 123.2.21.23 CS 271 6 Napster: Search 123.2.0.18 Fetch Query search(A) --> 123.2.0.18 Reply Where is file A? CS 271 7 Overlay Networks • A virtual structure imposed over the physical network (e.g., the Internet) – A graph, with hosts as nodes, and some edges Keys Node ids Hash fn Overlay Network CS 271 Hash fn 8 Unstructured Approach: Gnutella • Build a decentralized unstructured overlay – Each node has several neighbors – Holds several keys in its local database • When asked to find a key X – Check local database if X is known – If yes, return, if not, ask your neighbors • Use a limiting threshold for propagation. CS 271 9 Gnutella: Search I have file A. I have file A. Reply Query Where is file A? CS 271 10 Structured vs. Unstructured • The examples we described are unstructured – There is no systematic rule for how edges are chosen, each node “knows some” other nodes – Any node can store any data so a searched data might reside at any node • Structured overlay: – The edges are chosen according to some rule – Data is stored at a pre-defined place – Tables define next-hop for lookup CS 271 11 Hashing • Data structure supporting the operations: – void insert( key, item ) – item search( key ) • Implementation uses hash function for mapping keys to array cells • Expected search time O(1) – provided that there are few collisions CS 271 12 Distributed Hash Tables (DHTs) • Nodes store table entries • lookup( key ) returns the location of the node currently responsible for this key • We will mainly discuss Chord, Stoica, Morris, Karger, Kaashoek, and Balakrishnan SIGCOMM 2001 • Other examples: CAN (Berkeley), Tapestry (Berkeley), Pastry (Microsoft Cambridge), etc. CS 271 13 CAN [Ratnasamy, et al] • Map nodes and keys to coordinates in a multidimensional cartesian space source Zone key Routing through shortest Euclidean path For d dimensions, routing takes O(dn1/d) hops Chord Logical Structure (MIT) • m-bit ID space (2m IDs), usually m=160. • Nodes organized in a logical ring according to their IDs. N1 N56 N51 N8 N10 N48 N14 N42 N21 N38 N30 15 DHT: Consistent Hashing Key 5 Node 105 K5 N105 K20 Circular ID space N32 N90 K80 A key is stored at its successor: node with next higher ID Thanks CMU for animation CS 271 16 Consistent Hashing Guarantees • For any set of N nodes and K keys: – A node is responsible for at most (1 + )K/N keys – When an (N + 1)st node joins or leaves, responsibility for O(K/N) keys changes hands CS 271 17 DHT: Chord Basic Lookup N120 N10 “Where is key 80?” N105 “N90 has K80” K80 N32 N90 N60 Each node knows only its successor •Routing around the circle, one node at a time. CS 271 18 DHT: Chord “Finger Table” 1/2 1/4 1/8 1/16 1/32 1/64 1/128 N80 • Entry i in the finger table of node n is the first node that succeeds or equals n + 2i • In other words, the ith finger points 1/2n-i way around the ring CS 271 19 DHT: Chord Join • Assume an identifier space [0..8] • Node n1 joins Succ. Table i id+2i succ 0 2 1 1 3 1 2 5 1 0 1 7 6 2 5 3 4 CS 271 20 DHT: Chord Join • Node n2 joins Succ. Table i id+2i succ 0 2 2 1 3 1 2 5 1 0 1 7 6 2 Succ. Table 5 3 4 CS 271 i id+2i succ 0 3 1 1 4 1 2 6 1 21 DHT: Chord Join Succ. Table i id+2i succ 0 1 1 1 2 2 2 4 0 • Nodes n0, n6 join Succ. Table i id+2i succ 0 2 2 1 3 6 2 5 6 0 1 7 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table 5 3 4 CS 271 i id+2i succ 0 3 6 1 4 6 2 6 6 22 DHT: Chord Join Succ. Table i i id+2 0 1 1 2 2 4 • Nodes: n1, n2, n0, n6 • Items: f7, f1 0 i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table 1 7 Succ. Table Items 7 succ 1 2 0 6 i i id+2 0 2 1 3 2 5 Items succ 1 2 6 6 2 Succ. Table 5 3 4 CS 271 i id+2i succ 0 3 6 1 4 6 2 6 6 23 DHT: Chord Routing • Upon receiving a query for item id, a node: • Checks whether stores the item locally? • If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i i id+2 0 1 1 2 2 4 Items 7 succ 1 2 0 0 Succ. Table 1 7 i i id+2 0 2 1 3 2 5 query(7) 6 Items succ 1 2 6 6 2 Succ. Table 5 3 4 CS 271 i id+2i succ 0 3 6 1 4 6 2 6 6 24 Chord Data Structures • Finger table • First finger is successor • Predecessor • What if each node knows all other nodes – O(1) routing – Expensive updates CS 271 25 Routing Time • Node n looks up a key stored at node p • p is in n’s ith interval: p ((n+2i-1)mod 2m, (n+2i)mod n n+2i-1 2m] • n contacts f=finger[i] – The interval is not empty so: f ((n+2i-1)mod 2m, (n+2i)mod 2m] • f is at least 2i-1 away from n • p is at most 2i-1 away from f • The distance is halved at each hop. f finger[i] p n+2i 26 Routing Time • Assuming uniform node distribution around the circle, the number of nodes in the search space is halved at each step: – Expected number of steps: log N • Note that: – m = 160 – For 1,000,000 nodes, log N = 20 CS 271 27 P2P Lessons • • • • Decentralized architecture. Avoid centralization Flooding can work. Logical overlay structures provide strong performance guarantees. • Churn a problem. • Useful in many distributed contexts. CS 271 28