Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University Background • We studied P2P file sharing in class – Napster – Gnutella – KaZaA – BitTorrent • Today, let’s learn more! – Chord: a scalable P2P lookup protocol – CFS: a distributed file system built on top of Chord – http://pdos.csail.mit.edu/chord 2 Review: distributed hash table Distributed application put(key, data) get (key) data Distributed hash table node node …. node DHT provides the information look up service for P2P applications. • Nodes uniformly distributed across key space • Nodes form an overlay network • Nodes maintain list of neighbors in routing table • Decoupled from physical network topology 3 The lookup problem N2 N3 N1 Key = “beat it” Value = MP3 data… Publisher Internet ? N4 N6 Client Lookup(“beat it”) N5 4 Centralized lookup (Napster) SetLoc(“beat it”, N4) Publisher@ N 4 Key = “beat it” Value = MP3 data… N2 N1 N3 DB N9 N6 N7 Client Lookup(“beat it”) N8 Simple, but O(N) state and a single point of failure 5 Flooded queries (Gnutella) N2 N1 Publisher@ N4 Key = “beat it” Value = MP3 data… N6 N7 N3 Lookup(“beat it”) Client N8 N9 Robust, but worst case O(N) messages per lookup 6 Routed queries (Chord) N2 N1 Publisher Key = “beat it” Value = MP3 data… N3 Client N4 Lookup(“beat it”) N6 N7 N8 N9 7 Routing challenges • • • • Define a useful key nearness metric Keep the hop count small Keep the tables small Stay robust despite rapid change • Chord: emphasizes efficiency and simplicity 8 Chord properties • Efficient: O(log(N)) messages per lookup – N is the total number of servers • Scalable: O(log(N)) state per node • Robust: survives massive failures • Proofs are in paper / tech report – Assuming no malicious participants 9 Chord overview • Provides peer-to-peer hash lookup: – Lookup(key) return IP address – Chord does not store the data • How does Chord route lookups? • How does Chord maintain routing tables? 10 Chord IDs • • • • Key identifier = SHA-1(key) Node identifier = SHA-1(IP address) Both are uniformly distributed Both exist in the same ID space • How to map key IDs to node IDs? – The heart of Chord protocol is “consistent hashing” 11 Review: consistent hashing for data partitioning and replication 1 0 hash(key1) E replication factor N=3 A C hash(key2) F D B 1/2 A key is stored at its successor: node with next higher ID 12 Identifier to node mapping example • • • • • Node 8 maps [5,8] Node 15 maps [9,15] Node 20 maps [16, 20] … Node 4 maps [59, 4] 4 58 8 15 • Each node maintains a pointer to its successor 44 20 35 32 13 Lookup lookup(37) 4 • Each node maintains its successor • Route packet (ID, data) to the node responsible for ID using successor pointers 58 8 node=44 15 44 20 35 32 14 Join Operation Node with id=50 joins the ring via node 15 Node 50: send join(50) to node 15 Node 44: returns node 58 Node 50 updates its successor to 58 succ=4 pred=44 4 58 8 join(50) succ=nil succ=58 pred=nil 50 15 58 succ=58 pred=35 44 20 35 32 15 Periodic Stabilize succ=4 pred=44 pred=50 Node 50: periodic stabilize stabilize(node=50) Sends stabilize notify(node=50) message to 58 Node 50: send notify message to 58 succ=58 Update pred=44 4 58 8 succ.pred=44 pred=nil 15 50 succ=58 pred=35 44 20 35 32 16 Periodic Stabilize Node 44: periodic stabilize Asks 58 for pred (50) Node 44 updates its successor to 50 succ=4 pred=50 4 58 8 stabilize(node=44) succ=58 pred=nil 15 50 succ=58 succ=50 pred=35 44 20 35 32 17 Periodic Stabilize succ=4 pred=50 Node 44 has a new successor (50) Node 44 sends a notify message to node 50 4 58 succ=58 pred=nil pred=44 50 succ=50 pred=35 8 15 notify(node=44) 44 20 35 32 18 Periodic Stabilize Converges! pred=50 This completes the joining operation! 4 58 succ=58 pred=44 succ=50 8 50 15 44 20 35 32 19 Achieving Efficiency: finger tables Finger Table at 80 i 0 1 2 3 4 5 6 ft[i] 96 96 96 96 96 112 20 Say m=7 0 (80 + 26) mod 27 = 16 80 + 25 112 20 96 32 80 + 24 80 + 23 80 + 22 80 + 21 80 + 20 80 45 ith entry at peer with id n is first peer with id >= n 2i (mod 2m ) • Each node only stores O(log N) entries • Each look up takes at most O(log N) hops 20 Achieving Robustness • What if nodes FAIL? • Ring robustness: each node maintains the k (> 1) immediate successors instead of only one successor – If smallest successor does no respond, substitute the second entry in its successor list – Unlikely all successors fail simultaneously • Modifications to stabilize protocol (see paper!) 21 Example of Chord-based storage system Storage App Storage App Storage App Distributed Hashtable Distributed Hashtable Distributed Hashtable Chord Chord Chord Client Server Server 22 Cooperative File System (CFS) Block storage Availability / replication Authentication Caching Consistency Server selection Keyword search DHash distributed block store Lookup Chord • Powerful lookup simplifies other mechanisms 23 Cooperative File System (cont.) • Block storage – Split each file into blocks and distribute those blocks over many servers – Balance the load of serving popular files • Data replication – Replicate each block on k servers – Increase availability – Reduce latency (fetch from the server with least latency) 24 Cooperative File System (cont.) • Caching – Caches blocks along the lookup path – Avoid overloading servers that hold popular data • Load balance – Different servers may have different capacities – A real server may act as multiple virtual servers, by being hashed to several different IDs. 25 Q&A 26