Lecture 2 Distributed Hash Table 1 Information search in P2P Suppose we have a P2P systems with N nodes. A file “F” is stored in one node How could an arbitrary node find “F” in the system. 2 P2P: centralized index original “Napster” design 1) when peer connects, it informs central server: Bob centralized directory server 1 peers IP address content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob 1 3 1 2 1 Alice 3 P2P: problems with centralized directory single point of failure performance bottleneck copyright infringement: “target” of lawsuit is obvious file transfer is decentralized, but locating content is highly centralized 4 Query flooding fully distributed no central server used by Gnutella Each peer indexes the files it makes available for sharing (and no other files) overlay network: graph edge between peer X and Y if there’s a TCP connection all active peers and edges form overlay net edge: virtual (not physical) link given peer typically connected with < 10 overlay neighbors 5 Query flooding Query message sent over existing TCP connections peers forward Query message QueryHit sent over reverse Query path File transfer: HTTP Query QueryHit QueryHit Scalability: limited scope flooding 2: Application Layer SSL (7/09) 6 Gnutella: Peer joining joining peer Alice must find another peer in Gnutella network: use list of candidate peers 2. Alice sequentially attempts TCP connections with candidate peers until connection setup with Bob 3. Flooding: Alice sends Ping message to Bob; Bob forwards Ping message to his overlay neighbors (who then forward to their neighbors….) peers receiving Ping message respond to Alice with Pong message 4. Alice receives many Pong messages, and can then setup additional TCP connections 1. 7 Hierarchical Overlay Hybrid of centralized index, query flooding approaches each peer is either a super node or assigned to a super node TCP connection between peer and its super node. TCP connections between some pairs of super nodes. Super node tracks content in its children ordinary peer group-leader peer neighoring relationships in overlay network 8 Distributed Hash Table (DHT) DHT = distributed P2P database Database has (key, value) pairs; key: ss number; value: human name key: content type; value: IP address Peers query database with key database returns values that match the key Peers can also insert (key, value) pairs into database Finding “needles” requires that the P2P system be structured 9 The Principle Of Distributed Hash Tables A dynamic distribution of a hash table onto a set of cooperating nodes node Key Value 1 Frozen 9 Tangled 11 Mulan 12 Lion King 21 Cinderella 22 Doreamon • Basic service: lookup operation • Key resolution from any node A node B node D node C →Node D : lookup(9) • Each node has a routing table • Pointers to some other nodes • Typically, a constant or a logarithmic number of pointers (why?) 10 DHT Desirable Properties 1. Keys mapped evenly to all nodes in the network 2. Each node maintains information about only a few other nodes 3. A key can be found efficiently by querying the system 4. Node arrival/departures only affect a few nodes 11 Oc t. 4 Chord Identifiers Assign integer identifier to each peer in range [0,2n-1]. Each identifier can be represented by n bits. Require each key to be an integer in same range. To get integer keys, hash original key. e.g., key = h(“Led Zeppelin IV”) This is why database is called a distributed “hash” table 12 Each key must be stored in a node Central issue: Assigning (key, value) pairs to peers. Rule: assign to the peer that has the ID closest to key. Convention in lecture: closest is the immediate successor of the key (or equal to) Example: 4 bits; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1 13 Chord [MIT] consistent hashing (SHA-1) assigns each node and object an m-bit ID IDs are ordered in an ID circle ranging from 0 – (2m-1). New nodes assume slots in ID circle according to their ID Key k is assigned to first node whose ID ≥ k successor(k) 14 Consistent Hashing - Successor Nodes identifier node X key 6 7 successor(6) = 0 6 0 1 1 identifier circle 6 5 4 successor(1) = 1 2 2 successor(2) = 3 3 2 15 Consistent Hashing – Join and Departure When a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n. When node n leaves the network, all of its assigned keys are reassigned to n’s successor. 16 Consistent Hashing – Node Join keys 5 7 7 0 keys 1 1 keys 6 2 5 4 3 keys 2 17 Consistent Hashing – Node Dep. keys 7 7 keys 6 0 keys 1 1 6 2 5 4 3 keys 2 18 Consistent Hashing: more example For n = 6, # of identifiers is 64. The following DHT ring has 10 nodes and stores 5 keys. The successor of key 10 is node 14. Circular DHT (1) 1 3 15 4 12 5 10 8 Each peer only aware of immediate successor and predecessor. 20 Circle DHT (2) O(N) messages on avg to resolve query, when there are N peers 0001 I am Who’s resp 0011 for key 1110 ? 1111 1110 0100 1110 1110 1100 1110 1110 Define closest as closest successor 1010 0101 1110 1000 21 Circular DHT with Shortcuts 0001 0011 1111 Who’s resp for key 1110? 0100 1100 0101 1010 1000 Each peer keeps track of IP addresses of predecessor, successor, and short cuts. Reduced from 6 to 3 messages. Can design shortcuts such that O(log N) neighbors per peer, O(log N) messages per query 22 Scalable Key Location – Finger Tables finger table start For. 0 1 0+2 1 2 0+2 2 4 0+2 7 0 finger table For. start 0 1+2 2 1 1+2 3 2 1+2 5 1 6 succ. 1 3 0 keys 6 succ. 3 3 0 keys 1 2 5 4 3 finger table For. start 0 4 3+2 1 5 3+2 2 7 3+2 succ. 0 0 0 keys 2 Chord key location Lookup in finger table the furthest node that precedes key -> O(log n) hops 24 Peer Churn •To handle peer churn, require each peer to know the IP address of its two successors •Each peer periodically pings its two successors to see if they are still alive •Limited solution for single join or single failure 25 Consistent Hashing – Node Join keys 5 7 7 0 keys 1 1 keys 6 2 5 4 3 keys 2 Consistent Hashing – Node Dep. keys 7 7 keys 6 0 keys 1 1 6 2 5 4 3 keys 2 Node Joins and Stabilizations The most important thing is the successor pointer. If the successor pointer is ensured to be up to date, which is sufficient to guarantee correctness of lookups, then finger table can always be verified. Each node runs a “stabilization” protocol periodically in the background to update successor pointer and finger table. 28 Node Joins and Stabilizations “Stabilization” protocol contains 6 functions: create() join() stabilize() notify() fix_fingers() check_predecessor() When node n first starts, it calls n.join(n’), where n’ is any known Chord node. The join() function asks n’ to find the immediate successor of n. 29 Node Joins – stabilize() Each time node n runs stabilize(), it asks its successor for the it’s predecessor p, and decides whether p should be n’s successor instead. stabilize() notifies node n’s successor of n’s existence, giving the successor the chance to change its predecessor to n. The successor does this only if it knows of no closer predecessor than n. 30 Node Joins – Join and Stabilization n joins pred(ns) = n np succ(np) = ns n nil n runs stabilize succ(np) = n pred(ns) = np ns predecessor = nil n acquires ns as successor via some n’ n notifies ns being the new predecessor ns acquires n as its predecessor np runs stabilize np asks ns for its predecessor (now n) np acquires n as its successor np notifies n n will acquire np as its predecessor all predecessor and successor pointers are now correct fingers still need to be fixed, but old fingers will still work 31 Node Failures Key step in failure recovery is maintaining correct successor pointers To help achieve this, each node maintains a successor-list of its r nearest successors on the ring If node n notices that its successor has failed, it replaces it with the first live entry in the list Successor lists are stabilized as follows: node n reconciles its list with its successor s by copying s’s successor list, removing its last entry, and prepending s to it. If node n notices that its successor has failed, it replaces it with the first live entry in its successor list and reconciles its successor list with its new successor. 32 Handling failures: redundancy Each node knows IP addresses of next r nodes. Each key is replicated at next r nodes 33 Evaluation results 10,000 node network 34 Load distribution Probability density function 35 Failure rate 36 Path length 37 Failed lookups vs churn rate Start with 500 nodes 38 Chord main problem Not good churn-handling solution Only merely achieves “correctness” The definition of a correct Chord is letting each node maintain the predecessor and successor. Which allows a query to eventually arrive the key location, but…. Takes at most O(N) hops to find the key! Not log(N) as the original design claimed. 39 Chord main problem No good solution to maintain both scalable and consistent finger table under Churn. Not practical for P2P systems which are highly dynamic Paper talking about high consistency: Simon S. Lam and Huaiyu Liu, ``Failure Recovery for Structured P2P Networks: Protocol Design and Performance Evaluation,'' Proceedings of ACM SIGMETRICS 2004, 40 Chord problem 2 Only good for exact search Cannot support range search and approximate search 41 Solution of BitTorrent Maintain trackers (servers) as DHT, which are more reliable Users queries trackers to get the locations of the file File sharing are not structured. 42 DHT in a cloud Architecture Servers are hosted in a cloud. Data are distributed among servers User is a device outside the cloud. User sends a query for a key (webpage, file, data, etc) to the cloud The query first arrives at an arbitrary server and be routed among the servers using DHT. It finally arrives at the server which has the data The server replies the user. 43 End of Lecture02 Next paper: Read and Write a review Vivaldi: A Decentralized Network Coordinate System, Frank Dabek, Russ Cox, Frans Kaashoek and Robert Morris, Proceedings SIGCOMM 2004. 44