Routing and Location in P2P Networks Klaus Marius Hansen University of Aarhus 2003/09/23 Routing and location in previously introduced systems. Routing in structured overlay networks: Pastry and Chord. Routing and Location in P2P Networks Material Routing and Location in Introduced P2P Systems Pastry Chord Summary Material Material o o o o Previous systems introduced in the course (Rowstron & Druschel 2001) Rowstron, A. & Druschel, P. (2001), Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems, in IFIP/ACM International Conference on Distributed Systems Platforms (Middleware 2001), pp. 329-350. P2P routing and location in a structured overlay network taking into account network locality (Stoica, Morris, Liben-Nowell, Karger, Kaashoek, Dabek & Balakrishnan 2003) Stoica, I., Morris, R., Liben-Nowell, D., Karger, D. R., Kaashoek, M. F., Dabek, F. & Balakrishnan, H. (2003), Chord: A scalable peer-to-peer lookup protocol for internet applications, IEEE/ACM Transactions on Networking Software 11(1), 17-32. Algorithms for location in P2P networks. Provable correctness and performance. Routing and Location in Introduced P2P Systems Background o o o Routing: the process of moving a data packet to a location Central Questions How do we efficiently locate a peer? When a peer is located, how do we efficiently route messages to and from that peer? ... of course all in the context of no global network knowledge, and frequent joins and leaves by peers... A Traceroute Example Indexed o o o Using index servers for location, IP for routing Napster SETI@Home ICQ Walking/flooding o o o (Unstructured) walks based on neighbor sets Gnutella Kazaa (hybrid) (JXTA) Key-Proximity o o Route based on (unstructed) narrowing down of difference in keys Freenet Windows P2P Weaknesses Single point of failure for indexed Potentially low performance for walking/flooding Hard to prove correctness/performance/space requirements for routing protocols Pastry Pastry Overview o o Effective, distributed object location and routing substrate for P2P networks "Effective": O(log N) routing hops "Distributed": no servers, routing and location distributed to nodes, only limited knowledge (size O(log N) of routing tables) at nodes o "Substrate": not an application itself, rather it provides Application Program Interface (API) to be used by applications Runs on all nodes joined in a Pastry network Each node has a unique identifier (nodeId) Given a key and a message, Pastry routes the message to the node with nodeId numerically closest to the key Takes into account network locality Pastry API o o o o Pastry exports nodeId = pastryInit(Credentials, Application)make the local node join/create a Pastry network. Credentials are used for authorization. An object used for callbacks is passed through the Application parameter route(msg, key): routes a message to the live node D with nodeId numerically closest to the key (at the time of delivery) Application interface to be implemented by applications using Pastry deliver(msg, key): called on the application at the destination node for the given id forward(msg, key, nextId): invoked on applications when the underlying node is about to forward the given message to the node with nodeId = nextId. (Actually using the FreePastry 1.3 open source Java is slightly more involved) Assumptions and Guarantees Each node is assigned a 128 bit nodeId o nodeIds are assumed to be uniformly distributed in the 128 bit id space => numerically close nodeIds belong to diverse nodes o can be achieved, e.g., using a crypthographic hash of IP address of a node Pastry can route to numerically closest node in ceiling(log2^b(N)) steps (b is a configuration parameter) o If less than |L|/2 (|L| is a configuration parameter) adjacent nodeIds fail concurrently, eventual delivery is guaranteed Join, leave in O(log N) Maintains locality based on application-defined scalar proximity metric Example Applications SCRIBE:group communication/event notification o Groups can be created and joined Members of a group may multicast messages to all members of a group (delivered using best-effort) Each group has a unique id, groupId (from a hash of the group name and the creators name) o The node with nodeId numerically closest to groupId acts as rendezvous for the group Group creation is handled by sending a CREATE message to the node with id groupId Nodes wishing to join sends a JOIN message to this node o To send a message, a node sends a MULTICAST message to the rendezvous o o o o o In principle, the rendezvous might just then send messages to all joined nodes. (SCRIBE actually builds a multicast tree rooted in the rendezvous for optimization) PAST: Archival storage Each file inserted gets a 160 bit fileId (from a hash of file name, owner's public key, and random salt) Pastry routes the file to the k nodes that are numerically closest to the first 128 bits of the fileId Lookup ensures that the file is found as long as 1 of the k nodes are alive SQUIRREL co-operative web caching SplitStream high-bandwidth content distribution ... Routing Table Routing Example Pastry Routing Algorithm Pastry Routing Algorithm - Analysis o o o o o o o o o Observation: Either 1), 2), or 3) must hold For 3): leaf set must contain nodes numerically closer to the key with same shared prefix as us (otherwise, we are the closest node) - unless |L|/2 nodes in leaf set have failed... simultaneously Termination 1) Directly terminates at chosen node 2) Node routed to shares a longer prefix with key 3) Node routed to shares a prefix of same length but with numerically closer key (Expected) performance 1) Destination one hop away 2) The set of possible nodes with a longer prefix match is reduced by 2^b 3) Only one extra routing step is needed (with high probability) Given accurate routing tables, the probability for 3) is the probability that a node with the given prefix does not exist and that the key is not covered by the leaf set => expected performance is O(log N) Self-Organization - Node Arrival New node, X, needs to know existing, nearby node, A, (can be achieved using, e.g., multicast on local network) X asks a to route a "join" message with key equal to X o Pastry routes this message to node Z with key numerically closest to X o All nodes enroute to Z returns their state to X X updates its state based on returned state o neighborhood set = neighborhood set of A o leaf set is based leaf set of Z (since Z has id closet to id of X) o Rows of routing table is initialized based on rows of routing tables of nodes visited enroute to Z (since these share common prefixes with X) X calibrates routing table and neighborhood set based on data from the nodes referenced therein X sends it state to all the nodes mentioned in its leaf set, routing table, and neighborhood set O(log2^b(N)) messages exchanged Self-Organization - Node Departure o o o o o Assumption: A node that can no longer be communicated with has failed Repair of leaf set Contact the live node with the largest index on the side of the failed node and get leaf set from that node Returned leaf set will contain an appropriate one to insert Contacting works unless |L|/2 nodes with adjacent nodeIds have failed Repair of routing table Contact other node on the same row to check if this node has a replacement node (the contacted node may have a replacement node on the same row of its routing table) If not, contact node on next row of routing table Repair of neighborhood set o o Neighborhood set is normally not used in routing => contact periodically to check for liveness If a neighbor is not responding, check with other neighbors for close nodes Locality Routing performance is based on small number of routing hops - and "good" locality of routing with respect to underlying network Scalar proximity metric (e.g., number of IP routing hops, geographical distance, or available bandwidth) o Applications are responsible for providing proximity metrics Join protocol maintains closeness invariant Handling Malicious Nodes? Choose randomly between nodes satisfying the criteria of the routing protocol... Experimental Results - Routing Performance Experimental Results - Routing Distance Experimental Results - Routing Distance Experimental Results - Quality of Routing Tables Summary o o Pastry is a P2P content location and routing substrate Structured overlay network Usable for building various P2P application Space and time requirements (expected) in O(log N), N = number of nodes in network Takes locality into account Chord Overview o o o One operation IP address = lookup(key): Given a key, find node responsible for that key Goals: Load balancing, decentralization, scalability, availability, flexible naming Performance and space usage Lookup in O(log N) Each node needs information about O(log N) other nodes Example Applications o o Cooperative File System (CFS) Building a distributed hash table on top of Chord (DHash) Storing blocks using DHash, lookup using Chord o o Distributed Indices Derive keys from desired keywords Let values be servers holding documents matching the desired keywords Use of Consistent Hashing in Chord (1) Keys are assigned to nodes with consistent hashing Hash function balances load Rebalancing (when node joins or leaves) requires moving only O(log 1/N) Nodes and keys are assigned m-bit identifiers o Using SHA-1 on nodes' IP addresses and on keys o m should be big enough to make collisions improbable "Ring-based" assignment of keys to nodes o Identifiers are ordered on an identifier circle modulo 2^m o A key k is assigned to the first node n whose identifier is equal to or follows k - n = successor(k) Chord improves on consistent hashing by only requiring knowledge about O(log N) other nodes at each node o o Use of Consistent Hashing in Chord (2) Use of Consistent Hashing in Chord (3) o o o Designed to let nodes enter and leave network easily Node n leaves: all of n's assigned keys are assigned to successor(n) Node n joins: keys k <= n assigned to successor(n) become assigned to n Compare "traditional hashing", e.g., h(x) = ax + b (mod p), in which p changes... Example: node 26 joins => key 24 becomes assigned to node 26 (Each physical node runs a number of virtual nodes each with its own identifier to balance load) Simple Key Location (a) Simple key location can be implemented in time O(log N) and space O(1) (b) Example: Node 8 performs a lookup for key 54 Scalable Key Location (1) o Uses finger tables n.finger[i] = successor(n + 2^(i-1)), 1 <= i <= m Scalable Key Location (2) If successor not found, search finger table to find n' whose ID most immediately precedes id o Rationale: this node will know the most about n' of all nodes in the finger table Scalable Key Location (3) o o o Performance is O(log N) with high probability Each node can forward a query at least halfway along the remaining distance => less than m steps to find node After 2log N steps, the distance is max 2^m/2^(2log N) = 2^m/N^2 - and the probability for two nodes to be in such an interval is 1/N, i.e., negligible Space required is O(log N) with high probability As above: for i <= m - 2log N, the i'th finger of the node will be the node's immediate successor with high probability Self-organization - Node failures o o Chord maintains successor lists to cope with node failures Node leave could be viewed as a failure If nodes leaves voluntarily, it may notify its successor and predecessor Experimental Results - Path Length Summary o Decentralized lookup of nodes responsible for storing keys Based on distributed, consistent hashing Performance and space in O(log N) for stable networks Simple; provable performance and correctness Summary Summary o o o o "First generation" routing and location in P2P networks Largely application-specific Hard to analyse "Second generation" routing and location in P2P networks Based on structured network overlays Typically expected O(log N) time and space requirements Created by JackSVG