An Exploration of Algorithms Used in Message Routing and Node Addressing in Peer-to-Peer Networks Abstract A peer-to-peer network is a network with completely decentralized control. Each node in the network assumes equal responsibilities and capabilities. Because of this, peer-to-peer networks have the potential to significantly reduce the impact of DoS attacks, as well as the ability to self organize and adapt while still being scalable. These properties are possible via certain addressing and routing algorithms and properties inherent in the network. In this project I aim to design, implement and analyze alternative addressing and routing models for Pastry, a scalable distributed object location and routing substrate for peer-to-peer applications; and to evaluate the efficiency of these algorithms in regards to the network’s scalability, robustness and adaptability. Author: Adrian Gasparini Date: 20th August 2003 Supervisors: Katrina Falkner and Dave Munro 1 1. Introduction It has been estimated that among all the Internet connected PC’s in the world, there exists an “aggregate ten billion MHz of processing power and ten thousand terabytes of storage, assuming only 100 million PC’s among the net’s 300 million users, and only a 100MHz chip and 100 megabyte drive in the average PC.” [3] The potential to exploit this resource is currently limited by the scalability of any attempt to harness this resource. With Internet routing tables growing steadily since the Internet’s birth, attention has been focused recently on peer-to-peer networks, which provide a novel way to reduce the amount of global knowledge required in such networks. A peer-to-peer network is a completely decentralized network. Each node in such a network has equal responsibilities and capabilities. Each node behaves like both a client and a server, thus reducing the network’s dependence on the availability on centralized servers to function. The peer-to-peer network is considered an overlay network, over which underlying nodes are untrusted and are assumed to fail silently, and to act maliciously. For this reason, much data within the overlay peer-to-peer network is replicated to ensure availability. The possibilities for exploitation of the benefits of peer-to-peer networks are huge. One such benefit is the global collaboration of information between interested parties. For example, in the development of a project or research effort, which spans the globe, a large scale peer-to-peer distributed persistent storage application like OceanStore [1] enables groups from different geographic locations access to the most current data. Peer-to-peer networks also allow the utilization of spare computer resources in an organization’s network. For organizations with large scale computing needs, peer-to-peer networks allow an organization to distribute their large computational jobs over multiple computers. This enables an organization to take advantage of idle computer resources existing in their current infrastructure. An example would be sending a large computation over a peer-to-peer network spanning the globe, such that the section of the peer-to-peer network involved in the computation is operating between the hours of 8pm and 8am, when computing resources in a corporate network are largely left idle. This results in the large computation not degrading the networks and computing resources of the organization during more productive times of the day, between 9am to 5pm. The efficiency and scalability of a peer-to-peer overlay application to achieve persistent storage and computation relies on the design, implementation and locality properties of the network and various addressing and routing protocols needed in order to route messages quickly and efficiently. There are many different families of routing protocols that exist today. The most widely known family are those protocols based on Plaxton’s algorithm for distributed searching [3]. These routing algorithms tend to be slow but reliable. Another family of routing algorithms implements probabilistic methods in data location. An example is the use of Attenuated Bloom filters in the initial stage of data location in the OceanStore system [1]. These algorithms are quick in locating data, but suffer from a certain percentage of false positives, making them unreliable. However, all peer-to-peer routing protocols have the 2 same aim, which is to efficiently route messages between nodes over an untrusted underlay network such as the Internet, reliably. The underlying peer-to-peer network infrastructure is difficult to abstract over. Underlying nodes are not trusted, and can be separated by many thousands of kilometers. The part that routing protocols play is integral to the viability of the system. The routing protocols must deal with three fundamental problems of peer-to-peer networks, which are scalability, robustness and adaptability. In this project, we will be looking at designing alternative routing mechanisms for the Pastry system, aiming to improve their efficiency, specifically examining issues like scalability, robustness and adaptability. 2. Literature Review 2.1 Historical Background The interest in peer-to-peer networks began with the creation of Napster, a peer-to-peer network overlay enabling users of Napster to share files between each other. Napster attracted a lot of attention, but for all the wrong reasons. Users of Napster traded music files between each other, which raised copyright issues in regards to the music that was exchanged. Napster also was not a true peer-to-peer network because it was not completely decentralized. File exchange occurred between peers but only after a file/peer lookup at a centralized server where file indexes were stored [3]. Other music swapping peer-to-peer network overlays which are completely decentralized have been created since Napster, like Gnutella and FreeNet, but these overlays are not without their problems also. Since Napster, the distributed computing community has been actively researching possible applications using peer-to-peer networks. Currently, many applications have been created. In the following paragraphs, I will describe and categorize some of these systems and applications that have different routing protocols than Pastry. 2.2.1 Pastry: Based on the Plaxton routing mechanism: Pastry [4] is a member of the Plaxton family of algorithms. Pastry is a self-organizing overlay of network nodes that routes messages from node to node based on a nodeId and a key. The nodeId is a 128-bit node identifier, and is assigned randomly when a node joins the system. The nodeId’s assigned to nodes when entering the overlay are assigned randomly with a uniform distribution, and can be based on a cryptographic hash of the node’s public key or IP address. The random assignment of nodeId’s in this way ensure such properties as adjacent nodeId nodes being diverse geographically, as well as diverse in ownership and jurisdiction, etc. This property is needed in order to ensure the availability of the network in times of proximity related failures. Each Pastry node stores some state on the network within its area. Each node stores a leaf set, L, of nodes that are the set of L closest nodes in regards to the numeric nodeId of the current Pastry node. Half of the leaf set have nodeId’s larger than the node. Each Pastry node also stores a routing table of nodes, to enable more efficient routing. A node’s routing table consists of the closest node’s IP address matched to its nodeId, of which the current Pastry node shares a prefix to it’s own nodeId. Each Pastry node also stores a set of its M closest nodes based on some sort of proximity metric, like round-trip-time or IP 3 hops. These nodes are not generally used in routing, but are useful in maintaining locality properties. 2.2.2Pastry routing procedure The Pastry routing procedure is as follows [4]: Given a message, a node first tries to route the message to its leaf set. If the key is not represented in the leaf set, the node then tries to forward the message to an entry in its routing table that shares a common prefix with the key with at least 1 more digit in common than the current node’s nodeId. If there is no such routing table entry, the node then forwards the message to a node that has a common prefix at least as long as the current key, but is numerically closer according to some proximity metric. This method for routing gives an expected number of routing steps as O {log2^b N}, where b is a configuration parameter. This is because each three cases of the routing procedure forward the message closer to the destination. In step 1, where the message is routed to a leaf node, the message is within 1 hop from its destination, as nodes are in direct contact with their leaf sets. In step 2 where the routing table is used, each forward via the routing table results in the elimination of 2^b nodes as destinations. Therefore, using the routing table, the destination is reached in O {log2^b N} steps. In step 3, the message is more than 1 hop away, as the node with the desired prefix is not known to the current node. It indicates the node has failed or doesn’t exist. This case is rare because of the uniform distribution of nodeIds reduces the possibility of the network partitioning. 2.3 OceanStore: a probabilistic based routing mechanism: OceanStore [1] is a distributed peer-to-peer persistent storage application. Each persistent object in OceanStore is associated with at least one globally unique identifier [1]. These persistent objects are replicated on multiple servers to enhance availability of the object in the face of network attacks, failures, etc. These objects are free to reside on any servers within the OceanStore overlay. To support location independent addressing, OceanStore messages are labeled with a destination GUID, a random number, and a small predicate. The goal of routing a message in OceanStore is to route the message to the closest node that matches the predicate and has the desired GUID. The process of routing messages to a destination has two phases; the first phase is the routing of the message from peer to peer until a destination is discovered; then messages are routed directly from the source to the destination. This method of data location and routing has a number of advantages. The location of the destination using multiple nodes enables fault tolerant routing, lessening the impact of malicious or failed nodes on the availability of objects. Also, once the destination has been found, messages are routed directly to the destination, possibly cutting down on the number of hops per message. Routing in OceanStore is based on two mechanisms. A fast probabilistic algorithm using Attenuated Bloom filters attempt to locate objects near the node initiating contact with the object. This probabilistic algorithm is backed up by a slower but more reliable hierarchical routing mechanism, based on Plaxton’s Algorithm [3]. The reasoning behind the two mechanisms are that objects used frequently by a node are most likely to be cached near the node. In this case, the fast probabilistic algorithm quickly locates the 4 object. In the event where the faster probabilistic method fails to locate the object, the slower but reliable hierarchical algorithm is employed. The probabilistic algorithm uses Attenuated Bloom Filters in order to locate objects [1]. It is based on a hill-climbing algorithm. The basic idea is if a node cannot satisfy a query, local information is used to route the query to a likely neighbor who might be able to answer the query. An Attenuated Bloom filter is used in deciding the likely neighbour. The hierarchical routing method used is a randomized hierarchical distributed data structure, like that used in Pastry, in which multiple trees are embedded in the network. Each tree symbolizes a directory structure that is specified by the clients of the system. Each tree has a root node. The links between the root node and sub-directories are specified by differences in GUID’s for different nodes. Each object is mapped to a single node whose nodeId matches the object’s GUID in the most bits, beginning with the least significant. This is the object’s root node. The routing of a message then is a case of climbing the tree of the initiating node until the desired object’s root node (or a pointer to a replica) is encountered, and then routing directly to the object. Because of this routing data structure, routing is achieved in O{log N} steps. [1] 2.4 Kademlia: A simple routing mechanism based on the XOR metric. Another routing protocol we will be looking at is the protocol employed in Kademlia [2]. The protocol employed in Kademlia is like Pastry in that it is hierarchical, but it reduces the amount of configuration messages nodes must send to learn about each other. Any configuration information needed by nodes on their surroundings is gathered from key lookups and query resolution via piggybacking. This is done by a node including its nodeId with every message it transmits, so that recipients of the message can record the node’s existence if not previously recorded. Kademlia uses an XOR metric for distance between points in the key space. Kademlia nodes consist of {IP_address, UDP_port, Node_id} triples for nodes of distance between 2^i. and 2^i.+1 from itself. These lists are called k-buckets. Each kbucket is kept sorted in the order of least recently seen. The lists are limited to size k. When a Kademlia node receives any message from another node, it updates the appropriate k-bucket for the sender’s nodeId. This update consists of a least recently used reordering, if the node already exists in the k-bucket; addition to the k-bucket if the kbucket is not full; or a PING message to the least recently seen node to decide whether to evict that node’s entry from the k-bucket or not. If there is no response, the entry is evicted. Otherwise, the least recently seen node is reordered, and the sender’s entry is discarded. The Kademlia protocol consists of four RPC’s: PING, which probes a node to see if it is online; STORE, instructs a node to store a {key, value} pair; FIND_NODE, takes a key as an argument and returns a triplet entry of the node for a k-bucket by querying the network; and FIND_VALUE, which behaves like FIND_NODE, except if the target has been stored, it just returns the value. In Kademlia, the need for configuration messages is not needed. The k-buckets are kept up-to-date by the piggybacking of node existence on each message being routed. This property allows the detection of failed nodes without resorting to network timeouts. The 5 actual routing mechanism is similar to Pastry in that it is hierarchical, except it is based on a different distance metric. Pastry requires O {log2^b N} messages for a node joining the overlay, constant keep-alive messages between nodes in each other’s leaf sets, and further communication and timeout delays in the face of node failures [4]. By employing some aspects of Kademlia’s routing mechanisms in Pastry, we propose the amount of configuration messages needed in Pastry could be reduced. 3. Project Proposal The aim for this project is to increase the efficiency of Pastry’s existing protocols, while keeping in mind issues such as network scalability, robustness in the face of node failures, and the ability of the network to adapt to patterns of data location. More specifically, we aim to increase the efficiency of the existing Pastry routing protocols by implementing a Bloom filter to enable quick data location; implementing piggybacking of network configuration messages on message routing similar to Kademlia; and building applications over the top of Pastry to assess the new algorithms created, while keeping in mind the problems associated with peer-to-peer networks like scalability, robustness and adaptability. 3.1 Problems with peer-to-peer networks 3.1.1 Scalability The biggest problem associated with peer-to-peer networks is their efficient scalability. To be efficient, peer-to-peer overlay networks must have fast data location and routing mechanisms, in order to be practical. The two phase process that OceanStore employs in the use of Attenuated Bloom filters locating objects close to the requesting node, help this cause. This optimizes the location and routing of the most commonly requested objects. Coupled with OceanStore’s migrating cache mechanism, OceanStore allows for very efficient location and routing mechanisms for commonly accessed data. The use of Bloom filters will be evaluated in conjunction with Pastry’s routing mechanisms in this project, to see if it offers any gains in efficiency. Also, the use of Kademlia’s piggybacking mechanisms reduces the amount of communication needed in order to maintain consistent and current routing table data in Kademlia. This mechanism could be incorporated into Pastry in order to reduce the amount of communication messages needed. Another factor related to scalability is routing table sizes for each node. Each system discussed above limits the size of tables used, but an interesting factor to be evaluated could be the size of routing table required by Pastry in order to ensure fault tolerance and resistance to attacks by malicious nodes, while still being scalable. 3.1.2 Network robustness and fault tolerance As mentioned above, a peer-to-peer network must also ensure high availability to addressable components within the overlay even though its underlay is inherently unreliable. The network must make the possibility of partitioning unlikely. However, in 6 the event of such network partitions, efficient algorithms must be employed to detect and correct partitions. The fact that nodeIds are uniformly distributed greatly enhances the robustness of the network in terms of nodes failing and network’s partitioning. The delivery of a message is guaranteed unless L/2 nodes with consecutive nodeIds simultaneously fail. The chances of this happening are slim, because of the uniform distribution of nodeIds reduces the impact of localized outages or malicious actions on nodes within the network. This is because nodes of adjacent nodeId are not likely to be adjacent in the underlying physical network due to the uniform distribution of nodeId assignment. However, the possibility of L/2 nodes failing is dependent on the size of L, the leaf set. 3.1.3 Network adaptability Note that with each routing step, the number of nodes that could be the message’s destination node is exponentially decreasing. This implies the expected distance between the nodes in routing is exponentially increasing [4]. This property allows quick route convergence between two distinct nodes routing messages to a single destination. This allows the efficient spreading and access of cached data, as data can be cached on common convergence route points. Also accessing the cached data at the convergence points along the route avoids the typically longest routing hops that go directly to the destination. The identification of common convergence points autonomously by the network, thereby identify ideal sites for object replication, could also be explored by using Kademlia’s piggybacking mechanism. 3.2 Project plan The stages of this project are defined as: -Install and experiment with the open source version of Pastry, called FreePastry implemented by Rice University, Houston, USA. -Experiment with the size of the Pastry routing table and leaf set nodes to evaluate the propensity for the network to partition, and to determine the number of messages needed to rejoin a partitioned network. -Implement an Attenuated bloom filter, or a variation of Bloom filters, to enable faster data location and routing in the Pastry network. -Implement piggybacking of node configuration and availability on the routing of ordinary Pastry messages, and reevaluate the required messages for a network to rejoin. -The building of applications to evaluate the changes made to the Pastry network, a simple distributed, persistent store network 4. Conclusions Peer-to-peer networks are interesting because of the potential to exploit the vast reserves of computing power and memory not being fully utilized in the Internet, spanning the 7 globe. The ability to efficiently utilize these reserves lies in the efficiency of the routing and addressing protocols used in the network. In conclusion, I hope to investigate the performance of Pastry’s addressing and routing properties and protocols, and design and implement alternative addressing and routing mechanisms from similar areas of distributed computing discussed above, to evaluate their impact on Pastry’s performance. I also aim to investigate the robustness properties of the network in the face of node failures, and the efficiency with which a partitioned network can rejoin. 5. References [1] John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patric Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, and Ben Zhao. OceanStore: An Architecture for Global-Scale Persistent Storage. University of California, Berkeley, 2000. [2] Petar Maymounkov and David Mazieres. Kademlia: A Peer-to-peer Information System Based on the XOR Metric. New York University, Secure computer systems group, 2002. [3] Roberto Rinaldi and Marcel Waldvogel. Routing and Data Location in Overlay Peer-to-Peer Networks. Research Report. IBM Research, Zurich Research Laboratory, 8803 Ruschlikon, Switzerland. July 2002. [4] Antony Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. Appears in Proc. Of the 18th IFIP/ACM International Conference on Distributed Systems Platforms (Middleware 2001). Heidelberg, Germany, November 2001. 8