1. Introduction

advertisement
An Exploration of Algorithms Used in Message Routing and
Node Addressing in Peer-to-Peer Networks
Abstract
A peer-to-peer network is a network with completely decentralized control. Each node in
the network assumes equal responsibilities and capabilities. Because of this, peer-to-peer
networks have the potential to significantly reduce the impact of DoS attacks, as well as
the ability to self organize and adapt while still being scalable. These properties are
possible via certain addressing and routing algorithms and properties inherent in the
network. In this project I aim to design, implement and analyze alternative addressing
and routing models for Pastry, a scalable distributed object location and routing substrate
for peer-to-peer applications; and to evaluate the efficiency of these algorithms in regards
to the network’s scalability, robustness and adaptability.
Author: Adrian Gasparini
Date: 20th August 2003
Supervisors: Katrina Falkner and Dave Munro
1
1. Introduction
It has been estimated that among all the Internet connected PC’s in the world, there exists
an “aggregate ten billion MHz of processing power and ten thousand terabytes of storage,
assuming only 100 million PC’s among the net’s 300 million users, and only a 100MHz
chip and 100 megabyte drive in the average PC.” [3] The potential to exploit this resource
is currently limited by the scalability of any attempt to harness this resource. With
Internet routing tables growing steadily since the Internet’s birth, attention has been
focused recently on peer-to-peer networks, which provide a novel way to reduce the
amount of global knowledge required in such networks.
A peer-to-peer network is a completely decentralized network. Each node in such a
network has equal responsibilities and capabilities. Each node behaves like both a client
and a server, thus reducing the network’s dependence on the availability on centralized
servers to function. The peer-to-peer network is considered an overlay network, over
which underlying nodes are untrusted and are assumed to fail silently, and to act
maliciously. For this reason, much data within the overlay peer-to-peer network is
replicated to ensure availability.
The possibilities for exploitation of the benefits of peer-to-peer networks are huge. One
such benefit is the global collaboration of information between interested parties. For
example, in the development of a project or research effort, which spans the globe, a
large scale peer-to-peer distributed persistent storage application like OceanStore [1]
enables groups from different geographic locations access to the most current data.
Peer-to-peer networks also allow the utilization of spare computer resources in an
organization’s network. For organizations with large scale computing needs, peer-to-peer
networks allow an organization to distribute their large computational jobs over multiple
computers. This enables an organization to take advantage of idle computer resources
existing in their current infrastructure. An example would be sending a large computation
over a peer-to-peer network spanning the globe, such that the section of the peer-to-peer
network involved in the computation is operating between the hours of 8pm and 8am,
when computing resources in a corporate network are largely left idle. This results in the
large computation not degrading the networks and computing resources of the
organization during more productive times of the day, between 9am to 5pm.
The efficiency and scalability of a peer-to-peer overlay application to achieve persistent
storage and computation relies on the design, implementation and locality properties of
the network and various addressing and routing protocols needed in order to route
messages quickly and efficiently.
There are many different families of routing protocols that exist today. The most widely
known family are those protocols based on Plaxton’s algorithm for distributed searching
[3]. These routing algorithms tend to be slow but reliable. Another family of routing
algorithms implements probabilistic methods in data location. An example is the use of
Attenuated Bloom filters in the initial stage of data location in the OceanStore system [1].
These algorithms are quick in locating data, but suffer from a certain percentage of false
positives, making them unreliable. However, all peer-to-peer routing protocols have the
2
same aim, which is to efficiently route messages between nodes over an untrusted
underlay network such as the Internet, reliably.
The underlying peer-to-peer network infrastructure is difficult to abstract over.
Underlying nodes are not trusted, and can be separated by many thousands of kilometers.
The part that routing protocols play is integral to the viability of the system. The routing
protocols must deal with three fundamental problems of peer-to-peer networks, which are
scalability, robustness and adaptability. In this project, we will be looking at designing
alternative routing mechanisms for the Pastry system, aiming to improve their efficiency,
specifically examining issues like scalability, robustness and adaptability.
2. Literature Review
2.1 Historical Background
The interest in peer-to-peer networks began with the creation of Napster, a peer-to-peer
network overlay enabling users of Napster to share files between each other. Napster
attracted a lot of attention, but for all the wrong reasons. Users of Napster traded music
files between each other, which raised copyright issues in regards to the music that was
exchanged. Napster also was not a true peer-to-peer network because it was not
completely decentralized. File exchange occurred between peers but only after a file/peer
lookup at a centralized server where file indexes were stored [3]. Other music swapping
peer-to-peer network overlays which are completely decentralized have been created
since Napster, like Gnutella and FreeNet, but these overlays are not without their
problems also.
Since Napster, the distributed computing community has been actively researching
possible applications using peer-to-peer networks. Currently, many applications have
been created. In the following paragraphs, I will describe and categorize some of these
systems and applications that have different routing protocols than Pastry.
2.2.1 Pastry: Based on the Plaxton routing mechanism:
Pastry [4] is a member of the Plaxton family of algorithms. Pastry is a self-organizing
overlay of network nodes that routes messages from node to node based on a nodeId and
a key. The nodeId is a 128-bit node identifier, and is assigned randomly when a node
joins the system. The nodeId’s assigned to nodes when entering the overlay are assigned
randomly with a uniform distribution, and can be based on a cryptographic hash of the
node’s public key or IP address. The random assignment of nodeId’s in this way ensure
such properties as adjacent nodeId nodes being diverse geographically, as well as diverse
in ownership and jurisdiction, etc. This property is needed in order to ensure the
availability of the network in times of proximity related failures.
Each Pastry node stores some state on the network within its area. Each node stores a leaf
set, L, of nodes that are the set of L closest nodes in regards to the numeric nodeId of the
current Pastry node. Half of the leaf set have nodeId’s larger than the node. Each Pastry
node also stores a routing table of nodes, to enable more efficient routing. A node’s
routing table consists of the closest node’s IP address matched to its nodeId, of which the
current Pastry node shares a prefix to it’s own nodeId. Each Pastry node also stores a set
of its M closest nodes based on some sort of proximity metric, like round-trip-time or IP
3
hops. These nodes are not generally used in routing, but are useful in maintaining locality
properties.
2.2.2Pastry routing procedure
The Pastry routing procedure is as follows [4]:
Given a message, a node first tries to route the message to its leaf set. If the key is not
represented in the leaf set, the node then tries to forward the message to an entry in its
routing table that shares a common prefix with the key with at least 1 more digit in
common than the current node’s nodeId. If there is no such routing table entry, the node
then forwards the message to a node that has a common prefix at least as long as the
current key, but is numerically closer according to some proximity metric.
This method for routing gives an expected number of routing steps as O {log2^b N},
where b is a configuration parameter. This is because each three cases of the routing
procedure forward the message closer to the destination. In step 1, where the message is
routed to a leaf node, the message is within 1 hop from its destination, as nodes are in
direct contact with their leaf sets. In step 2 where the routing table is used, each forward
via the routing table results in the elimination of 2^b nodes as destinations. Therefore,
using the routing table, the destination is reached in O {log2^b N} steps. In step 3, the
message is more than 1 hop away, as the node with the desired prefix is not known to the
current node. It indicates the node has failed or doesn’t exist. This case is rare because of
the uniform distribution of nodeIds reduces the possibility of the network partitioning.
2.3 OceanStore: a probabilistic based routing mechanism:
OceanStore [1] is a distributed peer-to-peer persistent storage application. Each persistent
object in OceanStore is associated with at least one globally unique identifier [1]. These
persistent objects are replicated on multiple servers to enhance availability of the object
in the face of network attacks, failures, etc. These objects are free to reside on any servers
within the OceanStore overlay. To support location independent addressing, OceanStore
messages are labeled with a destination GUID, a random number, and a small predicate.
The goal of routing a message in OceanStore is to route the message to the closest node
that matches the predicate and has the desired GUID. The process of routing messages to
a destination has two phases; the first phase is the routing of the message from peer to
peer until a destination is discovered; then messages are routed directly from the source
to the destination.
This method of data location and routing has a number of advantages. The location of the
destination using multiple nodes enables fault tolerant routing, lessening the impact of
malicious or failed nodes on the availability of objects. Also, once the destination has
been found, messages are routed directly to the destination, possibly cutting down on the
number of hops per message.
Routing in OceanStore is based on two mechanisms. A fast probabilistic algorithm using
Attenuated Bloom filters attempt to locate objects near the node initiating contact with
the object. This probabilistic algorithm is backed up by a slower but more reliable
hierarchical routing mechanism, based on Plaxton’s Algorithm [3]. The reasoning behind
the two mechanisms are that objects used frequently by a node are most likely to be
cached near the node. In this case, the fast probabilistic algorithm quickly locates the
4
object. In the event where the faster probabilistic method fails to locate the object, the
slower but reliable hierarchical algorithm is employed.
The probabilistic algorithm uses Attenuated Bloom Filters in order to locate objects [1]. It
is based on a hill-climbing algorithm. The basic idea is if a node cannot satisfy a query,
local information is used to route the query to a likely neighbor who might be able to
answer the query. An Attenuated Bloom filter is used in deciding the likely neighbour.
The hierarchical routing method used is a randomized hierarchical distributed data
structure, like that used in Pastry, in which multiple trees are embedded in the network.
Each tree symbolizes a directory structure that is specified by the clients of the system.
Each tree has a root node. The links between the root node and sub-directories are
specified by differences in GUID’s for different nodes. Each object is mapped to a single
node whose nodeId matches the object’s GUID in the most bits, beginning with the least
significant. This is the object’s root node. The routing of a message then is a case of
climbing the tree of the initiating node until the desired object’s root node (or a pointer to
a replica) is encountered, and then routing directly to the object. Because of this routing
data structure, routing is achieved in O{log N} steps. [1]
2.4 Kademlia: A simple routing mechanism based on the XOR metric.
Another routing protocol we will be looking at is the protocol employed in Kademlia [2].
The protocol employed in Kademlia is like Pastry in that it is hierarchical, but it reduces
the amount of configuration messages nodes must send to learn about each other. Any
configuration information needed by nodes on their surroundings is gathered from key
lookups and query resolution via piggybacking.
This is done by a node including its nodeId with every message it transmits, so that
recipients of the message can record the node’s existence if not previously recorded.
Kademlia uses an XOR metric for distance between points in the key space.
Kademlia nodes consist of {IP_address, UDP_port, Node_id} triples for nodes of
distance between 2^i. and 2^i.+1 from itself. These lists are called k-buckets. Each kbucket is kept sorted in the order of least recently seen. The lists are limited to size k.
When a Kademlia node receives any message from another node, it updates the
appropriate k-bucket for the sender’s nodeId. This update consists of a least recently used
reordering, if the node already exists in the k-bucket; addition to the k-bucket if the kbucket is not full; or a PING message to the least recently seen node to decide whether to
evict that node’s entry from the k-bucket or not. If there is no response, the entry is
evicted. Otherwise, the least recently seen node is reordered, and the sender’s entry is
discarded.
The Kademlia protocol consists of four RPC’s: PING, which probes a node to see if it is
online; STORE, instructs a node to store a {key, value} pair; FIND_NODE, takes a key
as an argument and returns a triplet entry of the node for a k-bucket by querying the
network; and FIND_VALUE, which behaves like FIND_NODE, except if the target has
been stored, it just returns the value.
In Kademlia, the need for configuration messages is not needed. The k-buckets are kept
up-to-date by the piggybacking of node existence on each message being routed. This
property allows the detection of failed nodes without resorting to network timeouts. The
5
actual routing mechanism is similar to Pastry in that it is hierarchical, except it is based
on a different distance metric.
Pastry requires O {log2^b N} messages for a node joining the overlay, constant keep-alive
messages between nodes in each other’s leaf sets, and further communication and timeout
delays in the face of node failures [4]. By employing some aspects of Kademlia’s routing
mechanisms in Pastry, we propose the amount of configuration messages needed in
Pastry could be reduced.
3. Project Proposal
The aim for this project is to increase the efficiency of Pastry’s existing protocols, while
keeping in mind issues such as network scalability, robustness in the face of node
failures, and the ability of the network to adapt to patterns of data location.
More specifically, we aim to increase the efficiency of the existing Pastry routing
protocols by implementing a Bloom filter to enable quick data location; implementing
piggybacking of network configuration messages on message routing similar to
Kademlia; and building applications over the top of Pastry to assess the new algorithms
created, while keeping in mind the problems associated with peer-to-peer networks like
scalability, robustness and adaptability.
3.1 Problems with peer-to-peer networks
3.1.1 Scalability
The biggest problem associated with peer-to-peer networks is their efficient scalability.
To be efficient, peer-to-peer overlay networks must have fast data location and routing
mechanisms, in order to be practical. The two phase process that OceanStore employs in
the use of Attenuated Bloom filters locating objects close to the requesting node, help this
cause. This optimizes the location and routing of the most commonly requested objects.
Coupled with OceanStore’s migrating cache mechanism, OceanStore allows for very
efficient location and routing mechanisms for commonly accessed data. The use of
Bloom filters will be evaluated in conjunction with Pastry’s routing mechanisms in this
project, to see if it offers any gains in efficiency. Also, the use of Kademlia’s
piggybacking mechanisms reduces the amount of communication needed in order to
maintain consistent and current routing table data in Kademlia. This mechanism could be
incorporated into Pastry in order to reduce the amount of communication messages
needed.
Another factor related to scalability is routing table sizes for each node. Each system
discussed above limits the size of tables used, but an interesting factor to be evaluated
could be the size of routing table required by Pastry in order to ensure fault tolerance and
resistance to attacks by malicious nodes, while still being scalable.
3.1.2 Network robustness and fault tolerance
As mentioned above, a peer-to-peer network must also ensure high availability to
addressable components within the overlay even though its underlay is inherently
unreliable. The network must make the possibility of partitioning unlikely. However, in
6
the event of such network partitions, efficient algorithms must be employed to detect and
correct partitions.
The fact that nodeIds are uniformly distributed greatly enhances the robustness of the
network in terms of nodes failing and network’s partitioning. The delivery of a message
is guaranteed unless L/2 nodes with consecutive nodeIds simultaneously fail. The
chances of this happening are slim, because of the uniform distribution of nodeIds
reduces the impact of localized outages or malicious actions on nodes within the network.
This is because nodes of adjacent nodeId are not likely to be adjacent in the underlying
physical network due to the uniform distribution of nodeId assignment. However, the
possibility of L/2 nodes failing is dependent on the size of L, the leaf set.
3.1.3 Network adaptability
Note that with each routing step, the number of nodes that could be the message’s
destination node is exponentially decreasing. This implies the expected distance between
the nodes in routing is exponentially increasing [4]. This property allows quick route
convergence between two distinct nodes routing messages to a single destination. This
allows the efficient spreading and access of cached data, as data can be cached on
common convergence route points. Also accessing the cached data at the convergence
points along the route avoids the typically longest routing hops that go directly to the
destination. The identification of common convergence points autonomously by the
network, thereby identify ideal sites for object replication, could also be explored by
using Kademlia’s piggybacking mechanism.
3.2 Project plan
The stages of this project are defined as:
-Install and experiment with the open source version of Pastry, called FreePastry
implemented by Rice University, Houston, USA.
-Experiment with the size of the Pastry routing table and leaf set nodes to evaluate
the propensity for the network to partition, and to determine the number of
messages needed to rejoin a partitioned network.
-Implement an Attenuated bloom filter, or a variation of Bloom filters, to enable
faster data location and routing in the Pastry network.
-Implement piggybacking of node configuration and availability on the routing of
ordinary Pastry messages, and reevaluate the required messages for a network to
rejoin.
-The building of applications to evaluate the changes made to the Pastry network,
a simple distributed, persistent store network
4. Conclusions
Peer-to-peer networks are interesting because of the potential to exploit the vast reserves
of computing power and memory not being fully utilized in the Internet, spanning the
7
globe. The ability to efficiently utilize these reserves lies in the efficiency of the routing
and addressing protocols used in the network.
In conclusion, I hope to investigate the performance of Pastry’s addressing and routing
properties and protocols, and design and implement alternative addressing and routing
mechanisms from similar areas of distributed computing discussed above, to evaluate
their impact on Pastry’s performance.
I also aim to investigate the robustness properties of the network in the face of node
failures, and the efficiency with which a partitioned network can rejoin.
5. References
[1]
John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patric Eaton,
Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon,
Westley Weimer, Chris Wells, and Ben Zhao. OceanStore: An Architecture for
Global-Scale Persistent Storage. University of California, Berkeley, 2000.
[2]
Petar Maymounkov and David Mazieres. Kademlia: A Peer-to-peer Information
System Based on the XOR Metric. New York University, Secure computer
systems group, 2002.
[3]
Roberto Rinaldi and Marcel Waldvogel. Routing and Data Location in Overlay
Peer-to-Peer Networks. Research Report. IBM Research, Zurich Research
Laboratory, 8803 Ruschlikon, Switzerland. July 2002.
[4]
Antony Rowstron and Peter Druschel. Pastry: Scalable, decentralized object
location and routing for large-scale peer-to-peer systems. Appears in Proc. Of the
18th IFIP/ACM International Conference on Distributed Systems Platforms
(Middleware 2001). Heidelberg, Germany, November 2001.
8
Download