Choosing Security Mechanisms in Peer-to

advertisement

A Study of Discovery Mechanisms For Peer-to-Peer Applications

Mandar Kelaskar 2 Vincent Matossian 3 Preeti Mehra 4 Dennis Paul 5 Anand Vaidhyanathan 6 and Manish Parashar 1

Department of Electrical and Computer Engineering

Rutgers University

94 Brett Road

Piscataway, NJ 08854-8058 USA kelaskar

2

@paul.rutgers.edu {parashar

1,

vincentm

3

, mehra

4

, dennisp

5

, anandv

6

} @ece.rutgers.edu

Abstract

Peer-to-peer applications allow peers to connect or disconnect from a network at any time and are based on a loosely coupled resource distribution model. In this context, robust and efficient discovery mechanisms are mechanisms such as flooding and forward routing algorithms as implemented in Chord [1], Pastry [2] and

Content-Addressable Network (CAN [3]). Finally, we will match these discovery mechanisms with the requirements of the application categories studied.

2. Application Categories

We categorize peer-to-peer applications into three sets, namely, distributed file sharing systems, person-to-person central to the efficient functioning of such applications.

In this paper, we will evaluate several discovery mechanisms (flooding and forward routing algorithms such as CHORD [1], Pastry [2] and CAN [3]) against a canonical set of cardinal requirements for three prevalent categories of peer-to-peer applications and will investigate the suitability of these mechanisms for one or more applications categories. Each algorithm will be implemented and evaluated in a simulated network graph; the results will be presented and summarized in tables matching application requirements sets with appropriate discovery mechanisms.

Keywords : Discovery, content location, P2P, peer-topeer, CHORD, CAN, Pastry, flooding, forward routing

1. Introduction

With the continuing integration of the Internet with

Technology and Business Commerce, there has been an unprecedented growth of network-based services that must deliver to an ever-growing number of consumers anywhere and anytime. Large-scale distributed systems form the foundation of modern network-based services.

Peer-to-Peer systems, which provide network-based services, are examples of such distributed systems. In a peer-to-peer network, peers need mechanisms to discover other peers, agree on the type of communication and negotiate the transfer of information before ending a connection Typically, such systems are composed of a very large number of heterogeneous platforms and consumers of resources, making the discovery of resources complex. Centralized discovery mechanisms do not scale well in very large and completely distributed peer-to-peer systems and present a single point of failure.

In this paper, we will first investigate the problem of discovery in a decentralized system and specifically the requirements of three representative peer-to-peer application categories. We will then study and evaluate messaging systems and distributed computing systems.

This section summarizes the requirements of representative applications.

2.1. Distributed File Sharing

Distributed file sharing applications such as the popular Freenet, Lime Wire or Morpheus allow peers to share files with every other peer of a group in an application-based virtual network. A distributed file sharing system can accept relatively slow discovery mechanisms as long as it returns one or more target nodes with optimal transfer-time.

2.2 Person-to-Person Messaging

Instant messaging systems such as Jabber or Yahoo!

Messenger allow peers to exchange text as well as whiteboard type of messages. A person-to-person messaging system requires fast and unique discovery of peers and a fast update of the list of online users in order to maintain a reliable sense of online presence.

2.3 Distributed Computing

In a peer-to-peer distributed computing system (e.g.

SETI@home ), a node gathers results from the computations on raw information that was scattered across several tightly or loosely coupled processors. The same information might be distributed to several processors for result validation; as in file sharing systems.

The discovery mechanism for distributed computing needs to return a reference to all the processors working on similar raw information.

3. Discovery approaches

Among the various possible approaches to discovery, such as centralized and distributed directories, flooding, selective routing and request forwarding based on hashed indexes, we will focus in this study on the flooding and request forwarding approaches because of their inherent decentralized nature.

3.1. Flooding Protocols

Flooding protocols are decentralized peer-to-peer discovery protocols. One such flooding protocol, Gnutella

[4], has been implemented in the file-sharing application

Lime Wire. Peers form an overlay network over the physical Internet. To discover a resource a peer floods this overlay network with queries. Remote peers providing the resource hear this query through gossip and respond to the inquiring peer. In practice, flooding is limited to peers’ neighborhoods by limiting query propagation to a fixed number of hops, also called Time-To-Live (TTL) in the

Gnutella protocol. Robustness and an extensive scope of discovery make flooding protocols attractive discovery mechanisms. Although the flooding protocol might give optimal results in a network with a small to average number of peers, it does not scale well. Furthermore, accurate discovery of peers is not guaranteed in flooding mechanisms.

3.2. Chord

The Chord [1] project is the part of an ongoing large distributed secure file system project (self-certifying file system) that looks at the key location and routing in a virtual network represented as a one-dimensional circular identifier space. Each key can be located in O (lgN) hops where N is the total number of nodes in the system. The guarantee of finding the desired node makes Chord a very interesting algorithm to evaluate in our study.

3.3. Pastry

Pastry is a decentralized application-level object location and routing infrastructure. Pastry uses a message consisting of a key to route the message in a network of N nodes. Each node is identified in the network by a unique identifier (NodeID). Identifiers are assigned by taking into account network proximity. Each message hop corresponds to the node id present in the routing table, which is closest to the key of the message. It does this efficiently in O (lgN) steps. Every node keeps track of the live nodes and its neighbors and adapts to the arrival, departure and failure of nodes by informing other nodes in the network. Pastry is a generic self-organizing routing algorithm that can be used for storage, data sharing, group communication and naming systems.

3.4. Content-Addressable-Networks

A Content - Addressable Network [3] is a mesh of n nodes in a virtual d -dimensional coordinate space. This virtual coordinate space is dynamically partitioned among all the nodes such that every node owns its distinct zone within this space. The coordinate space is used to store

( key, value ) pairs such that every key k is deterministically mapped onto a point P in the coordinate space using a uniform hash function. The corresponding key-value pair is stored at the node in whose zone point P lies. To retrieve a value corresponding to key k, any node can apply the same deterministic hash function to map k onto point P and then retrieve the value from point P either directly or via neighboring nodes in O (n 1/d ).

According to its authors, the CAN design is fault-tolerant, robust, and self-organizing and provides a more scalable alternative to Flooding.

4. Evaluation Approach

The discovery mechanisms described in section 3 will be implemented in a simulated network environment. The algorithms will be evaluated on the basis of normalized performance metrics such as Time-To-Discover, Scope

(percentage of relevant information in the network actually searched), Accuracy, and Fault-tolerance.

Experiments will be set up to measure/compute these metrics for each discovery mechanism. Finally, the performance metrics values for all the discovery mechanisms will be used to compare their performances and to assess their suitability for one or more application categories.

5. Bibliography

[1] Ion Stoica, Robert Morris, David Karger, M. Frans

Kaashoek, and Hari Balakrishnan, Chord: A Scalable

Peer-to-peer Lookup Service for Internet Applications ,

MIT

[2] Antony Rowstron and Peter Druschel; Pastry:

Scalable, distributed object location and routing for large-scale peer-to-peer systems

[3] Sylvia Ratnasamy, Paul Francis, Mark Handley,

Richard Karp, Scott Shenker; A Scalable Content-

Addressable Network

[4] The Gnutella Protocol Specification v0.4 Revision 1.2

Download