An Introduction to Peer-to-Peer networks Presentation for CSE620:Advanced networking

advertisement
An Introduction to
Peer-to-Peer networks
Presentation for
CSE620:Advanced networking
Anh Le
Nov. 4
Outline


Overview of P2P
Classification of P2P
 Unstructured P2P systems
 Napster (Centralized)
 Gnutella (Distributed)
 Kazaa/Fasttrack (Super-node)
 Structured
 Chord


P2P systems (DHTs):
YAPPERS (hybrid)
Conclusions
Oct. 4
Anh Le + Tuong Nguyen
2
What is P2P systems?

Clay Shirkey:


P2P refers to applications that take advantage of resources
(storage, cycles, content, human presence) available at the
edges of the internet
The “litmus test:”



Does it allow for variable connectivity and temporary network
addresses?
Does it give the nodes at the edges of the network significant
autonomy?
P2P Working Group (A Standardization Effort):
P2P computing is:


Oct. 4
The sharing of computer resources and services by direct
exchange between systems.
Peer-to-peer computing takes advantage of existing computing
power and networking connectivity, allowing economical clients
to leverage their collective power to benefit the entire enterprise.
Anh Le + Tuong Nguyen
3
What is P2P systems?
Multiple sites (at edge)
 Distributed resources
 Sites are autonomous (different owners)
 Sites are both clients and servers
(“servent”)
 Sites have equal functionality

Oct. 4
Anh Le + Tuong Nguyen
4
P2P benefits


Efficient use of resources
Scalability:



Reliability




Consumers of resources also donate resources
Aggregate resources grow naturally with utilization
Replicas
Geographic distribution
No single point of failure
Ease of administration



Oct. 4
Nodes self organize
No need to deploy servers to satisfy demand
Built-in fault tolerance, replication, and load balancing
Anh Le + Tuong Nguyen
5
Napster



was used primarily for file sharing
NOT a pure P2P network=> hybrid system
Ways of action:
 Client
sends server the query, server ask everyone
and responds to client
 Client gets list of clients from server
 All Clients send ID’s of the data they hold to the
server and when client asks for data, server responds
with specific addresses
 peer downloads directly from other peer(s)
Oct. 4
Anh Le + Tuong Nguyen
6
Napster

Further services:
 Chat
program, instant messaging service, tracking
program,…

Centralized system
 Single
point of failure => limited fault tolerance
 Limited scalability (server farms with load balancing)

Query is fast and upper bound for duration can
be given
Oct. 4
Anh Le + Tuong Nguyen
7
Napster
5
6
4
central DB
3
3. Download
Request
2. Response
1
1. Query
4. File
2
Peer
Oct. 4
Anh Le + Tuong Nguyen
8
Gnutella
pure peer-to-peer
 very simple protocol
 no routing "intelligence"
 Constrained broadcast

 Life-time
of packets limited by TTL (typically
set to 7)
 Packets have unique ids to detect loops
Oct. 4
Anh Le + Tuong Nguyen
9
Gnutella - PING/PONG
3
6
Ping 1
Ping 1
Pong 3 Pong 6,7,8
Pong 6,7,8
Pong 6
Ping 1
1
Known Hosts:
2
3,4,5
Pong 3,4,5
Pong 5
2
Ping 1
5
Pong 7
Ping 1
Ping 1
Pong 2
Ping 1
Pong 8
Pong 4
7
8
6,7,8
4
Oct. 4
Query/Response
analogous
Anh Le + Tuong Nguyen
10
Free riding


File sharing networks rely on users sharing data
Two types of free riding
 Downloading
but not sharing any data
 Not sharing any interesting data

On Gnutella
 15%
of users contribute 94% of content
 63% of users never responded to a query

Didn’t have “interesting” data
Oct. 4
Anh Le + Tuong Nguyen
11
Gnutella:summary








Hit rates are high
High fault tolerance
Adopts well and dynamically to changing peer
populations
High network traffic
No estimates on duration of queries
No probability for successful queries
Topology is unknown => algorithm cannot exploit it
Free riding is a problem



Oct. 4
A significant portion of Gnutella peers are free riders
Free riders are distributed evenly across domains
Often hosts share files nobody is interested in
Anh Le + Tuong Nguyen
12
Gnutella discussion

Search types:


Scalability





High, since many paths are explored
Autonomy:



Search very poor with respect to number of messages
Probably search time O(logn) due to small world property
Updates excellent: nothing to do
Routing information: low cost
Robustness


Any possible string comparison
Storage: no restriction, peers store the keys of their files
Routing: peers are target of all kind of requests
Global knowledge

Oct. 4
None required
Anh Le + Tuong Nguyen
13
iMesh, Kazaa


Hybrid of centralized Napster and
decentralized Gnutella
Super-peers act as local search
hubs

Each super-peer is similar to a
Napster server for a small portion of
the network
 Super-peers are automatically
chosen by the system based on
their capacities (storage,
bandwidth, etc.) and availability
(connection time)



Users upload their list of files to a
super-peer
Super-peers periodically exchange
file lists
Queries are sent to a super-peer for
files of interest
Oct. 4
Anh Le + Tuong Nguyen
14
Structured Overlay Networks / DHTs
Chord, Pastry, Tapestry, CAN,
Kademlia, P-Grid, Viceroy
Set of Nodes
Keys of Nodes
Common Identifier
Space
Connect
The nodes
Smartly
Keys of Values
…
Node Identifier
Value Identifier
Oct. 4
Anh Le + Tuong Nguyen
15
The Principle Of Distributed Hash Tables

A dynamic distribution of a hash table onto a set of cooperating
nodes
Key
Value
1
Algorithms
9
Routing
11
DS
12
Peer-to-Peer
21
Networks
22
Grids
node A
node B
node D
• Basic service: lookup operation
• Key resolution from any node
node C
→Node D : lookup(9)
• Each node has a routing table
• Pointers to some other nodes
• Typically, a constant or a logarithmic number of pointers
Oct. 4
Anh Le + Tuong Nguyen
16
DHT Desirable Properties
Keys mapped evenly to all nodes in the network
Each node maintains information about only a
few other nodes
Messages can be routed to a node efficiently
Node arrival/departures only affect a few nodes
Oct. 4
Anh Le + Tuong Nguyen
17
Chord [MIT]





Oct. 4
consistent hashing (SHA-1) assigns each
node and object an m-bit ID
IDs are ordered in an ID circle ranging from
0 – (2m-1).
New nodes assume slots in ID circle
according to their ID
Key k is assigned to first node whose ID ≥ k
 successor(k)
Anh Le + Tuong Nguyen
18
Consistent Hashing - Successor Nodes
identifier
node
6
1
0
successor(6) = 0
6
identifier
circle
6
5
2
2
successor(2) = 3
3
4
Oct. 4
key
successor(1) = 1
1
7
X
2
Anh Le + Tuong Nguyen
19
Consistent Hashing – Join and
Departure
When a node n joins the network, certain
keys previously assigned to n’s successor
now become assigned to n.
 When node n leaves the network, all of its
assigned keys are reassigned to n’s
successor.

Oct. 4
Anh Le + Tuong Nguyen
20
Consistent Hashing – Node Join
keys
5
7
keys
1
0
1
7
keys
6
2
5
3
keys
2
4
Oct. 4
Anh Le + Tuong Nguyen
21
Consistent Hashing – Node Dep.
keys
7
keys
1
0
1
7
keys
6
6
2
5
3
keys
2
4
Oct. 4
Anh Le + Tuong Nguyen
22
Scalable Key Location – Finger Tables

To accelerate lookups, Chord maintains additional
routing information.
This additional information is not essential for
correctness, which is achieved as long as each node
knows its correct successor.
Each node n’ maintains a routing table with up to m
entries (which is in fact the number of bits in identifiers),
called finger table.
The ith entry in the table at node n contains the
identity of
i-1
the first node s that succeeds n by at least 2 on the
identifier circle.
s = successor(n+2i-1).

s is called the ith finger of node n, denoted by n.finger(i)




Oct. 4
Anh Le + Tuong Nguyen
23
Scalable Key Location – Finger Tables
finger table
start
For.
0+20
0+21
0+22
1
2
4
1
6
1
3
0
0
1+2
1+21
1+22
2
3
5
succ.
keys
1
3
3
0
2
5
3
4
Oct. 4
succ.
finger table
For.
start
0
7
keys
6
finger table
For.
start
0
3+2
3+21
3+22
Anh Le + Tuong Nguyen
4
5
7
succ.
keys
2
0
0
0
24
Chord key location


Lookup in finger
table the furthest
node that
precedes key
-> O(log n) hops
Oct. 4
Anh Le + Tuong Nguyen
25
Node Joins and Stabilizations
The most important thing is the successor
pointer.
 If the successor pointer is ensured to be
up to date, which is sufficient to guarantee
correctness of lookups, then finger table
can always be verified.
 Each node runs a “stabilization” protocol
periodically in the background to update
successor pointer and finger table.

Oct. 4
Anh Le + Tuong Nguyen
26
Node Joins and Stabilizations

“Stabilization” protocol contains 6 functions:
 create()
 join()
 stabilize()
 notify()
 fix_fingers()
 check_predecessor()

When node n first starts, it calls n.join(n’), where
n’ is any known Chord node.
The join() function asks n’ to find the immediate
successor of n.

Oct. 4
Anh Le + Tuong Nguyen
27
Node Joins – stabilize()



Each time node n runs stabilize(), it asks its
successor for the it’s predecessor p, and
decides whether p should be n’s successor
instead.
stabilize() notifies node n’s successor of n’s
existence, giving the successor the chance to
change its predecessor to n.
The successor does this only if it knows of no
closer predecessor than n.
Oct. 4
Anh Le + Tuong Nguyen
28
Node Joins – Join and Stabilization

n joins


n runs stabilize
nil
succ(np) = n
n
succ(np) = ns

np
Oct. 4

pred(ns) = n
pred(ns) = np
ns


predecessor = nil
n acquires ns as successor via some
n’
n notifies ns being the new
predecessor
ns acquires n as its predecessor
np runs stabilize




np asks ns for its predecessor (now n)
np acquires n as its successor
np notifies n
n will acquire np as its predecessor

all predecessor and successor
pointers are now correct

fingers still need to be fixed, but old
fingers will still work
Anh Le + Tuong Nguyen
29
Node Failures

Key step in failure recovery is maintaining correct successor
pointers

To help achieve this, each node maintains a successor-list of its r
nearest successors on the ring

If node n notices that its successor has failed, it replaces it with
the first live entry in the list

Successor lists are stabilized as follows:
node n reconciles its list with its successor s by copying s’s
successor list, removing its last entry, and prepending s to
it.
 If node n notices that its successor has failed, it replaces it
with the first live entry in its successor list and reconciles
its successor list with its new successor.

Oct. 4
Anh Le + Tuong Nguyen
30
Handling failures: redundancy
Each node knows IP addresses of next r
nodes.
 Each key is replicated at next r nodes

Oct. 4
Anh Le + Tuong Nguyen
31
Chord – simulation result
[Stoica et al. Sigcomm2001]
Oct. 4
Anh Le + Tuong Nguyen
32
Chord – “failure” experiment
The fraction of lookups that fail as a function
of the fraction of nodes that fail.
Oct. 4
Anh Le + Tuong Nguyen
[Stoica et al. Sigcomm2001]
33
Chord discussion

Search types


Scalability




Replication might be used by storing replicas at successor nodes
Autonomy


Search O(logn)
Update requires search, thus O(logn)
Construction: O(log^2 n) if a new node joins
Robustness


Only equality, exact keys need to be known
Storage and routing: none
Global knowledge

Oct. 4
Mapping of IP addresses and data keys to key common key
space
Anh Le + Tuong Nguyen
34
YAPPERS: a P2P lookup service
over arbitrary topology

Motivation:
 Gnutella-style



work on arbitrary topology, flood for query
Robust but inefficient
Support for partial query, good for popular resources
 DHT-based



Systems
Systems
Efficient lookup but expensive maintenance
By nature, no support for partial query
Solution: Hybrid System
 Operate
on arbitrary topology
 Provide DHT-like search efficiency
Oct. 4
Anh Le + Tuong Nguyen
35
Design Goals

Impose no constraints on topology
 No

underlying structure for the overlay network
Optimize for partial lookups for popular keys
 Observation:
Many users are satisfied with partial
lookup

Contact only nodes that can contribute to the
search results
 no

blind flooding
Minimize the effect of topology changes
 Maintenance
Oct. 4
overhead is independent of system size
Anh Le + Tuong Nguyen
36
Basic Idea:
Keyspace is partitioned into a small
number of buckets. Each bucket
corresponds to a color.
 Each node is assigned a color.

#

of buckets = # of colors
Each node sends the <key, value>
pairs to the node with the same color
as the key within its Immediate
Neighborhood.
 IN(N): All
nodes within h hops from Node
N.
Oct. 4
Anh Le + Tuong Nguyen
37
Partition Nodes
Given any overlay, first partition nodes into
buckets (colors) based on hash of IP
Oct. 4
Anh Le + Tuong Nguyen
38
Partition Nodes (2)
Around each node, there is at least one
node of each color
X
Y
May require backup color assignments
Oct. 4
Anh Le + Tuong Nguyen
39
Register Content
Partition content space into buckets (colors)
and register pointer at “nearby” nodes.
Nodes around
Z form a small
hash table!
Z
register red
content locally
Oct. 4
Anh Le + Tuong Nguyen
register yellow
content at a
yellow node
40
Searching Content
Start at a “nearby” colored node, search
other nodes of the same color.
X
W
Y
U
V
Z
Oct. 4
Anh Le + Tuong Nguyen
41
Searching Content (2)
A smaller overlay for each color and use
Gnutella-style flood
Fan-out = degree of nodes in the smaller overlay
Oct. 4
Anh Le + Tuong Nguyen
42
More…

When node X is inserting <key, value>
 Multiple
nodes in IN(X) have the same color?
 No node in IN(X) has the same color as key k?

Solution:
 P1:
randomly select one
 P2: Backup scheme: Node with next color


Primary color (unique) & Secondary color (zero or
more)
Problems coming with this solution:
 No
longer consistent and stable
 The effect is isolated within the Immediate
neighborhood
Oct. 4
Anh Le + Tuong Nguyen
43
Extended Neighborhood


IN(A): Immediate Neighborhood
F(A): Frontier of Node A


All nodes that are directly connected to IN(A), but not in
IN(A)
EN(A): Extended Neighborhood
 The
union of IN(v) where v is in F(A)
 Actually EN(A) includes all nodes within 2h + 1 hops

Each node needs to maintain these three set of
nodes for query.
Oct. 4
Anh Le + Tuong Nguyen
44
The network state information for node A (h = 2)
Oct. 4
Anh Le + Tuong Nguyen
45
Searching with Extended
Neighborhood

Node A wants to look up a key k of color C(k), it
picks a node B with C(k) in IN(A)
 If
multiple nodes, randomly pick one
 If none, pick the backup node



B, using its EN(B), sends the request to all
nodes which are in color C(k).
The other nodes do the same thing as B.
Duplicate Message problem:
 Each
Oct. 4
node caches the unique query identifier.
Anh Le + Tuong Nguyen
46
More on Extended
Neighborhood
All <key, value> pairs are stored among
IN(X). (h hops from node X)
 Why each node needs to keep an EN(X)?
 Advantage:

 The
forwarding node is chosen based on
local knowledge
 Completeness: a query (C(k)) message can
reach all nodes in C(k) without touching any
nodes in other colors (Not including backup
node)
Oct. 4
Anh Le + Tuong Nguyen
47
Maintaining Topology

Edge Deletion: X-Y
 Deletion
message needs to be propagated to all
nodes that have X and Y in their EN set
 Necessary Adjustment:



Change IN, F, EN sets
Move <key, value> pairs if X/Y is in IN(A)
Edge Insertion:
 Insertion
message needs to include the neighbor info
 So other nodes can update their IN and EN sets
Oct. 4
Anh Le + Tuong Nguyen
48
Maintaining Topology

Node Departure:
a
node X with w edges is leaving
 Just like w edge deletion
 Neighbors of X initiates the propagation

Node Arrival: X joins the network
 Ask
its new neighbors for their current
topology view
 Build its own extended neighborhood
 Insert w edges.
Oct. 4
Anh Le + Tuong Nguyen
49
Problems with basic design

Fringe node:
 Those
low connectivity node allocates a
large number of secondary colors to its
high-connectivity neighbors.

Large fan-out:
 The
forwarding fan-out degree at A is
proportional to the size of F(A)
 This is desirable for partial lookup, but not
good for full lookup
Oct. 4
Anh Le + Tuong Nguyen
50
A is overloaded by secondary
colors from B, C, D, E
Oct. 4
Anh Le + Tuong Nguyen
51
Solutions:

Prune Fringe Nodes:
 If

the degree of a node is too small, find a proxy node.
Biased Backup Node Assignment:
X
assigns a secondary color to y only when
a * |IN(x)| > |IN(y)|

Reducing Forward Fan-out:
 Basic


Oct. 4
idea:
try backup node,
try common nodes
Anh Le + Tuong Nguyen
52
Experiment:



Oct. 4
H = 2 (1 too small, >2 EN too large)
Topology: Gnutella snapshot
Exp1: Search Efficiency
Anh Le + Tuong Nguyen
53
Distribution of colors per node
Oct. 4
Anh Le + Tuong Nguyen
54
Fan-out:
Oct. 4
Anh Le + Tuong Nguyen
55
Num of colors: effect on Search
Oct. 4
Anh Le + Tuong Nguyen
56
Num of colors: effect on Fan-out
Oct. 4
Anh Le + Tuong Nguyen
57
Conclusion and discussion



Each search only disturbs a small fraction of the
nodes in the overlay.
No restructure the overlay
Each node has only local knowledge
 scalable

Discussion:
 Hybrid
(unstructured and local DHT) system
…
Oct. 4
Anh Le + Tuong Nguyen
58
ZIGZAG and CAN
ZIGZAG: An Efficient Peer-to-Peer
Scheme for Media Streaming
CAN: Content-Addressable Network
TUONG NGUYEN
ZIGZAG: An Efficient
Peer-to-Peer Scheme for
Media Streaming
INFOCOM 2003
Outline for ZIGZAG





Main problem
Sub Problem
Proposed Solution
 Structure and protocol
 Dynamic Issue ( node join/leave)
Performance Optimization
Performance Evaluation
Oct. 4
Anh Le + Tuong Nguyen
61
Main Problem
Streaming live bandwidth-intensive media from a single
source to a large quantity of receivers on the Internet.
Solution:

An individual connection to stream the content to
each receiver
 IP multicast
 A new technique P2P-based called ZIGZAG
Oct. 4
Anh Le + Tuong Nguyen
62
Main Idea of ZIGZAG



ZIGZAG distribute media content to many clients by
organizing them into an appropriate tree.
This tree routed at the server and including all and only
the receivers.
The subset of receivers get content directly from source
and others get it from the receivers in the upstream.
What ‘s the problem with this technique??
Oct. 4
Anh Le + Tuong Nguyen
63
Sub-problem

High end-to-end delay: have to go through
intermediate nodes

Behavior of receivers are unpredictable:
dynamic nature of P2P network

Efficient use of network resources: nodes
have different bandwidth..
Oct. 4
Anh Le + Tuong Nguyen
64
Proposed Solution





Administrative organization
 Logical relationships among the peers
The multicast tree
 Physical relationships among the peers
The control protocol
 Peers exchange state information
A client join/departure
Performance Optimization
Oct. 4
Anh Le + Tuong Nguyen
65
Administrative Design
Cluster divided rules:
•Layer 0 contains all peers.
• Peers in layer j < H− 1 are
partitioned into clusters of sizes
in [k, 3k]. Layer H −1 has only one
cluster which has a size in [2, 3k].
• A peer in a cluster at layer j < H is
selected to be the head of that cluster.
This head becomes a member of layer
j + 1 if j < H − 1. The server S is the
head of any cluster it belongs to.
Oct. 4
Anh Le + Tuong Nguyen
66
Administrative design (cont.)


H = Ө(logkN) where N = # of peers
Any peer at a layer j>0 must be the head of the
cluster it belongs to at every lower layer
Oct. 4
Anh Le + Tuong Nguyen
67
Connectivity Design
Some important terms:




Subordinate: Non-head peers of a cluster headed by a peer X are called
subordinate” of X.
Foreign head: A non-head (or server) clustermate of a peer X at layer j > 0 is called a
“foreign head” of layer-(j-1) subordinates of X.
Foreign subordinate: Layer-(j-1) subordinates of X are called “foreign subordinates” of
any layer-j clustermate of X.
Foreign cluster: The layer-(j-1) cluster of X is called a “foreign cluster” any layer-j
clustermate of X.
Oct. 4
Anh Le + Tuong Nguyen
68
Multicast Tree
Rules which the multicast tree must be confined
(1) A peer, when not at its highest layer, cannot have any link to or from any
other peer. (peer 4 at layer 1)
(2) A peer, when at its highest layer, can only link to its foreign subordinates.
The only exception is the server; at the highest layer, the server links to
each of its subordinates (peer 4 in layer 2 )
(3) At layer j<H-1:non-head members of a cluster get the content directly from
a foreign head (peer 1 2 3 )
Oct. 4
Anh Le + Tuong Nguyen
69
Multicast Tree (cont.)

The worst-case node degree of the multicast tree is O(k2)

The height of the multicast tree is O(logkN) where N = # of peers
Oct. 4
Anh Le + Tuong Nguyen
70
Key idea of protocol
Using foreign head to connect instead of the head
(ZIGZAG).
Main Benefits:

Much better node degree: suppose the highest layer
of a node X is j, X would have links to its subordinates at
each layer, j-1, j-2, ..., 0, that it belongs to. Since j can be
H - 1, the worst-case node degree would be H × (3k - 1)
= Ω(logkN).
Oct. 4
Anh Le + Tuong Nguyen
71
Control protocol



Each node X in a layer-j cluster periodically communicates with its
layer-j clustermates, its children and parent on the multicast tree
For peers within a cluster, the exchanged information is just the peer
degree
If the recipient is the cluster head, X also sends a list L =
{[X1,d1],[X2,d2],..}, where [Xi,di] represents that X is currently
forwarding the content to di peers in the foreign cluster whose head
is Xi
Oct. 4
Anh Le + Tuong Nguyen
72
Control protocol (cont.)

If the recipient is the parent, X instead sends the following
information:
 A Boolean flag Reachable(X): true iff there exists a path from
X to a layer-0 peer (Reachable(7)=false Reachable(4)=true )
 A Boolean flag Addable(X): true iff there exists a path from X
to a layer-0 peer whose cluster’s size is in [k,3k-1]

Although the worst-case control overhead of a node is O(k*logkN),
the amortized worst-case overhead is O(k)
Oct. 4
Anh Le + Tuong Nguyen
73
Client Join
If the administrative has one layer, new Client P connects to S
 D(Y) denotes the currently end-to-end delay from the server
observed by a peer Y
 d(Y,P) is the delay from Y to P measured during the contact
between Y to P measured
1. If X is a leaf
2.
Add P to the only cluster of X
3.
Make P a new child of the parent of X
4. Else
5.
If Addable(X)
6.
Select a child Y :
Addable(Y ) and D(Y )+d(Y , P) is min
7.
Forward the join request to Y
8.
Else
9.
Select a child Y :
Reachable(Y ) and D(Y )+d(Y , P) is min
10.
Forward the join request to Y

Oct. 4
Anh Le + Tuong Nguyen
74
Client Join (cont.)

The join overhead is O(logkN) in terms of
number of nodes to contact
If the number of node in a cluster is larger than 3k
then split
 The worst-case split overhead is O(k2)
Oct. 4
Anh Le + Tuong Nguyen
75
Client Departure





A peer X who departs
If X’s highest layer is layer 0, no further overhead emerges.
Suppose that X’s highest layer is j>0
For each layer-(j-1) cluster whose non-head members are children
of X, the head Y of the cluster is responsible for finding a new
parent for them.
Y selects Z, a layer-j non-head clustermate, that has the minimum
degree
Oct. 4
Anh Le + Tuong Nguyen
76
Client Departure (cont.)



Furthermore, since X used to be the head of j clusters at layers
0, 1,…,j-1
Let X’ be a random subordinate of X at layer 0
X’ will replace X as the new head for each of those clusters
Comment: In the
worst case, the
number of peers that
need to reconnect due
to a failure is O(k2)
Oct. 4
Anh Le + Tuong Nguyen
77
Client Departure - merge
If a cluster become undersize then we can
use merge procedure.
 The merge procedure is called periodically
to reduce overhead

Oct. 4
Anh Le + Tuong Nguyen
78
Performance Optimization
If a peer X, in its highest-layer cluster j > 0, is busy
serving many children. It might consider switching its
parenthood of some children to another non-head
clustermate which is less busy.
Two main methods:
 Degree-based Switch
 Capacity-based Switch
Oct. 4
Anh Le + Tuong Nguyen
79
Performance Evaluation
Use the GT-ITM Generator to create a
3240-node transit-stub graph as our
underlying network topology
 2000 clients located randomly
 K=5

Oct. 4
Anh Le + Tuong Nguyen
80
Join Evaluation
Comment: The join-overhead curve would continue going up slowly as more
clients join until a constant point when it would fall down to a very low value). This
behavior would repeat, making the join algorithm scalable with the client population
Oct. 4
Anh Le + Tuong Nguyen
81
Degree and Control Overhead
Evaluation
Comments:
1.The node degrees in a ZIGZAG
multicast tree are small, but also they
are quite balanced. In the worst case,
a peer has to transmit the content to
22 others, which is tiny to the client
population of 2000 clients.
2.Most peers have to exchange
control states with only 12 others.
Those peers at high layers do not
have a heavy control overhead either;
most of them communicate with
around 30 peers, only 1.5% of the
population
Oct. 4
Anh Le + Tuong Nguyen
82
Failure and Merge Overhead
Comments:
Evaluation
1.Most failures do not affect the
system because they happen to layer0 peers (illustrated by a thick line at
the bottom of the graph)
2. For those failures happening to
higher layer peers, the overhead to
recover each of them is small and
mostly less than 20 reconnections (no
more than 2% of client population)
3. The overhead to recover a failure
does not depend on the number of
clients in the system.
4. In the worst case, only 17 peers
need to reconnect, which accounts for
no more than 1.7% of the client
population.
Oct. 4
Anh Le + Tuong Nguyen
83
Conclusions
The key in ZIGZAG’s design is the use of a foreign head other than the head of a
cluster to forward the content to the other members of that cluster.
The benefit of creating algorithm with that idea in mind:
1.Short end-to-end delay: ZIGZAG keeps the end-to-end delay small because the
multicast tree height is at most logarithm of the client population and each client
needs to forward the content to at most a constant number of peers.
2.Low control overhead: Since a cluster is bounded in size and the client degree
bounded by a constant, the control overhead at a client is small. On average, the
overhead is a constant regardless of the client population.
3.Efficient join and failure recovery: A join can be accomplished without asking more
than O(logN) existing clients, where N is the client population. Especially, a failure
can be recovered quickly and regionally with a constant number of reconnections and
no affection on the server.
4. Low maintenance overhead: Maintenance procedures (merge, split, and
performance refinement) are invoked periodically with very low overhead.
Oct. 4
Anh Le + Tuong Nguyen
84
Content-Addressable Network
(CAN)
Proc. ACM SIGCOMM (San
Diego, CA, August 2001)
Motivation

Primary scalability issue in peer-to-peer
systems is the indexing scheme used to
locate the peer containing the desired
content
 Content-Addressable
Network (CAN) is a
scalable indexing mechanism
 Also a central issue in large scale storage
management systems
Oct. 4
Anh Le + Tuong Nguyen
86
Basic Design

Basic Idea:
A virtual d-dimensional Coordinate space
Each node owns a Zone in the virtual space
Data is stored as (key, value) pair
Hash(key) --> a point P in the virtual space
(key, value) pair is stored on the node
within whose Zone the point P locates
Oct. 4
Anh Le + Tuong Nguyen
87
An Example of CAN
1
Oct. 4
Anh Le + Tuong Nguyen
88
An Example of CAN (cont)
1
Oct. 4
2
Anh Le + Tuong Nguyen
89
An Example of CAN (cont)
3
1
2
Oct. 4
Anh Le + Tuong Nguyen
90
An Example of CAN (cont)
3
1
2
Oct. 4
Anh Le + Tuong Nguyen
4
91
An Example of CAN (cont)
Oct. 4
Anh Le + Tuong Nguyen
92
An Example of CAN (cont)
I
Oct. 4
Anh Le + Tuong Nguyen
93
An Example of CAN (cont)
node I::insert(K,V)
I
Oct. 4
Anh Le + Tuong Nguyen
94
An Example of CAN (cont)
node I::insert(K,V)
(1) a = hx(K)
Oct. 4
I
Anh Le + Tuong Nguyen
x=a
95
An Example of CAN (cont)
node I::insert(K,V)
I
(1) a = hx(K)
b = hy(K)
y=b
Oct. 4
Anh Le + Tuong Nguyen
x=a
96
An Example of CAN (cont)
node I::insert(K,V)
(1) a = hx(K)
b = hy(K)
I
(2) route(K,V) -> (a,b)
Oct. 4
Anh Le + Tuong Nguyen
97
An Example of CAN (cont)
node I::insert(K,V)
(1) a = hx(K)
b = hy(K)
I
(K,V)
(2) route(K,V) -> (a,b)
(3) (a,b) stores (K,V)
Oct. 4
Anh Le + Tuong Nguyen
98
An Example of CAN (cont)
node J::retrieve(K)
(1) a = hx(K)
b = hy(K)
(2) route “retrieve(K)” to (a,b)
(K,V)
J
Oct. 4
Anh Le + Tuong Nguyen
99
Important Thing….
Important note:
Data stored in CAN is addressable by name
(ie key) not by location (ie IP address.)
Oct. 4
Anh Le + Tuong Nguyen
100
Conclusion about CAN (part 1)
Support basic hash table operations on
key-value pairs (K,V): insert, search,
delete
 CAN is composed of individual nodes
 Each node stores a chunk (zone) of the
hash table

 A subset

of the (K,V) pairs in the table
Each node stores state information about
neighbor zones
Oct. 4
Anh Le + Tuong Nguyen
101
Routing in CAN
Oct. 4
Anh Le + Tuong Nguyen
102
Routing in CAN (cont)
(a,b)
(x,y)
Oct. 4
Anh Le + Tuong Nguyen
103
Routing in CAN (cont)
Important note:
A node only maintain state for its immediate neighboring
nodes.
Oct. 4
Anh Le + Tuong Nguyen
104
Node Insertion In CAN (cont)
I
new node
Oct. 4
1) discover some node “I” already in CAN
Anh Le + Tuong Nguyen
106
Node Insertion In CAN (cont)
(p,q)
2) pick random
point in space
I
new
node
Oct. 4
Anh Le + Tuong Nguyen
107
Node Insertion In CAN (cont)
(p,q)
J
I
new node
Oct. 4
3) I routes to (p,q), discovers node J
Anh Le + Tuong Nguyen
108
Node Insertion In CAN (cont)
J
new
4) split J’s zone in half… new owns one half
Oct. 4
Anh Le + Tuong Nguyen
109
Node Insertion In CAN (cont)
Important note:
Inserting a new node affects only a single
other node and its immediate neighbors
Oct. 4
Anh Le + Tuong Nguyen
110
Review about CAN (part2)




Requests (insert, lookup, or delete) for a key are
routed by intermediate nodes using a greedy
routing algorithm
Requires no centralized control (completely
distributed)
Small per-node state is independent of the
number of nodes in the system (scalable)
Nodes can route around failures (fault-tolerant)
Oct. 4
Anh Le + Tuong Nguyen
111
CAN: node failures

Need to repair the space
 recover database (weak point)
 soft-state updates
 use replication, rebuild database from replicas
 repair routing
 takeover algorithm
Oct. 4
Anh Le + Tuong Nguyen
112
CAN: takeover algorithm

Simple failures
know your neighbor’s neighbors
 when a node fails, one of its neighbors takes over its
zone


More complex failure modes

simultaneous failure of multiple adjacent nodes
 scoped flooding to discover neighbors
 hopefully, a rare event
Oct. 4
Anh Le + Tuong Nguyen
113
CAN: node failures
Important note:
Only the failed node’s immediate neighbors
are required for recovery
Oct. 4
Anh Le + Tuong Nguyen
114
CAN Improvements




Oct. 4
CAN provides tradeoff between per-node state, O(d), and
path length, O(dn1/d)
 Path length is measured in application level hops
 Neighbor nodes may be geographically distant
Want to achieve a lookup latency that is comparable to
underlying IP path latency
 Several optimizations to reduce lookup latency also
improve robustness in terms of routing and data
availability
Approach: reduce the path length, reduce the per-hop
latency, and add load balancing
Simulated CAN design on Transit-Stub (TS) topologies
using the GT-ITM topology generator (Zegura, et. al.)
Anh Le + Tuong Nguyen
115
Adding Dimensions



Increasing the dimensions of the coordinate space
reduces the routing path length (and latency)
 Small increase in the size
of the routing table at
each node
Increase in number of
neighbors improves
routing fault-tolerance
 More potential next hop
nodes
Simulated path lengths
follow O(dn1/d)
Oct. 4
Anh Le + Tuong Nguyen
116
Multiple independent coordinate
spaces (realities)



Nodes can maintain multiple independent coordinate spaces
(realities)
For a CAN with r realities:
a single node is assigned r zones
and holds r independent
neighbor sets
 Contents of the hash table
are replicated for each reality
Example: for three realities, a
(K,V) mapping to P:(x,y,z) may
be stored at three different nodes
 (K,V) is only unavailable when
all three copies are unavailable
 Route using the neighbor on the reality closest to (x,y,z)
Oct. 4
Anh Le + Tuong Nguyen
117
Dimensions vs. Realities




Increasing the number of dimensions
and/or realities decreases path
length and increases per-node state
More dimensions has greater effect
on path length
More realities provides
stronger fault-tolerance and
increased data availability
Authors do not quantify the different
storage requirements
 More realities requires replicating
(K,V) pairs
Oct. 4
Anh Le + Tuong Nguyen
118
RTT Ratio & Zone Overloading


Incorporate RTT in routing metric
 Each node measures RTT to each neighbor
 Forward messages to neighbor with maximum ratio of progress
to RTT
Overload coordinate zones
 - Allow multiple nodes to share the same zone, bounded by a
threshold MAXPEERS
 Nodes maintain peer state, but not additional neighbor state
 Periodically poll neighbor for its list of peers, measure RTT to
each peer, retain lowest RTT node as neighbor
 (K,V) pairs may be divided among peer nodes or replicated
Oct. 4
Anh Le + Tuong Nguyen
119
Multiple Hash Functions




Improve data availability by using k hash functions to
map a single key to k points in the coordinate space
Replicate (K,V) and store
at k distinct nodes
(K,V) is only unavailable
when all k replicas are
simultaneously
unavailable
Authors suggest querying
all k nodes in parallel to
reduce average lookup latency
Oct. 4
Anh Le + Tuong Nguyen
120
Topology sensitive






Use landmarks for topologically-sensitive construction
Assume the existence of well-known machines like DNS servers
Each node measures its RTT
to each landmark
 Order each landmark in order of
increasing RTT
 For m landmarks:
m! possible orderings
Partition coordinate space
into m! equal size partitions
Nodes join CAN at random
point in the partition corresponding
to its landmark ordering
Latency Stretch is the ratio of CAN
latency to IP network latency
Oct. 4
Anh Le + Tuong Nguyen
121
Other optimizations



Run a background load-balancing technique to offload
from densely populated bins to sparsely populated bins
(partitions of the space)
Volume balancing for more uniform partitioning
 When a JOIN is received, examine zone volume and
neighbor zone volumes
 Split zone with largest volume
 Results in 90% of nodes of equal volume
Caching and replication for “hot spot” management
Oct. 4
Anh Le + Tuong Nguyen
122
Strengths
More resilient than flooding broadcast
networks
 Efficient at locating information
 Fault tolerant routing
 Node & Data High Availability (w/
improvement)
 Manageable routing table size & network
traffic

Oct. 4
Anh Le + Tuong Nguyen
123
Weaknesses
Impossible to perform a fuzzy search
 Susceptible to malicious activity
 Maintain coherence of all the indexed data
(Network overhead, Efficient distribution)
 Still relatively higher routing latency
 Poor performance w/o improvement

Oct. 4
Anh Le + Tuong Nguyen
124
Summary

CAN
 an
Internet-scale hash table
 potential

Scalability
 O(d)

per-node state
Low-latency routing
 simple

building block in Internet applications
heuristics help a lot
Robust
 decentralized,
Oct. 4
can route around trouble
Anh Le + Tuong Nguyen
125
Some Main Research Areas in P2P
Efficiency Search, queries and topologies
( Chord, CAN, YAPPER…)
 Data delivery (ZIGZAG..)
 Resource Management
 Security

Oct. 4
Anh Le + Tuong Nguyen
126
Resource Management
Problem:
 Autonomous nature of peers: essentially selfish
peers must be given an incentive to contribute
resources.
 The scale of the system: makes it hard to get a
complete picture of what resources are available
An approach:
Use concepts from economics to construct a
resource marketplace, where peers can buy and sell
or trade resources as necessary
Oct. 4
Anh Le + Tuong Nguyen
127
Security Problem
Problem:
- Malicious attacks: nodes in a P2P system
operate in an autonomous fashion, and any
node that speaks the system protocol may
participate in the system
An approach:
Mitigating attacks by nodes that abuse the P2P
network by exploiting the implicit trust peers
place on them. An realize by building some
Oct. 4
Anh Le + Tuong Nguyen
128
Reference





Kien A. Hua, Duc A. Tran, and Tai Do, “ZIGZAG: An Efficient Peer-to-Peer Scheme
for Media Streaming”, INFOCOM 2003.
RATNASAMY, S., FRANCIS, P., HANDLEY, M., KARP, R., AND SHENKER, S. A
scalable content-addressable network. In Proc. ACM SIGCOMM (San Diego, CA,
August 2001)
Mayank Bawa, Brian F. Cooper, Arturo Crespo, Neil Daswani, Prasanna Ganesan,
Hector Garcia-Molina, Sepandar Kamvar, Sergio Marti, Mario Schlosser, Qi Sun,
Patrick Vinograd, Beverly Yang” Peer-to-Peer Research at Stanford”
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan,
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, ACM
SIGCOMM 2001
Prasanna Ganesan, Qixiang Sun, and Hector Garcia-Molina, YAPPERS: A Peer-toPeer Lookup Service over Arbitrary Topology, INFOCOM 2003.
Oct. 4
Anh Le + Tuong Nguyen
129
Download