Lecture 2 Distributed Hash Table 1

advertisement
Lecture 2 Distributed Hash
Table
1
Information search in P2P
 Suppose we have a P2P systems with N
nodes.
 A file “F” is stored in one node
 How could an arbitrary node find “F”
in the system.
2
P2P: centralized index
original “Napster” design
1) when peer connects, it
informs central server:


Bob
centralized
directory server
1
peers
IP address
content
2) Alice queries for “Hey
Jude”
3) Alice requests file from
Bob
1
3
1
2
1
Alice
3
P2P: problems with centralized directory
 single point of failure
 performance bottleneck
 copyright infringement:
“target” of lawsuit is
obvious
file transfer is
decentralized, but
locating content is
highly centralized
4
Query flooding
 fully distributed
 no central server
 used by Gnutella
 Each peer indexes the
files it makes available
for sharing (and no
other files)
overlay network: graph
 edge between peer X
and Y if there’s a TCP
connection
 all active peers and
edges form overlay net
 edge: virtual (not
physical) link
 given peer typically
connected with < 10
overlay neighbors
5
Query flooding
 Query message
sent over existing TCP
connections
 peers forward
Query message
 QueryHit
sent over
reverse
Query
path
File transfer:
HTTP
Query
QueryHit
QueryHit
Scalability:
limited scope
flooding
2: Application Layer
SSL (7/09)
6
Gnutella: Peer joining
joining peer Alice must find another peer in
Gnutella network: use list of candidate peers
2. Alice sequentially attempts TCP connections with
candidate peers until connection setup with Bob
3. Flooding: Alice sends Ping message to Bob; Bob
forwards Ping message to his overlay neighbors
(who then forward to their neighbors….)
 peers receiving Ping message respond to Alice
with Pong message
4. Alice receives many Pong messages, and can then
setup additional TCP connections
1.
7
Hierarchical Overlay
 Hybrid of centralized
index, query flooding
approaches
 each peer is either a
super node or assigned to
a super node


TCP connection between
peer and its super node.
TCP connections between
some pairs of super nodes.
 Super node tracks content
in its children
ordinary peer
group-leader peer
neighoring relationships
in overlay network
8
Distributed Hash Table (DHT)
 DHT = distributed P2P database
 Database has (key, value) pairs;
 key: ss number; value: human name
 key: content type; value: IP address
 Peers query database with key

database returns values that match the key
 Peers can also insert (key, value) pairs into
database
 Finding “needles” requires that the P2P
system be structured
9
The Principle Of Distributed Hash Tables

A dynamic distribution of a hash table onto a set of
cooperating nodes
node
Key
Value
1
Frozen
9
Tangled
11
Mulan
12
Lion King
21
Cinderella
22
Doreamon
• Basic service: lookup operation
• Key resolution from any node
A
node
B
node
D
node
C
→Node D : lookup(9)
• Each node has a routing table
• Pointers to some other nodes
• Typically, a constant or a logarithmic number of pointers (why?)
10
DHT Desirable Properties
1. Keys mapped evenly to all nodes in the
network
2. Each node maintains information about
only a few other nodes
3. A key can be found efficiently by
querying the system
4. Node arrival/departures only affect a
few nodes
11
Oc
t. 4
Chord Identifiers
 Assign integer identifier to each peer in range
[0,2n-1].

Each identifier can be represented by n bits.
 Require each key to be an integer in same range.
 To get integer keys, hash original key.
 e.g., key = h(“Led Zeppelin IV”)
 This is why database is called a distributed “hash”
table
12
Each key must be stored in a
node
 Central issue:

Assigning (key, value) pairs to peers.
 Rule: assign to the peer that has the ID
closest to key.
 Convention in lecture: closest is the
immediate successor of the key (or equal
to)
 Example: 4 bits; peers: 1,3,4,5,8,10,12,14;
key = 13, then successor peer = 14
 key = 15, then successor peer = 1

13
Chord [MIT]
 consistent hashing (SHA-1) assigns each
node and object an m-bit ID
 IDs are ordered in an ID circle ranging
from 0 – (2m-1).
 New nodes assume slots in ID circle
according to their ID
 Key k is assigned to first node whose ID ≥
k
  successor(k)
14
Consistent Hashing - Successor Nodes
identifier
node
X key
6
7
successor(6) = 0
6
0
1
1
identifier
circle
6
5
4
successor(1) = 1
2
2
successor(2) = 3
3
2
15
Consistent Hashing – Join and
Departure
 When a node n joins the network, certain keys
previously assigned to n’s successor now
become assigned to n.
 When node n leaves the network, all of its
assigned keys are reassigned to n’s successor.
16
Consistent Hashing – Node Join
keys
5
7
7
0
keys
1
1
keys
6
2
5
4
3
keys
2
17
Consistent Hashing – Node Dep.
keys
7
7
keys
6
0
keys
1
1
6
2
5
4
3
keys
2
18
Consistent Hashing: more
example
 For n = 6, # of identifiers is 64.
 The following DHT ring has 10 nodes and
stores 5 keys.
 The successor of key 10 is node 14.
Circular DHT (1)
1
3
15
4
12
5
10
8
 Each peer only aware of immediate successor
and predecessor.
20
Circle DHT (2)
O(N) messages
on avg to resolve
query, when there
are N peers
0001
I am
Who’s resp
0011
for key 1110 ?
1111
1110
0100
1110
1110
1100
1110
1110
Define closest
as closest
successor
1010
0101
1110
1000
21
Circular DHT with Shortcuts
0001
0011
1111
Who’s resp
for key 1110?
0100
1100
0101
1010
1000
 Each peer keeps track of IP addresses of predecessor,
successor, and short cuts.
 Reduced from 6 to 3 messages.
 Can design shortcuts such that O(log N) neighbors per
peer, O(log N) messages per query
22
Scalable Key Location – Finger
Tables
finger table
start
For.
0
1
0+2
1
2
0+2
2
4
0+2
7
0
finger table
For.
start
0
1+2
2
1
1+2
3
2
1+2
5
1
6
succ.
1
3
0
keys
6
succ.
3
3
0
keys
1
2
5
4
3
finger table
For.
start
0
4
3+2
1
5
3+2
2
7
3+2
succ.
0
0
0
keys
2
Chord key location
 Lookup in finger
table the
furthest node
that precedes
key
 -> O(log n) hops
24
Peer Churn
•To handle peer churn, require each peer to know
the IP address of its two successors
•Each peer periodically pings its two successors
to see if they are still alive
•Limited solution for single join or single failure
25
Consistent Hashing – Node Join
keys
5
7
7
0
keys
1
1
keys
6
2
5
4
3
keys
2
Consistent Hashing – Node Dep.
keys
7
7
keys
6
0
keys
1
1
6
2
5
4
3
keys
2
Node Joins and Stabilizations
 The most important thing is the successor
pointer.
 If the successor pointer is ensured to be
up to date, which is sufficient to guarantee
correctness of lookups, then finger table
can always be verified.
 Each node runs a “stabilization” protocol
periodically in the background to update
successor pointer and finger table.
28
Node Joins and Stabilizations
 “Stabilization” protocol contains 6 functions:
create()
 join()
 stabilize()
 notify()
 fix_fingers()
 check_predecessor()

 When node n first starts, it calls n.join(n’),
where n’ is any known Chord node.
 The join() function asks n’ to find the
immediate successor of n.
29
Node Joins – stabilize()
 Each time node n runs stabilize(), it asks
its successor for the it’s predecessor p,
and decides whether p should be n’s
successor instead.
 stabilize() notifies node n’s successor of n’s
existence, giving the successor the chance
to change its predecessor to n.
 The successor does this only if it knows of
no closer predecessor than n.
30
Node Joins – Join and
Stabilization
n joins



pred(ns) = n

np
succ(np) = ns
n
nil
n runs stabilize

succ(np) = n
pred(ns) = np
ns


predecessor = nil
n acquires ns as successor via some
n’
n notifies ns being the new
predecessor
ns acquires n as its predecessor
np runs stabilize




np asks ns for its predecessor (now n)
np acquires n as its successor
np notifies n
n will acquire np as its predecessor

all predecessor and successor
pointers are now correct

fingers still need to be fixed, but old
fingers will still work
31
Node Failures
 Key step in failure recovery is maintaining correct successor
pointers
 To help achieve this, each node maintains a successor-list of
its r nearest successors on the ring
 If node n notices that its successor has failed, it replaces it
with the first live entry in the list
 Successor lists are stabilized as follows:


node n reconciles its list with its successor s by copying
s’s successor list, removing its last entry, and
prepending s to it.
If node n notices that its successor has failed, it
replaces it with the first live entry in its successor list
and reconciles its successor list with its new successor.
32
Handling failures: redundancy
 Each node knows IP addresses of next r
nodes.
 Each key is replicated at next r nodes
33
Evaluation results
10,000 node network
34
Load distribution
 Probability density function
35
Failure rate
36
Path length
37
Failed lookups vs churn rate
 Start with 500 nodes
38
Chord main problem
 Not good churn-handling solution
 Only merely achieves “correctness”
 The definition of a correct Chord is letting
each node maintain the predecessor and
successor.
 Which allows a query to eventually arrive the
key location, but….
 Takes at most O(N) hops to find the key!
 Not log(N) as the original design claimed.
39
Chord main problem
 No good solution to maintain both scalable
and consistent finger table under Churn.
 Not practical for P2P systems which are
highly dynamic
 Paper talking about high consistency:
Simon S. Lam and Huaiyu Liu, ``Failure Recovery for
Structured P2P Networks: Protocol Design and
Performance Evaluation,'' Proceedings of ACM
SIGMETRICS 2004,
40
Chord problem 2
 Only good for exact search
 Cannot support range search and
approximate search
41
Solution of BitTorrent
 Maintain trackers (servers) as DHT, which
are more reliable
 Users queries trackers to get the locations
of the file
 File sharing are not structured.
42
DHT in a cloud
 Architecture
 Servers are hosted in a cloud.
 Data are distributed among servers
 User is a device outside the cloud.
 User sends a query for a key (webpage,
file, data, etc) to the cloud
 The query first arrives at an arbitrary
server and be routed among the servers
using DHT. It finally arrives at the server
which has the data
 The server replies the user.
43
End of Lecture02
Next paper: Read and Write a review
Vivaldi: A Decentralized Network
Coordinate System, Frank Dabek,
Russ Cox, Frans Kaashoek and Robert
Morris, Proceedings SIGCOMM
2004.
44
Download