pptx

advertisement
PEER TO PEER AND
DISTRIBUTED HASH TABLES
CS 271
1
Distributed Hash Tables
Challenge: To design and implement a robust
and scalable distributed system composed of
inexpensive, individually unreliable computers
in unrelated administrative domains
Partial thanks Idit Keidar)
CS 271
2
Searching for distributed data
• Goal: Make billions of objects available to
millions of concurrent users
– e.g., music files
• Need a distributed data structure to keep
track of objects on different sires.
– map object to locations
• Basic Operations:
– Insert(key)
– Lookup(key)
CS 271
3
Searching
N1
Key=“title”
Value=MP3 data…
Publisher
N2
Internet
N4
N5
CS 271
N3
?
Client
Lookup(“title”)
N6
4
Simple Solution
• First There was Napster
– Centralized server/database for lookup
– Only file-sharing is peer-to-peer, lookup is not
• Launched in 1999, peaked at 1.5 million
simultaneous users, and shut down in July
2001.
CS 271
5
Napster: Publish
insert(X,
123.2.21.23)
...
Publish
I have X, Y, and Z!
123.2.21.23
CS 271
6
Napster: Search
123.2.0.18
Fetch
Query
search(A)
-->
123.2.0.18
Reply
Where is file A?
CS 271
7
Overlay Networks
• A virtual structure imposed over the
physical network (e.g., the Internet)
– A graph, with hosts as nodes, and some edges
Keys
Node ids
Hash fn
Overlay
Network
CS 271
Hash fn
8
Unstructured Approach: Gnutella
• Build a decentralized unstructured overlay
– Each node has several neighbors
– Holds several keys in its local database
• When asked to find a key X
– Check local database if X is known
– If yes, return, if not, ask your neighbors
• Use a limiting threshold for propagation.
CS 271
9
Gnutella: Search
I have file A.
I have file A.
Reply
Query
Where is file A?
CS 271
10
Structured vs. Unstructured
• The examples we described are unstructured
– There is no systematic rule for how edges are
chosen,
each node “knows some” other nodes
– Any node can store any data so a searched data
might reside at any node
• Structured overlay:
– The edges are chosen according to some rule
– Data is stored at a pre-defined place
– Tables define next-hop for lookup
CS 271
11
Hashing
• Data structure supporting the operations:
– void insert( key, item )
– item search( key )
• Implementation uses hash function for
mapping keys to array cells
• Expected search time O(1)
– provided that there are few collisions
CS 271
12
Distributed Hash Tables (DHTs)
• Nodes store table entries
• lookup( key ) returns the location of the node
currently responsible for this key
• We will mainly discuss Chord, Stoica, Morris,
Karger, Kaashoek, and Balakrishnan
SIGCOMM 2001
• Other examples: CAN (Berkeley), Tapestry
(Berkeley), Pastry (Microsoft Cambridge), etc.
CS 271
13
CAN [Ratnasamy, et al]
• Map nodes and keys to coordinates in a multidimensional cartesian space
source
Zone
key
Routing through shortest Euclidean path
For d dimensions, routing takes O(dn1/d) hops
Chord Logical Structure (MIT)
• m-bit ID space (2m IDs), usually m=160.
• Nodes organized in a logical ring according
to their IDs.
N1
N56
N51
N8
N10
N48
N14
N42
N21
N38
N30
15
DHT: Consistent Hashing
Key 5
Node 105
K5
N105
K20
Circular ID space
N32
N90
K80
A key is stored at its successor: node with next higher ID
Thanks CMU for animation
CS 271
16
Consistent Hashing Guarantees
• For any set of N nodes and K keys:
– A node is responsible for at most (1 + )K/N keys
– When an (N + 1)st node joins or leaves,
responsibility for O(K/N) keys changes hands
CS 271
17
DHT: Chord Basic Lookup
N120
N10
“Where is key 80?”
N105
“N90 has K80”
K80
N32
N90
N60
Each node knows only its successor
•Routing around the circle, one node at a time.
CS 271
18
DHT: Chord “Finger Table”
1/2
1/4
1/8
1/16
1/32
1/64
1/128
N80
• Entry i in the finger table of node n is the first node that succeeds or
equals n + 2i
• In other words, the ith finger points 1/2n-i way around the ring
CS 271
19
DHT: Chord Join
• Assume an identifier space [0..8]
• Node n1 joins
Succ. Table
i id+2i succ
0 2
1
1 3
1
2 5
1
0
1
7
6
2
5
3
4
CS 271
20
DHT: Chord Join
• Node n2 joins
Succ. Table
i id+2i succ
0 2
2
1 3
1
2 5
1
0
1
7
6
2
Succ. Table
5
3
4
CS 271
i id+2i succ
0 3
1
1 4
1
2 6
1
21
DHT: Chord Join
Succ. Table
i id+2i succ
0 1
1
1 2
2
2 4
0
• Nodes n0, n6 join
Succ. Table
i id+2i succ
0 2
2
1 3
6
2 5
6
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
CS 271
i id+2i succ
0 3
6
1 4
6
2 6
6
22
DHT: Chord Join
Succ. Table
i
i id+2
0 1
1 2
2 4
• Nodes: n1, n2, n0, n6
• Items: f7, f1
0
i id+2i succ
0 7
0
1 0
0
2 2
2
Succ. Table
1
7
Succ. Table
Items
7
succ
1
2
0
6
i
i id+2
0 2
1 3
2 5
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
CS 271
i id+2i succ
0 3
6
1 4
6
2 6
6
23
DHT: Chord Routing
• Upon receiving a query for
item id, a node:
• Checks whether stores the
item locally?
• If not, forwards the query to
the largest node in its
successor table that does
not exceed id
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
Succ. Table
i
i id+2
0 1
1 2
2 4
Items
7
succ
1
2
0
0
Succ. Table
1
7
i
i id+2
0 2
1 3
2 5
query(7)
6
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
CS 271
i id+2i succ
0 3
6
1 4
6
2 6
6
24
Chord Data Structures
• Finger table
• First finger is successor
• Predecessor
• What if each node knows all other nodes
– O(1) routing
– Expensive updates
CS 271
25
Routing Time
• Node n looks up a key stored at
node p
• p is in n’s ith interval:
p
((n+2i-1)mod
2m,
(n+2i)mod
n
n+2i-1
2m]
• n contacts f=finger[i]
– The interval is not empty so:
f  ((n+2i-1)mod 2m, (n+2i)mod 2m]
• f is at least 2i-1 away from n
• p is at most 2i-1 away from f
• The distance is halved at each
hop.
f
finger[i]
p
n+2i
26
Routing Time
• Assuming uniform node distribution around
the circle, the number of nodes in the search
space is halved at each step:
– Expected number of steps: log N
• Note that:
– m = 160
– For 1,000,000 nodes, log N = 20
CS 271
27
P2P Lessons
•
•
•
•
Decentralized architecture.
Avoid centralization
Flooding can work.
Logical overlay structures provide strong
performance guarantees.
• Churn a problem.
• Useful in many distributed contexts.
CS 271
28
Download