P2P Guest lecture

advertisement
Peer To Peer
Distributed Systems
Pete Keleher
Why Distributed Systems?

Aggregate resources!
– memory
– disk
– CPU cycles

Proximity to physical stuff
–
–
–
–

things with sensors
things that print
things that go boom
other people
Fault tolerance!
– Don’t want one tsunami to take everything down
Why Peer To Peer Systems?

What’s peer to peer?
(Traditional) Client-Server
Server
Clients
Peer To Peer
– Lots of reasonable machines
• No one machine loaded more than others
• No one machine irreplacable!
Peer-to-Peer (P2P)

Where do the machines come from?
– “found” resources
• SETI @ home
• BOINC
– existing resources
• computing “clusters” (32, 64, ….)

What good is a peer to peer system?
– all those things mentioned before, including
Storage: files, MP3’s, leaked documents, porn
…
The lookup problem
N1
Key=“title”
Value=MP3 data…
Publisher
N2
Internet
N4
N5
N3
?
N6
Client
Lookup(“title”)
Centralized lookup (Napster)
N1 N 2
SetLoc(“title”, N4)
Publisher@N4
Key=“title”
Value=MP3 data…
N3
DB
N9
N6
N7
Client
Lookup(“title”)
N8
Simple, but O(N) states and a single point of failure
Flooded queries (Gnutella)
N1
Publisher@ N
4
Key=“title”
Value=MP3 data…
N2
N6 N 7
N3
Lookup(“title”)
Client
N8
N9
Robust, but worst case O(N) messages per lookup
Routed queries (Freenet, Chord, etc.)
N1
Publisher
N3
N4
Key=“title”
Value=MP3 data…
N6 N 7
N9
Bad load balance.
N2
N8
Client
Lookup(“title”)
Routing challenges

Define a useful key nearness metric.

Keep the hop count small.
– O(log N)

Keep the routing tables small.
– O(log N)

Stay robust despite rapid changes.
Distributed Hash Tables to the Rescue!





Load Balance: Distributed hash function spreads
keys evenly over the nodes (Consistent hashing).
Decentralization: Fully distributed (Robustness).
Scalability: Lookup grows as a log of number of
nodes.
Availability: Automatically adjusts internal tables
to reflect changes.
Flexible Naming: No constraints on key structure.
What’s a Hash?

Wikipedia:

Example:
any well-defined procedure or mathematical
function that converts a large, possibly variable-sized amount of
data into a small datum, usually a single integer
Assume:
N is a large prime
‘a’ means the ASCII code for the letter ‘a’ (it’s 97)
H(“pete”) = H(“pet”) x N + ‘e’
= (H(“pe”) x N + ‘t’) x N + ‘e’
= (H(“pe”) x N + ‘t’) x N + ‘e’
H(“pete”) mod 1000 = 507
H(“peter”) mod 1000 = 131
H(“petf”) mod 1000 = 986
= 451845518507
It’s a deterministic random number generator!
Chord (a DHT)

m-bit identifier space for both keys and nodes.

Key identifier = SHA-1(key).

Node identifier = SHA-1(IP address).

Both are uniformly distributed.

How to map key IDs to node IDs?
Consistent hashing [Karger 97]
Key 5
Node 105
K5
N105
K20
Circular 7-bit
ID space
N32
N90
K80
A key is stored at its successor: node with next higher ID
Basic lookup
N120
N10
“Where is key 80?”
N105
“N90 has K80”
K80 N90
N60
N32
Basic lookup
N120
N10
“Where is key 80?”
N105
“N90 has K80”
K80 N90
N60
N32
Basic lookup
N120
N10
“Where is key 80?”
N105
“N90 has K80”
K80 N90
N60
N32
Basic lookup
N120
N10
“Where is key 80?”
N105
“N90 has K80”
K80 N90
N60
N32
Basic lookup
N120
N10
“Where is key 80?”
N105
“N90 has K80”
K80 N90
N60
N32
“Finger table” allows log(N)-time lookups
¼
½
1/8
1/16
1/32
1/64
1/128
N80
Every node knows m other nodes in the ring
Finger i points to successor of n+2i-1
N120
112
¼
½
1/8
1/16
1/32
1/64
1/128
N80
Each node knows more about portion of circle close to it
Lookups take O(log(N)) hops
N5
N10
N110
K19
N20
N99
N32 Lookup(K19)
N80
N60
Lookups take O(log(N)) hops
N5
N10
N110
K19
N20
N99
N32 Lookup(K19)
N80
N60
Lookups take O(log(N)) hops
N5
N10
N110
K19
N20
N99
N32 Lookup(K19)
N80
N60
Lookups take O(log(N)) hops
N5
N10
N110
K19
N20
N99
N32 Lookup(K19)
N80
N60
Lookups take O(log(N)) hops
N5
N10
N110
K19
N20
N99
N32 Lookup(K19)
N80
N60
Joining: linked list insert
N25
N36
1. Lookup(36)
N40
K30
K38
1. Each node’s successor is correctly maintained.
2. For every key k, node successor(k) is responsible for k.
Join (2)
N25
2. N36 sets its own
successor pointer
N36
N40
K30
K38
Initialize the new node finger table
Join (3)
N25
3. Set N25’s successor
pointer
N36
N40
K30
K38
Update finger pointers of existing nodes
Join (4)
N25
4. Copy keys 26..36
from N40 to N36
N36 K30
N40 K38
Transferring keys
Stabilization Protocol





To handle concurrent node joins/fails/leaves.
Keep successor pointers up to date, then verify
and correct finger table entries.
Incorrect finger pointers may only increase
latency, but incorrect successor pointers may
cause lookup failure.
Nodes periodically run stabilization protocol.
Won’t correct a Chord system that has split into
multiple disjoint cycles, or a single cycle that loops
multiple times around the identifier space.
Take Home Points

Hash used to uniformly distribute data, nodes
across a range.

Random distribution balances load.

Awesome systems paper:
– identify commonality across algorithms
– restrict work to implementing that one simple abstraction
– use as building block
Download