File Distribution: Server-Client vs P2P Question N peers

advertisement
File Distribution: Server-Client vs P2P
Question : How much time to distribute file
from one server to N peers?
us: server upload
bandwidth
Server
us
File, size F
dN
uN
u1
d1
u2
ui: peer i upload
bandwidth
d2
di: peer i download
bandwidth
Network (with
abundant bandwidth)
1
File distribution time: server-client
 server sequentially
sends N copies:

NF/us time
 client i takes F/di
time to download
Server
F
us
dN
u1 d1 u2
d2
Network (with
abundant bandwidth)
uN
Time to distribute F
to N clients using
client/server approach = dcs = max { NF/us, F/min(di) }
i
increases linearly in N
(for large N)
2
File distribution time: P2P
 server must send one
copy: F/us time
 client i takes F/di time
to download
 NF bits must be
downloaded (aggregate)
 fastest possible upload rate: us +
Server
F
u1 d1 u2
us
d2
Network (with
abundant bandwidth)
dN
uN
Su
i
dP2P = max { F/us, F/min(di) , NF/(us +
i
Su ) }
i
3
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
Minimum Distribution Time
3.5
P2P
Client-Server
3
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
35
N
4
BitTorrent-like systems
 File split into many smaller pieces
 Pieces are downloaded from both seeds and downloaders
 Distribution paths are dynamically determined

Based on data availability
Downloader
Seed
Downloader
Seed
Downloader
Arrivals
Torrent
(x downloaders; y seeds)
Departures
Downloader
Seed residence
time
Download time
5
File distribution: BitTorrent
 P2P file distribution
torrent: group of
tracker: tracks peers
peers exchanging
chunks of a file
participating in torrent
obtain list
of peers
trading
chunks
peer
6
Background
Multi-tracked torrents
 Torrent file

“announce-list” URLs
 Trackers


Register torrent file
Maintain state information
 Peers





Obtain torrent file
Choose one tracker at random
Announce
Report status
Peer exchange (PEX)
 Issue

SwarmTorrent
Multiple smaller swarms
SwarmTorrent
7
Download using BitTorrent
Background: Incentive mechanism
 Establish connections to large set of peers
 At each time, only upload to a small (changing) set of
peers
 Rate-based tit-for-tat policy
 Downloaders give upload preference to the downloaders
that provide the highest download rates
Highest download rates
Optimistic unchoke
Pick top four
Pick one at random
8
Download using BitTorrent
Background: Piece selection
Peer 1:
1
2
3
Peer 2:
1
2
3
Peer N :
1
2
3
Pieces in neighbor set: (1)
1
2
3
…
…
…
(2) (1)
(2)
…
k
k
k
k
…
K
…
K
…
…
from
K
(2) (3) (2)
K
to
 Rarest first piece selection policy
 Achieves high piece diversity
 Request pieces that
 the uploader has;
 the downloader is interested (wants); and
 is the rarest among this set of pieces
9
Peer-assisted VoD streaming
Some research questions ...
 Can BitTorrent-like protocols provide scalable on-
demand streaming?
 How sensitive is the performance to the application
configuration parameters?
 Piece selection policy
 Peer selection policy
 Upload/download bandwidth
 What is the user-perceived performance?
Start-up delay
 Probability of disrupted playback

ACM SIGMETRICS 2008; IFIP Networking 2007/2009, IEEE/ACM ToN 2012
10
Live Streaming
using BT-like systems
Internet
piece
upload/downloads
Media player
queue/buffer
Buffer window
 Live streaming (e.g., CoolStreaming)
 All peers at roughly the same play/download position
• High bandwidth peers can easily contribute more …

(relatively) Small buffer window
• Within which pieces are exchanged
11
12
More p2p slides …
13
Client-server architecture
server:
 always-on host
 permanent IP address
 server farms for
scaling
clients:

client/server



communicate with server
may be intermittently
connected
may have dynamic IP
addresses
do not communicate
directly with each other
14
Pure P2P architecture

no always-on server
 arbitrary end systems
directly communicate
 peers are intermittently peer-peer
connected and change IP
addresses
 Three topics:




File sharing
File distribution
Searching for information
Case Studies: Bittorrent
and Skype
15
Hybrid of client-server and P2P
Skype
 voice-over-IP P2P application
 centralized server: finding address of remote
party:
 client-client connection: direct (not through
server)
Instant messaging
 chatting between two users is P2P
 centralized service: client presence
detection/location
• user registers its IP address with central
server when it comes online
• user contacts central server to find IP
addresses of buddies
16
P2P file sharing
Example
 Alice runs P2P client
application on her
notebook computer
 intermittently
connects to Internet;
gets new IP address
for each connection
 asks for “Hey Jude”
 application displays
other peers that have
copy of Hey Jude.
 Alice chooses one of
the peers, Bob.
 file is copied from
Bob’s PC to Alice’s
notebook: HTTP
 while Alice downloads,
other users uploading
from Alice.
 Alice’s peer is both a
Web client and a
transient Web server.
All peers are servers =
highly scalable!
17
P2P: centralized directory
original “Napster” design
1) when peer connects, it
informs central server:


Bob
centralized
directory server
1
peers
IP address
content
2) Alice queries for “Hey
Jude”
3) Alice requests file from
Bob
1
3
1
2
1
Alice
18
P2P: problems with centralized directory
 single point of failure
 performance bottleneck
 copyright infringement:
“target” of lawsuit is
obvious
file transfer is
decentralized, but
locating content is
highly centralized
19
Query flooding: Gnutella
 fully distributed
 no central server
 public domain protocol
 many Gnutella clients
implementing protocol
overlay network: graph
 edge between peer X
and Y if there’s a TCP
connection
 all active peers and
edges form overlay net
 edge: virtual (not
physical) link
 given peer typically
connected with < 10
overlay neighbors
20
Gnutella: protocol
 Query message
sent over existing TCP
connections
 peers forward
Query message
 QueryHit
sent over
reverse
path
File transfer:
HTTP
Query
QueryHit
Query
QueryHit
Scalability:
limited scope
flooding
21
Gnutella: Peer joining
joining peer Alice must find another peer in
Gnutella network: use list of candidate peers
2. Alice sequentially attempts TCP connections with
candidate peers until connection setup with Bob
3. Flooding: Alice sends Ping message to Bob; Bob
forwards Ping message to his overlay neighbors
(who then forward to their neighbors….)
 peers receiving Ping message respond to Alice
with Pong message
4. Alice receives many Pong messages, and can then
setup additional TCP connections
1.
22
Hierarchical Overlay
 between centralized
index, query flooding
approaches
 each peer is either a
group leader or assigned
to a group leader.


TCP connection between
peer and its group leader.
TCP connections between
some pairs of group leaders.
 group leader tracks
content in its children
ordinary peer
group-leader peer
neighoring relationships
in overlay network
23
Distributed Hash Table (DHT)
 DHT = distributed P2P database
 Database has (key, value) pairs;
 key: ss number; value: human name
 key: content type; value: IP address
 Peers query DB with key

DB returns values that match the key
 Peers can also insert (key, value) peers
24
DHT Identifiers
 Assign integer identifier to each peer in range
[0,2n-1].

Each identifier can be represented by n bits.
 Require each key to be an integer in same range.
 To get integer keys, hash original key.
 eg, key = h(“Led Zeppelin IV”)
 This is why they call it a distributed “hash” table
25
How to assign keys to peers?
 Central issue:
 Assigning (key, value) pairs to peers.
 Rule: assign key to the peer that has the
closest ID.
 Convention in lecture: closest is the
immediate successor of the key.
 Ex: n=4; peers: 1,3,4,5,8,10,12,14;
key = 13, then successor peer = 14
 key = 15, then successor peer = 1

26
Circular DHT (1)
1
3
15
4
12
5
10
 Each peer
8
only aware of immediate successor
and predecessor.
 “Overlay network”
27
Circle DHT (2)
O(N) messages
on avg to resolve
query, when there
are N peers
0001
I am
0011
Who’s
responsible
for key 1110 ?
1111
1110
0100
1110
1110
1100
1110
1110
Define closest
as closest
successor
1010
0101
1110
1000
28
Circular DHT with Shortcuts
1
Who’s
responsible
for key 1110?
3
15
4
12
5
10
8
 Each peer keeps track of IP addresses of predecessor,
successor, short cuts.
 Reduced from 6 to 2 messages.
 Possible to design shortcuts so O(log N) neighbors,
O(log N) messages in query
29
Peer Churn
1
3
15
4
12
•To handle peer churn, require
each peer to know the IP address
of its two successors.
• Each peer periodically pings its
two successors to see if they
are still alive.
5
10
8
 Peer 5 abruptly leaves
 Peer 4 detects; makes 8 its immediate successor;
asks 8 who its immediate successor is; makes 8’s
immediate successor its second successor.
 What if peer 13 wants to join?
30
P2P Case study: Skype
Skype clients (SC)
 inherently P2P: pairs
of users communicate.
 proprietary
Skype
login server
application-layer
protocol (inferred via
reverse engineering)
 hierarchical overlay
with Supernodes
(SNs)
 Index maps usernames
to IP addresses;
distributed over SNs
Supernode
(SN)
31
Peers as relays
 Problem when both
Alice and Bob are
behind “NATs”.

NAT prevents an outside
peer from initiating a call
to insider peer
 Solution:
 Using Alice’s and Bob’s
SNs, Relay is chosen
 Each peer initiates
session with relay.
 Peers can now
communicate through
NATs via relay
32
Download