File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? us: server upload bandwidth Server us File, size F dN uN u1 d1 u2 ui: peer i upload bandwidth d2 di: peer i download bandwidth Network (with abundant bandwidth) 1 File distribution time: server-client server sequentially sends N copies: NF/us time client i takes F/di time to download Server F us dN u1 d1 u2 d2 Network (with abundant bandwidth) uN Time to distribute F to N clients using client/server approach = dcs = max { NF/us, F/min(di) } i increases linearly in N (for large N) 2 File distribution time: P2P server must send one copy: F/us time client i takes F/di time to download NF bits must be downloaded (aggregate) fastest possible upload rate: us + Server F u1 d1 u2 us d2 Network (with abundant bandwidth) dN uN Su i dP2P = max { F/us, F/min(di) , NF/(us + i Su ) } i 3 Server-client vs. P2P: example Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us Minimum Distribution Time 3.5 P2P Client-Server 3 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 N 4 BitTorrent-like systems File split into many smaller pieces Pieces are downloaded from both seeds and downloaders Distribution paths are dynamically determined Based on data availability Downloader Seed Downloader Seed Downloader Arrivals Torrent (x downloaders; y seeds) Departures Downloader Seed residence time Download time 5 File distribution: BitTorrent P2P file distribution torrent: group of tracker: tracks peers peers exchanging chunks of a file participating in torrent obtain list of peers trading chunks peer 6 Background Multi-tracked torrents Torrent file “announce-list” URLs Trackers Register torrent file Maintain state information Peers Obtain torrent file Choose one tracker at random Announce Report status Peer exchange (PEX) Issue SwarmTorrent Multiple smaller swarms SwarmTorrent 7 Download using BitTorrent Background: Incentive mechanism Establish connections to large set of peers At each time, only upload to a small (changing) set of peers Rate-based tit-for-tat policy Downloaders give upload preference to the downloaders that provide the highest download rates Highest download rates Optimistic unchoke Pick top four Pick one at random 8 Download using BitTorrent Background: Piece selection Peer 1: 1 2 3 Peer 2: 1 2 3 Peer N : 1 2 3 Pieces in neighbor set: (1) 1 2 3 … … … (2) (1) (2) … k k k k … K … K … … from K (2) (3) (2) K to Rarest first piece selection policy Achieves high piece diversity Request pieces that the uploader has; the downloader is interested (wants); and is the rarest among this set of pieces 9 Peer-assisted VoD streaming Some research questions ... Can BitTorrent-like protocols provide scalable on- demand streaming? How sensitive is the performance to the application configuration parameters? Piece selection policy Peer selection policy Upload/download bandwidth What is the user-perceived performance? Start-up delay Probability of disrupted playback ACM SIGMETRICS 2008; IFIP Networking 2007/2009, IEEE/ACM ToN 2012 10 Live Streaming using BT-like systems Internet piece upload/downloads Media player queue/buffer Buffer window Live streaming (e.g., CoolStreaming) All peers at roughly the same play/download position • High bandwidth peers can easily contribute more … (relatively) Small buffer window • Within which pieces are exchanged 11 12 More p2p slides … 13 Client-server architecture server: always-on host permanent IP address server farms for scaling clients: client/server communicate with server may be intermittently connected may have dynamic IP addresses do not communicate directly with each other 14 Pure P2P architecture no always-on server arbitrary end systems directly communicate peers are intermittently peer-peer connected and change IP addresses Three topics: File sharing File distribution Searching for information Case Studies: Bittorrent and Skype 15 Hybrid of client-server and P2P Skype voice-over-IP P2P application centralized server: finding address of remote party: client-client connection: direct (not through server) Instant messaging chatting between two users is P2P centralized service: client presence detection/location • user registers its IP address with central server when it comes online • user contacts central server to find IP addresses of buddies 16 P2P file sharing Example Alice runs P2P client application on her notebook computer intermittently connects to Internet; gets new IP address for each connection asks for “Hey Jude” application displays other peers that have copy of Hey Jude. Alice chooses one of the peers, Bob. file is copied from Bob’s PC to Alice’s notebook: HTTP while Alice downloads, other users uploading from Alice. Alice’s peer is both a Web client and a transient Web server. All peers are servers = highly scalable! 17 P2P: centralized directory original “Napster” design 1) when peer connects, it informs central server: Bob centralized directory server 1 peers IP address content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob 1 3 1 2 1 Alice 18 P2P: problems with centralized directory single point of failure performance bottleneck copyright infringement: “target” of lawsuit is obvious file transfer is decentralized, but locating content is highly centralized 19 Query flooding: Gnutella fully distributed no central server public domain protocol many Gnutella clients implementing protocol overlay network: graph edge between peer X and Y if there’s a TCP connection all active peers and edges form overlay net edge: virtual (not physical) link given peer typically connected with < 10 overlay neighbors 20 Gnutella: protocol Query message sent over existing TCP connections peers forward Query message QueryHit sent over reverse path File transfer: HTTP Query QueryHit Query QueryHit Scalability: limited scope flooding 21 Gnutella: Peer joining joining peer Alice must find another peer in Gnutella network: use list of candidate peers 2. Alice sequentially attempts TCP connections with candidate peers until connection setup with Bob 3. Flooding: Alice sends Ping message to Bob; Bob forwards Ping message to his overlay neighbors (who then forward to their neighbors….) peers receiving Ping message respond to Alice with Pong message 4. Alice receives many Pong messages, and can then setup additional TCP connections 1. 22 Hierarchical Overlay between centralized index, query flooding approaches each peer is either a group leader or assigned to a group leader. TCP connection between peer and its group leader. TCP connections between some pairs of group leaders. group leader tracks content in its children ordinary peer group-leader peer neighoring relationships in overlay network 23 Distributed Hash Table (DHT) DHT = distributed P2P database Database has (key, value) pairs; key: ss number; value: human name key: content type; value: IP address Peers query DB with key DB returns values that match the key Peers can also insert (key, value) peers 24 DHT Identifiers Assign integer identifier to each peer in range [0,2n-1]. Each identifier can be represented by n bits. Require each key to be an integer in same range. To get integer keys, hash original key. eg, key = h(“Led Zeppelin IV”) This is why they call it a distributed “hash” table 25 How to assign keys to peers? Central issue: Assigning (key, value) pairs to peers. Rule: assign key to the peer that has the closest ID. Convention in lecture: closest is the immediate successor of the key. Ex: n=4; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1 26 Circular DHT (1) 1 3 15 4 12 5 10 Each peer 8 only aware of immediate successor and predecessor. “Overlay network” 27 Circle DHT (2) O(N) messages on avg to resolve query, when there are N peers 0001 I am 0011 Who’s responsible for key 1110 ? 1111 1110 0100 1110 1110 1100 1110 1110 Define closest as closest successor 1010 0101 1110 1000 28 Circular DHT with Shortcuts 1 Who’s responsible for key 1110? 3 15 4 12 5 10 8 Each peer keeps track of IP addresses of predecessor, successor, short cuts. Reduced from 6 to 2 messages. Possible to design shortcuts so O(log N) neighbors, O(log N) messages in query 29 Peer Churn 1 3 15 4 12 •To handle peer churn, require each peer to know the IP address of its two successors. • Each peer periodically pings its two successors to see if they are still alive. 5 10 8 Peer 5 abruptly leaves Peer 4 detects; makes 8 its immediate successor; asks 8 who its immediate successor is; makes 8’s immediate successor its second successor. What if peer 13 wants to join? 30 P2P Case study: Skype Skype clients (SC) inherently P2P: pairs of users communicate. proprietary Skype login server application-layer protocol (inferred via reverse engineering) hierarchical overlay with Supernodes (SNs) Index maps usernames to IP addresses; distributed over SNs Supernode (SN) 31 Peers as relays Problem when both Alice and Bob are behind “NATs”. NAT prevents an outside peer from initiating a call to insider peer Solution: Using Alice’s and Bob’s SNs, Relay is chosen Each peer initiates session with relay. Peers can now communicate through NATs via relay 32