Pure P2P architecture ❒ P2P: centralized directory no always-on server original “Napster” design 1) when peer connects, it informs central server: ❒ arbitrary end systems directly communicate ❒ peers are intermittently peer-peer connected and change IP addresses ❒ Three topics: ❍ ❍ ❍ ❍ ❍ ❍ Bob centralized directory server 1 peers 1 IP address content File sharing File distribution Searching for information Case Studies: Bittorrent and Skype 2 1 Alice TDTS04/09: Peer-to-peer TDTS04/09: Peer-to-peer P2P: problems with centralized directory Query flooding: Gnutella ❒ single point of failure ❒ fully distributed ❍ no central server ❒ performance bottleneck ❒ copyright infringement: “target” of lawsuit is obvious file transfer is decentralized, but locating content is highly centralized ❒ public domain protocol ❒ many Gnutella clients implementing protocol TDTS04/09: Peer-to-peer overlay network: graph ❒ edge between peer X and Y if there’s a TCP connection ❒ all active peers and edges form overlay net ❒ edge: virtual (not physical) link ❒ given peer typically connected with < 10 overlay neighbors TDTS04/09: Peer-to-peer Gnutella: protocol File transfer: HTTP ❒ Query message sent over existing TCP connections ❒ peers forward Query message ❒ QueryHit sent over reverse path 3 1 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob Query QueryHit Query Hierarchical Overlay ❒ between centralized index, query flooding approaches ❒ each peer is either a group leader or assigned to a group leader. ❍ QueryHit ❍ Scalability: limited scope flooding TCP connection between peer and its group leader. TCP connections between some pairs of group leaders. ❒ group leader tracks content in its children TDTS04/09: Peer-to-peer ordinary peer group-leader peer neighoring relationships in overlay network TDTS04/09: Peer-to-peer 1 Distributed Hash Table (DHT) DHT Identifiers ❒ DHT = distributed P2P database ❒ Assign integer identifier to each peer in range [0,2n-1]. ❒ Database has (key, value) pairs; ❍ ❍ ❍ key: ss number; value: human name key: content type; value: IP address Each identifier can be represented by n bits. ❒ Require each key to be an integer in same range. ❒ To get integer keys, hash original key. ❒ Peers query DB with key ❍ DB returns values that match the key ❍ ❒ Peers can also insert (key, value) peers ❍ eg, key = h(“Led Zeppelin IV”) This is why they call it a distributed “hash” table TDTS04/09: Peer-to-peer How to assign keys to peers? TDTS04/09: Peer-to-peer Circular DHT (1) 1 ❒ Central issue: ❍ Assigning (key, value) pairs to peers. 3 15 ❒ Rule: assign key to the peer that has the closest ID. ❒ Convention in lecture: closest is the immediate successor of the key. ❒ Ex: n=4; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 ❍ key = 15, then successor peer = 1 4 12 5 10 8 only aware of immediate successor and predecessor. ❒ “Overlay network” ❒ Each peer ❍ TDTS04/09: Peer-to-peer TDTS04/09: Peer-to-peer Circular DHT with Shortcuts Circle DHT (2) 1 O(N) messages on avg to resolve query, when there are N peers 0001 Who’s responsible for key 1110 ? I am 3 15 0011 4 1111 12 5 1110 0100 1110 10 1110 1100 8 ❒ Each peer keeps track of IP addresses of predecessor, 1110 1110 Define closest as closest successor Who’s responsible for key 1110? 0101 1110 1010 successor, short cuts. ❒ Reduced from 6 to 2 messages. ❒ Possible to design shortcuts so O(log N) neighbors, O(log N) messages in query 1000 TDTS04/09: Peer-to-peer TDTS04/09: Peer-to-peer 2 Peer Churn P2P Case study: Skype 1 3 15 4 12 •To handle peer churn, require each peer to know the IP address of its two successors. • Each peer periodically pings its two successors to see if they are still alive. 5 10 8 ❒ Peer 5 abruptly leaves ❒ Peer 4 detects; makes 8 its immediate successor; asks 8 who its immediate successor is; makes 8’s immediate successor its second successor. ❒ What if peer 13 wants to join? Skype clients (SC) ❒ inherently P2P: pairs of users communicate. ❒ proprietary Skype login server application-layer protocol (inferred via reverse engineering) ❒ hierarchical overlay with Supernodes (SNs) ❒ Index maps usernames to IP addresses; distributed over SNs Supernode (SN) TDTS04/09: Peer-to-peer TDTS04/09: Peer-to-peer Peers as relays File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? ❒ Problem when both Alice and Bob are behind “NATs”. ❍ NAT prevents an outside peer from initiating a call to insider peer us: server upload bandwidth Server u1 ❒ Solution: ❍ Using Alice’s and Bob’s SNs, Relay is chosen ❍ Each peer initiates session with relay. ❍ Peers can now communicate through NATs via relay d1 u2 us ui: peer i upload bandwidth d2 di: peer i download bandwidth File, size F dN uN Network (with abundant bandwidth) TDTS04/09: Peer-to-peer TDTS04/09: Peer-to-peer File distribution time: server-client File distribution time: P2P Server ❒ server sequentially sends N copies: ❍ NF/us time ❒ client i takes F/di F us dN Server u1 d1 u2 d2 Network (with abundant bandwidth) uN time to download ❒ server must send one copy: F/us time ❒ client i takes F/di time to download ❒ NF bits must be downloaded (aggregate) ❒ fastest possible upload rate: us + Time to distribute F to N clients using client/server approach = dcs = max { NF/us, F/min(di) } u1 d1 u2 F us d2 Network (with abundant bandwidth) dN uN Σu i i increases linearly in N (for large N) TDTS04/09: Peer-to-peer dP2P = max { F/us, F/min(di) , NF/(us + i Σu ) } i TDTS04/09: Peer-to-peer 3 File distribution: BitTorrent Server-client vs. P2P: example ❒ P2P file distribution Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us torrent: group of tracker: tracks peers peers exchanging chunks of a file participating in torrent Minimum Distribution Time 3.5 P2P Client-Server 3 2.5 obtain list of peers 2 1.5 trading chunks 1 0.5 0 peer 0 5 10 15 20 25 30 35 N TDTS04/09: Peer-to-peer TDTS04/09: Peer-to-peer BitTorrent-like systems Download using BitTorrent Background: Incentive mechanism ❒ File split into many smaller pieces ❒ Pieces are downloaded from both seeds and downloaders ❒ Distribution paths are dynamically determined ❒ Establish connections to large set of peers ❍ At each time, only upload to a small (changing) set of peers Based on data availability ❍ ❒ Rate-based tit-for-tat policy ❍ Downloaders give upload preference to the downloaders that provide the highest download rates Downloader Seed Downloader Seed Downloader Highest download rates Torrent Arrivals Pick top four (x downloaders; y seeds) Departures Downloader Seed residence time Download time Optimistic unchoke TDTS04/09: Peer-to-peer Pick one at random TDTS04/09: Peer-to-peer Live Streaming Download using BitTorrent using BT-like systems Background: Piece selection Peer 1: 1 2 3 Peer 2: 1 2 3 Peer N : 1 2 3 Pieces in neighbor set: (1) 1 2 3 (2) (1) (2) … k … K … k … K … k … K … k … from Internet piece upload/downloads (2) (3) (2) K to ❒ Rarest first piece selection policy ❍ Achieves high piece diversity ❒ Request pieces that ❍ the uploader has; ❍ the downloader is interested (wants); and TDTS04/09: Peer-to-peer ❍ is the rarest among this set of pieces Media player queue/buffer Buffer window ❒ Live streaming (e.g., CoolStreaming) ❍ All peers at roughly the same play/download position • High bandwidth peers can easily contribute more … ❍ (relatively) Small buffer window • Within which pieces are exchanged TDTS04/09: Peer-to-peer 4 Peer-assisted VoD streaming Some research questions ... ❒ Can BitTorrent-like protocols provide scalable on- demand streaming? ❒ How sensitive is the performance to the application configuration parameters? ❍ Piece selection policy ❍ Peer selection policy ❍ Upload/download bandwidth ❒ What is the user-perceived performance? ❍ ❍ Start-up delay Probability of disrupted playback ACM SIGMETRICS 2008; IFIP Networking 2007; IFIP Networking 2009 TDTS04/09: Peer-to-peer 5