Chapter 2 Application Layer Part 5: P2P Computer Networking: A Top Down Approach 6th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 2: Application Layer 1 Chapter 2: Application layer 2.1 Principles of network applications 2.2 Web and HTTP 2.3 FTP 2.4 Electronic Mail 2.6 P2P applications 2.7 Socket programming with UDP 2.8 Socket programming with TCP SMTP, POP3, IMAP 2.5 DNS 2: Application Layer 2 Pure P2P architecture no always-on server arbitrary end systems directly communicate peer-peer peers are intermittently connected and change IP addresses Peers always send entire file to another peer Some systems (like bit torrent) break files up into pieces. 2: Application Layer 3 Pure P2P architecture We’ll consider three examples: File distribution Searching for information Case Study: Skype peer-peer 2: Application Layer 4 File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? Distribution time: the time it takes to get a copy of the file to all N peers. Assumptions: 1. the internet core has abundant bandwidth therefore all bottlenecks are in network access. 2. Server and peers are not doing anything else on the internet. 5 File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? us: server upload bandwidth Server us File, size F dN uN u1 d1 u2 ui: peer i upload bandwidth d2 di: peer i download bandwidth Network (with abundant bandwidth) 2: Application Layer 6 File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? Time to upload file from server for first peer: F/us Time to upload file from server for second peer: F/us … Time to upload file from server for nth peer: F/us Time to download file to first peer: F/d1 (could be long) Could start downloading before entire file is uploaded by server The smaller di the larger F/di and the longer the download time. Time to download file to second peer: F/d2 2: Application Layer 7 File distribution time: server-client server sequentially sends N copies: NF/us time client i takes F/di time to download Server F us dN u1 d1 u2 d2 Network (with abundant bandwidth) uN Time to distribute F to N clients using = dcs >= max { NF/us, F/min(di) } i client/server approach increases linearly in N (for large N) 2: Application Layer 8 File distribution time: P2P server must send one Server F u1 d1 u2 d2 copy: F/us time us client i takes F/di time Network (with dN to download abundant bandwidth) uN NF bits must be downloaded (aggregate) fastest possible upload rate: us + Sui In other words, if every node had a copy of the file And every node uploaded the file simultaneously Then the total rate of upload would be the sum of all the rates 2: Application Layer 9 File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? Time to upload file from server for first peer: F/us Time to upload file from server for second peer: F/us … Time to upload file from server for nth peer: F/us Total Time to download file to first peer (from many other peers): F/d1 (all the bits have to be downloaded) The smaller di the larger F/di and the longer the download time. Time to download file to second peer: F/d2 2: Application Layer 10 File distribution time: P2P Server server must send one F u1 d1 u2 d2 copy: F/us time us client i takes F/di time Network (with dN to download abundant bandwidth) uN NF bits must be downloaded (aggregate) fastest possible upload rate: us + Sui dP2P >= max { F/us, F/min(di) , NF/(us + Sui) } i 2: Application Layer 11 Server-client vs. P2P: example Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us Minimum Distribution Time 3.5 P2P Client-Server 3 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 N 2: Application Layer 12 File distribution: BitTorrent P2P file distribution tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file obtain list of peers trading chunks peer 2: Application Layer 13 BitTorrent (1) file divided into 256KB chunks. peer joining torrent: has no chunks, but will accumulate them over time registers with tracker to get list of peers, connects to subset of peers (“neighbors”) while downloading, peer uploads chunks to other peers. peers may come and go once peer has entire file, it may (selfishly) leave or (altruistically) remain 2: Application Layer 14 BitTorrent (2) Pulling Chunks at any given time, different peers have different subsets of file chunks periodically, a peer (Alice) asks each neighbor for list of chunks that they have. Alice sends requests for her missing chunks rarest first Sending Chunks: tit-for-tat Alice sends chunks to four neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10 secs every 30 secs: randomly select another peer, starts sending chunks newly chosen peer may join top 4 “optimistically unchoke” 2: Application Layer 15 BitTorrent: Tit-for-tat (1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers With higher upload rate, can find better trading partners & get file faster! 2: Application Layer 16 Distributed Hash Table (DHT) DHT = distributed P2P database Database has (key, value) pairs; key: social security number; value: human name • E.g., (123-45-6789, George Bush) key: content type; value: IP address • E.g., (Newsboys Shine, 203.17.123.38) Peers query DB with key DB returns values that match the key Example: query with 123-45-6789 get “George Bush” Example: query with “Newsboys Shine” get 203.17.123.38 Peers can also insert (key, value) peers How do we store the DB? One central server Early napster Defeats the purpose in some ways Randomly distribute pieces Each peer maintains a list of the IP addresses of all participating peers. Not scalable 2: Application Layer 18 DHT Identifiers Better: Assign integer identifier to each peer in range [0,2n-1]. Each identifier can be represented by n bits. Require each key in DB to be an integer in same range. There are different types of keys, e.g., SSN or band name Doesn’t matter: every key will be an integer in this range. Problem: keys are not necessarily integers! To get integer keys, hash original key. eg, key = h(“Led Zeppelin IV”) = some integer. Hash function can insure result is in our range This is why they call it a distributed “hash” table How to assign keys to peers? Central issue: Assigning (key, value) pairs to peers. • i.e., which peer will store with (key, value) pairs? Recall each peer is assigned an identifier Rule: assign key to the peer that has the closest ID. Convention in lecture: closest is the immediate successor of the key. How to assign keys to peers? Example: n=4; Then all peer identifiers are in range [0, 15] peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 Key = 8, then successor peer = 8 key = 15, then successor peer = 1 Circular DHT (1) 1 3 15 4 12 5 10 8 Each peer only aware of immediate successor and predecessor. “Overlay network”: the peers form their own “network” A successor/predecessor may be many hops away Circle DHT (2) Disadv: O(N) messages on avg (actually N/2) to resolve query, when there 1111 are N peers Advantage: little info kept in each peer 0001 I am Who’s resp 0011 for key 1110 ? 1110 0100 1110 1110 1100 1110 1110 Define closest as closest successor 1010 1110 1000 0101 Circular DHT with Shortcuts 1 3 15 Who’s resp for key 1110? 4 12 5 10 8 Each peer keeps track of IP addresses of predecessor, successor, short cuts. Reduced from 6 to 2 messages. Possible to design shortcuts so O(log N) neighbors, O(log N) messages in query Peer Churn 1 •To handle peer churn, require 3 15 4 12 5 10 each peer to know the IP address of its two successors. • Each peer periodically pings its two successors to see if they are still alive. 8 Peer 5 abruptly leaves Peer 4 detects; makes 8 its immediate successor; asks 8 who its immediate successor is; makes 8’s immediate successor its second successor. What if peer 13 wants to join? P2P Case study: Skype Skype clients (SC) inherently P2P: pairs of users communicate. proprietary application-layer protocol (inferred via reverse Skype engineering) login server hierarchical overlay with supernodes Index maps usernames to IP addresses; distributed over SNs Skype is proprietary but guess is that it uses DHT Supernode (SN) 2: Application Layer 26 Peers as relays Problem when both Alice and Bob are behind “NATs”. NAT prevents an outside peer from initiating a call to insider peer Solution: Using Alice’s and Bob’s SNs, Relay is chosen Each peer initiates session with relay. Peers can now communicate through NATs via relay 2: Application Layer 27