Peer-to-Peer Algorithms and System CS600 Assignment #5 Nov. 22. 2006 Byungchul Park fates@postech.ac.kr DPNM Lab., Dept. of CSE, POSTECH - 1/25 - DP&NM Lab. CSE, POSTECH Presentation Outline History of P2P Motivation Definition of P2P Most popular P2P application - Skype Overlay Network – Unstructured P2P – Structured P2P Comparison of P2P models Application Services & technology Open Issues Conclusion - 2/25 - DP&NM Lab. CSE, POSTECH History of P2P Origin of Internet – ARPANET was a P2P system (not client/server model) – FTP : killer application of early Internet – Usenet and DNS Revolution of network model – Napster – Developed by Shawn Fanning – MP3 file sharing – Centralized P2P concept Obstacles of P2P – Firewall – Dynamic IP – NAT Success of Skype - 3/25 - DP&NM Lab. CSE, POSTECH Motivation Requirements of Internet application – Scalability – Security & Reliability – Flexibility & QoS • Client/Server model can not satisfies the requirements =>DOS attack, Bottleneck P2P Network – Self-organizing – More flexible than Client/Server model - 4/25 - DP&NM Lab. CSE, POSTECH Definition of P2P [1] Significant autonomy from central servers Exploits resources at the edges of the Internet – Storage and contents – Computing power – Connectivity and presence Resources at edge have intermittent connectivity, being added & removed - 5/25 - DP&NM Lab. CSE, POSTECH Skype [12] One of the most popular VoIP application Skype provide VoIP and IM services that works behind NAT and firewalls 50 million users Valued at $2.6 billon Encrypted protocol & proprietary network How P2P VoIP traffic in Skype differs from other P2P traffic and from traffic in traditional voicecommunication? Resource consumption? - 6/25 - DP&NM Lab. CSE, POSTECH Skype Overview Skype’s three services – – – – VoIP : two-way audio stream, conferences of up to 4user IM (instant messaging) : small text message File-transfer Paid service : initiate & receive calls via regular telephone number through VoIP-PSTN gateway Related to KaZaA – Supernode-based P2P network Inherent Structure Supernode : plenty of spare bandwidth & publicly reachable - 7/25 - DP&NM Lab. CSE, POSTECH Overlay Network Figure 1. from [1] - 8/25 - DP&NM Lab. CSE, POSTECH Overlay Network (cont) Virtual edge – TCP connection or a pointer to an IP address Maintenance of Overlay networks (typical) – – – – Periodically ping to make sure neighbor is still alive verify liveness while messaging If neighbor goes down, may establish new edge New node needs to bootstrap - 9/25 - DP&NM Lab. CSE, POSTECH Overlay Network (cont) Design flexibility – Topology – Protocol – Messaging over TCP, UDP, ICMP Underlying physical net is transparent to developer - 10/25 - DP&NM Lab. CSE, POSTECH Unstructured P2P [2] Unstructured P2P – No relations between peer and contents – Searching : query flooding to central server or neighbor peer Centralized Server model – Central server keeps peer information – Scalability problem and copyright problem Pure P2P – No central server (or directory) – Searching : query flooding to neighbor peer – excessive traffic generation Hybrid P2P – Super node – Hierarchical architecture - 11/25 - DP&NM Lab. CSE, POSTECH Unstructured P2P Applications Napster (Centralized model) [6] Application-level, clientserver point-to-point TCP Steps: – – – – connect to Napster upload list of files give server search query receive “best” of correct answers (ping) - 12/25 - DP&NM Lab. CSE, POSTECH Unstructured P2P Applications (cont) Gnutella [7,8] Focus: decentralized method of searching for files – Central directory server no longer problem (bottleneck) – Harder control of the system Each application instances serves to: – store/serve files – route queries to its neighbors – respond to request queries Limited scope query – if you don’t have the file you want, query 7 of your neighbors. – if they don’t have it, they contact 7 of their neighbors, for a maximum hop count of 10. – reverse path forwarding for responses (not files) - 13/25 - DP&NM Lab. CSE, POSTECH Unstructured P2P Applications (cont) Overlay management of Pure P2P New node uses bootstrap node to get IP addresses of existing Gnutella nodes New node establishes neighboring relations by sending join messages - 14/25 - DP&NM Lab. CSE, POSTECH Unstructured P2P Applications (cont) KaZaA & Skype [3] Every peer is either a supernode or is assigned to a supernode Each supernode knows where the other supernodes are (Mesh structure) Cross between Napster and Gnutella - 15/25 - DP&NM Lab. CSE, POSTECH Structured P2P Structured P2P – Contents-peer mapping – Content-addressing tech. – Searching complexity : O(logN) -> scalability What is DHT? – Distributed Hash Table – files are associated with a key (produced, by hashing the file name) and each node in the system is responsible for storing a certain range of keys [3] Example – Hash (“The Devil Wears Prada”) = 8045 – Pastry, Tapestry, Chord, CAN, Viceroy - 16/25 - DP&NM Lab. CSE, POSTECH Comparison of P2P models [5] - 17/25 - DP&NM Lab. CSE, POSTECH Application Service & Tech. 1. 2. 3. 4. Information Sharing File Sharing Bandwidth efficiency Computing power sharing - 18/25 - DP&NM Lab. CSE, POSTECH Information Sharing Presence information – Basic operation of P2P application – e.g. buddy list of Skype or MSN Document Management – Most documents are stored with personal PCs – P2P networking provides the repository that can coordinates distributed local data – e.g. NextPage-NXT 4 [9] Collaboration – P2P networking provides collaboration service without central management system – e.g. Groove Virtual Office [10] (ad-hoc collaboration system) • Instance messaging, File sharing • Real-time DB synchronization, etc - 19/25 - DP&NM Lab. CSE, POSTECH File Sharing Representative P2P application service Over 70% of Internet traffic generated by P2P file sharing File searching mode – Flooding model (e.g. Gnutella) – Central directory model (e.g. Napster) – Document routing model (e.g. Freenet) Document routing model – A file saved on anonymous peer – An identifier assigned to each file and peer – A file saved on a peer that has similar identifier - 20/25 - DP&NM Lab. CSE, POSTECH Bandwidth Efficiency Client/Server model involved with bottleneck problem and queuing delay Load balancing – Multimedia streaming service and VoD service – A contents copied to numerous peers – e.g. PeerCast, Peer-to-Peer-Radio Improving download speed – If file is found in multiple nodes, user can select parallel downloading – Most likely HTTP byte-range header used to request different portions of the file from different nodes – Automatic recovery when server peer stops sending file - 21/25 - DP&NM Lab. CSE, POSTECH Computing Power Sharing seti@home [11] – – – – – Search for ET intelligence Central site collects radio telescope data Data is divided into work chunks of 300Kbytes User obtains client, which runs in background Peer sets up TCP connection to central computer, downloads chunk – Peer does FFT on chunk, uploads results, gets new chunk » Not peer to peer, but exploits » resources at network edge - 22/25 - DP&NM Lab. CSE, POSTECH Open Issues Copyright issues Avoid Firewall or detection – Dynamic port allocation – NAT traversal - 23/25 - DP&NM Lab. CSE, POSTECH Conclusion P2P networking is a promising paradigm for services operating at the edges of the network [4] Significant autonomy from central servers Scalability, Security & Reliability, Flexibility & QoS P2P networks still have a long way to go - 24/25 - DP&NM Lab. CSE, POSTECH References [1] K.W. Ross and Dan Rubenstein, "Tutorial on P2P Systems " presented at Infocom 2003, http://cis.poly.edu/~ross/p2pTheory/P2Preading.htm. [2] Peer-to-peer information systems: concepts and models, state-of-the-art, and future systems, Karl Aberer, Manfred Hauswirth Tutorial at the 18th International Conference on Data Engineering, February 26-March 1, 2002, San Jose, California. [3]Nathaniel Leibowitz, Matei Ripeanu, and Adam Wierzbicki, "Deconstructing the Kazaa Network", 3rd IEEE Workshop on Internet Applications (WIAPP'03), June 2003. [4] Management of Peer-to-Peer Networks, K. Tutschku., http://www3.informatik.uniuerzburg.de/ITG/2001/vortrag/vortrag13.pdf. [5] 전자통신동향분석 제 21권 제 5호, ETRI, 2006년 10월 [6] Napster, “Napster Messages,” http://opennap.sourceforge.net/napster.txt, 2000. M. Ripeanu, I. Foster, and A. Iamnitchi, “Mapping [7]the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design,” IEEE Internet Computing Journal, Vol.6, No.1, 2002. [8] M. Ripeanu, “Peer-to-Peer Architecture Case Study: Gnutella Network,” In Proc. of IEEE 1st Int’l Conf. on Peer-to-Peer Computing, 2001. [9] Next Page Inc, http://www.nextpage.com/pdfs/collateral/wp/nxt4 build wp090903lo.pdf, 2004. [10] Groove Networks, http://www.groove.net, 2004. [11] D. Anderson, SETI@home, chapter 5, O’Reilly, Sebastopol, 2001, pp.67–76. [12] Saikat Guha, Neil Daswani, Ravi Jain, “An Experimental Study of the Skype Peerto-Peer VoIP System”, IPTPS 2006. - 25/25 - DP&NM Lab. CSE, POSTECH