“ 计算机科学面临的挑战” 高层研讨会 新一代对等网络系统的拓扑性研究 ▅ 为什么要P2P ▅ 第一代无结构的P2P系统有什么问题 ▅ 新一代有结构的P2P系统有什么好处 ▅ 新一代P2P系统的拓扑结构模型 ▅ 新一代P2P系统的研究问题 南京大学软件新技术国家重点实验室 陈贵海 2003年12月27日 Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion What is P2P Network—one version ---M. Ripeaunu, A. Lamnitchi, and I. Foster, “Mapping the Gnutella Network”, IEEE IC, No.1, 2002. [Dynamic operability] P2P applications must keep operating transparently although hosts join and leave the network frequently. [Performance and scalability] P2P applications exhibit what economists call the “network effect” in which a network’s value to an individual user scales with the total number of participants. [Reliability] External attacks should not cause significant data or performance loss. [Anonymity] The application should protect the privacy of people seeking or providing sensitive information. Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 2 What is P2P Network— My version [Equality] All peers assume equal role. [Non Centralized] No centralized server in the space. [Robust] Highly robust, resilient, and selforganizing. [Zero Hardware Cost] No further investments in hardware or bandwidth. [A hot topic] But huge investment in research, e.g, IRIS got $ 12M. Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 3 How Did it Start? A killer application: Napster - Free music over the Internet Key idea: share the storage and bandwidth of individual (home) users Internet Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 4 Napster: Example m5 E m6 F E? E E? m5 m1 m2 m3 m4 m5 m6 B m1 P2P 1st Generation m4 C A Why D A B C D E F m3 m2 2nd Generation Generic Model Problems Conclusion 5 Napster: History history: - 5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service - 12/99: first lawsuit - 3/00: 25% UWisc traffic Napster - 2000: est. 60M users - 2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws - 7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, - Now: try to come back: http://www.napster.com Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 6 Napster: problems centralized server: - single logical point of failure can load balance among servers using DNS notation potential for congestion Napster “in control” (freedom is an illusion) no security: - passwords in plain text - no authentication - no anonymity Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 7 Gnutella Distribute file location and decentralize lookup. Idea: multicast the request Hot to find a file: - Send request to all neighbors - Neighbors recursively multicast the request - Eventually a machine that has the file receives the request, and it sends back the answer Advantages: - Totally decentralized, highly robust Disadvantages: - Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL) Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 8 Gnutella: Example Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… m5 E m6 F E D E? E? m4 E? E? C A B m1 Why P2P 1st Generation m3 m2 2nd Generation Generic Model Problems Conclusion 9 Gnutella: problems Not scalable: the entire network can be swamped with request (to alleviate this problem, each request has a TTL) Not anonymous: The person you are getting the file from knows who you are. Not anymore than it’s non-centralized. What we care about: How much traffic does one query generate? how many hosts can it support at once? What is the latency associated with querying? Is there a bottleneck? Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 10 New Solutions to the Location Problem Overlay Networks: - applications, running at various sites - create “logical” links (e.g., TCP or UDP connections) pairwise between each other - each logical link: multiple physical links, routing defined by native Internet routing Goal: Scalability, Resilient, Security. Abstraction: a distributed hash-table data structure + routing table - Key = hash(data); Key = hash(IP) data= lookup(key); Note: data can be anything: a data object, document, file, pointer to a file… Proposals - CAN (ACIRI/Berkeley) Chord (MIT) Pastry (Rice) Tapestry (Berkeley) Why P2P 1st Generation - Koorde[MIT] - Viceroy[Weizman] - Cycloid[南京大学] 2nd Generation Generic Model Problems Conclusion 11 Overlay Networks: Consistent Hashing David Karger, Eric Lehman, Tom Leighton, Mathhew Levine, Daniel Lewin, Rina Panigrahy, Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web, ACM Symposium on Theory of Computing, 1997 SHA-1: http://www.w3.org/PICS/DSig/SHA1_1_0.html Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 12 Overlay Networks: Typical Systems 1 Ring Mesh Hypercube Systems Chord[MIT] CAN[Berkeley] Pastry[Rice], Tapestry[Berkeley] Persons Dabek Kaashoek Stoica Ratnasamy, Shenker Stoica(formerly in MIT) Druschel, Rowstron Applications CFS Key space 1-dimensional cycle Space-time complexity O (log N ) Data distribution Each node holds a segment of data keys between predecessor and itself. Each node holds a zone of data keys where itself resides Each node holds a segment of data keys that are the closest numerically. Data location Routing table lookup(k)successor(k) lookup(k)region(k) lookup(k) nearest(k) PAST, SCRIBE, OceanStore 2 or d-dimensional torus O (log N ) Successor set + O (log N ) fingers O(d ) 1-dimensional cycle O(d d N ) O(d ) neighbors O (log N ) O (log N ) O(| L |) leaf set + O(| M |) proximity set + O (log N ) neighobrs Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 13 Overlay Networks: Typical Systems 2 DeBruijn Butterfly CCC Systems Koorde[MIT] ODRI[Texas A&M] Viceroy[Weizman] Cycloid[NJU,Wayne] Persons Kaashoek, Karger, Malkhi, Naor, Ratajczak Guihai Chen Chengzhong Xu Loguinov,Kumar, Rai ?? ? Applications ??? Key space 1-dimensional cycle Space-time complexity O(d ) O (log N ) ??? 1-dimensional cycle O(d ) O (log N ) 2-dimensional cycle O(d ) O (log N ) Data distribution Each node holds a segment of data keys between predecessor and itself. Each node holds a segment of keys that are the closest numerically. Each node holds a segment of data keys that are the closest numerically. Data location Routing table lookup(k)successor(k) lookup(k) nearest(k) lookup(k) nearest(k) Why Successor set + O(d ) fingers P2P 1st Generation 2nd Generation 7 neighbors Generic Model 5 neighbors Problems Conclusion 14 Overlay Networks: a generic model Over l ay Net wor k peer 1 peer 2 peer n r out i ng and l ocat i ng r out i ng and l ocat i ng r out i ng and l ocat i ng al gor i t hm al gor i t hm al gor i t hm r out i ng t abl e r out i ng t abl e r out i ng t abl e Dat a St or age Dat a St or age Dat a St or age Dat a Cache Dat a Cache Dat a Cache I nt er net : Suppor t i ng Net or k A Gener i c Topol ogi cal Model of P2P Syst ems Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 15 Overlay Networks: criteria, issues or topics Can one network be modified as a P2P overlay network? • • Ordered Key Space: a necessary measurement of distance. Convergent Routing Algorithm: arriving at the destination after fixed number of steps Resilient Connection Pattern: node maintain continuous connections to neighbors. • Factors affecting the performance of P2P systems • • • • Degree: the number of neighbors Routing length: the number of hops fault tolerance: what fraction of nodes can fail Maintenance overhead: how many messages are passed to maintain coherence load balance: how evenly keys are distributed, how often each node works as an intermediate node for other routs. • Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 16 P2P Issues Security and Protection Trust Anonymity Reputation Business and Legal Issues Business Models Intellectual Property Rights Sociometry Small World Phenomena Power-Law Networks P2P Distributed Databases Query Decomposition Query Distribution Mediation Why P2P 1st Generation Network Architecture and Design Network Topology Routing Overlay Networks Intelligent Agents/ Web-based Services Matchmaking Service Description 2nd Generation Generic Model Distributed Data Structures Distributed Hash Tables Scalable Distributed Data Structures Problems Conclusion 17 P2P Issues 1) What topologies can be used for P2P systems? 2) How to determine the dimension of overlay networks? 3) Tradeoff between degree and routing length? 4) Fault tolerance in quantitative formulation ? 5) More flexible Hash function? 6) Proximity problem ? 7) Big peer and small peer problem? Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 18 conclusion Next generation of the Internet is Grid; Next generation of the Grid is P2P; Next generation of P2P is structured; Why P2P 1st Generation 2nd Generation Generic Model Problems Conclusion 19