Client/Server Distributed Systems 240-322, Semester 1, 2005-2006 3. Peer-to-Peer Technologies (P2P) Objectives – introduce P2P, discuss some current systems, highlight key issues 240-322 Cli/Serv.: P2P/3 1 Contents 1. 2. 3. 4. 5. 6. 7. 8. What is P2P (briefly) ? The Early Days of the Internet The Internet Now Reader-Centric P2P Publisher-Centric P2P P2P Meme Map P2P Issues More Information 240-322 Cli/Serv.: P2P/3 2 1. What is P2P (briefly) ? Peer-to-peer (P2P) allows everyone on the Internet to share their resources with others. P2P is a class of applications that uses resources from the 'edges' of the Internet – e.g. storage space, CPU time, files Some P2P applications: – Napster, Gnutella, FreeNet, FreeHaven 240-322 Cli/Serv.: P2P/3 3 Client/Server vs. P2P server clients P2P 240-322 Cli/Serv.: P2P/3 4 2. The Early Days of the Internet Many users think that P2P ideas are new. Infact, the Internet was originally designed to be P2P – but due to the Web and browsers it has changed into a client/server model during the 1990's 240-322 Cli/Serv.: P2P/3 5 The Arpanet (late 1960's) Arpanet's aim was to share computing resources around the main USA universities and Government installations. 'Killer' applications of 1970-1980's: – FTP, telnet, e-mail, chat – mostly client/server but... – usage patterns were symmetric most 240-322 Cli/Serv.: P2P/3 machines were both clients and servers 6 Usenet (since 1979) A decentralized file sharing model for distributing news. news server news server news server rs 240-322 Cli/Serv.: P2P/3 rs rs readers (and senders) news server news server rs rs continued 7 No central authority – authority is localised in each news server But, an unofficial Usenet backbone has developed: – these are servers which store more newsgroups, have faster processing, better connectivity, etc. – server inequalities cause a hierarchy to appear 240-322 Cli/Serv.: P2P/3 continued 8 News topic hierarchy (the ordering of the news groups) is controlled by users in the news.admin news group – but new users find it hard to influence decisions is an "anything goes" alt.* news group hierarchy. There Main 240-322 Cli/Serv.: P2P/3 problem: lots of useless news items 9 DNS (Domain Name System) Since 1983 DNS allows IP names to be mapped to IP addresses – used by almost all network applications Names are organised into a hierarchy: – psu.ac.th, ait.ac.th, foo.org.th 240-322 Cli/Serv.: P2P/3 continued 10 The name hierarchy has led to a mixed P2P and hierarchical server model th name server ac name server psu name server rs 240-322 Cli/Serv.: P2P/3 rs ait name server name org server lookups and requests name server name foo server rs rs continued 11 Each name server deals with part of the namespace and passes other requests on. Each name server does caching to reduce network load. A hierarchy 240-322 Cli/Serv.: P2P/3 makes search easier. 12 3. The Internet Now Most people use a browser to surf the Web – the Web encourages a client/server model request Most a page, get it users do not run Web servers – hard to setup – many ISPs do not allow them 240-322 Cli/Serv.: P2P/3 continued 13 The present Web makes it hard for an ordinary user to publish (serve) Web pages – dynamic addresses – firewalls – asymmetric bandwidth (e.g. cable modems) Today's Web in summary: – easy to read, difficult to publish 240-322 Cli/Serv.: P2P/3 14 Accountability Many of the restrictions on users (e.g. firewalls) are quite recent – they started appearing in the mid 1990's The reason is lack of accountability – an Internet user can send spam, attack machines, etc. due to the 'poor' design of the Internet protocol it assumes that users are responsible 240-322 Cli/Serv.: P2P/3 15 P2P Aims P2P has a political and social component – it aims to allow everyone to share resources – this is quite different from today's Web where business/governement/university servers present information, and ordinary uses read it There are many technological, political, and social problems to be dealt with. 240-322 Cli/Serv.: P2P/3 16 4. Reader-Centric P2P Reader-centric P2P systems distribute content (information) by anyone, for anyone to read. Example systems: – Napster, Gnutella, FreeNet 240-322 Cli/Serv.: P2P/3 17 4.1. Napster Napster use to allow users to publish music files which other users can download for free – publishing is not the same as authoring Napster is a hybrid of P2P and client/server since a Napster server stores who is logged onto the system and details about the files they are publishing. 240-322 Cli/Serv.: P2P/3 18 Using Napster 3. request music file John Napster client 0. login/upload 1. request "Yesterday" 4. send the file Hey Jude/Beatles. John Yesterday/Beatles. Bob Sgt. Pepper/Beatles. Carol Yesterday/Beatles. Ted 2. send : matching : info. 240-322 Cli/Serv.: P2P/3 Napster server Bob 0. login and upload details Carol Ted 19 Features Downloading a file makes it available from a new machine (your machine) – decentralizes file storage – increases redundancy (good for reliability) – reduces search time for nearby users Napster can be 'attacked' easily – music lawyers sued the Napster server owners, and closed it down (changed it) 240-322 Cli/Serv.: P2P/3 continued 20 The peers (clients) are not equal – some do not publish (they are freeloaders) – some clients are recognized as being 'better' e.g. more songs, better quality recordings people choose those clients first Client inequality means business opportunities – e.g. for the music industry 240-322 Cli/Serv.: P2P/3 21 4.2. Gnutella A network of shared file stores (servents)that can be searched. info. reply my servent search by broadcasting 240-322 Cli/Serv.: P2P/3 servents 22 Features No central server (authority) – so much harder to ‘switch off’ than Napster Illustrates how to access dynamic, heterogeneous file systems Queries can be interpreted differently by each servent – results can be anything 240-322 Cli/Serv.: P2P/3 continued 23 Gnutella servents have been ported to iMode mobile phone in Japan – servents are meant to run on anything InfraSearch – a prototype search engine for Gnutella – not fully developed as yet – uses a broadcast model 240-322 Cli/Serv.: P2P/3 24 Broadcast Model Details Broadcasting is done with TCP – can avoid some firewall problems – utilises TCP’s best effort packet sending TCP will discard packets if the network is too loaded Rebroadcasting/looping is avoided by each packet having a unique ID – a servent does not transmit a packet with the same ID more than once 240-322 Cli/Serv.: P2P/3 continued 25 Packets have a TTL (Time-To-Live) value of 7 to get rid of old stuff – TTL is the number of machines a packet can travel between before ‘dying’ A servent replies to the node that sent it the packet – answers are routed back along the transmission path of the query 240-322 Cli/Serv.: P2P/3 26 Network Shape A new servent will search for connections to nodes with similar bandwidths or higher – this will cause the ‘shape’ of the Gnutella network to change over time – a backbone topology (shape) will develop with high bandwidth nodes in the center, surrounded by slower nodes This dynamic behaviour is not implemented in all servents. 240-322 Cli/Serv.: P2P/3 27 Problems Node overloading – many servents are hardwired to connect to the same nodes these nodes can easily become overloaded these nodes can be attacked to affect the Gnutella network – many sevents cannot be reconfigured to look for other nodes 240-322 Cli/Serv.: P2P/3 continued 28 Too much broadcasting – some servents use broadcasting when direct node-to-node communication is possible – degrades the network Once a servent has found a node with the file it wants, the file is downloaded using ordinary HTTP – no security or anonymity 240-322 Cli/Serv.: P2P/3 29 4.3. FreeNet FreeNet supports disk space sharing – it creates a geographically distributed collection of hard drives reply new links created search by key 240-322 Cli/Serv.: P2P/3 30 Features Documents are encrypted, so the owners of the hard drives do not (easily) know what they are storing – prevents document censorship – provides owner deniability e.g. if they are sued An encrypted file comes with a unique key which is used by search tools 240-322 Cli/Serv.: P2P/3 continued 31 A search returns links to the answer nodes, so a search node collects new connections to the network over time. Often requested documents are cached (for a certain time) by the search nodes – increases network reliability – decreases search time in the future for that doc. 240-322 Cli/Serv.: P2P/3 continued 32 Nodes only have a certain amount of space – less popular files are deleted to make way for the caching of new ones – any unpopular files can be deleted, even ones put there by the node owner – may mean that FreeNet will end up storing music and pornography while more serious information disappears! 240-322 Cli/Serv.: P2P/3 33 Key Formats FreeNet has a complex range of different keys for documents: – keys using a hash function on the data – keys using data keywords (metadata) – keys using public/private key encryption These are likely to change/evolve as FreeNet is developed. 240-322 Cli/Serv.: P2P/3 continued 34 Keys are very important to FreeNet – they allow document contents to be hidden – they allow more efficient search than the Gnutella broadcast model – they allow fake documents to be detected FreeNet search engines are still being developed – early versions use a broadcast model like the one in Gnutella 240-322 Cli/Serv.: P2P/3 35 4.4. Gnutella and FreeNet Compared Availability of files: – Gnutella nodes do not delete files – cached behaviour of FreeNet means no guarantees about what will be stored on a node Node control – Gnutella nodes only contain what their owners put there – FreeNet nodes may end up containing anything 240-322 Cli/Serv.: P2P/3 continued 36 Anonymity and Deniability – Gnutella does not hide document contents – FreeNet encrypts documents Scalability – the caching and dynamic connectivity of FreeNet nodes means it will probably scale better than Gnutella – Gnutella’s broadcast search model will not scale 240-322 Cli/Serv.: P2P/3 37 4.5. MojoNation A distributed file sharing system, but searches use a complex micropayment system – a searcher must ‘compensate’ the node containing the content it wants this might be in the form of digital money or some other resource such as disk space or CPU cycles 240-322 Cli/Serv.: P2P/3 38 Uses of Micropayments For supporting P2P business applications. Micropayments can solve many hacker problems: – spam, distributed denial of service, freeloaders 240-322 Cli/Serv.: P2P/3 39 Problems with Micropayments If a node is based on a slow machine, its network link is slow, and/or it does not contain much space then: – no one will use the node, and – the node has nothing to give in compensation when it uses other nodes (except money) One solution is for a network of machines to become a single MojoNation node. 240-322 Cli/Serv.: P2P/3 40 5. Publisher-Centric P2P Publisher-centric systems concentrate on anonymously preserving information – derived from the ‘Eternity Service’ idea – examples: FreeHaven, Publius Most systems do not process queries quickly. Most systems assume a fixed network topology (shape). 240-322 Cli/Serv.: P2P/3 41 5.1. FreeHaven A system of distributed anonymous storage. Anonymity for everything! – authors, publishers, readers, servers, documents Documents can only be deleted/changed by their publishers not the servers 240-322 Cli/Serv.: P2P/3 continued 42 FreeHaven does allow servers to be dynamically added/removed. Uses complex reputation and accountability systems – to detect if documents are fakes/rubbish – prevents over-publication – the complexity is because author and publisher anonymity must be maintained 240-322 Cli/Serv.: P2P/3 43 5.2. Publius A Web-based publishing system that resists censorship and tampering. A file is replicated among many servers – combats distributed denial of service attacks Files are encrypted, but come with a ‘share’ – each copy of the document has its own unique share 240-322 Cli/Serv.: P2P/3 continued 44 A document search returns one copy of the encrypted doc and several shares – the shares are combined to create a key which is used to decrypt the file – not all the shares for a document are needed to create the key 240-322 Cli/Serv.: P2P/3 45 6. P2P Meme Map File sharing and caching networked devices open source email routing IP routing instant messaging distributed computation Strategic Positioning: an Internet OS User Positioning: make a more capable computer Core Competencies: * metadata management * seemless connectivity and communication * self-organizing systems, zero administration * security client and server simple joining decoupling from machine 240-322 Cli/Serv.: P2P/3 allow unreliability user power P2P is more fundamental we create communities projects, actions, apps that define P2P ideas supported by P2P use edge resources decentralized 46 7. Some P2P Issues Decentralization Metadata Accountability Trust 240-322 Cli/Serv.: P2P/3 47 7.1. Decentralization Most models are a hybrid with some hierarchy and/or central servers – e.g. Napster, DNS, ICQ ICQ (from 1996) allows direct client-toclient communication where possible, but has a server as a fallback. 240-322 Cli/Serv.: P2P/3 48 7.2. Metadata Metadata is “information (data) about data” – e.g. the column headings of a database – e.g. Napster’s metadata is the artist and song names used for searching for music files 240-322 Cli/Serv.: P2P/3 49 Why is Metadata Useful? Metadata about a resource (e.g. about a Web page, a music file, a video) gives extra information about the resource – explains the resource in a clear way Metadata can be used by search engines to search faster, and give more accurate results. 240-322 Cli/Serv.: P2P/3 continued 50 Metadata can be used for information addressing and more clever routing – e.g. use country information about a music file to direct a request to better servers 240-322 Cli/Serv.: P2P/3 51 Metadata in the Web HTTP added support for metadata late in 1997: – the <meta> tags – not widely used, can be misused – meant to contain information such as a description (keywords) about the file, the creator name, date, etc. 240-322 Cli/Serv.: P2P/3 continued 52 XML – allows the creation of new tags which more accurately reflect the meaning of the data e.g. RDF <author>, <publisher>, <owner>, ... (Resource Description Framework) – talks about the properties of resources – e.g. the meaning of a link can be made more accurate: was_written_by, is_interesting 240-322 Cli/Serv.: P2P/3 53 7.3. Accountability “The Tragedy of the Commons” – a commonly owned resource will be overused until it degrades due to the user’s putting selfinterest first One solution: – give ownership of the resource to its users so that caring for the resource prolongs the user’s income 240-322 Cli/Serv.: P2P/3 54 Accountability Problems in P2P Peers are often anonymous / hard to track / transient – so there is no reason for them to look after the resources they use Assigning IDs to users to encourage responsibility often fails – e.g. pseudospoofing 240-322 Cli/Serv.: P2P/3 continued 55 Pseudospoofing – a person creates many fake IDs in the system (e.g. at eBay) each fake ID is used to give a high positive rating to the other fake IDs then one of the IDs is used to steal – also used to get extra free disk space on GeoCities Web sites 240-322 Cli/Serv.: P2P/3 56 Some Solutions A micropayments model – based on digital cash or compensation of resources – e.g. MojoNation A reputation system – relatively new idea – based on encrypted ‘signatures’ and third party verifiers 240-322 Cli/Serv.: P2P/3 continued 57 Tolerate a certain amount of bad behaviour – use mirroring of resources, and other forms of redundancy 240-322 Cli/Serv.: P2P/3 58 7.4. Trust (between peers/servers) Trust increases based on the reputation of peers and servers – how to implement reputation? Trust increases when less people are involved in the transaction – e.g. buying directly without a middleman 240-322 Cli/Serv.: P2P/3 59 Trust increases when the environment contains less risk – e.g. applet execution inside a JVM sandbox 240-322 Cli/Serv.: P2P/3 60 Trust Issues in Censorship-resistent Publication Systems Risk logging your requests server-altered content fake updating of doc. multiple fake docs. legal deletion 240-322 Cli/Serv.: P2P/3 Solution use secure channels use multiple servers encryption/ multiple copies impose pub. limits/ micropayments multiple copies 61 8. More Information in our library Peer-to-Peer: Harnessing the Power of Disruptive Technologies Andy Oram (ed.) O'Reilly, 2001 – very good non-technical overview – contains chapters on the main projects (e.g. Gnutella, FreeNet) and main issues – the P2P meme map is explained in chapter 3 240-322 Cli/Serv.: P2P/3 continued 62 O'Reilly's P2P Web site: http://www.openp2p.com/ – articles about P2P Google's long list of P2P links (~150): http://directory.google.com/Top/ Computers/Software/Internet/ Clients/File_Sharing/ 240-322 Cli/Serv.: P2P/3 63