FreeDWeb: A Distributed Peer-to-Peer Web Server Student1, Student2, Student3, PC Member Anonymous Insitution 1st line of address Telephone number, incl. country code ABSTRACT The Internet’s primary purpose is to share information with other people. Today, most people use e-mail, peer-to-peer systems, and the WWW to share information. However, these mechanisms each contain flaws, from size limitations to software requirements, which hinder the process of information sharing. We have developed a solution to these problems. FreeDWeb is a distributed peer-to-peer (P2P) Web service that supports file-sharing by publishing files on the World Wide Web (WWW). Devices running FreeDWeb software are mutually untrusting yet form a network on top of the Bamboo [2] peer-to-peer substrate to replicate and serve files. This network provides cheap, simple, and robust file-sharing and Web services. Published data is also highly available and accessible to anyone with an Web browser. A set of highly dynamic domain name service (DNS) servers allows users without FreeDWeb software to access sites served on the network using standard Web browsers. Categories and Subject Descriptors H.5.4 [Information Interfaces Hypertext/Hypermedia – architectures C.2.4 [Computer-Communiation Systems - Distributed applications and Presentation]: Networks]: involves creating a separate message every time a file is requested. File-sharing services like KaZaA[5] and WinMX[8], require specific software on both the client and the server side, thus posing an inconvenience to users and limiting the operating systems which can access the network. As a solution, we offer FreeDWeb,1 a distributed, peer-to-peer Web and DNS server that provides file sharing via a network of untrusted computers. To share files, a user must install and run the FreeDWeb software on her computer. However, to view files, a user needs only a Web browser. FreeDWeb allows a user to publish information via a Web site under a domain of their choosing without the hassle, cost, or technical knowledge required to obtain professional hosting services. For users who do not share html files, FreeDWeb automatically generates hypertext markup language (HTML) pages which link to the files that they actually share. As a result of this Web-based approach users content is accessible by any Web-enabled device. The peer-to-peer structure of the network reduces the risk of data or service loss because there is no single point of failure. Finally, by replicating files over a number of nodes, the bandwidth required to host a site is distributed over the network, thereby reducing the cost of sharing . The rest of this paper is organized as follows: Section 2 deals with other projects related to FreeDWeb. Section 3 deals with the architecture of FreeDWeb. Sections 4 and 5 give an overview of the DNS server and filesystem, and Section 6 concludes. Distributed General Terms Management, Performance, Design, Reliability. Keywords P2P, WWW, Distributed Filesystem, Distributed Web Server, Dynamic DNS, File Sharing. 1. INTRODUCTION File sharing appears to be one of the most popular activities of Internet users. The most accessible methods to share files with other users are HTML publishing VIA the WWW, e-mail, and peer-to-peer networks. However, sharing content on the Web still requires expensive hardware and software, or (in the case of free services like Geocities [10] and Tripod [7]) the limitation to a specific Web domain and/or relatively small storage space available. Similarly many e-mail services limit file sizes and number of attachments. Using e-mail for content sharing also Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference’04, Month 1–2, 2004, City, State, Country. Copyright 2004 ACM 1-58113-000-0/00/0004…$5.00. 2. RELATED WORK FreeDWeb brings together many technologies, most of which have been researched independently, but are generally not combined into a single package. The primary technologies involved in FreeDWeb are a distributed filesystem, dynamic DNS, and a peer-to-peer network. The software most similar to FreeDWeb has been developed by IBM under the name YouServe [1]. YouServe is a proprietary system that allows user's to share content on a peer-to-peer network and access it through a Web browser. It implements a dynamic DNS server to allow resolution of YouServe names to IP addresses. Going beyond FreeDWeb, it does allow users to password protect pages, but does so with the use of a singlesignon server. Since it was originally developed for internal use, the system relies upon a single YouServe Coordinator server to authenticate and register nodes, and to manage the network. The network's reliance on a single server makes the system somewhat centralized and very susceptible to failure. Furthermore, YouServe does not automatically replicate sites – the owner of a site must specify who can replicate the site and those replicators must explicitly allow the replication of that site. Looking at the individual technologies used in FreeDWeb yields a long list of other projects. One important distributed file system is 1 Free, because it’s open source, D because it’s distributed, Web, because it’s a Web server. Freenet [], a peer-to-peer network application that provides anonymous decentralized publishing and retrieval of data. The Freenet system is built to maintain the privacy of the data and users on the network. Unfortunately, in order to maintain privacy, it sacrifices performance. Another project similar to FreeDWeb is DC-Apache [6], a distributed co-operative Web server. Developed to meet the demands of high-traffic web sites, this server provides automatic load-balancing and file replication. The system meets the high demands of some pages by replicating these resources more often than other, lower-traffic sites or pages. However, the goal of the DC-Apache project is to assist high-traffic Web hosts – not to support a peer-to-peer system. Thus, it does not implement a dynamic DNS service or a DHT. In the area of dynamic DNS, DynDNS [3] provides a service much like FreeDWeb's DNS servers. DynDNS maps domain names to rapidly changing Internet Protocol (IP) addresses. Clients are given access to their own records on the DNS servers for the purpose of updating those records whenever IP addresses are reassigned. FreeDWeb implements a different solution for dynamic DNS because of reasons described in Section 4. 3.ARCHITECTURE FreeDWeb uses Bamboo as its network substrate. The Bamboo network topology is quite robust and can handle a large number of simultaneous joins or disconnections. Under normal operation conditions, it can route messages to any node in the network in O(logN) time, where N represents the number of nodes in the network. Bamboo also supports a Distributed Hash Table (DHT), which also operates in O(logN) for get and put operations. When nodes disconnect, neighboring nodes soon recognize the loss and route network messages appropriately. Once a node has joined the Bamboo network, the FreeDWeb software running on that node has access to the DHT. The DHT stores information about which Web sites have been published, the owners of those sites, and the locations of current replicas. Although the Bamboo network is efficient, the project does not use it exclusively. Sending messages over a peer-to-peer network requires that those messages be routed through other nodes, thus slowing the delivery process. Operations such as file replication would be too cumbersome for the Bamboo network and would use too much bandwidth. Instead, many FreeDWeb operations use Bamboo to locate nodes and find their IP addresses, then use direct transmission control protocol (TCP) sockets to communicate with those nodes. One drawback to this design choice is that devices sitting behind firewalls will be unable to serve FreeDWeb content. 4. DNS SERVER A key component of the FreeDWeb project is allowing users without FreeDWeb software to access content served by nodes on the network. When a user attempts to access a traditional Web site, their Web browser resolves a mnemonic name to an IP address. However, in the FreeDWeb network, the Web sites a particular node serves may change over time, as might the IP addresses of the nodes. As a solution, the FreeDWeb DNS server maps domain names to the rapidly changing IP addresses of the nodes hosting the resource. When a Web client tries to resolve the name of a Web site on the FreeDWeb network, it is given an IP address of any working node on the network. The client can then request the site from that node. In the likely event that the node does not host that Web site, it will return a hypertext transfer protocol (HTTP) redirect response, which the client browser will then follow to a node hosting the requested site. Although slightly convoluted, this method is advantageous. First, since it relies upon the nodes in the network to find the current location of a Web site. If the DNS servers were used to locate a node that actually hosted a Web site, the network would have a single point of failure. However, with the method outlined, users with knowledge of the IP address of one working node could still access Web sites on the FreeDWeb network by simply redirecting their Web site requests to that node. Second, most Web browsers expect DNS queries to be fast if the servers were required to find the node that actually hosted a site, the DNS servers would need to perform the substantially slower operation of querying the DHT for the site's location. To create the list of live nodes on the network, the DNS servers expect nodes to register themselves with the servers. Upon startup, nodes find the DNS servers by querying the local DNS resolver for name servers responsible for a certain fixed domain (e.g. freedweb.org). They then register with each of the name servers through a process that uses the Bamboo network to ensure the node is actually on the network. Nodes must register with each server in order to allow flexibility of the number and liveness of the DNS servers. Each server also maintains its list of nodes by periodically sending messages to the nodes and requiring a reply in order for the node to remain listed. These methods may seem somewhat inefficient, as they require much communication between servers and nodes. However, they are necessary for the FreeDWeb network to remain as decentralized as possible. The DNS servers solely adapt the decentralized structure of the FreeDWeb network to the hierarchical structure of the Internet's DNS system; without them, the network would still function. 5. FILESYSTEM The FreeDWeb filesystem provides an interface for accessing files in a global store. This global store is made up of the files held in the local stores and local caches of all the nodes. The local store of a node holds the files of the site published by the owner of that node. Thus, each node can be thought of as a master since it is responsible for propagating changes to the files in its local store to all replicas of those files. The local cache keeps replicas of files from the local stores of some other nodes. Each node is also a replica server in that it hosts some copies of files from other nodes' local stores and serves them when requested. Making masters responsible for maintaining the replicas of their files is one way in which users of FreedDWeb are encouraged to keep their nodes alive for as long as possible so as to maintain the integrity of the network. The filesystem takes a few approaches to ensuring that file replication does not strain the network resources. First, the number of nodes to which a file is replicated depends inversely on the size of the file. Second, the size of the local cache on each node is set to at least some multiple n, greater than 1, of the size of that node's local store. This provides an incentive to users not to post very large files, or if they do, to be prepared to host even larger collections of data. Third, the master of a site must send a message to its replica servers every t minutes informing them that it is alive. If a replica server does not receive this “ping” regularly, the replica server may delete the site. The filesystem implements most FreeDWeb operations. When a node receives notification from the FreeDWeb user interface to publish a site for the first time, it prepares the site for replication. The site is packaged into one or more compressed archives; if the site contains a large file, the file is split into multiple parts and each part is replicated independently of the other parts; if the site is prohibitively large, the files are packaged in different archives. If either the site is split into multiple archives or the site contains large files, a note of the actions taken is placed in each package. To replicate, the node creates a list of other nodes onto which it intends to publish. The node contacts each node in turn, and attempts to publish the site. If the desired node does not have enough free space the master node continues searching for other nodes. Any node that successfully replicates the site is added to the DHT as a replicator of the site. When the user wishes to update a site, the node contacts each of the servers responsible for replicating that site and sends only the new or updated files to the replica server. In order to provide another incentive for master servers to host other content as well as receiving the benefit of publishing their own content, the master servers are required to maintain an appropriate number of replicas. Replica servers do not automatically replicate other Web sites. How the files are stored on a local machine is important to prevent tampering and to not consume too much free space. In order to reduce the number of entries in the DHT, groups of small files are compressed into a larger file. Very large files (in the hundred's of megabytes) are broken up into small pieces in order to make transferring them easier. It is important that files of another site stored on a node not be vulnerable to tampering. Thus, every file in the local store is encrypted with public-key cryptography and given a random name. The public keys for each site are stored in the DHT upon site creation. When files need to be opened, they are decrypted, renamed, and returned. Since nodes have only a limited amount of space in which to store files, they keep track of a metric for each site in their local cache. This metric determines the likelihood of deleting any file to make room for others. This deletion ranking takes the following into account: (1) Master's uptime; (2) Time elapsed since the last ping from the master; (3) Time elapsed since the last file open request for that site. When a node receives an HTTP GET request, it searches its local store for the file and returns it. If the file does not exist locally (in the event that the site contains a large file, or the site itself was split into multiple parts) the node queries a local cache of IP addresses of other nodes replicating files for that site. Once the file is found on another server, the node generates a HTTP redirect for that content. If the node does not hold any cache information for other servers, it queries the DHT for the other replica servers, and then searches for the file. By using the DHT, FreeDWeb can handle the loss of replica servers. 6. CONCLUSION Many internet-based information sharing mechanisms already exist. Most of these require monetary investments or significant technical knowledge. Furthermore, only the WWW is very accessibility. The accessibility and popularity of the WWW makes it the best method to share content with others. FreeDWeb gives users the opportunity to share content easily, cheaply, and without client software requirements. It also employs automatic data replication to give users the benefits of redundancy offered by large Web hosting providers. As Web technologies become more complex and expensive to maintain, an easy yet reliable system should exist to share information with others. 7. REFERENCES [1] R. J. Bayardo Jr., A. Somani, D. Gruhl, and R. Agrawal. YouServ: A Web Hosting and Content Sharing Tool for the Masses. In Proc. of the 11th Int'l World Wide Web Conference (WWW-2002), 2002. [2] K. Brad, R. Sylvia, R. Sean, S. Scott. Spurring Adoption of DHTs with Open Hash, a Public DHT Service. http://www.eecs.berkeley.edu/~kwei/readings/DHT/openhas h.pdf. [3] Dynamic DNS. http://www.dyndns.org/services/dyndns/ [4]I SC Domain Survey. Number of Internet Hosts http://www.isc.org/index.pl?/ops/ds/host-count-history.php. [5] KaZaA Web site. http://www.kazaa.com. [6] Q. Li, B. Moon. Distributed cooperative Apache web server. In Proceedings of the tenth international conference on World Wide Web, 2001. [7] Tripod Web site. http://www.tripod.lycos.com. [8] WinMX Web site. http://www.winmx.com. [9] Wiley et. al. http://www.freenetproject.org/freenet. [10] Yahoo Geocities. http://geocities.yahoo.com.