4. dns server

advertisement
FreeDWeb: A Distributed Peer-to-Peer Web Server
Student1, Student2, Student3, PC Member
Anonymous Insitution
1st line of address
Telephone number, incl. country code
ABSTRACT
The Internet’s primary purpose is to share information with
other people. Today, most people use e-mail, peer-to-peer
systems, and the WWW to share information. However,
these mechanisms each contain flaws, from size limitations
to software requirements, which hinder the process of
information sharing. We have developed a solution to
these problems. FreeDWeb is a distributed peer-to-peer
(P2P) Web service that supports file-sharing by publishing
files on the World Wide Web (WWW). Devices running
FreeDWeb software are mutually untrusting yet form a
network on top of the Bamboo [2] peer-to-peer substrate to
replicate and serve files. This network provides cheap,
simple, and robust file-sharing and Web services. Published
data is also highly available and accessible to anyone with
an Web browser. A set of highly dynamic domain name
service (DNS) servers allows users without FreeDWeb
software to access sites served on the network using
standard Web browsers.
Categories and Subject Descriptors
H.5.4
[Information
Interfaces
Hypertext/Hypermedia – architectures
C.2.4 [Computer-Communiation
Systems - Distributed applications
and
Presentation]:
Networks]:
involves creating a separate message every time a file is
requested. File-sharing services like KaZaA[5] and WinMX[8],
require specific software on both the client and the server side,
thus posing an inconvenience to users and limiting the operating
systems which can access the network.
As a solution, we offer FreeDWeb,1 a distributed, peer-to-peer
Web and DNS server that provides file sharing via a network of
untrusted computers. To share files, a user must install and run
the FreeDWeb software on her computer. However, to view files,
a user needs only a Web browser. FreeDWeb allows a user to
publish information via a Web site under a domain of their
choosing without the hassle, cost, or technical knowledge required
to obtain professional hosting services. For users who do not share
html files, FreeDWeb automatically generates hypertext markup
language (HTML) pages which link to the files that they actually
share. As a result of this Web-based approach users content is
accessible by any Web-enabled device. The peer-to-peer structure
of the network reduces the risk of data or service loss because
there is no single point of failure. Finally, by replicating files over
a number of nodes, the bandwidth required to host a site is
distributed over the network, thereby reducing the cost of sharing .
The rest of this paper is organized as follows: Section 2 deals with
other projects related to FreeDWeb. Section 3 deals with the
architecture of FreeDWeb. Sections 4 and 5 give an overview of
the DNS server and filesystem, and Section 6 concludes.
Distributed
General Terms
Management, Performance, Design, Reliability.
Keywords
P2P, WWW, Distributed Filesystem, Distributed Web Server,
Dynamic DNS, File Sharing.
1. INTRODUCTION
File sharing appears to be one of the most popular activities of
Internet users. The most accessible methods to share files with
other users are HTML publishing VIA the WWW, e-mail, and
peer-to-peer networks. However, sharing content on the Web still
requires expensive hardware and software, or (in the case of free
services like Geocities [10] and Tripod [7]) the limitation to a
specific Web domain and/or relatively small storage space
available. Similarly many e-mail services limit file sizes and
number of attachments. Using e-mail for content sharing also
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
Conference’04, Month 1–2, 2004, City, State, Country.
Copyright 2004 ACM 1-58113-000-0/00/0004…$5.00.
2. RELATED WORK
FreeDWeb brings together many technologies, most of which
have been researched independently, but are generally not
combined into a single package. The primary technologies
involved in FreeDWeb are a distributed filesystem, dynamic DNS,
and a peer-to-peer network.
The software most similar to FreeDWeb has been developed by
IBM under the name YouServe [1]. YouServe is a proprietary
system that allows user's to share content on a peer-to-peer
network and access it through a Web browser. It implements a
dynamic DNS server to allow resolution of YouServe names to IP
addresses. Going beyond FreeDWeb, it does allow users to
password protect pages, but does so with the use of a singlesignon server. Since it was originally developed for internal use,
the system relies upon a single YouServe Coordinator server to
authenticate and register nodes, and to manage the network. The
network's reliance on a single server makes the system somewhat
centralized and very susceptible to failure.
Furthermore,
YouServe does not automatically replicate sites – the owner of a
site must specify who can replicate the site and those replicators
must explicitly allow the replication of that site.
Looking at the individual technologies used in FreeDWeb yields a
long list of other projects. One important distributed file system is
1
Free, because it’s open source, D because it’s distributed, Web, because
it’s a Web server.
Freenet [], a peer-to-peer network application that provides
anonymous decentralized publishing and retrieval of data. The
Freenet system is built to maintain the privacy of the data and
users on the network. Unfortunately, in order to maintain privacy,
it sacrifices performance.
Another project similar to FreeDWeb is DC-Apache [6], a
distributed co-operative Web server. Developed to meet the
demands of high-traffic web sites, this server provides automatic
load-balancing and file replication. The system meets the high
demands of some pages by replicating these resources more often
than other, lower-traffic sites or pages. However, the goal of the
DC-Apache project is to assist high-traffic Web hosts – not to
support a peer-to-peer system. Thus, it does not implement a
dynamic DNS service or a DHT.
In the area of dynamic DNS, DynDNS [3] provides a service
much like FreeDWeb's DNS servers. DynDNS maps domain
names to rapidly changing Internet Protocol (IP) addresses.
Clients are given access to their own records on the DNS servers
for the purpose of updating those records whenever IP addresses
are reassigned. FreeDWeb implements a different solution for
dynamic DNS because of reasons described in Section 4.
3.ARCHITECTURE
FreeDWeb uses Bamboo as its network substrate. The Bamboo
network topology is quite robust and can handle a large number of
simultaneous joins or disconnections. Under normal operation
conditions, it can route messages to any node in the network in
O(logN) time, where N represents the number of nodes in the
network. Bamboo also supports a Distributed Hash Table (DHT),
which also operates in O(logN) for get and put operations. When
nodes disconnect, neighboring nodes soon recognize the loss and
route network messages appropriately. Once a node has joined
the Bamboo network, the FreeDWeb software running on that
node has access to the DHT. The DHT stores information about
which Web sites have been published, the owners of those sites,
and the locations of current replicas.
Although the Bamboo network is efficient, the project does not
use it exclusively. Sending messages over a peer-to-peer network
requires that those messages be routed through other nodes, thus
slowing the delivery process. Operations such as file replication
would be too cumbersome for the Bamboo network and would use
too much bandwidth. Instead, many FreeDWeb operations use
Bamboo to locate nodes and find their IP addresses, then use
direct transmission control protocol (TCP) sockets to
communicate with those nodes. One drawback to this design
choice is that devices sitting behind firewalls will be unable to
serve FreeDWeb content.
4. DNS SERVER
A key component of the FreeDWeb project is allowing users
without FreeDWeb software to access content served by nodes on
the network. When a user attempts to access a traditional Web
site, their Web browser resolves a mnemonic name to an IP
address. However, in the FreeDWeb network, the Web sites a
particular node serves may change over time, as might the IP
addresses of the nodes. As a solution, the FreeDWeb DNS server
maps domain names to the rapidly changing IP addresses of the
nodes hosting the resource.
When a Web client tries to resolve the name of a Web site
on the FreeDWeb network, it is given an IP address of any
working node on the network. The client can then request
the site from that node. In the likely event that the node
does not host that Web site, it will return a hypertext transfer
protocol (HTTP) redirect response, which the client browser
will then follow to a node hosting the requested site.
Although slightly convoluted, this method is advantageous.
First, since it relies upon the nodes in the network to find
the current location of a Web site. If the DNS servers were
used to locate a node that actually hosted a Web site, the
network would have a single point of failure. However,
with the method outlined, users with knowledge of the IP
address of one working node could still access Web sites on
the FreeDWeb network by simply redirecting their Web
site requests to that node. Second, most Web browsers
expect DNS queries to be fast if the servers were required
to find the node that actually hosted a site, the DNS servers
would need to perform the substantially slower operation of
querying the DHT for the site's location.
To create the list of live nodes on the network, the DNS servers
expect nodes to register themselves with the servers. Upon
startup, nodes find the DNS servers by querying the local DNS
resolver for name servers responsible for a certain fixed domain
(e.g. freedweb.org). They then register with each of the name
servers through a process that uses the Bamboo network to ensure
the node is actually on the network. Nodes must register with
each server in order to allow flexibility of the number and liveness
of the DNS servers. Each server also maintains its list of nodes by
periodically sending messages to the nodes and requiring a reply
in order for the node to remain listed.
These methods may seem somewhat inefficient, as they require
much communication between servers and nodes. However, they
are necessary for the FreeDWeb network to remain as
decentralized as possible. The DNS servers solely adapt the
decentralized structure of the FreeDWeb network to the
hierarchical structure of the Internet's DNS system; without them,
the network would still function.
5. FILESYSTEM
The FreeDWeb filesystem provides an interface for accessing files
in a global store. This global store is made up of the files held in
the local stores and local caches of all the nodes. The local store
of a node holds the files of the site published by the owner of that
node. Thus, each node can be thought of as a master since it is
responsible for propagating changes to the files in its local store to
all replicas of those files. The local cache keeps replicas of files
from the local stores of some other nodes. Each node is also a
replica server in that it hosts some copies of files from other
nodes' local stores and serves them when requested. Making
masters responsible for maintaining the replicas of their files is
one way in which users of FreedDWeb are encouraged to keep
their nodes alive for as long as possible so as to maintain the
integrity of the network.
The filesystem takes a few approaches to ensuring that file
replication does not strain the network resources. First, the
number of nodes to which a file is replicated depends inversely on
the size of the file. Second, the size of the local cache on each
node is set to at least some multiple n, greater than 1, of the size
of that node's local store. This provides an incentive to users not
to post very large files, or if they do, to be prepared to host even
larger collections of data. Third, the master of a site must send a
message to its replica servers every t minutes informing them that
it is alive. If a replica server does not receive this “ping” regularly,
the replica server may delete the site.
The filesystem implements most FreeDWeb operations. When a
node receives notification from the FreeDWeb user interface to
publish a site for the first time, it prepares the site for replication.
The site is packaged into one or more compressed archives; if the
site contains a large file, the file is split into multiple parts and
each part is replicated independently of the other parts; if the site
is prohibitively large, the files are packaged in different archives.
If either the site is split into multiple archives or the site contains
large files, a note of the actions taken is placed in each package.
To replicate, the node creates a list of other nodes onto which it
intends to publish. The node contacts each node in turn, and
attempts to publish the site. If the desired node does not have
enough free space the master node continues searching for other
nodes. Any node that successfully replicates the site is added to
the DHT as a replicator of the site.
When the user wishes to update a site, the node contacts each of
the servers responsible for replicating that site and sends only the
new or updated files to the replica server. In order to provide
another incentive for master servers to host other content as well
as receiving the benefit of publishing their own content, the
master servers are required to maintain an appropriate number of
replicas. Replica servers do not automatically replicate other Web
sites.
How the files are stored on a local machine is important to prevent
tampering and to not consume too much free space. In order to
reduce the number of entries in the DHT, groups of small files are
compressed into a larger file. Very large files (in the hundred's of
megabytes) are broken up into small pieces in order to make
transferring them easier.
It is important that files of another site stored on a node not be
vulnerable to tampering. Thus, every file in the local store is
encrypted with public-key cryptography and given a random
name. The public keys for each site are stored in the DHT upon
site creation. When files need to be opened, they are decrypted,
renamed, and returned.
Since nodes have only a limited amount of space in which to store
files, they keep track of a metric for each site in their local cache.
This metric determines the likelihood of deleting any file to make
room for others. This deletion ranking takes the following into
account: (1) Master's uptime; (2) Time elapsed since the last ping
from the master; (3) Time elapsed since the last file open request
for that site.
When a node receives an HTTP GET request, it searches its local
store for the file and returns it. If the file does not exist locally (in
the event that the site contains a large file, or the site itself was
split into multiple parts) the node queries a local cache of IP
addresses of other nodes replicating files for that site. Once the
file is found on another server, the node generates a HTTP
redirect for that content. If the node does not hold any cache
information for other servers, it queries the DHT for the other
replica servers, and then searches for the file. By using the DHT,
FreeDWeb can handle the loss of replica servers.
6. CONCLUSION
Many internet-based information sharing mechanisms
already exist. Most of these require monetary investments
or significant technical knowledge. Furthermore, only the
WWW is very accessibility.
The accessibility and
popularity of the WWW makes it the best method to share
content with others. FreeDWeb gives users the opportunity
to share content easily, cheaply, and without client software
requirements. It also employs automatic data replication to
give users the benefits of redundancy offered by large Web
hosting providers. As Web technologies become more
complex and expensive to maintain, an easy yet reliable
system should exist to share information with others.
7. REFERENCES
[1] R. J. Bayardo Jr., A. Somani, D. Gruhl, and R. Agrawal.
YouServ: A Web Hosting and Content Sharing Tool for the
Masses. In Proc. of the 11th Int'l World Wide Web
Conference (WWW-2002), 2002.
[2] K. Brad, R. Sylvia, R. Sean, S. Scott. Spurring Adoption of
DHTs with Open Hash, a Public DHT Service.
http://www.eecs.berkeley.edu/~kwei/readings/DHT/openhas
h.pdf.
[3] Dynamic DNS. http://www.dyndns.org/services/dyndns/
[4]I SC Domain Survey. Number of Internet Hosts
http://www.isc.org/index.pl?/ops/ds/host-count-history.php.
[5] KaZaA Web site. http://www.kazaa.com.
[6] Q. Li, B. Moon. Distributed cooperative Apache web server.
In Proceedings of the tenth international conference on
World Wide Web, 2001.
[7] Tripod Web site. http://www.tripod.lycos.com.
[8] WinMX Web site. http://www.winmx.com.
[9] Wiley et. al. http://www.freenetproject.org/freenet.
[10] Yahoo Geocities. http://geocities.yahoo.com.
Download