Bilkent University Department of Computer Engineering Senior Project AD HOC PEER-TO-PEER FILE SHARING SYSTEM FOR POCKET PC Meltem ÇELEBİ E. Büşra ÇELİKKAYA Hayrettin GÜRKÖK Fatma SÜTCÜ Supervisor: Asst. Prof. Dr. İbrahim KÖRPEOĞLU High-Level Design Report December 30th, 2005 This report is submitted to the Department of Computer Engineering of Bilkent University in partial fulfillment of the requirements of the Senior Projects course CS491. TABLE OF CONTENTS 1. INTRODUCTION .................................................................................................................. 3 1.1 Purpose of the System ..................................................................................................... 3 1.2 Design Goals ................................................................................................................... 3 1.3 Definitions, Acronyms, and Abbreviations ..................................................................... 3 1.4 References ....................................................................................................................... 4 2. CURRENT SOFTWARE ARCHITECTURE ....................................................................... 5 3. PROPOSED SOFTWARE ARCHITECTURE ..................................................................... 7 3.1 Overview ......................................................................................................................... 7 3.2 Subsystem decomposition ............................................................................................... 8 3.2.1. Link monitoring subsystem ..................................................................................... 8 3.2.2 Data transfer subsystem ......................................................................................... 10 3.2.3 Application ............................................................................................................. 10 3.3 Hardware/Software mapping......................................................................................... 11 3.3.1 Pocket PC specifications ........................................................................................ 11 3.3.2 Windows Mobile 2002 Pocket PC specifications .................................................. 12 3.3.3 D-Link Specifications ............................................................................................ 12 3.3.4 .NET Platform Microsoft Visual Studio .NET 2003 .............................................. 12 3.4 Persistent data management .......................................................................................... 12 3.5 Access control and security ........................................................................................... 13 3.6 Global software control ................................................................................................. 13 3.7 Boundary conditions ..................................................................................................... 13 4. SUBSYSTEM SERVICES .................................................................................................. 13 4.1 Application subsystem services .................................................................................. 123 4.2 Link monitoring subsystem services ............................................................................. 13 4.3 Data transfer subsystem services ................................................................................... 13 5. APPENDIX .......................................................................................................................... 15 2 1. INTRODUCTION 1.1 Purpose of the System The purpose of this project is to provide a portable file and message sharing system to people with Pocket PCs. There exists systems that provide instant messaging and file transfer but most of them require people to be registered to a network before and this procedure is time consuming. We aim to provide a system that connects devices without requiring an infrastructure. 1.2 Design Goals Reliability: Since TCP will be used for major packet transfers, loss of information will be prevented. When the connection is lost between two users, the active file transfers are paused and they can be resumed, even from another user, if the same files exist. Scalability: The system should handle excessive connection and transfer requests by a queuing method. For large data transfers, the system may either prevent the transfer or allow with maximum possible reliability. Maintainability: The system should be designed so as to allow future modifications such as implementation of multi-hop communication. Functionality: The system should function correctly on PDAs running on Pocket PC OS. It should be robust preventing invalid actions in place and warning the user on abnormal, unexpected and malicious inputs. Besides, the system continues to work when nodes join and leave the system. Usability: The system should have an interface which the users recognize instead of recalling the menu elements. The interface design should be consistent with most of the current P2P systems and grouped according to logical subsections. Help and shortcuts should be available for users. All the tasks should be reachable within a reasonable access time which can be satisfied by visible and easy-to-use menus. The user should provide feedback about the system status on important actions. Operability: The system will be easy to install on PDAs. In case of a crash, the user should be able to recover without any problem. 1.3 Definitions, Acronyms, and Abbreviations Ad-hoc network: A mobile ad-hoc network (MANET) is a self-configuring network of mobile routers (and associated hosts) connected by wireless links—the union of which forms an arbitrary topology. The routers are free to move randomly and organize themselves arbitrarily; thus, the network's wireless topology may change rapidly and 3 unpredictably. Such a network may operate in a standalone fashion, or may be connected to the larger network. Data packet: A unit of data sent over a network. Usually includes: a header, destination address and the data itself. IEEE 802.11a/b/g: family of wireless RF communication standards or 'languages' used in PC industry. IP Address (Internet Protocol Address): Every machine that is on a network (a local network, or the network of the Internet) has a unique IP number. If a machine does not have an IP address it cannot be on a network. Layer: In networking, layers refer to software protocols. Each layer performs services for the layer above it. Node: Any device connected to network. PCs, servers, and printers are all nodes on the network. Peer: In networking, any functional unit in the same layer as another entity. Ping: Ping is a basic Internet program that lets you verify that a particular Internet address exists and can accept requests. Pocket PC: A small portable computer with a 320x240 resolution screen running the Windows CE 3.0 operating system. Typical models are Compaq iPAQ, HP Jornada, and Casio Cassiopia. Pong: This is the reply of a node that receives a “ping” request from newly connected computers on the network. The pong lists the host’s IP address, network port, and the number of files available for sharing and their combined size. Port: A system or network access point for data entry or exit. Single-hop access: All nodes can reach to other nodes only within their coverage area. TCP: The Transport Control Protocol (TCP) is a transport layer protocol that moves multiple packet data between applications. UDP: User Datagram Protocol transports data as a connectionless protocol, using packet switching. XML (Extensible Markup Language): A W3C initiative that allows information and services to be encoded with meaningful structure and semantics that computers and humans can understand. XML is great for information exchange, and can easily be extended to include user-specified and industry-specified tags. 1.4 References 4 [1] Project Bridge; Requirements Analysis Report; http://www.ug.bcc.bilkent.edu.tr/~gurkok/senior/index.htm. [2] Stephanos Androutsellis-Theotokis; A survey of peer-to-peer file sharing technologies; December 2004. [3] Bernd Bruegge, Allen H. Dutoit; Object-Oriented Software Engineering; PrenticeHall; 2000. [4] D-Link Corporation; D-Link DCF-660W Compact Flash Adapter Quick Installation Guide; 2002. 2. CURRENT SOFTWARE ARCHITECTURE The current P2P file sharing architectures can be classified by their “degree of centralization”. There are three categories according to what extent they rely to one or more servers to facilitate the interaction between peers: • Purely decentralized P2P architectures (such as the original Gnutella Architecture and Freenet). All nodes in the network perform exactly the same tasks, acting both as servers and clients, and there is no central coordination of their activities. The nodes of such networks are termed “servents” (SERVers+clieENTS). • Partially centralized systems (such as Kazaa, Morpheus and more recently Gnutella). The basis is the same as with purely decentralized systems. However, some of the nodes assume a more “important” role than the rest of the nodes, acting as local central indexes for files shared by local peers. These nodes are called “supernodes”, and the way in which they are selected for these special tasks vary from system to system. It is important to note that these supernodes do not constitute single points of failure for a p2p network, since they are dynamically assigned and in case they are subject to failure or malicious attack the network will take action to replace them with others. • Hybrid decentralized architectures (such as Napster). There is a central server facilitating the interaction between peers by maintaining directories of the shared files stored on the respective PCs of registered users to the network, in the form of meta-data. The end-to-end interaction is between two peer clients; however these central servers facilitate this interaction by performing the lookups and identifying the nodes of the network (i.e. the computers) where the files are located. The terms “peer-through-peer” or “broker mediated” are sometimes used for such systems. A classification of peer-to-peer file-sharing systems is shown in Figure 1. Structured and loosely structured systems are inherently purely decentralized, while unstructured systems can be either pure or hybrid decentralized systems or partially centralized. Unstructured Systems Gnutella 5 In unstructured networks (such as Gnutella), the placement of data (files) is completely unrelated to the overlay topology. Since there is no information about which nodes are likely to have the relevant files, searching essentially amounts to random search, in which various nodes are probed and asked if they have any files matching the query. Unstructured P2P systems differ in the way in which they construct the overlay topology, and they way in which they distribute queries from node to node. The advantage of such systems is that they can easily accommodate a highly transient node population. The disadvantage is that it is hard to find the desire files without distributing queries widely. For this reason unstructured P2P systems are considered to be unscalable. However, work is done towards increasing the scalability of unstructured systems. Partially centralized unstructured systems Kazaa, Morpheus Kazaa and Morpheus are two similar partially centralized systems which use the concept of “supernodes”, in which nodes that are dynamically assigned the task of servicing a small subpart of the peer network by indexing and caching files contained in the part of the network they are assigned to. Both Kazaa and Morpheus are proprietary and there is no detailed documentation on how they operate. Peers are automatically elected to become supernodes if they have sufficient bandwidth and processing power using proprietary algorithms. In Morpheus a central server provides new peers with a list of one or more supernodes with which they can connect. supernodes index the files shared by peers connected to them, and proxy search requests on behalf of these peers. Queries are therefore sent to supernodes, not to other peers. The advantage of partially centralized systems is that discovery time is reduced in comparison with purely decentralized systems, while there is still no unique point of failure such as one single central server. If one or more supernodes go down, the nodes connected to them can open new connections with other supernodes, and the network will continue to operate. In the event that a very large number or even all supernodes go down, the existing peers can become supernodes themselves. Gnutella (more recent architecture) The concept of supernodes has also been proposed in a more recent version of the Gnutella protocol. A mechanism for dynamically selecting supernodes organizes the Gnutella network into an interconnection of SuperPeers (as they are referred to) and client nodes. As a node with enough CPU power joins the network, it immediately becomes a SuperPeer and establishes connections with other SuperPeers, forming a flat unstructured network of SuperPeers. It also sets the number of clients required for it to remain a SuperPeer. If it receives at least the required number of connections to client nodes within a specified time, it remains a SuperPeer. Otherwise it turns into a regular client node. If no other SuperPeer is available, it tries to become a SuperPeer again for another probation period. There are not many peer-to-peer file-sharing systems for handheld devices. One of them is “tunA” which enables audio streaming between handheld devices. tunA 6 The overall architecture of the tunA platform is illustrated in two diagrams. The first, Figure 2 shows how individual devices communicate with other peers on the same Ad-Hoc network in two ways: a UDP multicast channel shared between all of them, and separate each-to-each TCP/IP connections between all peers. The second, Figure 3 shows an expanded view of a single tunA peer, and the interaction between each of the major subsystems. Broadly speaking, tunA peers discover each other by periodically multicasting UDP packets announcing their presence to all nearby devices, and maintaining a list of those peers from whom it has detected similar packets within a specified time. When a user selects a local audio track, the system begins to multicast packets consisting of some timing info, and frames of MP3 data to all interested peers (itself included.) A separate ‘listening’ process marshals this data into a buffer from which the MP3 decoder reads. The timing info is used to regulate the contents of the buffer, and the requests of the decoder to provide a synchronized audio experience between peers. The IM component exchanges profile data, text messages, and graphical avatars over separate TCP/IP connections. A database maintains a record of all peers, events, audio, and messages encountered by the system. 3. PROPOSED SOFTWARE ARCHITECTURE 3.1 Overview The main characteristics of the proposed system are as follows: Ad-hoc peer-to-peer network structure allows all nodes to act as both servers and clients. In a P2P network, a node is a service requester from other nodes and also a service provider to other nodes simultaneously (Figure 3.1). Peer Peer service1() service2() … serviceN() requester * * provider Client Server Figure 3.1 – A peer is both a server and a client requesting and providing a series of services simultaneously. One of the main features of the system will be to discover the neighbors of each node and maintain the information for the sake of consistency. Further by combining the neighbor information of each node, the system can discover the network topology. Because the devices are mobile and the media is wireless, direct links between devices are subject to change frequently. That is why the system should not only maintain the neighbor information but also be able to provide reliability during 7 transmitting process. The direct links between the devices can be broken and the nodes can move to other parts of the network. 3.2 Subsystem decomposition The application level consists of three subsystems which are application, data transfer and link monitoring. The kernel level includes standard layers; network layer (IP), transport layer (TCP and UDP) and hardware level (IEEE 802.11). Application Application level Data transfer Link monitoring TCP Kernel level UDP IP IEEE 802.11 Figure 3.2 - The layers of the system 3.2.1. Link monitoring subsystem This subsystem constitutes that fundamental of the program. In order for the other subsystems to function properly, first, this subsystem must properly detect the peers within range and maintain the topology. The main services provided by this subsystem are neighbor detection and link maintenance. Neighbor detection The initial connection between the peers will be created with HELLO messages broadcasted by the new coming nodes to all the existing peers in the network. The message basically contains the IP Address and port of the newly joined node. Then all the peers receiving the broadcast message reply back with a message containing their IPs. Now this new node is connected to the network and can reach to other peers whose IP Addresses are stored. 8 HELLO (IPA) += IPA 1 2 A HELLO (IPA) . . += IPA . . HELLO (IPA) n += IPA Figure 3.3 – New node ‘A’ broadcasts HELLO message to peers and all the peers update their lists IPA 1 HELLO (IP1) += IP1..n HELLO (IP2) A 2 IPA . . . . HELLO (IPn) IPA n Figure 3.4 – All peers reply to node ‘A’ with their IPs and node ‘A’ updates its list Link maintenance The proper way of leaving the network is broadcasting a GOODBYE message so that all the peers can remove its IP from their lists. This way there will not be unnecessary REQUEST messages sent to non-existing nodes. GOODBYE (IPA) - = IPA 1 2 A GOODBYE (IPA) GOODBYE (IPA) . . n - = IPA . . - = IPA Figure 3.5 – Leaving node ‘A’ broadcasts a GOODBYE message to all peers so that they can remove its IP from the list However there will be cases where the nodes will leave the network without a GOODBYE message due to a crash or immediate close of the program To compensate this, the list of peers in the network will maintained by periodically sending PING 9 messages to the nodes whose IP Addresses are stored in the list. If the package drops or receives unsuccessful response, the node is supposed to be disconnected from the network without a GOODBYE message. This maintenance is essential for consistency and reliability. IPA 1 2 - = IP2 IP1..N A . . n Ping . . IPA Pong Figure 3.6 – Node ‘A’ periodically pings the others. Node ‘2’ does not pong back so it removes IP Address of node ‘2’ from the list 3.2.2 Data transfer subsystem The main function of this subsystem is to provide reliable data transfer between the nodes. Since the network structure is dynamic, the nodes can join and leave during a file transfer. Hence, the system should handle the connection loss between these two nodes, and the transferring file should be protected. The data transfer protocol has the following features. Data packets have a unique message ID, which includes the sender, receiver information, packet sequence number and packet resend count. The receiving node sends an acknowledge message to the sender, if the sender does not receive the acknowledgement within a given time, then the sender retransmits the message. During the data transfer, if a packet is known to be missing by the receiver, then the receiver demands the missing packet, by sending the sequence number of the missing packet, to the sender. Then the sender retransmits the missing packet to the sender. Therefore, the data can be transferred completely and reliably. 3.2.3 Application The actual user application (user interface) is going to be implemented after the successful installation of the core layers into the wireless devices. The application will run over the layers using its services. Thus the application is considered as a different subsystem. Given the results of the data request and transfers, user interface provides visualization and accessibility tools. It is a module of the system that interacts with the user directly 10 and provides an abstraction while accessing the result of the operations. The design of UI subsystem provides simple access to complex data. Figures 5, 6 and 7 are some screenshots of the proposed UI demonstrating main windows for each sub-menu. 3.3 Hardware/Software mapping The system model is mapped on the hardware and software by mapping objects about processor, memory, Input/Output issues and mapping associations about connectivity issues. This system is a wireless peer-to-peer ad hoc system; therefore, each part of the system has different hardware and software requirements. However, since the system consists of individual nodes only, the tasks are not located at different locations. The system requirements can be analyzed by Pocket PC hardware specifications, Pocket PC operating system specifications, performance issues, connection tools specifications and development environment issues. Figure 3.7 – Deployment diagram for the system 3.3.1 Pocket PC specifications We are using Asus A620 Pocket PCs during the development of our project. The system is supposed to be running on the devices with at least these specifications. The device has the following hardware specifications: Processor Memory Operating System Display Expansion Slot Notification System 400Mhz Intel® PXA255 Processor 64 MB RAM Microsoft® Pocket PC 2002 3.5” Trans-reflective TFT LCD Display (65k colors) 255 levels of brightness 320x240 resolution Compact Flash Type II slot Event Notification 11 Audio IrDA Battery AC Power Size Weight Charge status Integrated Microphone and Speaker 3.5mm stereo headphone jack FIR/SIR (infrared) 1300mAH Lithium Rechargeable Battery AC Input: 100-240VAC, 50/60Hz Output: 5VDC, 2V (Typical) 125mm x 76.8mm x 13.3mm (L x W x H) 141grams 3.3.2 Windows Mobile 2002 Pocket PC specifications Asus A620 Pocket PCs use Windows Mobile 2002 Pocket PC operating system that provides multimedia applications and network connection services. Windows Pocket PC 2002 has features that will be used by the system such as 802.11 support and .Net Compact Framework. 3.3.3 D-Link Specifications The wireless Compact Flash card that is used in this project is the D-Link DCF-660W, which is IEEE 802.11b compliant. This card can create connection to an existing wireless network, at the same time, it can create connections in Ad-Hoc mode and Infrastructure mode. This system is an Ad-Hoc system that does not need an Access Point. This card can reach a maximum signal rate up to 11 Mbps. Moreover, it provides an Auto Fall-Back mechanism to adjust the speed of the adapter automatically, according to the distances. The DCF-660W requires a Compact Flash type I or II interface, which is available in the Pocket PC we are using. Besides, the battery consumption of DCF-660W is 80mA in power save mode and less than 350mA in transmission mode. [4] 3.3.4 .NET Platform Microsoft Visual Studio .NET 2003 The development platform of the project is Microsoft Visual Studio .NET 2003 and the implementation language is C#. The Visual Studio .NET 2003 supports the .NET Compact Framework, so that development is possible in the devices such as the Pocket PC, as well as other devices powered by the Microsoft Windows CE .NET operating system. .NET Compact Framework constitutes the methodology of .NET platform. .NET Compact Framework provides machine independent code development. Besides the implementation language, that is C#, offers useful object oriented programming in an efficient way. 3.4 Persistent data management When a new node connects to the network, it receives the IP addresses and port numbers of the existing nodes in the network to send requests. In order to provide the link maintenance, this information must be stored. Since the nodes in the range are volatile and 12 they can easily join or leave the system, a text file or database would not very suitable to store their information. Instead, a dynamic data structure such as a vector is better to use for easy search, update, add and remove. Moreover, each node will keep a list of its shared files and attributes to send to other nodes when requested. A text file would not be efficient for transferring and handling various attributes. However, an XML file is easy to maintain as it can be structured according to the requirements. 3.5 Access control and security Providing the security in ad-hoc networks is difficult, because access to the system can not be restricted with a characteristic like IP address, since everything is dynamic. All users can access to the system. However, users can see only the shared folders of the peers through the program. 3.6 Global software control In the software, the application and the core layers work independently. However, the application should use the services of the layers, and the layers can send data to the application. Therefore, layers and application will run as different threads and use a shared global data structure. The sequencing of actions in the layers is controlled by an event-driven main loop. The loop waits for an event to occur such as a request from the application or a received message from the environment. When an event becomes available it is dispatched to a thread according to the type of the message, in order not to block other events. 3.7 Boundary conditions Initialization: The program is brought to steady-state as it is run. To perform use-case operations, the user has to “Connect” to the network. There is not a need for an existing network of other devices; it can be the first node. The only condition is to have a Pocket PC with a 802.11 wireless solution. Termination: The preferred method of terminating the program is clicking on “Disconnect” and “Exit” buttons respectively. In this way, other users will be informed about the leave of the user and the temporary files will be removed. Failure: The program may quit due to a crash, bug or external error (e.g. power supply). We should implement the system so as to avoid internal errors. Also the user may exit the program improperly, without disconnecting from the network. This is compensated with period ping messages. 4. SUBSYSTEM SERVICES 4.1 Application subsystem services 13 When the graphical user interface is started, the Pocket PC becomes a node of the system. The connection is established if there are existing nodes in the range, and user interface displays available services that the user can control. If there are no existing Pocket PCs running this program, a new connection is created and user can see the responses. 4.2 Link monitoring subsystem services The creation of the links between other systems, first of all, UDP sockets are created. They specify the type of the communication. The new node is connected to the other nodes by sending the connection data to them. After the node can introduce itself to the other nodes, the neighbor nodes are connected to each other. This connection services are maintained after a new node joins or leaves the system. The links are preserved until the connected node leaves, or the connection fails. 4.3 Data transfer subsystem services Data transfer services use TCP connections in order to provide reliability. These services are required during file and message transfers between the nodes. The system provides lossless transfer of the data and handles the storing procedures of this data. 14 5. APPENDIX Figure 1 – Table classifying the commonly used P2P programs Figure 2 – How individual devices communicate with other peers 15 Figure 3 – Expanded view of a peer and the interaction between each major subsystem Figure 4 – Single-hop access example. Yellow node searches for a file and only the blue node has the file. 16 Figure 5 – Screenshot of main window with main menu options displayed Figure 6 – Screenshot of main users window and submenu options 17 Figure 7 – Screenshot of a file collection window and file options 18