MobiDFS: A lightweight mobile distributed file system Jilong Liao, Chi Zhang University of Tennessee, Knoxville, USA jliao2@utk.edu; czhang24@utk.edu Abstract—Though Mobile computing area is getting increasingly popular during these several years, the current development of distributed file system does not catch up with the dramatic growth of mobile platforms. This paper proposes a new mobile distributed file system: mobiDFS in order to place some implications on solving this problem. The generic idea of mobiDFS is to reduce computing in mobile device by transferring computing requirement to server. By using server-and-client mode, this goal can be reached. Mobile device, such as Android equipped smart phone, tablets, and pads, have multiple communication interfaces which contain Wi-Fi, 3G, Bluetooth and Near-FieldCommunication. This file system will choose the optimal way to transfer files when requested in order to reduce energy consuming. The implementation of this system allows users connect to the whole distributed file system directories without considering the factor of mobile device hardware platform. In addition, User privileges, which separate different users in the file system, are taken into consideration. Then, user shell as user interface was applied to gather some demo snapshots. Finally, this paper invokes the discussion about the new direction of distributed file system as a social oriented file system. I. I NTRODUCTION Equipped with good computing abilities, storage capacities, sensors (microphone, camera, GPS, gravitational sensor, ambient light sensor, etc.) and various wireless network interfaces, smart phone can communicate with each other or remote sever easily in the world’s largest network Internet. Thus, it provides an convenient platform for mobile computing. The rising demand for mobile computing asked for an improved network distributed file system that supports mobile devices, including portability, energy efficiency, extensibility, consistency and security. Regarding the sharply increasing amount of smart phone users, it is intuitive to have an idea taking advantage of these pervasive computing resource. File system is the crux in our design. The basic idea in our proposal is that we abstract the mobile devices, such as smart phones or PDAs, as a directory that mount to the whole mobile distributed file system and the system is working in a Client-Server model. Through this way, most of the low level complicated computing, e.g. locate resources and optimal data transfer route discovery, is moved to the server so that the mobile device can utilize its computing resource to serve mobile applications as much as possible. II. R ELATED W ORK Previous literature design many framework to address this issue, such as [1] proposed a framework for cloud computing on mobile devices. They implemented Hyrax, a platform derived from Hadoop that supports cloud computing on Android Fig. 1. Topology of the Mobile Distributed File System smart phones. Hyrax allows client applications to utilize data and execute computing jobs on networks of smart phones and heterogeneous networks of phones and servers. But the problem is that Hadoop is designed for clusters, and it is heavy weighted, so energy became one of the main constrain in this project. Their testbed is a cluster consisted of ten HTC Dream smart phones and Wi-Fi (802.11g) is the network interface. M-DFS [2] is proposed based on network file system mobile distributed file system for ephemeral sharing in proximity networks through Bluetooth and 802.11b. This paper studies NFS in the mobile environment and overcomes several shortcomings of the protocol when used under difficult network conditions disconnections, packet losses and low signal reception zones, which are typical in proximity networks. Caching is used mainly to improve performance (e.g. in low bandwidth-links), not to guarantee disconnected operation. MFS [3] is a client cache manager for a distributed file system to improve its performance. It focused on adaptation techniques for management of data accessed and modified by mobile hosts, through the use of prioritized communication, and an efficient cache consistency protocol using file access information to improve performance. However, those solutions are not suitable for a real mobile distributed file system. We propose a lightweight but full featured mobile distributed file system to support mobile application, such as file sharing, with energy efficiency, mobility, consistency and security. III. D ESIGN OVERVIEW A. System Overview Our mobile distributed file system is a multiple communication interface and device transparent file system, which is fully 2 discussed in [4]. Devices exist in the file system as a directory which contains its own subdirectories. Therefore, a user can browse a remote device like going deep into a folder. The basic idea is to map each device to a directory in the whole system so that any other users can browse the system directory like what they do with his local directory in the mobile device. All device directories mapped to the system are under the system remote root: /mobiHome. This directory will appear in each device as the system mount directory and it is the entry point of the system. The generic idea of this system, for another thing, is to be light weight and energy efficient on mobile device. Therefore, client and server working mode were chosen. We expect that the energy limited mobile device only involves computing as the least while requesting the server to do it as much as possible. This mechanism can both save energy in the mobile device and speed up the response from clients or server. Clients refer to mobile devices including mobile phone, mobile tablet, mobile pads, laptops and desktops. Server is the central machine that controls the system, handles clients’ requests and chooses optimal transmission method. Mobile smart phone are clustered or distributed in wireless networks connecting to the server via three different network interface: Wi-Fi, 3G and Bluetooth. The server is a desktop or laptop that connect to Internet or it can be reachable directly by mobile devices based on the location and network availability of mobile smart phones. As illustrated in Figure 1, mobile devices can communicate with each other through Ad-hoc networks setup by themselves; or they can connect to Access Point via 802.11; or they can contact base stations with 3G network; or they just use Bluetooth to build an low power link. In this complex network topology, mobile devices are abstracted to directories mounted in the distributed file system so that users can easily handle files on different mobile devices, which means users only concentrate on files in the directories similar on PCs, rather than stumble by network switch off. Client-Server model are chosen to cope with the lower layer operations. In Figure 2, the server maintains four tables: resource table, mobile device profile table, and resource request ranking table. Server receives updated information. e.g. location, local resource directory, and requests from devices and sends out command and link decisions to the clients so that they can setup file transmission route. In the server side, as what is illustrated in the Figure 2 above, user device information and user resource information are stored in the data base. Incoming control messages from any of the communication interfaces will be parsed by an interpreter which could be regarded as file system scheduler. Thereafter, this message will be handled by Analyzer that performs some optimization algorithms. Analyzer usually will choose the best interface and vendor to minimize the possible energy cost by using the information such as battery level, IP address, GPS signal that report from each device in the heartbeat message. Network Interfaces in this system contain Wi-Fi, 3G, Blue- tooth and Wired-Internet. Mobile devices may obtain various network interfaces but the most energy efficient interface will be used at a time. Upon these interfaces, socket communications are built between the server and the client. Both server and clients are running three listen threads in three different sockets where server and clients receive message. The three sockets are TCP listen socket, UDP listen socket and File Transfer listen socket. Among them, TCP listen socket is the major control message exchange channel while UDP is a backup. However, some of the message exchange will be on the UDP channel, such as heartbeat, because it is not in the essential message group which should be ensured to arrive correctly. File socket will only be used in the situation of the needs of file sharing. The server will send out control message that restrict the behavior of mobile devices. In addition, mobile devices are also connected with each other via sockets which will be used for file sharing and file deploying. Data base in the server stores the user information and resource information which will be used by Analyzer and Interpreter to identify the optimal resource or respond resource information request. Two tables are created and maintained in the server database: user table and resource table. The user table in the server’s database contains information of user name, user IP address, network interfaces, GPS signal and battery level. The resource table records each file that each user wants to store in the file system and its metadata. In addition to this information, the resource table will record file’s popularity so that the server can cache the popular file and offer the corresponding file request service directly to client. The client side also maintains a lite database which records the directory and file modifications in the daily use. The client program will put a time stamp to each record of modification so that next round heartbeat message can compress the directory updates and submit the changes to server. Once the local directories are synchronized with server, this database will update the time stamp to the current time. If the file is deleted, the database will handle it in three steps: record, report and delete item in database. User interface will be provided on the mobile device. Users will use this UI to get access to the system and to start browsing resources. All of user’s operations and requests will be handled in background so as to offer a fluently operating environment. B. Communication Interface and Device Transparent Considering users may use various devices such as mobile smart phone, mobile tablets, laptops and mobile pads, the system needs to satisfy the need of device transparent so that users will not be bothered for the switch back and forth between client software. To achieve the device transparency, file system server needs to get full communication interface information of user device. Typically, Wi-Fi, 3G and Bluetooth are the primary concerns. In the current design, however, Wi-Fi is the priority communication network in the system for the following reasons. First, Wi-Fi is free and is used most frequently on different 3 Client Server J i Join ACK0|Info Info|… ACK1|dirUpdate |d d Dir|… ACK2 Fig. 3. Join system and synchronization devices. Fig. 2. System architecture mobile devices in the campus wide region. Second, the signal and link quality is stable for Wi-Fi which will reduce resynchronizations that is caused by the client-to-server contact failures. Third, the IP address assigned from WiFi DHCP server is unique external IP so that both server and client can listen to each other directly. Even though Wi-Fi is relatively energy consuming, the file system itself is not the major energy customer. Other network required software, such as Gtalk, Skype and Google Voice, are the major power sinks in the mobile devices. It is the fact that 3G is a network that can cover around the country with both lower transmission rate and less energy cost. However, this network service is not free for many of the users; also, a latency may occur when users request for large files. Another drawback of 3G interface is that current carriers use NAT to assign IP address, this means will block the packets sent out from the server at the gateway. Mobile devices are assigned internal IP address that is only useful under the same gateway which is completely different the IP address assigned from WiFi network. In other words, clients are blind to server in this network condition so that client will regard itself offline even it is online in the server side. Therefore, 3G is only a backup plan for this system until NAT issue is address, then 3G could serve users without WiFi access. Bluetooth is a very good power efficient message transmission protocols which can be used for device-to-device communication. This communication mode will be used after clients are required by server’s control message. Since the coverage of Bluetooth, either GPS location algorithm or mobile network location algorithm is required to decide if two clients are suitable for Bluetooth communication. Thus, Bluetooth will only be used in the case of mobile devices are closed to each other or energy is limited. Device transparency requires a system to take network interface into considerations so that users do not need to manually decide which interface should be used. Additionally, users may use different platforms but those platforms should not disturb users’ operation because of the various communication interfaces. This should be solved by system background process after system server knows the interface information of C. System Scheduling System scheduling is the essential topic in the system since it includes six major schedule events: join system scenario, query resource scenario, search directory scenario, read file or directory status scenario, server cache request scenario, server heartbeat request scenario. In this section, the schedule protocols for each scenario will be discussed one by one in detail. 1) Join system and synchronization: This event happens at first when users start the program. As what is shown in Figure 3, the client program will send out a join message to server. Once the message is received by the server side, server will request client’s hardware information, such as device name, network interface, GPS and battery. Clients will follow the command from server to read device hardware information and battery level, then send out this message. The server will trigger directory update request message after receiving the hardware information packets and store this information in the database. Finally, server receives directory update by responding an acknowledgment to finish the start session. At this time, server and client are synchronized with each other and both of their database have been updated. 2) Query Resource: Query resource is one of the highest frequency talk sessions employed in the file system. The detail protocol is illustrated in Figure 4. Suppose client 1 starts querying some resource by using copy command. Client initiates the work by sending out query message to server. When server receives this message, it will post a new thread to handle this request in the background, process which usually contains resource finding, resource location optimization, vendor contact and confirmation of information exchange. Asynchronized mode is applied in order to avoid disturbing users. After users’ identification of the resource and the optimal method that are about to be transmitted, server will inform the requesting client the way to contact the resource vendor. The last step is that the request user will contact the resource vendor directly and retrieve the resource. It is possible that server has been cached the resource that client 1 want to copy. In this case, server will return message saying that it has the resource and request client 1 to contact with him in file socket. Then client 1 will have conversation with server even though client 2 is the resource owner. 4 Server Client #1 Q|… Client #2 Client Server Open|WiFi|… Ongoing ACK4|WiFi|… Search|… Try|WiFi|… ACK5 ACK3|… FTP|… Data… … Fig. 4. Query Resource Another possible situation is that server contact client 2 for a specific resource but client 2 fail to configure its network interface, it will return failure information to server. In this condition, server will try different interface by the sequence of WiFi, 3G and Bluetooth. If all of these interfaces are failed, server will inform client 1 for request failed because the resource owner cannot configure its file socket. In addition, for the same scenario happens above, client 2 may be quit the system before the server knows it, because heartbeat message have a detecting period of 30 minutes. Therefore, server cannot contact client 2 until time out. At last, server has to send a failure and try again information to client 1. Though it is the worst case, but once client 2 join system again, the server will immediately cache resources in the client 2 because server think client 2’s link is not stable and this bad effect will damage the file system’s consistency. 3) Search Directory: Search command will invoke this message event, as described in Figure 5, and the server will search the resource in the database and return the full path retrieving the resource to the requestor. This message uses UDP socket to transmit for two purposes: 1) It is not usual and does not require continuous connection; 2) If transmission failed, retransmit cost less than maintaining a TCP connection for a period of time. Potential failures may happen in the congesting wireless network. UDP packets may be lost in both directions from client to server and from server to client. This can be made up by try multiple times in the client side. It is not worth maintaining long time TCP connection in a busy network to ensure the arrival of a simple resource search request packet. A user may tolerant this failure by trying it again later rather than wasting precious energy in the mobile devices. 4) Read File or Directory Status: These two scenarios usually appear in the command of ls and stat. Clients send out the file or directory request to server, then server receive the request message and respond the corresponding information. For the same two purposes mentioned above, we use UDP instead of TCP. Figure 6 shows the connection flow. In some worst case that the resource vendor delete the file Fig. 5. Search Client Server R|File|… S|File|… R|Dir|… S|Dir|… Fig. 6. Read File or Directory Status just after one user request it. The current file system allows this happen because the file system assumes that the probability of this extremely case is very low. Even this happened, the user will not be bothered because the resource is invisible to anyone after 30 minutes or file transfer request case is generated. 5) Server Cache Request: This is invoked by server which periodically scans its database to see if any of the resource is getting more popular. If any of the resource is discovered, server will ask the client for a copy, which starts the message communication demonstrated in Figure 7. When the server and target client are synchronized, they will switch to file socket pipe to transmit the resource till they finish it. Once the server cache the resource, most of the request for this file will be responded by the server itself. Only those users located close to the vendor and server think Bluetooth is the best choice, this file request will direct to the original file owner. Network failure or logical failure still exists in this scenario but the file system server will handle it in a similar way to try multiple times first. If still fail, the server will mark this client to be fragile client and it will cache all of its resources when client’s network quality changes to good. 5 Client Server Client Cli C|… Server HB Open|FTP Ready Dir|… FTP|… Data … Fig. 8. Fig. 7. Server Heartbeat Detect Server Cache Request 6) Server Heartbeat Detect: Since clients may leave the file system involuntarily for various reasons, such as power off, lack of network access and network congestion. Those will make the client invisible in the file system without informing the server. In this case, server will check whether the client is alive by broadcasting heartbeat information to all clients and waiting for responses. The client that receives the heartbeat information will respond the directory modifications as piggy back. The working flow is shown in Figure 8. A heartbeat message results no response from a certain client does not mean this client has left the file system, because the heartbeat message is sent out via UDP socket. Network performance will destroy this scheme if network is really busy. In case of this condition, the system will use additional alive check method that checks a user whether alive at each message it receive from the user. If the user has been lost contact, the server program will update its database table to make the user back into the file system again. This scheme could add computation work to the server, but it does not affect the responding time to the client. As the file system goes larger, multiple server can be deployed to handle the confliction in the server. D. User Interface On the mobile device, users will be provided a user interface to cooperate with the distributed file system. Typically, a shell should be displayed to users. Once the users joins the file system, the following operation should be assigned: 1) Browsing the directory: : users are allowed to browse both local and system mount remote directories. Since each directory owned by a user will be set different levels of privileges and access lists for the other users, only permitted directories are visible to the user who browses the directory. 2) Copy and move directory: : users can copy directory in the network from other user device to its local directory if privileges are acceptable. However, either move or write operation is constraint to directory or file owner. This is similar to the current file system that requires administration authority to move, write, delete or modify. 3) Set directory attributes: : physical device’s directory can be both in-system and out-system. This can be defined by the owner user whether a directory or file should be in the system and what privileges should be assigned. 4) Read directory properties: : each directory or file should provide relevant information to any visible user, such metadata like file size, read/write privilege and ownership. 5) Search files: : user may know the name of the file without knowing the full path. This feature can help user find out the exact full path both local and in the system. E. User Privileges Logically, the distributed file system can only examine the directories public or accessible in each device. However, the physical device does not contribute all of its directories to the distributed system. Therefore, users need to obtain the authority to set, change and modify the privileges of all directories in the mobile device. User resource database in the server side records the privileges for each item. Some of them are public so that those are open to all users while the others are set as only accessible to a group of users. Other directories that are not set by users will not present in the distributed file system. In this way, we separate them into three categories: in-system directories, out-system directories, and different users. IV. I MPLEMENTATION A. Platform At present, the client platform for the distributed file system has been implemented in Android equipped smart phone and laptop. Typically, the minimal requirements that we specified for client device are Google Android 2.3+ operating system and the laptop supporting Java. Smart phones equipped with Google Android are chosen following thesefour reasons: First, popular smart phone operating system is prevalent around the world and more people prefer this kind of smart operating system. Second, Google 6 Android operating system is based on Linux kernel plus Davik Java run time which is the most familiar with developers and easy to modify. Third, it provides the easiest method to cooperate with other Google services that will be used for future development, such as Google Navigation, Google Maps and Google Accounts. Fourth, Android integrates the lightweight and embedded oriented database SQLite, which brings great convenience to the local data storage in the distributed system. On the other hand, the server platform that we implemented is Python 2.7 with MySQL database. The reasons why we chose Python with MySQL are first because Python saves a lot programming intensive work. Second, Python has more dynamic load advantages than Java or other languages. This property makes the development fluent without disturbing users in service while upgrading the distributed system service. Third, Python with MySQL has better integration with web service in Apache which could simplify the future development like search engine support and user web management service. B. User Shell Currently, we provide the client shell to allow user contact with the distributed file system. The shell commands list is the following: 1) ls: list all the sub directories that is visible for the request user. Usage: ls 2) pwd: to show the current directory both local and in the distributed file system. Usage: pwd 3) cd: continue browse the child directory both local and remote other device if accessible. Usage: cd <dir> 4) stat: print the directory’s meta data both local and remote device in the distributed file system. Usage: stat <src> 5) cp: copy one file from the source directory to destination directory. Both local and remote is the same. If the request target is another device in the same distributed system, the system will handle the file request background. Usage: cp <src> <dest> 6) search: search the full path for a specific directory both local and remote. Usage: search <src> 7) setpublic: set a directory as public and put it into the distributed system. Then this directory becomes in-system directory. Usage: setpublic <src> 8) setprivate: set a directory as private only visible to a group of people and put it into the distributed file system. Then this directory becomes in-system directory. Usage: setprivate <src> <access list [,...]> C. Limitations Though the system is now working correctly and serve for any users, there are two major limitations: 1) only Wi-Fi is working; 2) No GUI for users. In the previous sections we discussed the multiple communication interface advantages, but we only implemented the Wi-Fi interface because the Bluetooth protocol is so complex and we did not figure out a possible solution for this project, meanwhile, the 3G network has NAT to change the IP address and port number so that the server is not visible to clients in 3G network. Another pitfall is the user interface in the client device. We only provide a simple command driven shell to operate however, graphic user interface is badly needed for smart phone since typing command is extremely difficult by comparison to clicking on the screen. The file recovery scheme is not implemented in the current prototype. If the file owner’s hardware is damaged and the server has not cached the file yet, there is no other way to recover the file. Although we can assume the possibility of a user’s hardware failure is impossible, the file recovery scheme should also be taken into consideration. One of the quick solutions is replicate each file in the server side at the beginning. This simple and quick solution, however, may be affected by a lot of versions of each file. To address this point, the file system has to provide versioning control service to make sure each version of the file’s replicas has been well kept. One more limitation is that we think everyone own only one mobile device, so each device is separated with each other in the current file system. Unfortunately, a user may have multiple mobile devices such as smart phone, laptop and tablet. The present solution treats them as three directories that exist in the file system, even though they belong to a same user. This could be solved by change the minimal user count to user level rather than device level, and personal data synchronization service is provided in the file system. These four limitations are now in the future development progress. Solutions have been proposed to overcome these limitations that will provide better service than the current prototype. V. E VALUATIONS A. Experience After the prototype has been built and released, we have several experiences in this prototype. The following paragraphs recount the experiment operations. At the beginning when the client software starts, the server receives the join message from this client and requests the information of the device. Then the client responds to the server all information it needs, such as user name, network interfaces, GPS, battery information. This response invokes the server to update its database for the device information. Later the server asks for the directory update information. Once server receives the directory update information, the initialization synchronization is finished. The device is alive in the distributed file system. Then, the user wants to browse the /mobiHome directory, it goes into the directory and sends directory request to the server while server will reply its corresponding directory information. If the user is interested in a certain file, he can first request the status of the file by using the command stat. Then the server will return the status information to the user. 7 Again the user may want to copy the remote file to its local directory; in this situation, the user can just apply cp command to make it. The file copy will be done in the system background process. Every ten minutes, the server will go through its database and send out heartbeat message and the corresponding client will response to the heartbeat message by reporting its directory modification in the last 10 minutes which stores in the local database. Reversely, if another user wants to copy a file in this user’s directory, the file will be transmitted quietly in the background. If the file or directory owner user changes the privileges of certain file or directory, this modification will be both buffered in the database which is waiting for heartbeat message synchronization and report to server immediately. Once the server receives this report, it will adjust its corresponding item in the database. At last, the user closes the client program; the server receives the leave message and put the device status to be dead. B. Demo The above experience section will be presented as a demo. The demo setup in the following way: 1) One Google Nexus S Android 2.3 equipped smart phone runs the client program and one laptop runs the laptop client program. 2) Projector will show the server monitor screen so that it can be used to verify the functionalities of the system. 3) Both smart phone and laptop start working simultaneously and independently. The following figure 9 are shown the screen snapshot of the ongoing distributed file system. VI. D ISCUSSION AND F UTURE WORK A. Personal Storage Synchronization Basically this system does not consider the situation that a user hold multiple devices, each device require synchronization with each other. One of the popular software is Dropbox [5]. MobiDFS is different from the concept that server will cache the files and start synchronization among all of the user devices. This is possibly the easiest evolution of the current mobile distributed file system. B. Uniform File Sharing Platform Based on the topic above, platform like Dropbox only concentrates on the personal data synchronization rather than peer-to-peer data sharing. Our current design satisfies the idea that provides a uniform file sharing platform. C. File Version Control Version control is not supported by the current system but it is still a possible development directions for this project. If combining personal data storage synchronization, uniform file sharing platform and file version control, this distributed file system will be very useful. Many applications like group document editing, mobile synchronizing editing, version related applications, etc. Fig. 9. Screen Snapshot D. Social File System This is a novel idea that the file system can be socialized. All of the designs and methods discussed so far are directory oriented, but we are much more interested in the undirectory based design, like tags which associate with a person in the social network. Therefore, we put forward a new idea: Social File System. This design will change the infrastructure of the distributed file system, because data are stored, synchronized and shared in a social mode by using tag or other similar techniques without having a real directory to direct the browsing. People who follow you on the social network can change the relationship with you and your device in the distributed file system. E. Challenges File distributing and collecting algorithms should work well with dynamic change of the wireless network conditions. Algorithms, including responder selection and routing, should be highly adaptive in order to keep an acceptable performance. Multiple network interface may be exist simultaneously and different battery lives are among smart phones. An accurate indicator model which combine all these factors should be developed to provide correctly device condition information to algorithms in Challenge 1. Privacy and security is needed to ensure a relatively reliable mobile distributed file system. VII. C ONCLUSION In this project, we design and implement a lightweight mobile distributed file system with device transparent service. The implementation provides an real distributed file system 8 prototype that is now in service. This design is a fundamental design that worth extending to further evolution that combine personal data storage synchronization, transparent device file sharing and social networks. R EFERENCES [1] E. E. Marinelli, “Hyrax: Cloud computing on mobile devices using mapreduce,” CMU, Tech. Rep., 2009. [2] N. Michalakis and D. N. Kalofonos, “Designing an nfs-based mobile distributed file system for ephemeral sharing in proximity networks,” in Proceed of 4th Workshop on Applications and Services in Wireless Networks, 2004. [3] B. Atkin and K. P.Birman, “Mfs: an adaptive distributed file systemformobile hosts,” Departement of Computer Science, Corbnell University, Ithaca, Tech. Rep., 2003. [4] J. A. Strauss and M. F. Kaashoek. [5] Dropbox, “Dropbox, http://www.dropbox.com/l,” http://www.dropbox.com/l, 2011.