8803 AIAD Project Report Chetna Kaur (GT ID: 902428550) Contents: 1. Abstract 2. Introduction 1. Objectives 2. Motivation 3. Our approach 3. Design Principles and Challenges 4. Architecture 5. System Interface & Functionality 6. Detailed System Design 7. Evaluation 8. Strengths & Limitations 9. Main Contributions 10. Future Work Acknowledgements I would like to thank Dr. Ling Liu for her guidance and for several useful suggestions that have helped shape this work. I also take this opportunity to thank Mahesh S.B., my ex-System Architect at Huawei Technologies, for helping me find answers to the various challenges that arose during the course of this project. Introduction 1.1. Objective With the popularity of mobile internet devices growing exponentially every year, mobile computing is emerging as a new platform in the evolution of internet computing. A key area that needs to evolve in this paradigm shift is the development of efficient storage services that meets the demands of a modern mobile internet device user. While many corporations have focused on providing explicit storage space to the traditional end user PC applications, most of them do not provide means to extend this available storage to the new generation mobile internet applications. Therefore, what is required is a storage service available on the internet that can be used to deliver content to the end user mobile internet device keeping in mind the limitations that are inherent to the mobile internet device platform. The service thus provided could then be used in such a way so as to extend the storage capability of internet enabled devices and will pave the way for new generations of applications to be written that will provide rich content to all Internet user regardless of their memory limitations and geographical location. 1.2. Approach In this work we propose an approach that will let the user access files from the remote store without having to download the file locally in its entirety. We propose the use of a block level filing system to meet these requirements. A block level filing system, serves to maintain a logical view of each of the user’s files in the form of several smaller sized Blocks. The use of this approach allows for easy download/upload of blocks of files, as needed by the user, consuming both less bandwidth and memory. Further, we propose architecture for a distributed storage platform, the detailed design of the block level file system, and the API’s that need to be provided by such a platform. Requirements and Challenges Efficiency: Currently, data rate dictates the cost of internet access from a mobile device. Though this trend will change in the future, efficient techniques for accessing the storage must be designed. An example would be to provide APIs that will let the user access only part of the file from the remote store without having to download the file locally in its entirety. Client Memory and Bandwidth Limitations: The main requirement was to be able to cater to the clients with resource constraints. The storage service APIs had to be designed bearing in mind this limitation while providing seamless access to storage that will enhance the user experience of operating in a mobile environment. Scalability: The storage service must be scalable to meet the requirements of ever increasing mobile internet users. The systems that provide the actual storage could be servers residing in different physical locations. The storage service must be capable of running on this distributed system of servers. Simple API interface: The APIs should follow a simple model of JSON over HTTP protocol. System Architecture Meta File System Server WWW Distributed Storage Server Access Server Fig: System Architecture Detailed System Design 1.1. Block Level File System The Meta File System and Access Server together form the Block Level Filing System. A block level file system serves two purposes. First, it represents data to end users and applications. This data is organized in directories or folders typically in some hierarchical fashion. The second thing it does is organize where data is placed in storage. These filing systems have to scatter the data around the storage servers to make sure that all data can be accessed with reasonable performance. They do this by directing the storage block addresses where the data is going to be placed. These are actually all logical block addresses as the disk drives keep their own internal block translation tables. So the filing system sends commands to "slave" storage to write data to certain blocks and retrieve it from certain blocks. This is what is commonly called blocklevel storage. Storing functions are based on master/slave relationships, not client server. It is also possible for systems to request data using the user-level data representation interfaces (File level storage). This is done by the client using the data's filename, its directory location, URL, or whatever. This is a client/server model of communicating. The access server in this case receives the filing request and then looks up the data storage locations where the data is stored and retrieves it using storing level functions (block level storage). 1.2. Meta File System Meta File System abstracts the User created directories and the files. This is the central data store that maintains information such as user currently connected, files are being used and the blocks are being written, Storage Servers on which file are being created. Meta File System runs as a separate component on a dedicated server. Following Diagram show the basic design of Meta File System, 1.3. Access Server Access Sever acts as user gateway to internet file system. This is responsible for controlling the flow of all user action and the file system function like file open, close, read, write, create file/directory etc. Access Server makes use of Meta File System which mimics the file system. When a User file needs to be created, it gets Meta File System for the Storage Sever information on which it has to be created and requests Storage Server to create the file. The Access Server also contains a file cache, and carries out caching of blocks that a client requests or may request in the near future. Fig: Access Server Design 1.4. Storage Server The Storage Server is the data store where all the files are stored. It is essentially has a flat hierarchy. And all the meta-data for each file is maintain by the Meta File System. A single file, may be split up and stored in several different actual files on the storage server. 1.5. System Interface The system interface essentially consists of a set of APIs to provide access to the functions provided by the storage server. Some of the interfaces provided by this platform have been listed below. 1. Simple API for user login (password based authentication). 2. APIs to create, and delete files and directories. 3. APIs to read and write to files at a block level granularity. 4. API to read a directory, and display it contents. 1.6. Web Client This is essentially a reference implementation using the API set will be developed on an internet platform. It has primarily been developed to demonstrate the capabilities of this storage solution. Screenshots from the Web Client: Fig: User chetna’s Root directory. Fig: File Editor to edit files. The links, previous and next are to get the previous & next blocks of data if more data is available on the server. Important Design Decisions This section discusses the important design decisions made during the course of this project to meet the design challenges outlined in Section x.x. 1. Scalability – This was one of the most interesting requirements to design for. I took several decisions based on an attempt to make my system more scalable. One of these was to separate the Meta File System from the Access server and to allow support for several access servers to be able to access the meta file system. While this is not something that I have tested, my design was trying to ensure that this was a requirement that was easy to add. 2. What Meta-Data do I capture? The Data Structures selected were guided by the following considerations. The Meta file system should support a. Fast Indexing b. User Based Views (of directory structure) c. Ease of building a file sharing application (or other simple applications) with minimal effort. 3. Efficiency – To ensure that the file upload and download time is reasonable, a. Introduced a File caching mechanism at the Access Server. b. Designed Efficient Algorithms for creation of new files and allocation of new data blocks. System Interface – Designing the interfaces required for block level 4. access by a client as well as for regular file operations such as create, delete etc. Technology – What technology to use was an important question that 5. I had to spend quite some time figuring out. After considering several alternatives, I choose to use JSON over HTTP for communication between the various sub-systems. This is because it is one of the most light weight of data encoding formats. Evaluation 1.7. Evaluation and Testing Method 1. Functional test: This covers the basic functionality of the product as a whole. The tester was able to provision storage space on the server, create, store and retrieve data files onto the web client application. 2. Performance test: Response time have been measured for file read and write access. These have been compared against the file access times of another standard internet file storage application, the Gmail drive, to show the distinct speed and memory gains achieved with the block level approach as opposed to the traditional file level approach. Main Contributions 1. Proposed a Scalable, Highly-Available Architecture for a Distributed Internet File System. 2. Design and implement an efficient Meta-File structure for a block level file system implementation. 3. Implemented a simple file caching and pre-fetching mechanism, to improve efficiency of block level file access. 4. Design and implement APIs for accessing files block by block for client with resource constraints. 5. Implement a proof of concept system, and built a basic web based file storage application on this platform to demonstrate its capabilities. Future Work 1. Persistence of Meta File System structure 2. Improved File Caching Mechanism in the Access Server 3. Improve Block Allocation Algorithm in MFS. 4. Move towards a Google Docs type of implementation (Multiple users can write different blocks of data in the same file simultaneously.) Bibliography 1. Introduction to the Open XDrive API (http://dev.aol.com/article/2007/xdrive_api) 2. The Storage Delivery Network (http://nirvanix.com/platform.aspx) 3. IFS – an Internet File System implementation based on Web services and peer-to-peer technology by Stoyan (http://www.codeproject.com/KB/webservices/ifs.aspx) 4. OpenDHT: http://www.opendht.org/ Damov