平成 19 年度 筑波大学第三学群情報学類 卒業研究論文 An Internet File System for Random Accessing Protected Data 主専攻 計算機システム 著者名 AHMAD SYAHIR BIN CHE ABDULLAH 指導教員 板野肯三、新城 靖、佐藤聡、中井央 1i Abstract As the Internet become ubiquitous in1 these days, 2video collaboration over the Internet becomes very common. For example, in the sports coaching 4area, video data is shared among scattered organizations. Most of 5this video data are confidential and need to be protected. Typically, to distribute protected video data, Digital Rights Management (DRM) is often used. Representative DRM products include Windows Media DRM, Helix DRM, FairPlay DRM, and DReaM. Existing DRM mechanisms for video have a problem in the random access capability. Random access is really important for video seeking. This problem is 7inherent for these existing DRM mechanisms because they are optimized for major users who watch video sequentially. In this paper, we 8propose a new data protection mechanism to access protected data through the Internet which allows random access usage. 3 i Table of Contents Chapter 1 Introduction............................................................................................................... 1 Chapter 2 Related Works ........................................................................................................... 3 2.1 Digital Right Management ............................................................................................ 3 2.1.1 Windows Media DRM ........................................................................................... 3 2.1.2 Apple’s FairPlay ..................................................................................................... 4 2.3 DirectShow API .............................................................................................................. 4 2.4 Shell Namespace Extension........................................................................................... 5 Chapter 3 Overview of our protected data distribution ............................................................ 7 Chapter 4 VFS Module ............................................................................................................... 9 4.1 Dokan Library ................................................................................................................ 9 4.1.1 4.2 Dokan’s component ............................................................................................ 10 User-level Module for Dokan ...................................................................................... 11 4.2.1 Global variables ................................................................................................... 11 4.2.1 Items class ........................................................................................................... 12 4.2.2 CreateFile() function............................................................................................ 12 4.2.3 OpenDirectory() function .................................................................................... 13 4.2.4 GetFileInformation() function ............................................................................. 13 4.2.5 ReadFile() function .............................................................................................. 15 4.3 Secure Channels .......................................................................................................... 16 4.3.1 Named Pipe ......................................................................................................... 16 4.3.2 Filename Extension ............................................................................................. 17 4.4 Performance Issues ..................................................................................................... 18 4.4.1 Pooled Connections............................................................................................. 18 4.4.2 Cache and Prefetch ............................................................................................. 19 Chapter 5 Server-side Programs .............................................................................................. 20 5.1 Persistent Connections................................................................................................ 20 5.2 Byte serving ................................................................................................................. 20 Chapter 6 Encryption and Decryption...................................................................................... 21 6.1 Advanced Encryption Standard ................................................................................... 21 6.2 Counter mode ............................................................................................................. 21 ii 6.1.1 Padding ................................................................................................................ 22 6.3 Token ........................................................................................................................... 22 6.4 Encryption software .................................................................................................... 23 6.3.1 6.5 Chapter 7 Multithreads ........................................................................................................ 24 Decryption ................................................................................................................... 25 Experiments............................................................................................................. 27 7.1 Experiment environment ............................................................................................ 27 7.2 Encryption Rate ........................................................................................................... 27 7.3 Transfer rate ................................................................................................................ 28 Chapter 9 Integration with Movie Database System of Japan Institute of Sports Sciences 31 9.1 Current data distribution mechanism in JISS movie database system ....................... 31 9.2 New data distribution mechanism propose to JISS movie database system .............. 31 9.3 The advantage compare to current mechanism ......................................................... 32 Chapter 10 Conclusions ............................................................................................................... 34 Acknowledgements ..................................................................................................................... 35 References................................................................................................................................... 36 iii Table of Figures Figure 1: Data flow and token distribution ........................................................................... 7 Figure 2: Dokan library working in Windows kernel ........................................................... 10 Figure 3: Global variables .................................................................................................... 11 Figure 4: Items class source code ........................................................................................ 12 Figure 5: CreateFile() function code .................................................................................... 13 Figure 6: GetFileInformation() code .................................................................................... 14 Figure 7: ReadFile() function code ...................................................................................... 15 Figure 8: Partial of GetNamedPipe() code .......................................................................... 16 Figure 9: CreateFile() function code when using filename extension................................. 17 Figure 10: Counter mode encryption .................................................................................. 22 Figure 11: Portion of Encrypt() function ............................................................................. 23 Figure 12: Graph showing encryption rate using different thread count ........................... 28 Figure 13: Graph of transfer rate using different methods ................................................ 29 Figure 14: Graph of CPU usage of different method .......................................................... 29 Figure 15: Current data distribution mechanism in movie database system of JISS .......... 32 Figure 16: New data distribution mechanism proposed to movie database system of JISS ..................................................................................................................................... 33 iv Table of Tables Table 1: Multithreads calculation........................................................................................ 24 v Chapter 1 Introduction Internet is ubiquitous on these days and has been essential for collaborating people. Therefore, sharing data on the Internet 9is really common for easy access. For example, in the sports coaching 10area, video data is shared with annotations among scattered organizations [1]. Most of these video data are confidential 11which need to be protected. Typically, to distribute 12 protected video data, Digital Rights Management (DRM) is often used. Representative DRM products include Windows Media DRM, Helix DRM, FairPlay DRM, and DReaM. Existing DRM mechanisms for 13video have a problem in the random access capability. Random access is really important for video seeking. This problem 14is inherent for these existing DRM mechanisms because they are optimized for major users who watch video sequentially. In concrete, these existing DRM mechanisms use a large buffer and 15often download entire video data in advance. This feature is nice to mitigate jitters, high latency, and slow throughput16. However, this feature is not suitable for applications that 17need random access. In this paper, we propose a new data protection mechanism to access protected data through the Internet. 18In this mechanism, instead creating a whole new player or video format, we 19reuse existing media players for the Windows operating system (OS) without 20large modifications to applications and no modification the OS. Instead, we 21extend the OS in the Virtual File System (VFS) layer. To simplify the implementation of a VFS module, we 22use a framework for userlevel file systems called Dokan [2]. Upon 23execution VFS module 24mount a web server as a new logical drive. The logical drive acts as a pipe to access the protected file on the web server. Some media players 25not support opening file on remote server. Even 26it supported, there are no random access capability for the remote file. From the 27side of the application which 28 accessing the files on the logical drive, it only sees the files as local files and this enable almost any application to open it with random access capability. However, just to give random access capability is not the whole goal for this research. We also want to provide access control to the accessible file in case the file is protected or classified. In other words, we don’t want any malicious access 29the file on the mounted logical drive. This can be done by 30hide the logical drive or 31reports there is no file in the drive except to the legitimate application. To make it security comparable with other DRMs, the file placed on the web server is 32decrypted. This will prevent malicious users to download the files on the web server even he or she knows the file URL. 33Even malicious users can download it, he/she cannot open the file without 34decrypt it first. The entire 35necessary job: accessing file from web server, decrypt it and pass it to the legitimate application 36done seamlessly by the VFS module that we will introduce in this paper. To provide encryption while maintain the random access property, 37special mode called Counter Mode with Advanced Encryption Standard (AES) method 38used in this research. 39It not only fast to do the encryption process but also provides 41unbreakable result. The VFS 11 module obtains 41legit key from the legitimate application using 42secure channel and decrypts the file, block by block. Our protected data distribution mechanism 43also has the scalability. In case the protected data is 44 media file, 45it no longer limited to any proprietary file format which depend on proprietary software of 46media player to play the media file. For example, when using Microsoft DRM, 47 user must use 48file with format WMV or WMA and can only be played on Windows Media Player. Other media 49player also supported only if the developer got 50proper authorization from the Microsoft. 51It goes same to Apple FairPlay which 52is need media player make use of QuickTime. In other 53word most of this DRM mechanisms is closed source, limited and not customizable. The rest of the thesis is organized as follows. Chapter 2 54discuss about related works including Digital Right Management, Microsoft DirectShow API, Shell Namespase Extension and File system in User Space (FUSE). Chapter 3 discusses the overview of our protected data distribution mechanism more detail. Chapter 4 elaborates the VFS module of our mechanism and performance issues. Chapter 5 55is talking about server side software and their requirements. Chapter 6 elaborates more about encryption and decryption used in this mechanism. Chapter 7 examines the system functionality and performance. In chapter 8 we propose the integration of this mechanism into SmartSystem of Japan Institute of Sports Sciences. Chapter 9 concludes this thesis and 56discusses future work of this thesis. 2 Chapter 2 Related Works 2.1 Digital Right Management Digital Right Management (DRM) 57is always refers to access control technologies used by publishers and copyright holders to limit usage of digital media or devices. It may also refer to restrictions associated with specific instances of digital works or devices. To some extent, DRM overlaps with copy protection, but DRM is usually applies to creative media (music, films, etc.) whereas copy protection typically refers to software. In DRM, data files (usually audio and video) are encrypted or just wrapped in encrypted containers. The authentic player will receive the key to decrypt the data. However there are several problems using this kind of DRM. Most of 59this DRM need proprietary file format and special software or API to be used. For example Microsoft DRM also known as Windows Media DRM need the user to use their property codec like WMV or WMA and this protected media can only be played on 60their property media player. Other media player may support the DRM if the developer got the authorization from Microsoft. 61 This also the same for Apple’s FairPlay, which 62limited the file format to MP4 and AAC codec, and only allow to be played using Quicktime. 63This limited to the audio and video file, and not to other format such as document file. 64 There also problem with the encryption. Most of these DRM mechanisms are targeting the download data. Some of them also support direct streaming from the server, but usually use large buffer to 66promise smooth playback. Each time 67user seeking the video, the media player buffers the data before start playing. Also random access 68only supported for limited file format. This is 69become disadvantage as user needs to convert the file to the specific file format. Some of 70DRMs also only has 71one key or a limited set of keys for all file, which is really unsecure. Typical DRM 72also built on purpose to restrict user playing the media 73not to protect it. For example it prohibits user from playing the media on 74the unregistered machine, or limits play count. Most well-known DRMs are Microsoft’s Windows Media DRM and Apple’s FairPlay: 65 2.1.1 Windows Media DRM Windows Media DRM is a Digital Rights Management service for the Windows Media platform. It is designed to provide secure delivery of audio and/or video content over an IP network to a PC or other playback device in such a way that the distributor can control how that content is used. It using a combination of elliptic curve cryptography key exchange, the DES 3 block cipher, a custom block cipher dubbed MultiSwap, the RC4 stream cipher, and the SHA-1 hashing function. Windows Media DRM is designed to be renewable, that is, it is designed on the assumption that it will be cracked and must be constantly updated by Microsoft. The result is that while the scheme has been cracked several times, it has usually not remained cracked for long. 2.1.2 Apple’s FairPlay FairPlay is a DRM technology created by Apple Inc., based on technology created by the company Veridisc. FairPlay is built into the QuickTime multimedia software and used by the iPhone, iPod, iTunes, and iTunes Store. Any protected song purchased from the iTunes Store with iTunes is encoded with FairPlay. FairPlay digitally encrypts AAC audio files and prevents users from playing these files on unauthorized computers. FairPlay protected files are just regular MP4 container files with an encrypted AAC audio stream. The audio stream is encrypted using the AES algorithm in combination with MD5 hashes. The master key required to decrypt the encrypted audio stream is also stored in encrypted form in the MP4 container file. The key required to decrypt the master key is called the "user key." Each time a customer uses iTunes to buy a track a new random user key is generated and used to encrypt the master key. The random user key is stored, together with the account information, on Apple’s servers, and also sent to iTunes. iTunes stores these keys in its own encrypted key repository. Using this key repository, iTunes is able to retrieve the user key required to decrypt the master key. Using the master key, iTunes is able to decrypt the AAC audio stream and play it. When a user authorizes a new computer, iTunes sends a unique machine identifier to Apple’s servers. In return it receives all the user keys that are stored with the account information. This ensures that Apple is able to limit the number of computers that are authorized and makes sure that each authorized computer has all the user keys that are needed to play the tracks that it bought. 2.3 DirectShow API DirectShow is a multimedia framework and API provided by Microsoft for software developers to perform various operations with media files [3]. Based on the Microsoft Windows Component Object Model (COM) framework, DirectShow provides a common interface for 4 media across many of Microsoft's programming languages, and is an extensible filter-based framework that can render media files on demand by applications. DirectShow divides the processing of multimedia tasks such as video playback into a set of steps known as filters. Each filter represents a stage in the processing of the data. Filters have a number of input and output pins which connect them together. The generic design of the connection mechanism means that filters can be connected in many different ways for different tasks to build a filter graph, and developers can add custom effects or other filters at any stage in the graph then render the file, URL or camera. Most video-related applications on Windows, not only Microsoft's Windows Media Player but also most third-party applications use DirectShow to manage multimedia data. However, DirectShow has a problem with a random access capability. Applications that use DirectShow API can perform random access for local files but cannot when opening files over HTTP. In other words, the video seeking does not work when applications open files over HTTP. 2.4 Shell Namespace Extension In Windows environment, 75there several ways to implement a new file system. The easiest way is through user land Shell namespace extensions [4]. With a namespace extension, software developer can take any data and have Windows Explorer present it to the user as a virtual folder. When a user browses into this folder, the data is presented as a tree-structured hierarchy of folders and files, much like the rest of the Shell namespace. Users and applications are able to interact with the contents of this virtual folder in much the same way as with any other namespace object. Behind the scenes, every folder that Windows Explorer displays is represented by a Component Object Model (COM) object called a folder object. Each time the user interacts with a folder or its contents, the Shell communicates with the associated folder object through one of a number of standard interfaces. The folder object then does whatever is necessary to respond to the user's action, and the Shell updates the Windows Explorer display. The majority of the files and folders that users interact with are part of the file system or a system virtual folder such as the Recycle Bin. To implement a namespace extension, the information must be organized as a tree-structured namespace. The namespace root is presented as a virtual folder in the Shell namespace. The root folder, and all its subfolders and data items, becomes part of the Shell namespace, and Windows Explorer becomes the user interface. Developer can thus present their information to the user in a familiar and readily accessible way with much less UI programming than would be required for a custom application. 5 The availability of shell namespace extension file system toolkits [5], lighten the process of implementation file system using namespace extension. The most notable file system using namespace extension is GMail Drive [6], which is a Namespace Extension that creates a virtual file system around Google Mail account, allowing user to use Gmail as a storage medium. However, despite of this easiness, file system that implemented using this method does not support the lowest-level file system access API in Windows, including DirectShow API. The file system cannot be mapped as a drive letter. The file system also inaccessible through command line tools. It can be accessible using Windows Explorer. So not all applications are able to access file systems that are implemented as namespace extensions. 2.5 File system in User Space File system in User space (FUSE) [] is a free Unix kernel module, released under the GPL and the LGPL, that allows non-privileged users to create their own file systems without editing the kernel code. This is achieved by running the file system code in user space, while the FUSE module only provides a "bridge" to the actual kernel interfaces. FUSE was officially merged into the mainstream Linux kernel tree in kernel version 2.6.14. FUSE is particularly useful for writing virtual file systems. Unlike traditional file systems which essentially save data to and retrieve data from disk, virtual file systems do not actually store data themselves. They act as a view or translation of an existing file system or storage device. In principle, any resource available to FUSE implementation can be exported as a file system. FUSE is available for Linux, FreeBSD, NetBSD (as PUFFS), OpenSolaris and Mac OS X. 76 Dokan library used in this mechanism is similar to FUSE, instead it running in Windows environment. 6 Chapter 3 Overview of our protected data distribution Authentication server User PC Windows kernel Kernel level VFS module Filename Token Application Internet Secure channel Web server Admin PC Encryption software User-level VFS module Encrypted files Encrypted file Original file Figure 1: Data flow and token distribution In this mechanism, we make use of existing application or media player 77with small modification and no modification to OS. Instead, we extend the OS using virtual file system (VFS) layer. We 78try to overcome the most problem exist in existing DRM mechanisms by providing the protection on the file system level instead. We also try to make workaround for random access problem when accessing remote file that exist in DirectShow API by not using the URL accessibility API in DirectShow, instead 79make the remote file as 80local file virtually. All the process to access the remote file is done by the VFS module. VFS module introduced in this paper also 81not using the namespace extension approach. In its place, we use Installable File System (IFS), a file system API in Microsoft Windows that enables the OS to recognize and load drivers for file systems. Our mechanism consists of several components: encryption software, authentication server, web server and virtual file system module as shown in Figure 1. When the administrator wants to upload a file to the web server, he/she need to use the encryption software. The encryption software generates random key and nonce and encrypts the given file. After finish uploads, the software registers the filename, original file, file’s key, and file’s nonce onto the authentication server. All this information packs as a token stored in authentication server. We will elaborate details of token in Chapter 6 Encryption and Decryption. When a user runs the legitimate application, the application logs into the authentication server to receive the filename/token pair. The applications also trigger the execution of VFS module, which is mounts a virtual logical drive. Upon opening the file, the application passes the token to the VFS module. The VFS module accesses the data from the web server and decrypts it using the received token from the application through a secure channel before sends it back to the application using file system API. 7 This paper focuses on the following programs: The VFS module, encryption software and applications. We will not discuss about authentication server as we assume that we can use some token distribution mechanism such as Smart System [1]. 8 Chapter 4 VFS Module The virtual file system (VFS) layer is an abstraction layer on top of more concrete file systems. The purpose of VFS is to allow applications to access different types of concrete file systems in a uniform way. VFS specifies an interface (or a "contract") between the kernel and a concrete file system. Therefore, it enables to add new file system types to the kernel by fulfilling the contract. For this mechanism, instead of using a limited shell namespace extension, we use a more reliable approach, Installable File System (IFS). IFS is a file system API in Microsoft Windows that enables the OS to recognize and load kernel module for file systems. IFS implementation in Windows is really 83hard work because it involved kernel programming. To simplify this we make use of Dokan library. IFS used in this research 84has a difference compare to normal IFS as it separated into 2 components: a kernel level module and a user-level module. 4.1 Dokan Library In this research, we use Dokan library [2] for simplifying kernel level programming. Dokan library contains a user mode dynamic library link (DLL), dokan.dll and a kernel mode file system driver, dokan.sys. Once Dokan file system driver is installed, user can create file systems which is seen as normal file systems in Windows. In this paper, we refer the application that creates file systems using Dokan library as user-level module for Dokan. File operation requests from user programs (e.g., CreateFile, ReadFile, WriteFile, …) will be sent to the Windows I/O subsystem (runs in kernel mode) which will subsequently forward the requests to the Dokan file system driver (dokan.sys). By using functions provided by the Dokan user mode library (dokan.dll), file system applications are able to register callback functions to the file system driver. The file system driver will invoke these callback routines in order to response to the requests it received. The results of the callback routines will be sent back to the user program. For example, when a Windows application requests to open a directory, the OpenDirectory request will be sent to Dokan file system driver and the driver will invoke the OpenDirectory callback provided by the user-level module. The results of this routine are sent back to the Windows application as the response to the OpenDirectory request. Therefore, the Dokan file system driver acts as a proxy between user programs and file system applications. Dokan is written in C and the user-level module can be written in C or Ruby and C# using provided language binding support. 9 Application User-level module Windows kernel Dokan file system driver Figure 2: Dokan library working in Windows kernel 4.1.1 Dokan’s component Dokan itself consists of several main component: 4.1.2 dokan.dll Dokan user mode library. It provides functions to the user-level module. dokan.sys Dokan File System Driver. It stays in kernel-level to invoke call-back function provided by user-level module. mounter.exe Dokan mounter service. It run as service to mount a virtual drive when the mount function invoked. dokanctl.exe Dokan control program. User may use this program to dismount the mounted drive if the user-level module ends unexpectedly. dokan.lib Dokan import library dokan.h Dokan library header DokanNet.dll Library for .NET binding. This is required to write user-level module in C#. Callback function in Dokan library Dokan library provide necessaries callback function to create a full features of file system, however in this mechanism we only use several functions. 10 Function name CreateFile Parameters string filename, FileAccess access, FileShare share, FileMode mode, FileOptions options, DokanFileInfo info string filename, DokanFileInfo info string filename, DokanFileInfo info string filename, DokanFileInfo info string filename, byte[] buffer, ref uint readBytes, long offset, DokanFileInfo info string filename, byte[] buffer, ref uint writtenBytes, long offset, WriteFile DokanFileInfo info string filename, DokanFileInfo info FlushFileBuffers GetFileInformation string filename, FileInformation fileinfo, DokanFileInfo info string filename, ArrayList files, DokanFileInfo info FindFiles DokanFileInfo info Unmount OpenDirectory Cleanup CloseFile ReadFile 4.2 User-level Module for Dokan We implement the user-level module of Dokan in the C# language. As this file-system focuses on read-only capability, we make use only several needed callback function. Functions related with creating folder and writing file will return -1 or error. Typically when an application opening a file from the file system, CreateFile(), OpenDirectory(), GetFileInformation(), ReadFile(), Cleanup(), and CloseFile() in sequence, so we only focus on these callback functions and some others specific functions. 4.2.1 1 2 3 4 5 6 7 8 Global variables private int count_; private string host_ = "http://server"; public static Hashtable filetable = new Hashtable(); public static TcpClient c = null; public static Stream s = null; public static SimpleEncoding se = null; public static StreamReader r = null; public static StreamWriter w = null; Figure 3: Global variables 11 As shown in figure in line 1 variable count_ is counter for the Dokan file handler. It increase each time new CreateFile() function invoked. The host_ variable in line 2 is the web server hostname. filetable variable is hashtable which is store files meta information using filename as the key and Items class as the value. Variable from line 5 to line 8 used to create pooled connection. This will be elaborated more in ReadFile() function. 4.2.1 Items class Item class functions to store file’s meta information such as the filenames, files attribute (whether it a file or directory), file sizes, files key, files nonce and process ID (PID) of application which opening the file. Details about key and nonce will be elaborated in chapter 5. 1 2 3 4 5 6 7 class Items { public string name; public int size; public byte[] key; public byte[] nonce; public int PID; } Figure 4: Items class source code 4.2.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 CreateFile() function public int CreateFile(String filename, FileAccess access, FileShare share, FileMode mode, FileOptions options, DokanFileInfo info) { string path = HttpUtility.UrlPathEncode(filename.Replace("\\", "/")); info.Context = count_++; if (path.Equals("/")) { info.IsDirectory = true; return 0; } if (!filetable.ContainsKey(path)) { ulong PID = GetNamedPipe(path); if (PID == 0 || PID !-= info.ProcessID) return -DokanNet.ERROR_FILE_NOT_FOUND; } try { HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://host_" + path); HttpWebResponse response = (HttpWebResponse)request.GetResponse(); request.Accept = "text/plain"; if (response.StatusCode.ToString() == "OK") return 0; else return -DokanNet.ERROR_FILE_NOT_FOUND; 12 23 24 25 26 27 28 } catch { return -DokanNet.ERROR_FILE_NOT_FOUND; } } Figure 5: CreateFile() function code When the application creates a file handle it will invoke this function. The first thing this function do is converts the backslash in the filename to slash. This is necessary as URL use slash as delimiter instead of backslash like Windows do. In the line 4 on source code above the function increase the fill the file Context with the counter and increase it. After that it checks whether the application opening a file or a root directory. If it opening the root directory then it just set Dokan file handler, IsDirectory as true and return 0. It continues to verify the filetable whether the filename key exists or not. If the key does not exist it will invoke the GetNamedPipe() method. GetNamedPipe() sends a request to the pipe server which is the application that open the file. If the application returns with the proper token, GetNamedPipe() fill the filetable with received data and returns the application PID. Otherwise it returns 0. More details about GetNamedPipe() will be elaborated in Named Pipe section in Secure Channel subchapter. If the returned PID is 0 or the PID is not the same with the PID of application which opened the file, then CreateFile() returns a file not found error. After that the CreateFile() functions accesses the file header from the web server to obtain the verification of the files’ existence on the web server. If the file exists it will return 0, otherwise it return error with file not found status. 4.2.3 OpenDirectory() function Virtually the file system is empty, since this file system does not allow access to any applications other than legitimate application, and the legitimate application already knows which files that exist. In other words, applications are not allowed to open directory. However, to avoid problem upon opening file, OpenDirectory() always return 0 or true. 4.2.4 1 2 3 4 5 6 7 8 GetFileInformation() function public int GetFileInformation( String filename, FileInformation fileinfo, DokanFileInfo info) { string path = HttpUtility.UrlPathEncode(filename.Replace("\\", "/")); if (!table.ContainsKey(path)) return -1; Items file = (Items)filetable[path]; 13 9 10 11 12 13 14 15 fileinfo.Attributes = m.type; fileinfo.CreationTime = DateTime.Now; fileinfo.LastAccessTime = DateTime.Now; fileinfo.LastWriteTime = DateTime.Now; fileinfo.Length = file.size; return 0; } Figure 6: GetFileInformation() code This function is invoked when the application which opened the file try to get the file information. GetFileInformation() draws the file information from the filetable hashtable and fill it in the fileinfo parameter and returns 0. As the file time is not important it’s just filled with the current time. If the requested filename is not available in the hashtable then it returns -1 (error). 14 4.2.5 1 2 3 4 5 6 7 8 9 10 11 12 13 . . . 24 25 26 27 28 29 ReadFile() function public int ReadFile(String filename, Byte[] buffer, ref uint readBytes, long offset, DokanFileInfo info) { string path = HttpUtility.UrlPathEncode(filename.Replace("\\", "/")); if (!table.ContainsKey(path)) return -1; Items file = (Items)filetable[path]; int filesize = file.size; byte[] key = file.key; byte[] nonce = file.nonce; if ((int)offset >= filesize) { readBytes = 0; return 0; } ... calculate block counts ... ... get position of block in file ... ... find new offset ... byte[] encryptedData = AccessData(path, offset, bytesize); byte[] decryptedData = Decrypt(encryptedData, firstBlock, key, nonce, filesize, int offsetInFirstBlock) Array.Copy(decryptedData, buffer, decryptedData.Length); readBytes = (uint)decryptedData.Length; return 0; } Figure 7: ReadFile() function code This is the most important function as 85it almost the ideas of this research go into this function. Like other necessary callback function, it converts backslash in the filename variable into slash. After that it tries to draw the meta information from the hashtable. If it fails, it return -1 or error. The meta information used in this function are file size, file key and file nonce. After that it continues to verify the offset that whether or not it’s still in file size range. Otherwise it return 0 or success, but with 0 readBytes. Such verification is vital to avoid any error when accessing data on the web server. After doing necessary verifications, it starts to calculate the block number for the given offset and how much data need to be accessed to decrypt it properly. After these mathematical process are done, it uses the calculation result to access the data portion from the web server. Detail about AccessData() function will be detailed in Chapter 4.4. AccessData() returns an array of bytes and placed in encryptedData variable. Decrypt() function called to decrypt the gained data portion by applying gained encryptedData, buffer length, drawn file key, file nonce and file size as parameter. Details of Decrypt() will be elaborated more in Chapter 6.5. Returned decrypted bytes array is copied into buffer with the length of decryptedData, which is in some case shorter than buffer length. And decryptedData length also returns as readByte. 15 4.3 Secure Channels It is vital to prevent malicious applications from opening the protected files. To realize this, a file needs to be encrypted and the legitimate application must pass a token to the VFS module, which is needed to decrypt the requested file. To pass the token, a secure channel between the VFS module and the application needs to be established. We are considering two ways to implement this: a named pipe and a filename extension. 4.3.1 Named Pipe Named pipe is a named, one-way or duplex pipe (shared memory) for communication between the pipe server and one or more pipe clients. All instances of a named pipe share the same pipe name, but each instance has its own buffers and handles, and provides a separate conduit for client/server communication. In our case, the application will act as pipe server and create a named pipe which can be accessed by the VFS module. 1 2 3 4 5 6 7 8 9 . . 17 18 19 20 21 22 23 public ulong GetFileInformation(String filename) { IInterProcessConnection clientConnection = null; try { clientConnection = new ClientPipeConnection("PipeName", "."); clientConnection.Connect(); clientConnection.Write(filename); string base64data = clientConnection.Read(); clientConnection.Close(); ... omitted (decoding base64data, split it, decrypt and put it into hashtable) ... return PID; } catch { clientConnection.Dispose(); return 0; } } Figure 8: Partial of GetNamedPipe() code .NET (dot net) itself not has library for named pipe. So we make use of freely available named pipe library for C# []. Upon execution, the legitimate application creates a named pipe and listens to it. As shown in the source code above, user-level VFS module create connection to pipe named “PipeName” and send request to the application with filename as parameter whenever the application open a file on the file system. The application replies with self PID and token which is: filename, file size, file key, file nonce, self PID. All this elements are encrypted, convert to base64 string and concatenated with some delimiters. This is for easy transfer across the named pipe. Upon receiving the response from the application this function will reverse all the procedure done by application. It splits the string; convert it back to byte 16 arrays and decrypts it. After that it put the appropriate data into the hashtable of filetable. If the data is successfully inserted into the hashtable, it returns the PID. If any error occurred, it returns 0. The error may be the string failed to be splitted, insufficient data like there no nonce or no key in the string, or the received key is not long enough. 4.3.2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 . . 24 25 26 Filename Extension public int CreateFile(String filename, FileAccess access, FileShare share, FileMode mode, FileOptions options, DokanFileInfo info) { string path = HttpUtility.UrlPathEncode(filename.Replace("\\", "/")); info.Context = count_++; if (path.Equals("/")) { info.IsDirectory = true; return 0; } int semiColon = path.LastIndexOf(";"); if (semiColon > -1) path = path.Substring(0, semiColon); if (!filetable.ContainsKey(path)) { string token = filename.Substring(semiColon + 1); ... convert token to bytes arrays, decrypts it and put it into the hashtable ... ... omitted ... } try { ... omitted .. Figure 9: CreateFile() function code when using filename extension In this method, a token is passed with the filename upon opening a file. For example, when opening a file named “sports.wmv”, the legitimate application must open ‘sport.wmv; MithJmRCTHVCNzIzcyhEZ3N0S0orWmNAdw==’ instead, while ‘MithJmRCTHVCNzIzcyhEZ3N0S0orWmNAdw==’ is the base64 encoded token and the semicolon is the delimiter. Notice that when using this method CreateFile() function slightly different, as shown in Figure 9. Instead of calling GetNamedPipe(), it split the requested filename to real filename and token as shown on line 10,11 in Figure 9. Other functions also need this line if the filename extension method is used. After that, it do the same procedure as GetNamedPipe() does: split it using special delimiter, convert base64 string to bytes arrays, decode it and put it into the hashtable. However, there is a limitation using this method. In Windows, a filename is limited up to 255 characters long for Windows XP and 260 characters for Windows Vista. This problem prevents the application from opening the file since the token itself is can exceed 40 characters. 17 4.4 Performance Issues In file systems, performance always matter. To avoid high latency to the whole system, we perform several optimizations. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 . 40 41 42 43 44 45 46 47 48 49 50 51 52 53 if (c == null || !c.Connected) { c = new TcpClient(host_, port_); s = (Stream)c.GetStream(); se = new SimpleEncoding(); r = new StreamReader(s, se); w = new StreamWriter(s, se); } w.WriteLine("GET " + HttpUtility.UrlPathEncode(path) + " HTTP/1.1"); w.WriteLine("HOST: " + host_); w.WriteLine("Accept: */*"); w.WriteLine("Keep-Alive: 1000"); w.WriteLine("Connection: keep-alive"); w.WriteLine("range: bytes=" + offset + "-" + (offset + bytesize)); w.WriteLine(""); w.Flush(); WebHeaderCollection h = new WebHeaderCollection(); while (true) { try { string str = r.ReadLine(); if (str.Length == 0) break; if (str.IndexOf(":") != -1) h.Add(str); } catch (Exception) { ... omitted ... (reconnected if the connection closed) } } int len = int.Parse(h[HttpRequestHeader.ContentLength]); char[] tmp = new char[len]; int offset2 = 0; while (true) { int ret = r.Read(tmp, offset2, tmp.Length - offset2); if (ret <= 0) break; offset2 += ret; if (offset2 >= len) break; } byte[] data = se.GetBytes(tmp); return data; Figure 10: Portion of AccessData() function code 18 4.4.1 Pooled Connections Upon accessing data by an application, data is usually accessed in small chunks from random positions in the file. This leads to rapid ReadFile() requests in a very short time, and each time ReadFile() is called, new connection is created to access the data portion from the internet. To avoid delays to establish a new connection, we use pooled connections to the server. In other words, the VFS module reuses the same connections to access several portions of data from the same file. Since the no method to create pooled connection in available in .NET, we created the method ourselves by using TcpClient() class. We make use of custom SimpleEncoding() class to detect the stream encoding and returns the proper supported encoding. GetStream() method is used to send request using StreamWriter() method and StreamReader() to get the response. Line 9 to 15 in Figure 10 indicates the function send request header to web server. After that the AccessData() function read the response header from the web server as indicates in line 18 to line 40. We use try and catch to work around with any error occur due to lost connection. If the connection is lost due to timeout or the keep-alive connection limit reached, we would reestablish the connection. Afterward the function read the response stream and return it. 4.4.2 Cache and Prefetch Since we implement a decryption capability, data needs to be downloaded based on block sizes instead of requested buffer sizes. The VFS module downloads data that is slightly bigger than the requested buffer size. Data chunks are cached first then decrypted before the module sends them to the application. In addition, when playing audio and video files, either from beginning or after perform seeking across the file, data are usually accessed sequentially. Therefore, we improve the performance by using prefetching. The algorithm guesses the next data, fetches it and stores it in memory. Each time a user performs skipping backward or forward command, a new prefetching session is started. To support prefetch properly, we perform the prefetch function in a dedicated thread. <<todo>> 19 Chapter 5 Server-side Programs For the web server, any web server applications that supports HTTP version 1.1 are usable. These include Apache, Microsoft ISS, lighttpd, etc. HTTP 1.1 is vital because it contains what we need to make the random access idea comes true. Since version 1.1, it supports range persistence connection (keep-alive) and request (byte serving) []. The range request is needed for random access and persistence connection is essential for pooled connections. 5.1 Persistent Connections In HTTP/0.9 and 1.0, the connection is closed after a single request/response pair. In HTTP/1.1 a keep-alive-mechanism was introduced, where a connection could be reused for more than one request. Such persistent connections reduce lag considerably, because the client does not need to re-negotiate the TCP connection after the first request has been sent. The advantage of persistence connections are: 5.2 less CPU and memory usage (because fewer connections are open simultaneously) enables HTTP pipelining of requests and responses reduced network congestion (fewer TCP connections) reduced latency in subsequent requests (no handshaking) errors can be reported without the penalty of closing the TCP connection Byte serving Byte serving is the process of sending only a portion of an HTTP/1.1 message from a server to a client. Clients which request byte-serving might do so in cases in which a large file has been only partially delivered and a limited portion of the file is needed in a particular range. Byte Serving is therefore a method of bandwidth optimization. In the HTTP/1.0 standard, clients were only able to request an entire document. By allowing byte-serving, clients may choose to request any portion of the resource. One advantage of this capability is when a large media file is being requested, and that media file is properly formatted, the client may be able to request just the portions of the file known to be of interest. <<todo>> 20 Chapter 6 Encryption and Decryption The encryption we use in our mechanism is Advanced Encryption Standard (AES) with Counter (CTR) mode. We use CTR mode because it is suitable with random access. 6.1 Advanced Encryption Standard Advanced Encryption Standard (AES), also known as Rijndael, is a block cipher adopted as an encryption standard by the U.S. government. It has been analyzed extensively and is now used widely worldwide as was the case with its predecessor, the Data Encryption Standard (DES). AES was announced by National Institute of Standards and Technology (NIST) as U.S. FIPS PUB 197 (FIPS 197) on November 26, 2001 after a 5-year standardization process. It became effective as a standard May 26, 2002. As of 2006, AES is one of the most popular algorithms used in symmetric key cryptography. The cipher was developed by two Belgian cryptographers, Joan Daemen and Vincent Rijmen, and submitted to the AES selection process under the name "Rijndael", a portmanteau of the names of the inventors. AES is not precisely Rijndael (although in practice they are used interchangeably) as Rijndael supports a larger range of block and key sizes; AES has a fixed block size of 128 bits and a key size of 128, 192, or 256 bits, whereas Rijndael can be specified with key and block sizes in any multiple of 32 bits, with a minimum of 128 bits and a maximum of 256 bits. Due to the fixed block size of 128 bits, AES operates on a 4×4 array of bytes, termed the state (versions of Rijndael with a larger block size have additional columns in the state). 6.2 Counter mode Counter mode (CTR mode) is a block cipher mode operation which turns a block cipher into a stream cipher. It generates the next keystream block by encrypting successive values of a "counter". The counter can be any simple function which produces a sequence which is guaranteed not to repeat for a long time, although an actual counter is the simplest and most popular. CTR mode has similar characteristics to Output Feedback (OFB), but also allows a random access property during decryption, and is believed to be as secure as the block cipher being used. The initialization vector (IV) in this mode is the combination of nonce and the counter. The nonce and the counter can be concatenated, added, or XORed together to produce the actual unique counter block which we use as IV for encryption. CTR mode is well suited to operation on a multi-processor machine where blocks can be encrypted in parallel, which is also an advantage of CTR mode. 21 Nonce c2b3f342… Key Counter 00000000 0 Block Cipher Encryption Original bytes Ciphered bytes Nonce c2b3f342… Key Counter 00000001 10 Block Cipher Encryption Original bytes Nonce c2b3f342… Key Counter 00000002 0 Block Cipher Encryption Original bytes Ciphered bytes Ciphered bytes Figure 11: Counter mode encryption 6.1.1 Padding Because AES works on units of a fixed size; 16 bytes, but original data come in a variety of lengths, this mechanism require that the final block be padded before encryption. Several padding schemes exist. The simplest is to add null bytes to the original data to bring its length up to a multiple of the block size, but care must be taken that the original length of the data can be recovered; this is so, for example, if the original data is a C style string which contains no null bytes except at the end. 6.3 Token The token term in our mechanism is a combination of nonce, key and original file size. The token also may contain any additional useful data. The reason why original file size is included is because it needed when decrypting the file by VFS module. Both nonce and key generates randomly by the encryption software for each file. This provides more secure protections against malicious access. The token can be in any size, depend on original file size and used key size. Supported key sizes are, 128 bits, 192 bits and 256 bits. The nonce size is 64 bits as it is concatenated later by counter bytes. 22 6.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Encryption software string filename = args[0]; string filename2 = "enc_" + filename; FileInfo info = new FileInfo(filename); long size = info.Length; int blockCount = (int) Math.Round((decimal)size / 16,MidpointRounding.AwayFromZero); try { FileStream fs = File.OpenRead(filename); FileStream fs2 = File.OpenWrite(filename2); int offset = 0; byte[] buffer = new byte[16]; byte[] encrypted = new byte[16]; byte[] nonce = new byte[8]; byte[] iv = new byte[16]; RijndaelManaged transform = new RijndaelManaged(); transform.Padding = PaddingMode.Zeros; transform.GenerateIV(); transform.GenerateKey(); byte[] key = transform.Key; Array.Copy(transform.IV, nonce, 8); for (int i = 0; i < blockCount -1; i++) { byte[] counter = BitConverter.GetBytes((long)i); Array.Copy(nonce,0,iv,0,8); Array.Copy(counter, 0, iv, 8, 8); fs.Seek(offset, SeekOrigin.Begin); offset = fs.Read(buffer, 0, buffer.Length) + offset; transform.IV = iv; ICryptoTransform encrypt = transform.CreateEncryptor(); encrypt.TransformBlock(buffer, 0, 16, encrypted, 0); fs2.Write(encrypted, 0, 16); } byte[] counter2 = BitConverter.GetBytes((long)blockCount-1); Array.Copy(nonce, 0, iv, 0, 8); Array.Copy(counter2, 0, iv, 8, 8); fs.Seek(offset, SeekOrigin.Begin); offset = fs.Read(buffer, 0, buffer.Length); transform.IV = iv; ICryptoTransform encrypt2 = transform.CreateEncryptor(); byte[] lastBLock = encrypt2.TransformFinalBlock(buffer, 0, offset); fs2.Write(lastBLock, 0, 16); fs.Close(); fs2.Close(); } Figure 12: Portion of Encrypt() function Our encryption software is really simple. We make use of RijndaelManaged class in the .NET library to simplify the encryption process. When it encrypts a file, it generates a random nonce and key by using GenerateIV() and GenerateKey() method respectively. Since there are no nonce terms in RijndaelManaged class, we use the truncated IV. IV is truncated to 8 bytes to produce the nonce. Since the .NET library 86not has the counter mode implementation, we must 23 implement the mode ourselves. The software divides the given file into blocks of 128 bit (16 bytes) and calculates the block counts. Typically the division result does not always return an integer. In case it returns a floating number, it is rounded using “round away from zero” rounding method. For example: a file with 1030 bytes size divides with 16 resulting 64.375. However, instead rounded to 64 it’s rounded to 65. As shown in line 21 to 32 in Figure 12 the function starts a loop to read the original file, decrypt and writes it to the destination file. The loop starts with generating of a counter value. It cast i into long value and converts it into byte array. The reason it cast the i into long is to make it 8 bytes long. The counter bytes array is concatenated with the nonce bytes array to produce a temporary IV bytes array which is set into transform.IV property. In each loop, 16 bytes data is read from the origin file and offset is increased with the readbyte value. We make use of ICryptoTransform interface to perform the block transformation by using TransformBlock() method. After it has been encrypted, it’s immediately written into the destination file. This continues until it reach second last block. When the loop finished the function perform transformation of last block. After that, each block is fixed with the counter value incrementally. The software encrypts each block using the key, the nonce and the counter. Since AES algorithm can encrypt 16 bytes block only, the last block need to be padded in case it less than 16 bytes. In our mechanism we use Zeros Padding which adds zero byte to fill the block. For example, a file with 1030 bytes size split into 64 blocks of 16 bytes and a block of 6 bytes for the final block. The final block will be padded by 10 of zero byte (0x0) to make it 16 bytes. The result file size will be 1040 bytes. After finishing the encoding the software uploads the encrypted file to web server and register the filename, file size, file key and nonce to the authentication server. 6.3.1 Multithreads We make use of the advantage of current technology and the suitability of CTR mode with multithread. 87The whole file divided into blocks and the whole blocks set divided again into several subsets as desired thread count. For example, in case we want to use 4 threads, 1030 bytes file divides into a set of 65 blocks. This block set will be divided again into 4 subsets. The first, second, and third subset contain 16 blocks while the forth block contains 17 blocks. The counter value for the first block for each subset is sum of previous subsets blocks. In this case: Blocks subset First Second Third Forth First counter value Blocks contained 1 16 17 16 33 16 49 17 Table 1: Multithreads calculation Calculation 0+1 16 +1 16+17 16+33 All four threads execute simultaneously resulting in faster encrypting process. 24 6.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Decryption private byte[] Decrypt(byte[] sourceFile, int startBlock, byte[] key, byte[] nonce, int filesize, int offsBl) { int blockCount = (int)sourceFile.Length / 16; int lastBlock = 0; int totalBlock = (int)Math.Round((decimal)size / 16, MidpointRounding.AwayFromZero); int padding = 0; byte[] lastBlk = null; if ((startBlock * 16 + sourceFile.Length) > filesize) { lastBlock = 1; padding = totalBlock - size; } try { int offset = 0; byte[] iv = new byte[16]; byte[] decrypted = new byte[sourceFile.Length – (lastBlock * 16)]; int i = startBlock; RijndaelManaged transform = new RijndaelManaged(); transform.Padding = PaddingMode.Zeros; transform.Key = key; for (int k = 0; k < blockCount - lastBlock; k++) { byte[] counter = BitConverter.GetBytes((long)i); Array.Copy(nonce, 0, iv, 0, 8); Array.Copy(counter, 0, iv, 8, 8); transform.IV = iv; ICryptoTransform decrypt = transform.CreateDecryptor(); decrypt.TransformBlock(sourceFile, offset, 16, decrypted, offset); offset = offset + 16; i++; } int dataLen = sourceFile.Length – offsBl – padding; byte[] data = new byte[dataLen]; Array.Copy(decrypted, offsBl, data, 0, decrypted.Length - offsBl); if (last == 1) { byte[] counter = BitConverter.GetBytes((long)i); Array.Copy(iv, 0, nonce, 0, 8); Array.Copy(counter, 0, nonce, 8, 8); transform.IV = nonce; ICryptoTransform decryptLast = transform.CreateDecryptor(); lastBlk = decryptLast.TransformFinalBlock(sourceFile, offset, 16 padding); Array.Copy(lastBlk, 0, data, decrypted.Length - offsBl, lastBlk.); } return data; } ... omit ... Figure 13: Decrypt() method code 25 Decryption process is done in user-level module of VFS module in ReadFile() function. Instead 88 we place the code in the ReadFile() function, we made an independent method; Decrypt() which can be called from the ReadFile(). Parameter of Decrypt() consists of bytes array sourceFile which represent encrypted data portion, key and nonce, integer startBlock which is first block counter, filesize represent original file size and offsBl represent offset in first block. On initialization, the function count the blocks of the given sourceFile, verify if the sourceFile contains the last block and calculates the padding if the last block exists. Afterward it tries to begin block transformation by initializing the RijndaelManaged class. Using for statement, it loops in blockCount. In each loop, it generates the IV base on given nonce and incremental counter begins with startBlock. After that it copies the decrypted data into the destination bytes array; data. If the last block exist, the loop count decrease by 1 and last block transformation perform after the loop. The decrypted last block copied into data and the data returns. 26 Chapter 7 Experiments Some experiments have been done to prove the ability of this mechanism. For the experiment purpose, we use a PC to act as both administrator PC (use to upload file to web server and register file to authentication server) and a user PC (used to access the file on web server). 7.1 Experiment environment User PC/Admin PC: o CPU: Intel Core2 Quad 2.4 GHz o RAM: 4 GB RAM o Ethernet: 100 Mbps connection o OS: MS Windows XP SP2 Web Server: o CPU: Intel Pentium 4 3.0 GHz with Hyper-Threading o RAM: 1 GB o Ethernet: 100 Mbps connection o OS: Ubuntu Linux 7.10 o Software: Apache 2.2.4 as web server software and ProFTPd as FTP server Both user PC/admin PC and the web server is located in same intranet. We used a quad core PC for the admin PC to test the full ability of the encoding software. 7.2 Encryption Rate To analyse the encryption rate, we experimented with several file with different sizes: 5, 10, 50 and 100 MB. Each file was encrypted using several thread counts: 1, 2 and 4 threads. We use the user PC to run the experiment. As you can see from the result shown in table and graph the encryption rate nearly double when we doubled the thread count. The content of the file (whether it is media file, text file or compressed file) does not affect the encryption rates. 27 50 44.234375 45 40 Duration (seconds) 35 30 22.125 25 1 thread 24.0625 2 threads 20 4 threads 15 5 12.796875 12.109375 10 2.328125 4.40625 2.46875 0.625 1.265625 0 5 6.65625 1.28125 10 50 100 Filesize (MB) Figure 14: Graph showing encryption rate using different thread count 7.3 Transfer rate We benchmark the transfer rate by comparing several methods used to transfer data online. The methods are: direct access HTTP/HTTPS by using web client, Server Message Block (SMB) protocol, WebDAV protocol using Windows XP embedded WebDAV Mini Redirector (shell namespace extension), our VFS module with and without decryption. The methods we used to benchmark transfer rate for HTTP and HTTPS is using WGET [] software. For benchmark of SMB, our VFS module and WebDAV we mount a logical drive in My Computer and use FastCopy [] program to copy file on remote server to local drive. As the result we can see from the graph in Figure 15, HTTP transfer rate is the fastest as expected. Surprisingly our VFS module transfer rate is faster than the SMB/CIFS protocol transfer rate. Our VFS module with prefetch does not improve the transfer rate as it use to decrease request counts. As expected using decryption in VFS module decrease the transfer rate to 25% of original transfer rate. However if we make this number as the maximum speed of our mechanism we can still afford any intermediate level of video streaming. For example, if we want to stream a video file the mechanism can still afford to stream up to 20 Mbits/s. 28 12 11.32 10.42 10.84 10.34 9.64 10 Transfer rate (MB/s) 10.51 8 6 4 2.64 2 0 Samba VFS Module VFS Module VFS Module (prefetch) (decryption) HTTP HTTPS WebDAV Figure 15: Graph of transfer rate using different methods 100% 90% 80% CPU Usage 70% 60% 50% 40% 30% 20% 10% 0% Samba VFS Module VFS Module VFS Module (prefetch) (decryption) HTTP Figure 16: Graph of CPU usage of different method 29 HTTPS WebDAV Random access 30 Chapter 8 Integration with Movie Database System of Japan Institute of Sports Sciences This mechanism is scheduled to replace current data distribution mechanism in movie database system of Japan Institute of Sports Sciences (JISS). 8.1 Current data distribution mechanism in JISS movie database system As shown in Figure 17, the process starts with client logging in to the user administrator server by sending his/her username and password and the client received list of movie IDs. Client can access movies on the list by sending the movie ID back to user admin server to received URL. User admin server logs the request and create random key and store it in a temporary table. Beside the key, the table also contains client IP address, movie ID and timestamp. After that, the user admin server responses to the client PC with the URL of proxy with key. Client creates connection to the proxy with the key as filename. The proxy verifies the request by contacting the user admin server and sends the requested key and the client IP address. User admin server then, compares the key, IP address and timestamp with the temporary table. If they are valid and table is still not expired, the user admin server responds with the real URL to the proxy server. The proxy server would then access the real file on the streaming server and relay it to client’s PC. 8.2 New data distribution mechanism propose to JISS movie database system The current mechanism which was developed 2 year ago, suffers from high latency due to relaying data through a proxy. In our mechanism we simplify the system by removing the proxy server. Comparing to the current mechanism which has 4 components, our mechanism only has 3 components. Based on Figure 18 the process start with client logging in to the authentication server using the username and password and receive movie IDs list. When client want to access a movie on the list, it requests the movie filename by sending the movie ID to the authentication server and that authentication server replies with the movie’s filename and token. The client application access the file directly from the local virtual drive. The VFS module in client PC open connection to web server and access the wished file, decrypt it using the key and nonce contained in the token and pass it to the application. 31 Streaming server 6. Access http://server/moviefile Movie files publisher (upload movie files) (register movie ID) 7. Stream the file 4. key, client IP address Proxy 5. http://server/moviefile 1. username, password movieID 3. Access http://proxy/key User admin server 8. Relay the streaming 2. http://proxy/key Client PC Figure 17: Current data distribution mechanism in movie database system of JISS 8.3 The advantages compared to current mechanism As we can see from the explanation above there is several advantages of this new mechanism compared to the current mechanism. Less components As the working component reduced, the latency of whole system is reduced. It also saves the whole costs of the system as less hardware needed. No proxy As the proxy is not involved any more, movie file is downloaded directly from the web server instead relaying it using the proxy. In this mechanism, VFS module acts as the proxy but yielded much more performance due to its existence in client PC, instead of located in a separate and remote server. 32 Encryption There no encryption involved in current mechanism of data distribution in JISS movie database system. Our new mechanism introduces an encryption property which provides a much more secure system. For example in the current mechanism if the real URL of the movie file is leaked, malicious user can gain access to the file and the file would lose its confidential. On the other hand, in our mechanism, even if the file can be accessed, malicious user can’t open it without the proper key. Web server (Streaming server) (Encrypt and upload movie file) Movie files publisher (Register movie ID) 3. Access encrypted file on web server Authentication server (user admin server) 4. Response the encrypted file 1. Username, password, movie ID 2. Movie filename, token Client PC Figure 18: New data distribution mechanism proposed to movie database system of JISS 33 Chapter 9 Conclusions In the paper, we have proposed a new data protection mechanism and how it works. By using this mechanism, users are able to access protected data over HTTP with a random access capability while providing a tight security. In the future, we evaluate the use of multiple pooled connections for each file to relieve stress on a single connection. We are also considering implementing support to FTP servers. <<todo>> 34 Acknowledgements First and foremost, I would like to thank my supervisors, Professor Kozo Itano, Professor Yasushi Shinjo, Professor Akira Sato and Professor Hisashi Nakai of Graduate School of Systems and Information Engineering at the University of Tsukuba for being exceptional advisors. They never 89seize to amaze me with their unlimited patience, attention to detail and helping me improve the thesis’s presentation style. I can’t imagine how I could have perfected this thesis without their support. I am also grateful to all my lab partners in the Software Laboratory at the University of Tsukuba especially Mr. Daiyuu Nobori for offering they’re support and encouragement since the day I joined the research lab. I owe my deepest debt of gratitude to my parents, Che Abdullah and Nik Rahmah, for providing me with the resources to succeed in life. They have constantly advised and supported me in everything that I have done. I am especially grateful to my mother for teaching me that time management, organization, and diligence, are keys to success. Finally, I would like to acknowledge the financial support that made this research possible. All the years when I was in Japan was supported by the Public Service Department of Malaysia and I am grateful for being given the opportunity to further my studies in a foreign country to widen my knowledge. 35 References [1] Chikara Miyagi, Koji Ito, and Jun Shimizu: "Creating the SMART system - A Database for Sports Movement", The Engineering of Sport 6, Vol.3, pp.179-184, 2006. [2] Hiroki Asakawa: Design and implementation of user-mode file system library, 2008 http://decas-dev.net/en/ [3] Microsoft Developer Network (MSDN) Documentation: DirectShow, 2008 [4] MSDN Documentation: Registering Shell Extensions, 2008 [5] Galaxy File system Toolkit: Chad Yoshikawa, 2005 http://galaxy.sourceforge.net/ [6] GMail Drive shell extension: Bjarke Viksoe 2007, http://www.viksoe.dk/code/gmail.htm [] Filesystem in Userspace: http://fuse.sourceforge.net/ [] Ivan Latunov: Inter-Process Communication in .NET Using Named Pipes http://ivanweb.com/articles/namedpipes/ [] RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1, 1999 [] GNU Wget: Hrvoje Nikšić, http://www.gnu.org/software/wget/ Accessed on 18 January 2008 [] FastCopy: SHIROUZU Hiroaki, http://www.ipmsg.org/tools/fastcopy.html.en Accessed on 18 January 2008 36