A FAULT-TOLERANT SECURE DISTRIBUTED FILE SYSTEM

advertisement
A FAULT-TOLERANT SECURE DISTRIBUTED FILE SYSTEM
( FT S D F S )
By
Arun A Kumar
Fei Lu
SUBMITTED FOR THE COURSE CS 600.442, CRYPTOGRAPHY AND NETWORK SECURITY
AS A FINAL PROJECT
DECEMBER 2001
BALTIMORE
ABSTRACT
This projects attempts to create a secure, fault-tolerant environment for users to store files.
The files that need to be stored are first split into n shares, appended with their hash values,
encrypted and then stored on n servers. The system will retrieve files as long as at least t
(t<n) servers are functional. A malicious user cannot retrieve the file, as the file is encrypted.
In addition, it would take at least t malicious users to destroy the file. The system is resistant
against all types of active as well as passive malicious attacks.
It currently supports most variants of Unix, including, but not limited to Solaris® and Linux.
The application assumes that the failure of the servers will not exceed t out of n (both
chosen dynamically by the user). The files are stored using the IDA1 scheme. But, since this
scheme is not totally secure, and some information about the original file can be extracted
out of partial shares, the file is encrypted using AES after dispersal. The system is flexible
enough to accept key sizes of 128 or 192 or 256 bits as the secret key. AES has been
implemented using the CBC mode. AES encryption makes the system secure. Hence, the
security of the system is dependant on safe storage of the secret key. The key may be stored
securely on a smart card or a similar device. In addition, each share is appended with its hash
value (MD5) and an update counter (to resolve update inconsistencies). The interface to the
file system is through the GNU Midnight Commander. This form of an implementation
makes it analogous to threshold cryptography schemes like the one proposed by Shamir –
‘How to Share a Secret’, 1979. But, this scheme is highly space efficient in the sense that
each share requires only m/t bytes of data (where m is the size of the original file) + a
constant overhead which is the initialization vector + size of the hash function.
1
M. Rabin. Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance.
Journal of the ACM, 36(2):335-348, 1989
SYSTEM DESCRIPTION
The application is client-server based. The client provides a major part of the functionality.
This is done intentionally so that the failure or compromise of a server does not compromise
security. The client is responsible for splitting, combining, encrypting, decrypting files and
computing the hash values.
THE CLIENT
The interface for the client used is that of the GNU Midnight Commander, which was the
first file manager used for the GNOME project. The initial code was obtained from the
GNU free software foundation. It has been modified to incorporate code to let it access the
distributed file system. Initially, when a connection is requested between the client and the
distributed file system, the client authenticates the user with the server. Hence, the user must
have an account on each of the servers that forms the distributed server network. Once
authenticated, the distributed file server gives access to all files that belong to the user who is
logged in. The files in the current directory structure are shown on a pane on the user
interface.
Whenever the user copies a file from the local file system to the distributed file system, the
client obtains a 128/192/256 bit key from the user for encrypting the file. The file is passed
on to the part of the code implementing IDA. This splits the file into n shares, each of
length m/t, where m is the length of the file. Each file is identical in size. These files are
written to the local file system with a random string appended to the each of the filenames,
as they are all stored on the same location. After dispersing, it calculates the hash value
(MD5) for each share and appends it to the file along with an update counter that is
incremented every time the file is updated. It then uses AES in CBC mode to encrypt each
share with the same key. The IV (initializing vector) for the AES is also stored in the
encrypted file as a 16byte header. The IV is generated using a pseudo random number
generator. The first byte in the next block is used to store the length of the last block. This is
done because the file size need not necessarily be a multiple of 16 and hence, the last block
size might vary between 1 and 16. Since only 4 bits are required to represent this number,
the four higher order bits are filled with random bits and the actual block length is stored in
the lower four bits. The subsequent blocks have 16 bytes of data. The encryption is then
carried out and each share is updated on the local file system. The shares are then copied to
the servers with their original filenames.
Consider the case when a request is made to copy a file from the distributed file system to
the local file system. It first obtains the decryption key from the user. It now uses this
decryption key to recreate the original file share. The client contacts t servers randomly for a
share of the file. If a server is down or its version of the file is corrupt (determined by
decrypting the share and comparing it to its hash value and comparing the update counters),
the client switches to the next server. It proceeds in this manner until it has obtained t
shares. It now reconstructs the original file and stores it locally.
The Midnight Commander interface requires that a file on the distributed file system also be
edited/viewed within the interface. In order to provide this functionality, the file is first
copied to the local host and edited on the local host. When the user finishes
modifying/viewing the file, the system compares the file with the original file to detect any
changes that were made by the user. If a change is detected, the original file is erased from all
servers and updated with the new copy. This concludes the section on the description of the
client. The next section briefly describes the server’s capabilities and the mode of
communication between the client and the server.
THE SERVER
The server provides very minimal functionality in order to make the system secure. The
distributed file server is a daemon process running on the server. The number of such
participating in the distributed environment can be arbitrarily large. The only requirement is
that the client configuration must be updated to reflect the presence of additional servers.
The most important feature of this system is that additional servers can be added to the
system without affecting existing users in any way. Hence, the clients would work just fine
even if they were not updated to reflect the presence of new servers.
Initially, when the server is loaded up, it establishes a socket to listen to. This is done by
requesting the port mapper for a port. The port mapper assigns a port number below 1024
to the server daemon. Note that super user privileges are required to run a daemon that uses
a port number below 1024. Subsequent requests for communication by the client with the
server are preceded by the client contacting the port mapper to find the port associated with
the server daemon. This method is preferable, as the requests for connections can be traced,
tracked and logged. Once the client establishes contact with the host, it authenticates itself
with the server. The server performs authentication using the PAM (Portable Authentication
Module) library. Once verified, a TCP connection is established over the network between
the client and the server. The client sends requests for accessing, retrieving, storing and
erasing files using RPC (Remote Procedure Calls).
DESIGN GOALS AND NOTES
Design Goals -:









Confidentiality
Fault Tolerance
Minimal overhead in terms of file size of the shares
Integrity
Ease of Use
Portability
Scalability
Generic
Anonymity
Confidentiality - The most significant factor affecting the design of this system was that of
confidentiality. It was desired to have a system that would prevent even n colluding servers
from retrieving the contents of the original file.
Fault Tolerance – It was desired to have a high level of fault tolerance. The level of fault
tolerance has been kept dynamic so that every user can customize it to suit his/her
requirements. Level of fault tolerance is set by the values of t and n. This is defined in the
configuration file of the client.
Minimal Overhead – A minimal overhead is highly desirable in any system. A small file size
for each share would translate to several benefits. It would result in reduced usage of server
space, reduced network traffic and balancing of network traffic over the network. This is
made possible as the file is distributed over several different servers.
Integrity – Integrity is very important in the context of malicious users who have
compromised the server. Hence, a mechanism needs to be provided that can detect and
correct a share that has been modified by a malicious user. This is achieved by calculating the
hash (MD5) of each share and then appending it to the share. This whole share is then
encrypted. Hence, the system detects any kind of a malicious attack on a share. If the share is
found to be corrupt, the client just switches to the next server.
Ease of Use – The user interface has been made so that it is very user-friendly and all
operations take place transparently. Hence, a user would hardly notice a difference in
operation with a distributed (except, of course, while keying in the secret key).
Portability – The system does not use any machine dependant code. In addition, the whole
code has been written in C++ with parts of it in C. Since C++, C compilers are available
for almost all configurations; it should be no problem porting the code to systems other than
those in which it was tested.
Scalability – The existing system is not affected by increasing the number of servers. It is
entirely optional for a user to either use or not use the new servers as they get added to the
set of distributed file servers.
Generic – The system can very easily be adapted to use a variety of servers. For example, it
can even use an FTP server if it is not possible to install a distributed file server on the target
server.
Anonymity – Although anonymity wasn’t a primary design goal, this system provides a high
level of anonymity. This is because the bulk of the code resides on the client and the server
has absolutely no knowledge of the data stored on the system. The user can be made totally
anonymous by providing for anonymous login accounts on the server.
CRITICAL APPRAISAL
Here, we attempt to bring out the flaws as well as the salient features of the systems as
candidly as possible.
FEATURES:
1) The system is invulnerable to all kinds of attacks on the server. Even if n servers are
compromised, the only attack possible is deletion of the file or a denial of service. There is
no way that the file can be recovered from the server by anyone else (provided that the
secret key is not compromised and assuming that AES with the chosen level of encryption is
computationally secure).
2) Moreover, the file is recoverable, provided <t servers have had malicious attacks.
3) The system gracefully handles file updates. This is because the hash is appended to each
share.
Let us consider an example to illustrate it.
Assume there is a file update, and one of the servers is down. Thus, the client wont be able
to update the share on that particular server. Now, when the client detects that one of the
servers is down, it ignores it, but increments the update counter for each share. The next
time when it has to retrieve the file, suppose the server that was down during file storage is
operational. Now, while retrieving the file, the system notices a discrepancy in the server
with the old share as it has a different update count. The client immediately erases the
incorrect share and updates it with a new share.
The only case when an update discrepancy can occur is when the file is updated 232 times
and the server is down in all 232 updates, which causes the counter to loop back to 0, and if
the server that was offline comes online in the 232 update. The assumption made here is that
the likelihood of such an event is extremely low and can be ignored. If the user is really
paranoid, the code could be modified to use several bytes to store the update counter value.
We used four bytes for the counter, as we consider it to be adequate for all practical
purposes.
4) There is no trusted party and all except the user are not trusted.
FLAWS

The interface is limited to Midnight Commander.

The security of the entire system is dependant on the secrecy of the key stored by the
user.

The key has to be manually entered by the user for every file. This should be
modified so that all keys are stored in a safe repository, which could very well be a
file that is encrypted using a public key cryptographic system. Hence, the user would
have to remember only his/her secret key and just store the rest in the repository.

If the user loses the key, it is impossible to recover any file encrypted using that key
(this is impossible to avoid if confidentiality is of utmost importance). But, if
confidentiality is not so important, the following scheme could be implemented:
The key could be split into n shares using threshold cryptography like Shamir’s secret
sharing scheme. Though, a similar kind of attachment of a hash value of the key
should be done to ensure integrity of the key. Once this is done, the user can obtain t
shares from the servers and reconstruct the secret key. But, note that if t servers
collude, then they can recover the key as well as the file.

The system is relatively slower than a distributed file system where no encryption is
used.

An attacker listening to the conversation between the client and the server can obtain
the user’s password and later destroy the user’s data, but confidentiality is still
maintained as the attacker cannot read the encrypted file.

The code might be somewhat buggy as it is the initial release.  We haven’t
incorporated too many checks for invalid states that could arise due to bad/invalid
data input.
FURTHER WORK
We intend enhancing this system to make it a full-fledged file system. This would involve
modifying the kernel in the case of Unix based systems and would require writing of device
drivers for the Windows based systems.
Alternatively, an HTML interface could be given to the file system so that it is truly machine
independent.
The system could be modified so that it uses a UDP connection instead of a TCP
connection, as TCP is slower than UDP.
The communication channel could use SSL, so that it is resistant to attackers. Though, even
if the communication channel is not encrypted, confidentiality is maintained.
Someday, maybe, we’ll implement a smart card based authentication system for the client.
We intend giving the project a name (haven’t yet found a suitable name for it!) 
Thank You
Download