FPR Report

advertisement
1
Final Report: Distributed Consistent Secure
USB Storage
Sean Busch, Matt Dube, Eddie Lai, and Zhou Zheng
Abstract – Backing up data is a necessary task for anyone who
hopes to have access to their digital data at another time or location.
Personal Computers are typically equipped with hard disk drives
that allow for users to back up their data internally, while floppy
disk drives and, more recently USB flash drives, have been used to
store and transport data. While these are trusted solutions, each has
the potential of failure due to wearing out from use, physical
damage, or being lost. Thus, in order to guarantee the availability of
a backup, one needs to create several backups. Here, we propose a
simple, reliable hub that instantly creates several copies of a single
back up on several different USB sticks with simply the push of a
button. This device will be capable of syncing with other hubs across
a network, allowing backups to be created or updated regardless of
where they are located.
I. INTRODUCTION
O
NE of the greatest benefits of using a computer is the
ease with which one can save and copy data and backing
up data has become a daily occurrence for billions of
people. While many people have their own personal or home
computer, users commonly need to transfer data from one
location to another such as from their work computer to their
home computer or to save work done on a public computer to a
secure location. This backing up and transferring of data has
been accomplished using floppy disks, and more recently using
USB flash drives or other memory cards due to their greater
memory and a smaller form factor. While this is a convenient
solution, these small, portable memory devices have their own
unique vulnerabilities.
Data backed up on a portable storage device, such as a USB
flash drive, is prone to physical damage, such as the wear and
tear of being stored in a pocket or on a key ring. The diminutive
size that makes these devices so portable also makes them easy
to lose or misplace. To counteract these concerns, it is natural to
make backups in several places and/or on several different
devices. Keeping each of these backups up to date and consistent,
however, is a chore that involves copying and pasting the most
recent files from one place to another which needs to be repeated
for each device.
With the advent of cloud computing, there are several
products, such as Dropbox, that allow users to save their data to
the cloud in order to access it from anywhere at any time.
However, some users' data may be too sensitive to release to
external storage. Users should be able to have total control over
where their data is stored and how it is accessed. Without owning
and possessing the physical media on which the data is stored, it
is hard to provide this kind of security.
Our hub aims to solve this problem by taking several USB
drives and synchronizing them with the touch of a button. This
relieves the user of the tedious task of determining which backup
is the most recent and then manually copying files from one
place to another. With the added capability of networking, our
hub would be able to synchronize drives in different locations.
This allows users functionally beyond simply creating and
maintain consistent backups. For example, a user on one side of
the country could share photos or files with another hub owner
on the other side of the country by simply loading the files to a
USB flash drive and pressing a button.
II. DESIGN
A. System Overview
The design for the system consists of multiple USB storage
devices plugged into several hubs which are connected to a
network. This specification can be seen by the high level
diagram in Figure 1.
The system's operation begins with the User initializing the
Hub network via a web browser on their PC. Once the IP
address of each hub has been identified and stored on each
hub, the User then writes a file to a USB drive mounted on one
of the hubs from their PC. Once the user initiates
synchronization, the hub determines what changes have been
made to the files on the drive and distributes the updates to the
other USB drives in the group, both on the same hub and on
other hubs distributed across the network. Each of the USB
drives in the synchronized group is recognized by the drive’s
unique identifier, ensuring that only the correct drives are
updated.
An additional feature of the hub is its secure application.
One particular issue that can arise when creating several
copies of the data is that the data then becomes vulnerable as
there are more physical versions of it that exist. To ensure the
security of each copy, the hub implements a secret sharing
scheme that renders each copy obsolete unless combined with
other copies, or “shares” as they will be referred to in this
report. Upon synchronization, the data is broken up into
several of these shares and distributed to each individual USB
drive. In order for a user to access the original data, a subset of
the shares must be present in the hub network. Once this
number of shares are present, the data can be rebuilt and
recovered.
2
on the hub, creating a simple, transparent connection on both
ends.
B) Detailed System Specification
4) User Interface
There are two primary operations that the user must have
control over: when a synchronization is initiated and when the
USB drives can be unmounted. Additionally, the Hub must
give the user feedback on when a synchronization is occurring,
when the USB drives are mounted, when an error has occurred.
The most efficient way to address this interface is with two
buttons and two LEDs. The buttons are mounted on
BeagleBoard's GPIO ports and one allows for the user to
initiate a synchronization request while the other unmounts the
USB drives. The LEDs are also mounted on the BeagleBoard's
GPIO ports and one indicated when a synchronization is
occurring while the other indicates when the USB drives are
mounted.
In order to allow for the functionality described in the system
overview to be achieved, each hub is built upon an embedded
system with a Linux operating system. In addition to processing
power and an operating system each hub must have the features
detailed in the block diagram shown in figure two. These features
include contains internal, network connectivity, a USB interface,
several USB ports, LEDs, external buttons, and file consistency
software.
5) Secret Sharing
This application is based on Shamir’s Secret Sharing
Algorithm and implements what is called a (k,n) threshold
scheme. Following this algorithm, a piece of data is broken into
n shares on a synchronization request, with each individual
piece containing no discernible information about the original
data. In order for the original data to be recovered, there must
be at least k of the original n shares present on the USB
network
1) Consistency Software
The consistency software running on each hub is responsible
for maintaining consistent data for each group of USB storage
devices connected to any of the other hubs. The software is also
partially responsible for interfacing to the host computer as a
single USB storage device containing a single copy of the
consistent data across the system. As part of these
responsibilities the custom software must detect when a change
is made to data on the USB storage device and set up secure
connections to other hubs in order to distribute the changes.
The consistency software recognizes when changes have
been,made using conventional UNIX tools. Timestamps and
checksums are recorded, then “diff”ed with the previous record
to determine what updates need to be made.
2) Hub Networking
In order to maintain consistency across distributed hubs, the
hubs must be able to communicate with one another across a
network. This is achieved using the TCP client/server model,
with the hub distributing the updates acting as the client and the
hub receiving the updates acting as the client. When a sync is
intiated, the client hub opens up a secure socket with the other
server hubs using SSL. Once the client has connected to a
server and distributed its updates, it disconnects.
3) Host PC Interface
The Hub interfaces with the user's PC via the BeagleBoard's
USB On-The-Go port. This connection allows for the Hub to
connect directly to the User's PC without any configuration or
software installation. Once connected to the user's PC, the hub
is mounted as a USB Mass Storage Device. By being mounted
this way, the user can directly access the USB drives mounted
6) Hub Initialization
In order for the hubs to be able to network with one another,
they must first know the IP address of the other hubs in the
network. Since finding and recording this information cannot
be automated, as it is unknown what IP address will be
assigned to the hub, the user must input this information in
order to configure the hub. To record these values in the hub,
the user accesses a web server on the hub via a web client of
their choice. After inputting the IP addresses of the other hubs
in the network in the GUI, a user is then able to successfully
network the hubs.
7) Design Alternatives
Several design alternatives have been discussed throughout
the design and implementation of this project. One such design
alternative that was discussed was using Windows XP as the
operating system running on the hubs. It was determined that a
lightweight Linux operating system is more suited to our needs.
One major decision needed to be made was how the hub
would interface with a computer. One option we considered
would be for the hub the be set up and accessed as network
attached storage. This alternative, however, was ruled out as it
was determined that there would be too much initial
configuration on the part of the user, and this would be in direct
contrast of our goal of making a device that is as simple and
intuitive as possible.
The greatest design decision we had to make was which
embedded system to develop our hub on. We originally began
developing on a Advantech development board featuring an
Intel Atom processor. Despite this system’s relatively powerful
processing power and vast memory, it was determined that it
was not properly equipped to interface with another PC. Since
one of our design specifications is that the hub must appear to a
3
user’s PC as a USB mass storage device, the embedded PC
must have the proper hardware to achieve this. Since the
Advantech board only features USB A ports, it can only be
used as a USB host, and therefore cannot be seen as a mass
storage device. Since we determined that this functionality was
paramount to the success of our project, the development
platform was changed to the BeagleBoard-xM due to its USB
On-The-Go port. Despite having reduced processing power, the
BeagleBoard-xM had the correct hardware to be mounted on a
host PC as a mass storage device.
FDR PROTOTYPE IMPLEMENTATION
A. Overview
Each USB hub is built on a BeagleBoard-xM development board.
The BeagleBoard includes an ARM Cortex-A8 CPU clocked at
1GHz, 512 MB of RAM, onboard Ethernet jack, 4 port USB hub,
a USB On-The-Go port, and flash memory provided by
MicroSD. The hub is booted
The prototype developed for the Midway Design Review
(MDR) consists of two Intel Atom prototype boards (hub)
connected to each other via a router. There will be two USB
thumb drives plugged into each hub, each of which will be
configured for a different “hub network”. An update on USB1 on
HUB1 will only mirror the corresponding USB on HUB2 that is
on the same network.
Upon the user’s request, a “synchronize” function can be
executed. This synchronize function will mirror the changes from
the USB drives connected onto the corresponding hub to the
other hub’s USB drives. Each hub will act as a server and a
client.
Files and folders added from one hub to another will maintain
the original last modified timestamp.
B. Algorithm
The algorithm used to detect changes will be described in the
following pages.
4
Each hub will act as a server. The server will be done in the following way:
Listen to incoming
port
N
Incoming
connection?
Y
Interpret
message
Global sync
message?
Run
synchronize
(client)
Folder added/
removed?
File deleted?
Generate trusted
USBs on the hub and
distribute update to
those
Timestamp
changed?
File being
added?
Generate one
trusted USB on the
update’s USB
network and
distribute to that
one USB
Distribute update
locally to other USBs
on the same USB
network
Update USB’s listing
so they do not
detect these
updates as their
own modifications
Anymore
incoming data?
N
Close all connected
socket connections
Y
5
Each hub will need to run a bash script for each of the USB drives connected. The bash script will perform the following operations:
Take listing of all
file’s checksums and
timestamps, and all
folders
Diff the current
listing to the
previous listing
Timestamp
changed but
CRC same?
File removed
from listings?
Folder added
or removed
from listing?
Timestamp and
CRC change for
file?
File was only
touched
File was removed
Folder was
added/removed
File added or edited;
write file update to “file
modification” file
Write update to
“non file
modification” file
6
The client will be done in the following way:
Client
Any USBs
connected?
Y
Run
Consistency_Script
on a trusted USB
Does a “non file
modification”
file exist?
Y
Send update to other hubs
for:
1) Adding/removing folders
2) Removing folders
3) Timestamp changes
N
Y
Anymore
updates?
N
Does a “file
modification”
file exist?
N
Y
Secret
Sharing or
Normal?
Normal
Y
Secret Sharing
Take modified file,
split the file, distribute
shares to self and
other hubs
Send file update to
self and other hubs
Anymore
updates?
N
Anymore
USBs?
N
Close all outgoing
socket connections
and exit program
7
PROJECT MANAGEMENT
C. Team Roles
 Linux Bash Script, C Socket Program: Eddie
 Linux Bash Script/ C Socket Program/ (k, n) threshold:
Matt
 USB Interface: Sean
 GPIO: Zhou
Summary and Conclusion
Overall, the main focus of the project is creating a distributed,
consistent, and secure environment. The users are able to sync
with other hubs across the network, allowing backups or updates
regardless of their location. The distributing technique is based
on TCP/IP protocol. For consistency we are recording the
checksum and time stamps to see if any changes were made since
the last check. Security will focus on a k-n threshold scheme.
With these features, our device will create a consistent, secure,
and distributed network environment.
8
APENDIX
D. Application of Mathematics, Science and Engineering
In developing this prototype, we have used material from at
least three science of engineering courses:
1. ECE 353: Learned how to build embedded systems and
program microcontrollers.
2. ECE 354: Designed a network between two FPGA
development boards to send pixels of a large image
3. ECE 374: Learned about IP, server/client connections,
created a simple client/server socket application in Java
Detailed Example
For our prototype, we had to set up a TCP/IP socket
connection to allow two hubs to be able to communicate with
each other. Each hub had to act as a server and a client to the
other hub. Initially, the client and server was set up on one hub
through a local socket connection communicating through a
single port. We successfully maintained consistency between two
directories using our software and the local socket connection.
After successfully maintaining consistency through a local
socket connection, we had to expand this function to two
different hubs. In our networking course, we had to develop a
simple client service application. Based on our experience from
that socket application and the image-sharing network, we were
able to debug any issues that occurred.
E. Design and Performance of Experiments, Data Analysis and
Interpretation
Not yet performed.
F. Design of System, Component, or Process to Meet Desired
Needs within Realistic Constraints
The system requirements consist of having a small, compact
hub so that users can carry it with them without it being an issue.
Currently, our hub is an Intel Atom prototype board and is very
large to have to carry around. This will be addressed in future
prototypes and the final design, in which a custom-built PCB will
be used.
The other system requirement consists of performing data
transfer during the consistency maintenance. With extremely
large files, the transfer will take long amounts of time. Also, data
transfer rates are dependent on the connected network, which is
out of our control.
G.
Eddie Lai, CSE: Worked on socket and bash programming to
maintain consistency with Matt.
Matt Dube, CSE: Set up hubs to be able to network with each
other, researched potential applications, socket and bash
programming
Sean Busch, EE: Worked on the hub to host PC interface.
Looked into building a custom PCB with Zhou
Zhou Zheng, EE: Webmaster. Worked on hardware research.
H. Identification, Formulation, and Solution of Engineering
Problems
One of the main obstacles was how to detect whether a file was
modified based on a timestamp update. We solved this issue by
recording the checksum of the file, but there is also the case
where the timestamp may update without a modification.
Because of these issues, we are recording the checksum and the
timestamp. By detecting a timestamp update without a
modification, re-transmitting the file would be avoided.
I. Understanding of Professional and Ethical Responsibility
An issue of professional and ethical responsibility is that if we
were to be a General Dynamics group, there would be
expectations about delivering a satisfactory prototype that they
find useful. Since none of the team members have had any
involvement with military and secret applications, we had to
contact an employee at General Dynamics to give us an idea of
some relevant projects.
During our brainstorming of a project, we received
several emails from the General Dynamics representative and
it gave us an idea of something they would find beneficial.
During the career fair, we also brought up the idea to the
representative for General Dynamics and he loved the idea.
J. Team Communication
Weekly team meetings with our advisor is scheduled for each
Friday at 11am. Weekly team meetings without the advisor is
scheduled on a weekly basis. During the faculty meeting,
individual and group progress is discussed along with
challenging problems that our group has encountered. Our
weekly team meetings are for assigning roles to team members,
ensuring the previous roles were completed, and brainstorming
about potential upcoming issues.
Unplanned scheduled meetings occur many times a week in
Duda for many hours at a time for brainstorming and discussion.
Written and stored communication has been done through
email and individual journals shared among the team.
Eddie Lai has served as the primary faculty contact in charge
of responding to emails and setting up any meetings.
K. Understanding of the impact of engineering solutions in a
global, echonomic, environmental and societal context
Our project will have positive ____ impacts by allowing users to
collaborate over a distributed network. Our project allows
distributed collaboration ona set of data and allows a group of
people to have control over when a set of people can have access
to data. Data cannot be accessed without the minimum subset of
users, which will prevent one user from having all the power.
L. Application of Material Acquired Outside of Coursework
Three examples of sources used outside of coursework is:
1. Md5deep manual page. This provided us with the
solution on how to record all checksums and their file
paths.
2. Du manual page. This provided us with the solution on
how to list all folder paths
3. Bash scripting and C socket programming tutorials posted
on the internet. These provided us with specific
solutions to issues of how to detect and send changes
9
M. Knowledge of Contemporary Issues
People always wants to make a backup of their data for
safekeeping; many often times, there are multiple backups.
After a change is made on one of the backups, it is extremely
tedious to mirror the updates to all of the other backups
especially with many backups or many updates. Our project
addresses this issue and gives an extremely easy and
transparent method in keeping backups consistent in many
different locations.
Our project also allows people to collaborate on a common set
of data all over the world. Distributed file sharing is an
important part of communication.
References:
[1] Cooper, Mendel. “Advanced Bash-Scripting Guide”.
http://tldp.org/LDP/abs/html/
[2] Kurihara, Jun. “A New (k, n)- Threshold Secret Sharing
Scheme and Its Extension”.
http://isc08.twisc.org/slides/S10P2_A_New_(k,n)Threshold_Secret_Sharing_Scheme_and_Its_Extension.pdf
[3] Metalx1000. “Learn Bash Scripts-tutorial”.
http://www.youtube.com/watch?v=QGvvJO5UIs4
[4] AIMB-212_DS (English Version). PDF.
http://support.advantech.com.tw/Support/DownloadSearchByPro
duct.aspx?keyword=AIMB212&ctl00_ContentPlaceHolder1_EbizTabStripNoForm1_Tab=
Datasheet
[5] Aimb-212_user_manual_ed.2-FINAL. PDF.
http://support.advantech.com.tw/Support/DownloadSearchByPro
duct.aspx?keyword=AIMB212&ctl00_ContentPlaceHolder1_EbizTabStripNoForm1_Tab=
Manual
Download