A Novel Deduplication-Based Covert Channel in Cloud Storage Service Hermine Hovhannisyan

advertisement
A Novel Deduplication-Based Covert Channel in
Cloud Storage Service
Hermine Hovhannisyan∗ , Kejie Lu†,‡ , Rongwei Yang§,∗ , Wen Qi∗ , Jianping Wang∗ , Mi Wen†
∗
Department of Computer Science, City University of Hong Kong
School of Computer Engineering, Shanghai University of Electric Power, Shanghai, China
‡
Department of Electrical and Computer Engineering, University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico
School of Computer Science and Technology, University of Science and Technology of China(USTC), Hefei, Anhui, China
†
§
Abstract—To efficiently provide cloud storage services, most
providers implement data deduplication schemes so as to reduce storage and network bandwidth consumption. Due to its
broad application, many security issues about data deduplication
have been investigated, such as data security, user privacy,
etc. Nevertheless, we note that the threat of establishing covert
channel over cloud storage has not been fully investigated. In
particular, existing studies only demonstrate the potential of a
single-bit channel, in which a sender can upload one of the two
predefined files for a receiver to infer the information of “0” and
“1”. In this paper, we design a more powerful deduplicationbased covert channel that can be used to transmit a complete
message. Specifically, the key features of our design include: (1)
a synchronization scheme that can establish a covert channel
between a sender and a receiver, and (2) a novel coding scheme
that allows each file to represent multiple bits in the message. To
evaluate the proposed design, we implement the covert channel
and conduct extensive experiments in different cloud storage
systems. Our work highlights a more severe security threat in
cloud storage services.
Keywords—cloud storage service, deduplication, covert channel
I.
I NTRODUCTION
In recent years, cloud storage service has been widely
adopted as one of the most popular cloud services. Today, more
than a billion users store their data in cloud storage [1]. With
the increasing number of users and their files, cloud storage
providers need to not only expand their storage capacity but
also design efficient storage scheme to avoid duplicated files
or file chunks to be stored in the cloud.
To achieve this goal, most cloud storage providers have
used data deduplication, a method that keeps only a single
copy of each unique file or file chunk [2]. With deduplication,
every time a user wants to store a file, the file or its chunks
will be compared to those stored in the cloud. If there is a
match, the file or chunk will not be stored; instead, a link to
the existing file or chunk will be created.
In practice, different types of deduplication schemes can be
applied [3]. First, deduplication can be applied to a single user
or to multiple users. Single-user deduplication is performed
only for data of the same user, which means that the same
data will be stored as separate copies in the cloud for different
users. Alternatively, cross-user deduplication stores only one
copy of each piece of data, regardless of the users. Second,
deduplication checking can be done at the client side or server
side. Most cloud providers choose to do so at the client
side, also known as source-based deduplication, due to its
additional benefit of reducing Internet bandwidth consumption.
Depending on deduplication settings, up to 90% of space and
bandwidth can be saved [4].
Along with its benefit, cross-user data deduplication can
be exploited if deduplication authorization (or proof-ofownership) is not properly designed [5]–[7]. For instance,
in 2011, a malicious tool “DropShip” exploited Dropbox’s
deduplication mechanism, which allows the access to a file
with the hash value of the file. On the other hand, an attacker
can also exploit the deduplication routine to infer whether
a victim has a certain file, which is a type of side channel
attack that violates the privacy of a user. To deal with these
issues, some providers (e.g. Dropbox) decided to stop crossuser deduplication. Nevertheless, many other big providers still
use cross-user source-based data deduplication.
Besides the above issues, another potential threat associated
with data deduplication is covert channel, which is a hidden
communication model that aims to exchange information bypassing security policies [8]. Cross-user data deduplication
opens a back door for information to be leaked from one
user to another through covert channels. Recently, several
researchers have identified a single-bit covert channel that can
be established by making use of the cross-user deduplication
[9]–[13].
In a single-bit covert channel, there is a sender program
residing at the victim computer and a receiver program at
the attacking computer. The sender holds two different files,
predefined to symbolize “0” or “1”, respectively. If the sender
wants to send “1” to the receiver, it will upload the “1” file to
the cloud. If the sender wants to send “0” to the receiver, it will
upload the “0” file. The receiver then retrieves the message by
loading both files to the cloud. If the receiver finds out the
uploading time of the “1” file is much faster than the time
needed to transmit the “0” file, it can infer that deduplication
on the “1” file has been conducted and the sender must have
tried to send “1” to the receiver. Similarly, the receiver can
infer whether the sender tried to send “0”.
If we directly apply a single-bit covert channel to transmit
978-1-4799-5952-5/15/$31.00 ©2015 IEEE
a piece of complicated message with multiple bits, it will face
two major problems: sending many files and out of order
delivery. For each bit to transmit through a one-bit covert
channel, two files are needed. This increases the risk of being
detected. Also, in order to receive accurate messages, it is very
important to upload files in the right order, that is the sender
uploads before the receiver. However, in the existing channels,
there is no mechanism that allows the sender and the receiver
to communicate. In this case, the receiver may upload the files
before the sender and the message will be lost.
For the above reasons, the risk of covert channel has been
underestimated. In this paper, we demonstrate a new method
that allows to transmit multi-bit messages instead of a single
bit. Specifically, we design a novel synchronization scheme
that can establish a covert channel between a sender and a
receiver such that the order of delivery does not invalidate
message recovery. Secondly, in our covert channel, we design
a novel coding scheme that can allow each file to represent
multiple bits in the message. Thirdly, we test our channel on
two big cloud providers — SugarSync and BaiduYun — and
verify the efficiency of the design.
Our contributions are summarized as follows:
•
We demonstrate a new multi-bit covert channel in
cloud storage services that is a serious threat for cloud
users.
•
In the designed multi-bit covert channel in cloud
storage service, to achieve error-free decoding, we
propose a new synchronization technique.
•
In the proposed multi-bit covert channel design, we
consider that each file can represent multiple bits in the
message, and we eliminate the necessity of uploading
“0”s to reduce the number of files to be uploaded so
that the covert channel is hard to be detected.
•
We implement the proposed framework and evaluate
the efficiency in terms of the number of uploaded
files and achievable data rates on two different public
clouds.
The rest of the paper is organized as follows. In Section II,
we introduce data deduplication and single-bit covert channel.
In Section III, we present implementation details of the proposed channel and its transmission model. In Section IV, we
demonstrate our simulation results and evaluate the channel
performance on different storage providers. And finally, in
Section V, we conclude our study.
II.
BACKGROUND
A. Data deduplication
Data deduplication [14]–[16] is a mechanism for reducing
storage cost by eliminating redundant data. Each data chunk
or file is identified by a unique hash value, which is used for
comparing similar files to detect duplicates in the server. The
server only stores the original data instead of storing multiple
copies of the same data. Data deduplication can be classified
to two types.
^ĞƌǀĞƌ
ϵϬD
hƉůŽĂĚ&ŝůĞƐ
y
&
<
>
>ŝŶŬƚŽƚŚĞĞdžŝƐƚŝŶŐĨŝůĞ
&
&
< > y
<
>
ůŝĐĞ
ϲϬD
Fig. 1.
y
Žď
ϲϬD
An example of cross-user source based deduplication
1) Single-user or cross-user: In the former case, deduplication occurs only when the same user uploads redundant data.
In the latter case, if a different user uploads a data unit that
already exists in the cloud, the data will not occupy a new
storage space, and the service provider will create a link of
the original data for other users.
2) Source-based or Target-based: Source-based deduplication calculates block’s hash function at the client machine
and determines whether to upload it or not. Target-based
deduplication occurs at the server side after a user uploads
the data. Obviously, the cloud storage services that adopt the
cross-user and source-based data deduplication can save both
storage and bandwidth.
For example, in Figure 1, the first user Alice copies files
A, B, C, D, E and F with overall size of 60MB to her local
cloud folder. All the files are uploaded to the server. Later,
another user Bob copies files D, A, C, K, L and X to his own
cloud folder, which are also 60MB all together. From Bob’s
side, only the files K, L and X (30MB) are uploaded to the
server, because the rest are duplicated files. Thus, the cloud
server saves 30MB storage and respective bandwidth. In our
work, we focus on cross-user source based deduplication.
B. Deduplication detection
Since the client machine does not need to upload files if
data deduplication is detected, a user can find out that the
duplicated data is uploaded faster than non-duplicated ones. In
the literature, there are two main methods to detect duplicated
data in the cloud: to check the transmission time and to
check the bandwidth consumption of the uploaded file. These
measurements can be manual or with the help of network
monitoring tools.
1) Transmission time detection: This method suggests to
measure the time each file takes to be uploaded to determine
if the file is a duplicate or not. If the file exists in the server,
it will take noticeably smaller time than an original file.
2) Bandwidth consumption detection: This is similar to
the transmission time detection. Client machine only needs
to upload the hash value to the server when a file is a duplicate, which hardly consumes any bandwidth. Therefore, the
attacker can identify duplication by measuring the bandwidth
consumption.
yϬ
DĞƐƐĂŐĞąϬĆ
yϭ
yϭ
Žď
Žď
yϬ
yϬ
ůŽƵĚ
Fig. 2.
yϬ
^ĞŶĚĞƌ
yϭ
^ƚĂƌƚ
^ƚĂƌƚ
hƉůŽĂĚD^'ƐƚĂƌƚ΀ŝ΁
hƉůƉĂĚD^'ƐƚĂƌƚ΀Ϭ΁Ͳ
D^'ƐƚĂƌƚ΀Ŷ΁
ůŝĐĞ
yϬ
ůŝĐĞ
yϬ͗WƌĞǀŝŽƵƐůLJ^ƚŽƌĞĚ͊
ZĞĐĞŝǀĞƌ
zĞƐ
An example of a single-bit covert channel
EŽ
C. A Single-Bit Covert Channel
/ĨD^'ŝŶĨŽ΀ŝ΁ссϭ
Two different files, X0 and X1 are initially stored both
in malicious software and Alice’s machine. Suppose they are
special enough so that no copy of each of them is previously
stored in the cloud storage servers. If a message “0” has to
be transferred, the file X0 will be uploaded by the malicious
software, otherwise, the file X1 will be uploaded. After a
“long” delay, Alice uploads file X0 and X1 to the same
cloud storage service as Bob, and then observes which file
has previously been uploaded and learns the message that the
malicious software has sent.
However, we cannot directly apply this technique to transmit complicated messages. In order to receive multi-bit messages, the receiver has to upload the files after the sender. In
this channel, though, there is no synchronization mechanism
to allow the sender and the receiver to communicate, which
will lead to out of order delivery. Also, two files are needed
to transmit each bit, which increases the chances of channel
detection.
III.
A N OVEL D EDUPLICATION - BASED C OVERT
C HANNEL
In this section, we present the main components of our
multi-bit covert channel design. When constructing the new
channel, we face the following challenges: out of order arrival
and sending many files. We solved the first problem by
introducing a new synchronization algorithm that uses different
file types to initialize communication. We also present a novel
coding scheme, which allows the number of files transmitted to
be much fewer than the number of bits in the message, helping
the sender to stay unnoticed. And finally, we demonstrate
a new method that we used to detect deduplication, which
allows error-free decoding, in the contrary to the traditional
deduplication detection schemes.
A. Threat model
In our covert channel scenario, we assume that the attacker
has a malicious program (insider) running in the victim’s
machine. Figure 3 illustrates the steps each side has to take
to initialize the covert channel. As soon as the victim starts
uploading files to his cloud folder, the insider waits for some
seconds and begins transmitting the covert message. When the
zĞƐ
EŽ
Figure 2 illustrates the idea of the single-bit covert channel.
An attacker (Alice) installs malicious software on a victim’s
(Bob) machine. The malicious software is called an insider,
which can upload files to the cloud storage service stealthily
by Bob’s account. Then, a covert channel can be established
for Alice to use the insider to send out covert data from Bob’s
machine.
hƉůŽĂĚD^'ĞŶĚ΀ŝнϭ΁Ͳ
D^'ĞŶĚ΀Ŷ΁
hƉůŽĂĚD^'ŝŶĨŽ΀ŝ΁͕
EŽ
EŽ
zĞƐ
/ĨD^'ĞŶĚ΀ũ΁
ĚĞƚĞĐƚĞĚ
hƉůŽĂĚD^'ŝŶĨŽ΀ŝ΁Ͳ
D^'ŝŶĨŽ΀ũ΁
zĞƐ
ŚĞĐŬŝĨǀŝĐƚŝŵ
ƐƚŽƉƐ
EŽ
hƉůŽĂĚD^'ŝŶĨŽ͕
D^'ĞŶĚ΀ũ΁
ĞůĞƚĞD^'
hƉůŽĂĚD^'ĞŶĚ΀ũ΁
ĞůĞƚĞD^'
/ĨD^'ŝŶĨŽ΀Ŭ΁
ƵƉůŽĂĚĞĚ
ZĞĐŽƌĚ͚ϭ͛
zĞƐ
ZĞĐŽƌĚ͚Ϭ͛
ĞĐŽĚĞ
DĞƐƐĂŐĞ
ŶĚ
Fig. 3.
/ĨD^'ƐƚĂƌƚ΀ŝ΁
ĚĞƚĞĐƚĞĚ
ŶĚ
Covert channel initialization diagram
victim’s upload is finished, the insider also stops transmitting.
In this way, the insider minimizes the risk of channel exposure.
Later, the attacker (receiver) uploads the same files as the
insider has uploaded at the particular time frame, which will
be explained shortly, and checks for duplicate values. The
uploaded messages are generated by timestamp and both sides
have the same file generation program. Our message decoding
algorithm is designed to be flexible, such that messages may
be recovered at arbitrary future intervals by the receiver.
For example, a receiver may attempt message decoding
on a daily or weekly basis and still correctly recover the
message. This is because each execution of the detection
algorithm runs from the last known recovery timestamp until
the current timestamp. More details about file generation and
synchronization are given later in this section.
B. File generation and synchronization
In order to ensure error free transmission, the receiver has
to know the time period at which the message was sent. This is
why we design a new synchronization method, where messages
are composed of three types of files. M SGstart indicates the
beginning time of the message, M SGend contains the time
when message transmission ends, and M SGinf o carries the
main message. Here, M SGinf o is a stream of encoded bits
and aims to transmit the whole message in a bit stream, unless
the transmission is interrupted by victim’s actions.
The above described files consist of three components random content, type field and timestamp. Random content
is the main part of the file that ensures the file’s uniqueness.
Type field identifies the type of the file - start, inf o or end.
Timestamp is placed according to the file generation time and
is used to synchronize the files between the insider and the
attacker. It can also be identified as the serial number of the
dŝŵĞůŝŶĞ
/ŶƐŝĚĞƌ
ϭ Ϯ ϯ ϰ ϱ ϲ
ůŽƵĚ
ƚƚĂĐŬĞƌ
ϭ
ϯ
ůŽƵĚ
^ƚŽƌĂŐĞ
^ĞƌǀĞƌ
Ϯ
͘͘͘
Fig. 6.
͘͘͘
Transmission model design
/ŶƐŝĚĞƌ
΀ϭ΁ ΀Ϯ΁ ΀ϯ΁ ΀ϰ΁ ΀ϱ΁ ΀ϲ΁΁
ϭ Ϭ Ϭ ϭ Ϭ ϭ
ϭ ϰ ϲ
sŝĐƚŝŵ
Fig. 5.
Decoding of a 6 bit message
used to represent “1”. In this case, if a message has N bits,
the insider needs to generate N files.
ƚƚĂĐŬĞƌ
Fig. 4.
΀ϭ΁΀Ϯ΁΀ϯ΁΀ϰ΁΀ϱ΁΀ϲ΁
ϭ Ϭ Ϭ ϭ Ϭ ϭ
ůŽƵĚ
Transmission of a 6 bit message
file. It is important to mention that timestamp is not based on
real time; it is only a code to assist synchronization.
Figure 4 illustrates the file synchronization between an
attacker and an insider. The insider and the attacker have
the same file generation program that creates files based on
timestamp. In this way, the attacker knows the exact set of
files to upload, if it learns the time period in which they have
been uploaded.
In this example, M SGstart and M SGend have timestamps T s and T e. T s is the initial parameter for M SGinf o
generation, and T e−T s specifies the set of M SGinf o files to
be checked by the receiver. M SGinf o consists of many files
with different timestamps, and they are sorted by timestamp
value, where the i-th file is marked as M SGinf o(i). In the
following subsections, we will further explain how a sequence
of bits can be carried by a sequence of M SGinf o(i).
C. Message encoding and transmission
Most cloud storage systems have a local folder where a
user can copy files and these files are automatically uploaded
into the server. The insider can use this folder to upload covert
messages each time the victim starts uploading. Specifically,
to send a message, the insider can generate and save a set of
files in the folder.
To avoid being detected by the victim, the insider can
delete all generated files after the message transmission. In
our experiments, we have discovered that, even if the insider
deletes the files from the local folder, the files still remain on
the cloud, which a common practice in many cloud systems.
Such a fact makes it possible for the receiver to “read” the
message at a later time. Our experiments also show that
cloud storage providers do not delete files from their servers
for a long time (around one month). If the transmission is
interrupted by the victim, the insider uploads M SGend file to
indicate the end of the transmission.
To encode a message, a straightforward way is to generate
one file to represent each bit. More specifically, a particular
file will be used to represent “0”, and a different file will be
Since each M SGinf o(i) has its timestamp, we can improve the efficiency by sending a file only when the corresponding bit is “1”. Figure 5 demonstrates a simple example of
a 6-bit message transmission using such a method. In this example, to send a message “100101”, only files M SGinf o[1],
M SGinf o[4] and M SGinf o[6] are uploaded.
To further improve the transmission rate and to reduce the
risk of detection, we also propose a new multiple bits per file
method, in which a single file can represent multiple bits. For
example, to transmit two bits per one file, instead of uploading
up to two files to infer the two bits, we can define three files,
each of which represents a two-bit pattern: “01”, “10” and
“11”. Here, the sender can transmit only one of the three files
to send a pattern and remain silent to indicate a “00” pattern.
The receiver uploads three different files and if no duplication
is detected, the message is “00”, otherwise the message is one
of the three codewords, depending on the type. Clearly, we can
further generalize this technique to transmit even more bits by
using one file.
D. Message decoding
To decode the message, the attacker periodically checks if
any files have been sent by uploading all the files at given time
periods, since last known detection timestamp. To learn the
beginning of the message, it first uploads a set of the possible
M SGstart files. When a duplication occurs, the attacker
records the timestamp T s of the detected M SGstart file and
then uploads the set of the possible M SGend to make sure
the insider has uploaded the M SGinf o completely. Lastly, the
attacker uploads a set of all possible M SGinf o files which
have timestamp T s or later. At the given time period the
attacker can detect several M SGstart and M SGend files,
which indicate that the insider has uploaded several times.
The detailed decoding process for a one-bit per file channel is
illustrated in Algorithm 1.
Figure 6 shows the decoding process of the one
bit per file example demonstrated in Section III-C. After uploading M SGstart and M SGend files, the attacker
uploads M SGinf o files generated at that time period
(M SGinf o[1] − M SGinf o[6] in our example). Because duplicated files are not uploaded, the attacker detects which
files already exist in the cloud. In this example, it learns
that M SGinf o[1], M SGinf o[4] and M SGinf o[6] exist on
the cloud, which means they are “1” and the remaining
M SGinf o[2], M SGinf o[3] and M SGinf o[5] are “0”.
E. Elimination of Redundant Files
In our design, we drastically reduce the number of files to
be sent compared to the previous one-bit channel. To achieve
that, we use the following techniques: 1) the sender uploads
only some part of the message (selective message upload) , 2)
we use a coding scheme to transmit multiple bits in one file.
Suppose that an insider transmits an n bit message and
the number of “1”s in the message is m, where m <= n. In
existing one-bit covert channel model, the sender uploads n
files and the receiver uploads 2n files to recover a message.
In technique 1, where we do not transmit “0”s, the insider
uploads only m files and the attacker uploads around n + k
files, where k is the number of M SGstart and M SGend
files. In technique 2, we use coding to transmit multiple bits in
one file, resulting in a further reduction in files sent compared
to technique 1. File reduction at the sender’s side is very
important because if the sender consumes too much bandwidth,
the channel can be exposed easily.
F. Deduplication Detection Techniques
As we mentioned in Section II, the receiver can verify
whether a file is a duplicate by computing uploading time
or bandwidth size. Our studies show that most cloud storage
client software allow to examine file transfer status, which can
be used to determine the time cost. Another option is to use
network monitoring tools like Wireshark to analyze the traffic
between the client and the cloud storage server.
To estimate uploading time more accurately, the receiver
can initially limit the upload traffic. Then, it uploads files and
observes the time cost of each file. For example, the receiver
can set the file upload size to 1 KB and the upload rate to 1
Kbps. If a file is duplicated, it only needs around 4 seconds to
upload, otherwise, it takes more than 10 seconds. We should
note that the larger the file, the bigger the difference of the
uploading time is.
However, during our experiment we find that it is also
possible to detect deduplication by examining the log files of
the cloud storage software. For example, when deduplication
occurs we can find the keywords “hash match” in SugarSync
synchronization log.
10000
Number of files to send
Algorithm 1 Message decoding for the one-bit per file channel
1: for all M SGstart[0] − M SGstart[n] do
2:
if M SGstart[i], i ∈ n exist then
3:
Record T s[i]
4:
end if
5: end for
6: for all M SGend[i + 1] − M SGend[n] do
7:
if M SGend[j], j ∈ n exist then
8:
Record T e[j]
9:
end if
10: end for
11: for all M SGinf o[T e[j] − T s[i]] do
12:
if M SGinf o[k], k ∈ n uploaded then
13:
bindata [k]: = 0
14:
else
15:
bindata [k]: = 1
16:
end if
17: end for
8000
Channel 1
Channel 2
Channel 3
6000
4000
2000
0
0
200
400
600
800
Message size (Bytes)
1000
1200
Fig. 7. Comparison of the number of the uploading files in Channel 1,
Channel 2 and Channel 3
IV.
P ERFORMANCE E VALUATION AND D ISCUSSION
In this section, we analyze the results of our experiments.
In Section IV-A, we discuss the implementation details of the
proposed covert channels. In Section IV-B, we evaluate the
efficiency of our channels by comparing their performance, in
terms of the achievable data rate, on two cloud providers. And
finally, in Section IV-C, we discuss the main factors that can
affect our channel performance.
A. System Implementation
We have tested our covert channels by carefully conducting
a set of experiments on two cloud storage services - SugarSync
and BaiduYun. For each of them, we first create two accounts.
We then develop a realistic testbed using lab computers as
victim’s and attacker’s machines, where we use Python as the
programming language to develop the covert channels.
To discover the highest achievable data rate, we test the
channels using different size of files. In our study, we note that
very small files may likely be in the cloud server already, which
will result in incorrect detection. On the other hand, we do
not want to upload large files to prevent channel exposure and
high bandwidth consumption. Therefore, for our experiments,
we use files in the size range between 100Bytes to 1MB.
B. Experimental results
In Figures 7, due to limited space, we only compare
the performance of three different channels in terms of file
redundancy: 1) Channel 1: one bit per file, 2) Channel 2:
one bit per file, not transmitting “0”, and 3) Channel 3: two
bits per file, not transmitting “00”. In Channel 3, we apply
the scheme discussed in Section III-C. The figure shows the
number of files that each channel needs to send for messages
with different sizes. In each evaluation group, we apply the
same message for three different channels. We use different
messages for different evaluation groups. As we can observe,
compared to Channel 1, Channel 2 can reduce the number
of files by about 50%, which demonstrates the stealth of
our proposed channel. And Channel 3 achieves even better
performance than Channel 2 because each file can represent
two bits.
In Figures 8 and 9, we demonstrate the transmission
rates of the three channels. In each experiment, we transmit
1000 files of the same size. The results show that while the
maximum achievable data rate for Channel 1 is only 1.69
5
4
Data Rate (bps)
allow the sender to transmit fewer files, so as to improve the
transmission data rate and reduce the risk of channel exposure.
To evaluate the proposed channel, we have conducted extensive
experiments on existing cloud storage services. Our study
demonstrates that the potential threats of the deduplicationbased covert channel can be more severe than expected.
Channel 1
Channel 2
Channel 3
3
2
1
0
−1
10
Fig. 8.
0
10
1
10
File Size (KB)
2
10
3
10
Data Rate (bps)
ACKNOWLEDGEMENTS
The work is supported in part by a General Research
Fund from Hong Kong Research Grant Council under project
122913 and project 61272462 from NSFC China, and by the
Shanghai Oriental Scholar Program.
Comparison of the data rates on SugarSync
15
Channel 1
Channel 2
Channel 3
10
R EFERENCES
5
0
−1
10
Fig. 9.
VI.
0
10
1
10
File Size (KB)
2
10
3
10
Comparison of the data rates on BaiduYun
bps on SugarSync and 3.64 bps on BaiduYun, Channel 3 can
transmit at up to 4.35 bps on SugarSync and 13.15 bps on
BaiduYun. We can also observe that there may exist an optimal
size of file that can lead to the maximal data rate.
C. Discussions
There are several factors that can affect our channel
performance. Here are some of our observations during the
experiments:
•
User behavior: The insider only operates when the
victim is uploading. The longer the victim uploads
the longer the insider transmits.
•
Cloud software: Cloud service software have specific
rules about uploading files and we can not leverage
the whole network bandwidth. For example, BaiduYun
software can upload 3-5 files concurrently but SugarSync software uploads files one by one.
•
Cloud location: Depending on the data center location,
channel rate varies. For example, BaiduYun data centers are closer to our location than SugarSync, which
explains the difference in the results.
•
The size of the files: The generated files can not be
very small because there is chance that they may exist
in the cloud already. However, if we choose very large
files, the channel performance will decrease.
V.
C ONCLUSION
In this paper, we have investigated the threats of
deduplication-based covert channel that leverages cross-user
data deduplication technique, which is adopted in most cloud
storage services. In particular, we have proposed a novel
deduplication-based covert channel, for which we have designed and implemented a new synchronization mechanism
that solves the message reordering issue and allows sending realistic multi-bit messages. We also developed new methods that
[1] J. Research. (2014) Cloud services to be adopted by 3.6bn consumers
globally by 2018. [Online]. Available: www.juniperresearch.com/
press-release/cloud-computing-pr1
[2] W. Leesakul, P. Townend, and J. Xu, “Dynamic data deduplication in
cloud storage,” in Proc. IEEE Service Oriented System Engineering
(SOSE), 2014, pp. 320–325.
[3] J. Paulo and J. Pereira, “A survey and classification of storage deduplication systems,” ACM Computing Surveys (CSUR), vol. 47, no. 1,
p. 11, 2014.
[4] M. Dutch and L. FREEMan, “Understanding data de-duplication ratios,”
2009.
[5] O. Heen, C. Neumann, L. Montalvo, and S. Defrance, “Improving
the resistance to side-channel attacks on cloud storage services,” in
Proc. 5th International Conference on New Technologies, Mobility and
Security (NTMS), 2012, pp. 1–5.
[6] S. Lee and D. Choi, “Privacy-preserving cross-user source-based data
deduplication in cloud storage,” in Proc. ICT Convergence (ICTC),
2012, pp. 329–330.
[7] M. Dahshan and S. Elkassass, “Data security in cloud storage services,”
in The Fifth International Conference on Cloud Computing, GRIDs, and
Virtualization, 2014, pp. 1–5.
[8] S. Ju and X. Song, “On the formal characterization of covert channel,”
in Content Computing, ser. Lecture Notes in Computer Science, 2004,
vol. 3309, pp. 155–160.
[9] D. Harnik, B. Pinkas, and A. Shulman-Peleg, “Side channels in cloud
services: Deduplication in cloud storage,” IEEE Security Privacy, vol. 8,
no. 6, pp. 40–47, 2010.
[10] M. Mulazzani, S. Schrittwieser, M. Leithner, M. Huber, and E. Weippl,
“Dark clouds on the horizon: Using cloud storage as attack vector and
online slack space,” in USENIX Security Symposium, 2011.
[11] S. Halevi, D. Harnik, B. Pinkas, and A. Shulman-Peleg, “Proofs of
ownership in remote storage systems,” in Proc. 18th ACM Conference
on Computer and Communications Security CCS ’11, 2011, pp. 491–
500.
[12] Q. Zheng and S. Xu, “Secure and efficient proof of storage with deduplication,” in Proc. Second ACM Conference on Data and Application
Security and Privacy CODASPY ’12, 2012, pp. 1–12.
[13] R. Di Pietro and A. Sorniotti, “Boosting efficiency and security in
proof of ownership for deduplication,” in Proc. 7th ACM Symposium
on Information, Computer and Communications Security ASIACCS ’12,
2012, pp. 81–82.
[14] T. Pulls, “(more) side channels in cloud storage,” in Privacy and Identity
Management for Life, 2012, vol. 375, pp. 102–115.
[15] D. Russell, “Data deduplication will be even bigger in 2010,” Gartner,
February, 2010.
[16] P. Neelaveni and M. Vijayalakshmi, “A survey on deduplication in cloud
storage,” Asian Journal of Information Technology, vol. 13, no. 6, pp.
320–330, 2014.
Download