A Novel Deduplication-Based Covert Channel in Cloud Storage Service Hermine Hovhannisyan∗ , Kejie Lu†,‡ , Rongwei Yang§,∗ , Wen Qi∗ , Jianping Wang∗ , Mi Wen† ∗ Department of Computer Science, City University of Hong Kong School of Computer Engineering, Shanghai University of Electric Power, Shanghai, China ‡ Department of Electrical and Computer Engineering, University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico School of Computer Science and Technology, University of Science and Technology of China(USTC), Hefei, Anhui, China † § Abstract—To efficiently provide cloud storage services, most providers implement data deduplication schemes so as to reduce storage and network bandwidth consumption. Due to its broad application, many security issues about data deduplication have been investigated, such as data security, user privacy, etc. Nevertheless, we note that the threat of establishing covert channel over cloud storage has not been fully investigated. In particular, existing studies only demonstrate the potential of a single-bit channel, in which a sender can upload one of the two predefined files for a receiver to infer the information of “0” and “1”. In this paper, we design a more powerful deduplicationbased covert channel that can be used to transmit a complete message. Specifically, the key features of our design include: (1) a synchronization scheme that can establish a covert channel between a sender and a receiver, and (2) a novel coding scheme that allows each file to represent multiple bits in the message. To evaluate the proposed design, we implement the covert channel and conduct extensive experiments in different cloud storage systems. Our work highlights a more severe security threat in cloud storage services. Keywords—cloud storage service, deduplication, covert channel I. I NTRODUCTION In recent years, cloud storage service has been widely adopted as one of the most popular cloud services. Today, more than a billion users store their data in cloud storage [1]. With the increasing number of users and their files, cloud storage providers need to not only expand their storage capacity but also design efficient storage scheme to avoid duplicated files or file chunks to be stored in the cloud. To achieve this goal, most cloud storage providers have used data deduplication, a method that keeps only a single copy of each unique file or file chunk [2]. With deduplication, every time a user wants to store a file, the file or its chunks will be compared to those stored in the cloud. If there is a match, the file or chunk will not be stored; instead, a link to the existing file or chunk will be created. In practice, different types of deduplication schemes can be applied [3]. First, deduplication can be applied to a single user or to multiple users. Single-user deduplication is performed only for data of the same user, which means that the same data will be stored as separate copies in the cloud for different users. Alternatively, cross-user deduplication stores only one copy of each piece of data, regardless of the users. Second, deduplication checking can be done at the client side or server side. Most cloud providers choose to do so at the client side, also known as source-based deduplication, due to its additional benefit of reducing Internet bandwidth consumption. Depending on deduplication settings, up to 90% of space and bandwidth can be saved [4]. Along with its benefit, cross-user data deduplication can be exploited if deduplication authorization (or proof-ofownership) is not properly designed [5]–[7]. For instance, in 2011, a malicious tool “DropShip” exploited Dropbox’s deduplication mechanism, which allows the access to a file with the hash value of the file. On the other hand, an attacker can also exploit the deduplication routine to infer whether a victim has a certain file, which is a type of side channel attack that violates the privacy of a user. To deal with these issues, some providers (e.g. Dropbox) decided to stop crossuser deduplication. Nevertheless, many other big providers still use cross-user source-based data deduplication. Besides the above issues, another potential threat associated with data deduplication is covert channel, which is a hidden communication model that aims to exchange information bypassing security policies [8]. Cross-user data deduplication opens a back door for information to be leaked from one user to another through covert channels. Recently, several researchers have identified a single-bit covert channel that can be established by making use of the cross-user deduplication [9]–[13]. In a single-bit covert channel, there is a sender program residing at the victim computer and a receiver program at the attacking computer. The sender holds two different files, predefined to symbolize “0” or “1”, respectively. If the sender wants to send “1” to the receiver, it will upload the “1” file to the cloud. If the sender wants to send “0” to the receiver, it will upload the “0” file. The receiver then retrieves the message by loading both files to the cloud. If the receiver finds out the uploading time of the “1” file is much faster than the time needed to transmit the “0” file, it can infer that deduplication on the “1” file has been conducted and the sender must have tried to send “1” to the receiver. Similarly, the receiver can infer whether the sender tried to send “0”. If we directly apply a single-bit covert channel to transmit 978-1-4799-5952-5/15/$31.00 ©2015 IEEE a piece of complicated message with multiple bits, it will face two major problems: sending many files and out of order delivery. For each bit to transmit through a one-bit covert channel, two files are needed. This increases the risk of being detected. Also, in order to receive accurate messages, it is very important to upload files in the right order, that is the sender uploads before the receiver. However, in the existing channels, there is no mechanism that allows the sender and the receiver to communicate. In this case, the receiver may upload the files before the sender and the message will be lost. For the above reasons, the risk of covert channel has been underestimated. In this paper, we demonstrate a new method that allows to transmit multi-bit messages instead of a single bit. Specifically, we design a novel synchronization scheme that can establish a covert channel between a sender and a receiver such that the order of delivery does not invalidate message recovery. Secondly, in our covert channel, we design a novel coding scheme that can allow each file to represent multiple bits in the message. Thirdly, we test our channel on two big cloud providers — SugarSync and BaiduYun — and verify the efficiency of the design. Our contributions are summarized as follows: • We demonstrate a new multi-bit covert channel in cloud storage services that is a serious threat for cloud users. • In the designed multi-bit covert channel in cloud storage service, to achieve error-free decoding, we propose a new synchronization technique. • In the proposed multi-bit covert channel design, we consider that each file can represent multiple bits in the message, and we eliminate the necessity of uploading “0”s to reduce the number of files to be uploaded so that the covert channel is hard to be detected. • We implement the proposed framework and evaluate the efficiency in terms of the number of uploaded files and achievable data rates on two different public clouds. The rest of the paper is organized as follows. In Section II, we introduce data deduplication and single-bit covert channel. In Section III, we present implementation details of the proposed channel and its transmission model. In Section IV, we demonstrate our simulation results and evaluate the channel performance on different storage providers. And finally, in Section V, we conclude our study. II. BACKGROUND A. Data deduplication Data deduplication [14]–[16] is a mechanism for reducing storage cost by eliminating redundant data. Each data chunk or file is identified by a unique hash value, which is used for comparing similar files to detect duplicates in the server. The server only stores the original data instead of storing multiple copies of the same data. Data deduplication can be classified to two types. ^ĞƌǀĞƌ ϵϬD hƉůŽĂĚ&ŝůĞƐ y & < > >ŝŶŬƚŽƚŚĞĞdžŝƐƚŝŶŐĨŝůĞ & & < > y < > ůŝĐĞ ϲϬD Fig. 1. y Žď ϲϬD An example of cross-user source based deduplication 1) Single-user or cross-user: In the former case, deduplication occurs only when the same user uploads redundant data. In the latter case, if a different user uploads a data unit that already exists in the cloud, the data will not occupy a new storage space, and the service provider will create a link of the original data for other users. 2) Source-based or Target-based: Source-based deduplication calculates block’s hash function at the client machine and determines whether to upload it or not. Target-based deduplication occurs at the server side after a user uploads the data. Obviously, the cloud storage services that adopt the cross-user and source-based data deduplication can save both storage and bandwidth. For example, in Figure 1, the first user Alice copies files A, B, C, D, E and F with overall size of 60MB to her local cloud folder. All the files are uploaded to the server. Later, another user Bob copies files D, A, C, K, L and X to his own cloud folder, which are also 60MB all together. From Bob’s side, only the files K, L and X (30MB) are uploaded to the server, because the rest are duplicated files. Thus, the cloud server saves 30MB storage and respective bandwidth. In our work, we focus on cross-user source based deduplication. B. Deduplication detection Since the client machine does not need to upload files if data deduplication is detected, a user can find out that the duplicated data is uploaded faster than non-duplicated ones. In the literature, there are two main methods to detect duplicated data in the cloud: to check the transmission time and to check the bandwidth consumption of the uploaded file. These measurements can be manual or with the help of network monitoring tools. 1) Transmission time detection: This method suggests to measure the time each file takes to be uploaded to determine if the file is a duplicate or not. If the file exists in the server, it will take noticeably smaller time than an original file. 2) Bandwidth consumption detection: This is similar to the transmission time detection. Client machine only needs to upload the hash value to the server when a file is a duplicate, which hardly consumes any bandwidth. Therefore, the attacker can identify duplication by measuring the bandwidth consumption. yϬ DĞƐƐĂŐĞąϬĆ yϭ yϭ Žď Žď yϬ yϬ ůŽƵĚ Fig. 2. yϬ ^ĞŶĚĞƌ yϭ ^ƚĂƌƚ ^ƚĂƌƚ hƉůŽĂĚD^'ƐƚĂƌƚŝ hƉůƉĂĚD^'ƐƚĂƌƚϬͲ D^'ƐƚĂƌƚŶ ůŝĐĞ yϬ ůŝĐĞ yϬ͗WƌĞǀŝŽƵƐůLJ^ƚŽƌĞĚ͊ ZĞĐĞŝǀĞƌ zĞƐ An example of a single-bit covert channel EŽ C. A Single-Bit Covert Channel /ĨD^'ŝŶĨŽŝссϭ Two different files, X0 and X1 are initially stored both in malicious software and Alice’s machine. Suppose they are special enough so that no copy of each of them is previously stored in the cloud storage servers. If a message “0” has to be transferred, the file X0 will be uploaded by the malicious software, otherwise, the file X1 will be uploaded. After a “long” delay, Alice uploads file X0 and X1 to the same cloud storage service as Bob, and then observes which file has previously been uploaded and learns the message that the malicious software has sent. However, we cannot directly apply this technique to transmit complicated messages. In order to receive multi-bit messages, the receiver has to upload the files after the sender. In this channel, though, there is no synchronization mechanism to allow the sender and the receiver to communicate, which will lead to out of order delivery. Also, two files are needed to transmit each bit, which increases the chances of channel detection. III. A N OVEL D EDUPLICATION - BASED C OVERT C HANNEL In this section, we present the main components of our multi-bit covert channel design. When constructing the new channel, we face the following challenges: out of order arrival and sending many files. We solved the first problem by introducing a new synchronization algorithm that uses different file types to initialize communication. We also present a novel coding scheme, which allows the number of files transmitted to be much fewer than the number of bits in the message, helping the sender to stay unnoticed. And finally, we demonstrate a new method that we used to detect deduplication, which allows error-free decoding, in the contrary to the traditional deduplication detection schemes. A. Threat model In our covert channel scenario, we assume that the attacker has a malicious program (insider) running in the victim’s machine. Figure 3 illustrates the steps each side has to take to initialize the covert channel. As soon as the victim starts uploading files to his cloud folder, the insider waits for some seconds and begins transmitting the covert message. When the zĞƐ EŽ Figure 2 illustrates the idea of the single-bit covert channel. An attacker (Alice) installs malicious software on a victim’s (Bob) machine. The malicious software is called an insider, which can upload files to the cloud storage service stealthily by Bob’s account. Then, a covert channel can be established for Alice to use the insider to send out covert data from Bob’s machine. hƉůŽĂĚD^'ĞŶĚŝнϭͲ D^'ĞŶĚŶ hƉůŽĂĚD^'ŝŶĨŽŝ͕ EŽ EŽ zĞƐ /ĨD^'ĞŶĚũ ĚĞƚĞĐƚĞĚ hƉůŽĂĚD^'ŝŶĨŽŝͲ D^'ŝŶĨŽũ zĞƐ ŚĞĐŬŝĨǀŝĐƚŝŵ ƐƚŽƉƐ EŽ hƉůŽĂĚD^'ŝŶĨŽ͕ D^'ĞŶĚũ ĞůĞƚĞD^' hƉůŽĂĚD^'ĞŶĚũ ĞůĞƚĞD^' /ĨD^'ŝŶĨŽŬ ƵƉůŽĂĚĞĚ ZĞĐŽƌĚ͚ϭ͛ zĞƐ ZĞĐŽƌĚ͚Ϭ͛ ĞĐŽĚĞ DĞƐƐĂŐĞ ŶĚ Fig. 3. /ĨD^'ƐƚĂƌƚŝ ĚĞƚĞĐƚĞĚ ŶĚ Covert channel initialization diagram victim’s upload is finished, the insider also stops transmitting. In this way, the insider minimizes the risk of channel exposure. Later, the attacker (receiver) uploads the same files as the insider has uploaded at the particular time frame, which will be explained shortly, and checks for duplicate values. The uploaded messages are generated by timestamp and both sides have the same file generation program. Our message decoding algorithm is designed to be flexible, such that messages may be recovered at arbitrary future intervals by the receiver. For example, a receiver may attempt message decoding on a daily or weekly basis and still correctly recover the message. This is because each execution of the detection algorithm runs from the last known recovery timestamp until the current timestamp. More details about file generation and synchronization are given later in this section. B. File generation and synchronization In order to ensure error free transmission, the receiver has to know the time period at which the message was sent. This is why we design a new synchronization method, where messages are composed of three types of files. M SGstart indicates the beginning time of the message, M SGend contains the time when message transmission ends, and M SGinf o carries the main message. Here, M SGinf o is a stream of encoded bits and aims to transmit the whole message in a bit stream, unless the transmission is interrupted by victim’s actions. The above described files consist of three components random content, type field and timestamp. Random content is the main part of the file that ensures the file’s uniqueness. Type field identifies the type of the file - start, inf o or end. Timestamp is placed according to the file generation time and is used to synchronize the files between the insider and the attacker. It can also be identified as the serial number of the dŝŵĞůŝŶĞ /ŶƐŝĚĞƌ ϭ Ϯ ϯ ϰ ϱ ϲ ůŽƵĚ ƚƚĂĐŬĞƌ ϭ ϯ ůŽƵĚ ^ƚŽƌĂŐĞ ^ĞƌǀĞƌ Ϯ ͘͘͘ Fig. 6. ͘͘͘ Transmission model design /ŶƐŝĚĞƌ ϭ Ϯ ϯ ϰ ϱ ϲ ϭ Ϭ Ϭ ϭ Ϭ ϭ ϭ ϰ ϲ sŝĐƚŝŵ Fig. 5. Decoding of a 6 bit message used to represent “1”. In this case, if a message has N bits, the insider needs to generate N files. ƚƚĂĐŬĞƌ Fig. 4. ϭϮϯϰϱϲ ϭ Ϭ Ϭ ϭ Ϭ ϭ ůŽƵĚ Transmission of a 6 bit message file. It is important to mention that timestamp is not based on real time; it is only a code to assist synchronization. Figure 4 illustrates the file synchronization between an attacker and an insider. The insider and the attacker have the same file generation program that creates files based on timestamp. In this way, the attacker knows the exact set of files to upload, if it learns the time period in which they have been uploaded. In this example, M SGstart and M SGend have timestamps T s and T e. T s is the initial parameter for M SGinf o generation, and T e−T s specifies the set of M SGinf o files to be checked by the receiver. M SGinf o consists of many files with different timestamps, and they are sorted by timestamp value, where the i-th file is marked as M SGinf o(i). In the following subsections, we will further explain how a sequence of bits can be carried by a sequence of M SGinf o(i). C. Message encoding and transmission Most cloud storage systems have a local folder where a user can copy files and these files are automatically uploaded into the server. The insider can use this folder to upload covert messages each time the victim starts uploading. Specifically, to send a message, the insider can generate and save a set of files in the folder. To avoid being detected by the victim, the insider can delete all generated files after the message transmission. In our experiments, we have discovered that, even if the insider deletes the files from the local folder, the files still remain on the cloud, which a common practice in many cloud systems. Such a fact makes it possible for the receiver to “read” the message at a later time. Our experiments also show that cloud storage providers do not delete files from their servers for a long time (around one month). If the transmission is interrupted by the victim, the insider uploads M SGend file to indicate the end of the transmission. To encode a message, a straightforward way is to generate one file to represent each bit. More specifically, a particular file will be used to represent “0”, and a different file will be Since each M SGinf o(i) has its timestamp, we can improve the efficiency by sending a file only when the corresponding bit is “1”. Figure 5 demonstrates a simple example of a 6-bit message transmission using such a method. In this example, to send a message “100101”, only files M SGinf o[1], M SGinf o[4] and M SGinf o[6] are uploaded. To further improve the transmission rate and to reduce the risk of detection, we also propose a new multiple bits per file method, in which a single file can represent multiple bits. For example, to transmit two bits per one file, instead of uploading up to two files to infer the two bits, we can define three files, each of which represents a two-bit pattern: “01”, “10” and “11”. Here, the sender can transmit only one of the three files to send a pattern and remain silent to indicate a “00” pattern. The receiver uploads three different files and if no duplication is detected, the message is “00”, otherwise the message is one of the three codewords, depending on the type. Clearly, we can further generalize this technique to transmit even more bits by using one file. D. Message decoding To decode the message, the attacker periodically checks if any files have been sent by uploading all the files at given time periods, since last known detection timestamp. To learn the beginning of the message, it first uploads a set of the possible M SGstart files. When a duplication occurs, the attacker records the timestamp T s of the detected M SGstart file and then uploads the set of the possible M SGend to make sure the insider has uploaded the M SGinf o completely. Lastly, the attacker uploads a set of all possible M SGinf o files which have timestamp T s or later. At the given time period the attacker can detect several M SGstart and M SGend files, which indicate that the insider has uploaded several times. The detailed decoding process for a one-bit per file channel is illustrated in Algorithm 1. Figure 6 shows the decoding process of the one bit per file example demonstrated in Section III-C. After uploading M SGstart and M SGend files, the attacker uploads M SGinf o files generated at that time period (M SGinf o[1] − M SGinf o[6] in our example). Because duplicated files are not uploaded, the attacker detects which files already exist in the cloud. In this example, it learns that M SGinf o[1], M SGinf o[4] and M SGinf o[6] exist on the cloud, which means they are “1” and the remaining M SGinf o[2], M SGinf o[3] and M SGinf o[5] are “0”. E. Elimination of Redundant Files In our design, we drastically reduce the number of files to be sent compared to the previous one-bit channel. To achieve that, we use the following techniques: 1) the sender uploads only some part of the message (selective message upload) , 2) we use a coding scheme to transmit multiple bits in one file. Suppose that an insider transmits an n bit message and the number of “1”s in the message is m, where m <= n. In existing one-bit covert channel model, the sender uploads n files and the receiver uploads 2n files to recover a message. In technique 1, where we do not transmit “0”s, the insider uploads only m files and the attacker uploads around n + k files, where k is the number of M SGstart and M SGend files. In technique 2, we use coding to transmit multiple bits in one file, resulting in a further reduction in files sent compared to technique 1. File reduction at the sender’s side is very important because if the sender consumes too much bandwidth, the channel can be exposed easily. F. Deduplication Detection Techniques As we mentioned in Section II, the receiver can verify whether a file is a duplicate by computing uploading time or bandwidth size. Our studies show that most cloud storage client software allow to examine file transfer status, which can be used to determine the time cost. Another option is to use network monitoring tools like Wireshark to analyze the traffic between the client and the cloud storage server. To estimate uploading time more accurately, the receiver can initially limit the upload traffic. Then, it uploads files and observes the time cost of each file. For example, the receiver can set the file upload size to 1 KB and the upload rate to 1 Kbps. If a file is duplicated, it only needs around 4 seconds to upload, otherwise, it takes more than 10 seconds. We should note that the larger the file, the bigger the difference of the uploading time is. However, during our experiment we find that it is also possible to detect deduplication by examining the log files of the cloud storage software. For example, when deduplication occurs we can find the keywords “hash match” in SugarSync synchronization log. 10000 Number of files to send Algorithm 1 Message decoding for the one-bit per file channel 1: for all M SGstart[0] − M SGstart[n] do 2: if M SGstart[i], i ∈ n exist then 3: Record T s[i] 4: end if 5: end for 6: for all M SGend[i + 1] − M SGend[n] do 7: if M SGend[j], j ∈ n exist then 8: Record T e[j] 9: end if 10: end for 11: for all M SGinf o[T e[j] − T s[i]] do 12: if M SGinf o[k], k ∈ n uploaded then 13: bindata [k]: = 0 14: else 15: bindata [k]: = 1 16: end if 17: end for 8000 Channel 1 Channel 2 Channel 3 6000 4000 2000 0 0 200 400 600 800 Message size (Bytes) 1000 1200 Fig. 7. Comparison of the number of the uploading files in Channel 1, Channel 2 and Channel 3 IV. P ERFORMANCE E VALUATION AND D ISCUSSION In this section, we analyze the results of our experiments. In Section IV-A, we discuss the implementation details of the proposed covert channels. In Section IV-B, we evaluate the efficiency of our channels by comparing their performance, in terms of the achievable data rate, on two cloud providers. And finally, in Section IV-C, we discuss the main factors that can affect our channel performance. A. System Implementation We have tested our covert channels by carefully conducting a set of experiments on two cloud storage services - SugarSync and BaiduYun. For each of them, we first create two accounts. We then develop a realistic testbed using lab computers as victim’s and attacker’s machines, where we use Python as the programming language to develop the covert channels. To discover the highest achievable data rate, we test the channels using different size of files. In our study, we note that very small files may likely be in the cloud server already, which will result in incorrect detection. On the other hand, we do not want to upload large files to prevent channel exposure and high bandwidth consumption. Therefore, for our experiments, we use files in the size range between 100Bytes to 1MB. B. Experimental results In Figures 7, due to limited space, we only compare the performance of three different channels in terms of file redundancy: 1) Channel 1: one bit per file, 2) Channel 2: one bit per file, not transmitting “0”, and 3) Channel 3: two bits per file, not transmitting “00”. In Channel 3, we apply the scheme discussed in Section III-C. The figure shows the number of files that each channel needs to send for messages with different sizes. In each evaluation group, we apply the same message for three different channels. We use different messages for different evaluation groups. As we can observe, compared to Channel 1, Channel 2 can reduce the number of files by about 50%, which demonstrates the stealth of our proposed channel. And Channel 3 achieves even better performance than Channel 2 because each file can represent two bits. In Figures 8 and 9, we demonstrate the transmission rates of the three channels. In each experiment, we transmit 1000 files of the same size. The results show that while the maximum achievable data rate for Channel 1 is only 1.69 5 4 Data Rate (bps) allow the sender to transmit fewer files, so as to improve the transmission data rate and reduce the risk of channel exposure. To evaluate the proposed channel, we have conducted extensive experiments on existing cloud storage services. Our study demonstrates that the potential threats of the deduplicationbased covert channel can be more severe than expected. Channel 1 Channel 2 Channel 3 3 2 1 0 −1 10 Fig. 8. 0 10 1 10 File Size (KB) 2 10 3 10 Data Rate (bps) ACKNOWLEDGEMENTS The work is supported in part by a General Research Fund from Hong Kong Research Grant Council under project 122913 and project 61272462 from NSFC China, and by the Shanghai Oriental Scholar Program. Comparison of the data rates on SugarSync 15 Channel 1 Channel 2 Channel 3 10 R EFERENCES 5 0 −1 10 Fig. 9. VI. 0 10 1 10 File Size (KB) 2 10 3 10 Comparison of the data rates on BaiduYun bps on SugarSync and 3.64 bps on BaiduYun, Channel 3 can transmit at up to 4.35 bps on SugarSync and 13.15 bps on BaiduYun. We can also observe that there may exist an optimal size of file that can lead to the maximal data rate. C. Discussions There are several factors that can affect our channel performance. Here are some of our observations during the experiments: • User behavior: The insider only operates when the victim is uploading. The longer the victim uploads the longer the insider transmits. • Cloud software: Cloud service software have specific rules about uploading files and we can not leverage the whole network bandwidth. For example, BaiduYun software can upload 3-5 files concurrently but SugarSync software uploads files one by one. • Cloud location: Depending on the data center location, channel rate varies. For example, BaiduYun data centers are closer to our location than SugarSync, which explains the difference in the results. • The size of the files: The generated files can not be very small because there is chance that they may exist in the cloud already. However, if we choose very large files, the channel performance will decrease. V. C ONCLUSION In this paper, we have investigated the threats of deduplication-based covert channel that leverages cross-user data deduplication technique, which is adopted in most cloud storage services. In particular, we have proposed a novel deduplication-based covert channel, for which we have designed and implemented a new synchronization mechanism that solves the message reordering issue and allows sending realistic multi-bit messages. We also developed new methods that [1] J. Research. (2014) Cloud services to be adopted by 3.6bn consumers globally by 2018. [Online]. Available: www.juniperresearch.com/ press-release/cloud-computing-pr1 [2] W. Leesakul, P. Townend, and J. Xu, “Dynamic data deduplication in cloud storage,” in Proc. IEEE Service Oriented System Engineering (SOSE), 2014, pp. 320–325. [3] J. Paulo and J. Pereira, “A survey and classification of storage deduplication systems,” ACM Computing Surveys (CSUR), vol. 47, no. 1, p. 11, 2014. [4] M. Dutch and L. FREEMan, “Understanding data de-duplication ratios,” 2009. [5] O. Heen, C. Neumann, L. Montalvo, and S. Defrance, “Improving the resistance to side-channel attacks on cloud storage services,” in Proc. 5th International Conference on New Technologies, Mobility and Security (NTMS), 2012, pp. 1–5. [6] S. Lee and D. Choi, “Privacy-preserving cross-user source-based data deduplication in cloud storage,” in Proc. ICT Convergence (ICTC), 2012, pp. 329–330. [7] M. Dahshan and S. Elkassass, “Data security in cloud storage services,” in The Fifth International Conference on Cloud Computing, GRIDs, and Virtualization, 2014, pp. 1–5. [8] S. Ju and X. Song, “On the formal characterization of covert channel,” in Content Computing, ser. Lecture Notes in Computer Science, 2004, vol. 3309, pp. 155–160. [9] D. Harnik, B. Pinkas, and A. Shulman-Peleg, “Side channels in cloud services: Deduplication in cloud storage,” IEEE Security Privacy, vol. 8, no. 6, pp. 40–47, 2010. [10] M. Mulazzani, S. Schrittwieser, M. Leithner, M. Huber, and E. Weippl, “Dark clouds on the horizon: Using cloud storage as attack vector and online slack space,” in USENIX Security Symposium, 2011. [11] S. Halevi, D. Harnik, B. Pinkas, and A. Shulman-Peleg, “Proofs of ownership in remote storage systems,” in Proc. 18th ACM Conference on Computer and Communications Security CCS ’11, 2011, pp. 491– 500. [12] Q. Zheng and S. Xu, “Secure and efficient proof of storage with deduplication,” in Proc. Second ACM Conference on Data and Application Security and Privacy CODASPY ’12, 2012, pp. 1–12. [13] R. Di Pietro and A. Sorniotti, “Boosting efficiency and security in proof of ownership for deduplication,” in Proc. 7th ACM Symposium on Information, Computer and Communications Security ASIACCS ’12, 2012, pp. 81–82. [14] T. Pulls, “(more) side channels in cloud storage,” in Privacy and Identity Management for Life, 2012, vol. 375, pp. 102–115. [15] D. Russell, “Data deduplication will be even bigger in 2010,” Gartner, February, 2010. [16] P. Neelaveni and M. Vijayalakshmi, “A survey on deduplication in cloud storage,” Asian Journal of Information Technology, vol. 13, no. 6, pp. 320–330, 2014.