2014 IEEE 22nd International Conference on Network Protocols Cross-VM Covert Channel Risk Assessment for Cloud Computing: An Automated Capacity Profiler Rui Zhang, Wen Qi, Jianping Wang Department of Computer Science, City University of Hong Kong, Hong Kong SAR Email: zhangrui.ray@gmail.com, qi.wen@my.cityu.edu.hk, jianwang@cityu.edu.hk take some counter-measures against potential cross-VM covert channels on their platforms. In the literature, the risk of a cross-VM covert channel is usually measured by the transmission rate with certain error rate1 . For example, Xu et al [4] reported one of their experiment results with the L2 cache covert channel as follows. When the transmission rates are 215.11 bps and 85.86 bps, the error rates are 5% and 1%, respectively. Such quantitative measures only provide the average risk of a covert channel. To a cloud service provider, the estimation on the maximum transmission capacity of a covert channel is often more important in assessing the risk of information leakage. Classic Shannon equation [6] has been widely used to estimate the transmission capacity of communication channels, which, however, is hard to be applied to estimate the transmission capacity of cross-VM covert channels. The main reason is that there is no precise timer in most cross-VM covert channels, thus, it is hard to calculate signal-to-noise ratio (SNR) required in classic Shannon equation. Shannon entropy formulation [7] was proposed to estimate transmission capacity of covert channels. It does not require calculation of SNR and is suitable for estimating the transmission capacity of cross-VM covert channels. With Shannon entropy formulation, however, to assess the transmission capacity of cross-VM covert channels, there are still several challenges. In Section II, we will elaborate more on Shannon entropy formulation and the challenges of applying it to assess the capacity of cross-VM covert channels. Here, we conceptually present two challenges. Firstly, in order to apply the Shannon entropy formulation to estimate the capacity of cross-VM covert channels, we need to obtain the corresponding conditional probabilities which reflect their symbol classification accuracy. The conditional probabilities are statistically estimated with general assumptions made for all cloud platforms, which may lead to loose estimation of cross-VM covert channel capacity on a specific cloud platform. Thus, in order to assess the capacity of cross-VM covert channels established on any particular cloud platform, we need a method to automate the process of obtaining the conditional probabilities specific to that cloud platform. Secondly, in order to estimate the capacity of cross-VM covert channels, we have to construct a covert channel and op- Abstract—Cross-VM covert channels leverage physical resources shared between co-resident virtual machines, like CPU cache, memory bus, and disk bus, to leak information. The capacity of cross-VM covert channels varies on different cloud platforms. Thus, it is hard for cloud service providers to estimate the risk of information leakage caused by cross-VM covert channels on their own platforms. In this paper, we develop an Auto Profiling Framework of Covert Channel Capacity (AP F C 3 ) to automatically profile the maximum capacities of various crossVM covert channels on different cloud platforms. The framework consists of automated parameter tuning for various cross-VM covert channels to achieve high data rate and automated capacity estimation of those cross-VM covert channels. We evaluate the proposed framework by constructing fine-tuned cross-VM covert channels on different virtualization platforms and comparing the optimized achievable data rate with the estimated maximum capacity computed using the proposed framework. The experiments show that in most cases, the capacity estimated using AP F C 3 is very close to the achieved data rate of constructed covert channels with fine-tuned parameters. Index Terms—Cross-VM covert channel; Capacity estimation; Shannon entropy I. I NTRODUCTION Virtualization is one of the key technologies used in cloud computing. Most virtualization technologies, such as KVM, XEN, VMWare, and Hyper-V, can provide logical isolation between co-resident virtual machines (VM) on the same physical machine. However, virtualization only provides isolation on software level. Co-resident VMs still share common pools of physical resources, e.g., disk bus, memory bus, and CPU cache. These shared hardware components open doors for cross-VM information leakage. Among various cross-VM information leakage methods like side channels [1] [2] and covert channels [3] [4] [5], cross-VM covert channel is extremely hard to detect due to unnoticed information transmission. In recent years, various cross-VM covert channels have been reported [2]–[5] in public cloud platforms. Despite such efforts, it is hard for a cloud service provider to estimate the threat of information leakage on its own cloud platform as the capacity of a covert channel is jointly determined by hardware platforms, hypervisors, and workloads. Therefore, it is important to provide a quantitative measure for evaluating the risk of cross-VM covert channels on specific cloud platforms. With such a quantitative measure, cloud service providers can decide whether it is worth to 978-1-4799-6204-4/14 $31.00 © 2014 IEEE DOI 10.1109/ICNP.2014.24 1 The error rate is usually measured by repeatedly sending data strings through the covert channel and counting the number of incorrectly received ones. 25 A. Cross-VM covert channels timize its transmission quality through parameter tuning which could be complex and cumbersome. Our preliminary studies have shown that the capacity of cross-VM covert channels varies dramatically with their parameter settings and hardware environments, which indicates that it may not be practically feasible to construct a covert channel with the maximum data rate through brute-force methods. To solve these issues, we propose a generic framework, named AP F C 3 , which can automatically profile the maximum capacity of cross-VM covert channels. The main contributions of this paper are summarized as follows. • We propose a compact framework to estimate the transmission capacity of cross-VM covert channels. The proposed framework consists of two essential components, namely, parameter tuning for optimizing the covert channel and machine learning for evaluating its maximum capacity. The entire process can be orchestrated using a centralized controller. • We propose an automatable procedure to obtain the prerequisite for applying the Shannon Entropy formulation [7] on estimating the capacity of cross-VM covert channels established on any given cloud platform. • We statistically model the noise of a cross-VM covert channel under a specific cloud platform to eliminate the covert channel implementations which perform poorly and, hence, narrow down the parameter space. • Using the fined-tuned covert channel implementations, we collect a number of sample signals with their corresponding ground truth labels. Lightweight machine learning tools are utilized to cross-validate the samples and, hence, estimate the capacity of the covert channel. • We evaluate the proposed framework on three cloud platforms and four types of cross-VM covert channels. The result shows that AP F C 3 produces capacity estimation close to the data rates achieved by constructed covert channels using fine-tuned parameters. The rest of the paper is organized as follows. In Section II, we introduce the background and discuss our preliminary efforts on analyzing cross-VM covert channels. Then, we give an overview of AP F C 3 in Section III. In Section IV, we elaborate the technical details of the proposed framework. In Section V, we conduct extensive experiments to evaluate the proposed framework. Existing works on cross-VM covert channels and capacity assessment methods are reviewed in Section VI. Finally, we conclude the paper in Section VII. Cross-VM covert channels can be established between two VMs that co-reside on the same physical machine. If both VMs share a hardware component of the machine, the usage by one VM can unexpectedly affect the usage of the other VM. Existing works [2]–[4] show that covert channels can be established by intentional contention and contention detection on the shared hardware components between two co-resident VMs. Covert channels can be categorized in timing-based covert channels and storage-based covert channels. In this paper, we focus on the timing-based covert channels since most cross-VM covert channels discovered in the literature are timing-based channels [2]–[4]. The procedure of constructing a timing channel can be generalized as follows. • • The sender encodes data by varying the time required for performing an operation whose execution time is sensitive to the contention from other co-resident VMs. The receiver monitors the system status by frequently performing the operation and measuring the execution time. A possible implementation of the timing channel can be as follows. The receiver divides its time into equal sized sampling periods. For each sampling period, the receiver repeatedly executes the contention sensitive operation and counts the number of times it is executed. The sender divides its time into much longer equal sized intervals. In each interval, the sender attempts to transmit one data bit. To transmit a bit “1”, the sender poses contention to the operation by repeatedly executing it. To transmit a bit “0”, the sender stays idle. There are mainly four types of cross-VM covert channels discovered in the literature. We now briefly introduce these four types of cross-VM covert channels which will be evaluated in our framework. • • • • II. BACKGROUND AND P RELIMINARIES Before introducing the framework for estimating cross-VM covert channel capacity, in this section, we first introduce the basic principles in constructing various types of covert channels. Then, we present our preliminary results to show that the covert channel capacity is determined by hardware platforms and parameter settings. We introduce the Shannon entropy equation and discuss the challenges of applying it to estimate the capacity of cross-VM covert channels. L2 cache covert channel, which uses the access to CPU L2 cache as the contention sensitive operation for establishing the covert channel. CPU load covert channel, which can be established between VMs sharing CPU cores. CPU hungry operations are used as the contention sensitive operation for establishing the covert channel. Memory bus covert channel, which utilizes memory bus lock as the contention sensitive operation. Disk bus covert channel, which utilizes reading and writing to the disk as the contention sensitive operation. B. Same covert channel implementation on different cloud platforms To demonstrate that the capacity of a covert channel varies on different hardware platforms, we conduct the following experiment. We implement the memory bus covert channel introduced by Wu et al [3] in three different test environments, shown as follows: 26 (a) Xen (b) VMWare Fig. 1. (c) KVM Square wave covert channel signals captured on different virtualization platforms 1) Xen: Intel Xeon CPU E5-2670 2.60GHz with Hypervisor Xen x64, 32GB memory 2) VMWare: Intel Core 2 Duo CPU E8500 3.16GHz with Hypervisor VMWare ESXi 5.5, 6GB memory 3) KVM: Intel Xeon CPU X5650 2.67GHz with Hypervisor KVM x64, 48GB memory In each platform there are two VM instances and all of the VMs run Linux Ubuntu 12.04 LTS with 1 GB memory and 1 vcpu. The same settings will be used in the remaining experiments of this paper. We let one VM alternate between sending bit “1” and “0” through the covert channel. Figure 1 shows sample periodic 01 square wave signals captured at a co-resident VM on these three test environments. In all three cases, we use exactly the same implementation and parameter settings. We can observe that the signal sample recorded on KVM exhibits a significant level of noise while the signal samples show clear square waves on the other two platforms. This indicates that even if the implementations of the covert channels are identical, the resulting capacities on different cloud environments may be different. Therefore, a generic profiling tool is required to assess the capacity of covert channels on different cloud environments. Fig. 2. Covert channel error rate against different classification thresholds Here, C represents the channel capacity, B stands for the bandwidth and SN R is the signal-to-noise ratio of the communication channel. To avoid the calculation of SN R, we select an alternating approach which is developed from the Shannon Entropy equation [8] [9]. We now briefly introduce Shannon Entropy equation. Suppose that data is encoded into a sequence of symbols for transmission. There are N distinct symbols in total (we refer to the set of distinct symbols as the alphabet). Let i be one of the N distinct symbols received by the receiver when the sender was casting out symbol j. Therefore, when i = j, the receiver correctly receives a symbol, otherwise, it gets an error symbol. Let B be the average number of symbols which can be cast out from the sender in one second. In addition, there are two statistical features for defining a communication channel, qj and R(i|j). Here, qj , j = 1 . . . N , is the probability of occurrence for symbol j in the sequence of symbols cast out by the sender. R(i|j) denotes the conditional probability that the receiver received symbol i given the fact that the sender transmitted symbol j. Let C be the covert channel capacity. According to [7], C can be calculated by the following equation: C. Different covert channel implementations on the same test environment In this experiment, we show that the covert channel capacity varies with different parameter settings on the same test environment. To interpret the square wave signals, the receiver can use a simple threshold approach, i.e., signal value greater than the threshold t0 is interpreted as a bit “1”, otherwise, “0”. We show that different t0 leads to different covert channel capacity. We implement the L2 cache covert channel on the Xen platform. We conducted 12 groups of estimation with the same settings except the threshold t0 . As anticipated, we find that the error rate varies a lot with different classification threshold. As shown in Figure 2, the error rates range from 1.60% to 50.26%. Only when the threshold is equal to 18.5 (which indicates the number of executions of the contention sensitive operation in a sampling period), the classification performs the best and reaches an error rate 1.60%. This indicates that even for the same test environment, the resulting capacities with different parameters may be different. C=B max qj ,j=1...N qj R(i|j) log R(i|j) k,k=1...N qk R(i|k) (2) The decisive variables of (2) are R(i|j), as the value of qj which maximizes C can be determined base on R(i|j) using the iterative algorithm proposed by Arimoto [10]. We now briefly discuss the tasks to be accomplished in order to apply (2) to calculate the capacity of cross-VM covert channels. Firstly, most timing channels communicate using only two distinct symbols which are commonly denoted as “0” and “1”. Due to the lack of a precise timer, symbol losses and insertions often occur in cross-VM covert channels. The symbol losses and insertions cannot be accounted by (2), as it only takes symbol misclassification into account. To resolve this problem, D. Maximum capacity estimation using Shannon Entropy The classic Shannon equation (1) is often used to estimate the capacity of communication channels. C = B log2 (1 + SN R) (1) 27 covert channels, we adopt a lightweight approach which will be discussed in section IV. Based on the noise model, we can generate sample signals under different parameter settings of the cross-VM covert channel. Here, we introduce three configurable parameters for the timing based cross-VM covert channels. • Threshold t0 is used for distinguishing between the high and low signal values. • The interval for sending one data bit from the sender is denoted as ds . • The interval of one sampling period at the receiver is denoted as dr . Given a specific parameter setting for a covert channel, we can then generate mock samples by adding noise based on the model to the ground truth signals. Subsequently, we can statistically analyze the sample signals with respect to their ground truth and assess the covert channel setting. We iterate this parameter tuning procedure for different covert channel implementations. We collect a number of parameter settings which lead to the highest covert channel capacity. As generating these mock signal samples takes much less time in comparison to collecting real samples, we can perform a large number of trials and gain an overview of the parameter space. For discrete parameters, such as the encoding method used by the covert channel, we are able to enumerate several possibilities. The detailed covert channel implementation will be introduced in Section IV. Fig. 3. Flowchart demonstrating the procedure of AP F C 3 a solution is to group multiple consecutive binary symbols as one symbol, which makes occasional symbol losses and insertions insignificant. Secondly, we need to design an automated procedure to calculate R(i|j) cross different cloud platforms. Normally, R(i|j) is obtained by statically analyzing the covert channel implementation based on a set of assumptions. Making precise assumptions for different cloud platforms, however, is practically infeasible. In order to automate the capacity assessment process for any given cross-VM covert channel in an arbitrary cloud platform, we need to collect sample signals for each symbol in the alphabet and classify them with a best-practice classifier. Then, R(i|j) is computed from the classification result. To fulfill this requirement, we proposed a cross-VM covert channel capacity estimation framework, AP F C 3 , which will be introduced in the following section. III. OVERVIEW OF B. Generating real samples from ground truth After obtaining the reasonable covert channel implementation, we establish a covert channel on two co-resident VMs under the test environment with the corresponding parameters. According to the formulation shown in (2) and without changing the covert channel implementation, we treat the signals exchanged between the sender and the receiver through the cross-VM covert channel as symbols. For each symbol in the alphabet, we need to collect a number of signal samples from the receiver side in order to apply statistical analysis. The main challenge of this step is that the boundary between signal samples cannot be reliably received by the receiver. On the other hand, we do not intend to force such synchronization as the ability of separating consecutive samples also affects the capacity of the covert channel. The separation of signal samples is achieved by a preprocessing process which will be introduced in Section IV. After performing sample collection, we should have a set of signal samples with their corresponding ground truth labels. We can then apply statistical analysis to these samples. AP F C 3 In this section, the framework for our AP F C 3 is briefly introduced. The procedures of our proposed cross-VM covert channel capacity estimation framework consists of three main steps, namely, parameter tuning, sample generation, and sample analysis, as demonstrated in Fig. 3. We now briefly introduce each main step. The technical details of each main step will be introduced in Section IV. C. Sample analysis According to (2), we next compute the conditional probability, R(i|j), for estimating the covert channel capacity. We first apply a customized hierarchical clustering algorithm to pre-process the samples. After that, we use Neural Network classifier to cross-validate the real samples and compute R(i|j) for the classification result. A. Parameter tuning for covert channel implementation In order to narrow down the parameter space, we start with modeling the noise of cross-VM covert channels established in any particular cloud platform. As getting a precise model of the noise is infeasible due to the exotic behavior of cross-VM 28 !" Figure 4 depicts the signal-to-noise ratios (SNR) of the gettimeof day function against a range of sleep time intervals on different test environments, as introduce in section II. We have two important observations. Firstly, the timers on different platforms show significantly different levels of precision. We can see that the Xen platform has the highest timer precision. Secondly, all of the SN R curves exhibit some degree of fluctuation especially for the KVM environment. This indicates that the timer function on all platforms produces some inconsistent readings. Here, we model the effect of system background noise on cross-VM covert channel signals to perform an initial parameter space reduction. The noise causes two types of deformation on cross-VM covert channel signals: (1) the variation on the time interval for transmitting one bit and (2) the variation of signal values for each sampling point. According to the two types of deformation that noise may cause on cross-VM covert channel signals, we propose the following noise model. Suppose that a ground truth signal has many data bits where each data bit consists of a sequence of sampling points. We denote the number of sampling points for the ith data bit of a ground truth signal as ni and the signal value of the jth sampling point for the ith data bit as vij . For the corresponding mock signal, we denote the number of sampling points for the ith data bit as ni and denote the signal value of the jth sampling point for the ith data bit as uji . The relationship between the mock signal and the ground truth signal can be represented using the following equations. For data bit i, # $% %& Fig. 4. SNR of gettimeofday function in different platforms Lastly, we apply the iterative method proposed by Arimoto [10] to compute the maximum covert channel capacity. This procedure is repeated for each covert channel implementation collected from the parameter tuning step. We also report the confidence interval for the capacity of the covert channel settings. IV. T ECHNICAL D ETAILS In this section, we discuss the technical details of the aforementioned three main steps in our AP F C 3 . ni = ni + δi A. Parameter Tuning As introduced in the previous section, we speed up the parameter tuning procedure by modeling the noise of crossVM covert channels. Here, we separate the noise into two parts, namely, the system background noise and the front-end noise caused by user applications. The front-end noise is unstable as the users may run different applications at different time instances. The background noise is stable and can be modeled statistically. These two types of noise combined together form the noise of cross-VM covert channels. By modeling the system background noise, we can perform an initial parameter space reduction for the covert channel implementation. In this section, we first conduct an experiment to show the existence of system background noise. Then, we demonstrate our noise model using normal distributed random variables. As introduced in section II, timing channels use system timer functions to receive the covert data bits, the precision and consistency of the timer readings is key to their transmission accuracy. We show the existence of system background noise to cross-VM covert channels by demonstrating the inconsistent readings of timer functions on measuring the execution time of system operations. Here, we utilize the Linux C function gettimeof day to measure the execution time of the sleep function with an input interval ranged from 0 to 10000 microseconds. We minimize the front-end noise by silencing all user induced workloads on the machine. (3) Here, δi is a random variable which models the variation in the number of sampling points for transmitting one data bit due to noise. For the jth sampling point of the ith data bit, uji = vij + λij (4) Here, λij is a random variable which models the variation of the signal value due to noise. We model δi and λi as independently, identically and normally distributed random variables, i.e., δi ∼ N (μ1 , θ1 ), λi ∼ N (μ2 , θ2 ). Here μ1 , θ1 , μ2 , and θ2 are sampled from the test environment. To verify this statistical model, we conducted the following experiment on our test environment. The quantile-quantile plot is often used to compare two distributions. If the pairwise quantiles of both distributions are all close to the x = y diagonal on the plot, the two distribution can be deemed similar, otherwise, different. We sample a number of signal values on our test environment and plot them against the corresponding normal distribution, the result is shown in Figure 5. We can observe that the samples follows closely to their corresponding distribution. We have similar observation for the number of sampling points collected on one data bit. Using this noise model, we are able to generate mock signal samples under any covert channel implementation and obtain a quick feedback on its transmission quality. 29 Fig. 6. Signal delimiter diagram can obtain a range of capacity estimations and, hence, produce a confidence interval on the maximum capacity estimation. ! ( C. Sample analysis ' ' After collecting signal samples with their corresponding ground truth from the fine-tuned covert channel implementation, we conduct statistical techniques to analyze the samples and estimate the maximum capacity of the target covert channel. The sample analysis will be accomplished in three steps. Each sample collected consists of a sequence of signal strength readings. The dimension of a sample refers to the number of readings it consists of. We first pre-process the samples collected to make sure they have the same dimension. Then, we apply a best-practice machine learning algorithm on the preprocessed samples to produce the classification result. Lastly, capacity estimation is computed using an iterative algorithm proposed by Arimoto [10]. 1) Data pre-processing: During the transmission, bit loss/insertion and bit-flip can directly affect the transmission accuracy. As introduced previously, we define the duration for transmitting one data bit as ds and the duration for one sampling period as dr , with ds > dr . The receiver gets multiple readings for a single bit which empirically improved the transmission accuracy. We introduce an adjustable parameter n so that ds = n × dr . Ideally, we get n sampling points for a single data bit. If n is small, especially n = 1, we anticipate that the receiver will experience a high data loss or error rate. On the other hand, a large n limits the maximum value of the sample. In an extreme case, the samples are either one or zero which also reduce the transmission accuracy. Before passing the samples to our machine learning tools, we need to make sure that the samples are of the same dimension. Here, we propose a customized Hierarchical clustering algorithm, as defined in Algorithm 1. The input parameters to Algorithm 1 are a set of signal samples S and the number of target clusters, k. Here, k also stands for the dimension of the samples after being processed by Algorithm 1. In the example shown in Figure 7, the sender transmits data 0101 and the receiver receives with n = 4. Two sampling points in the first and third sampling groups are lost. We set k = 4. After being processed by the customized Hierarchical clustering algorithm, sampling points are clustered into four groups. As shown in Figure 7, sample points surrounded by the same dummy circle are grouped together. 2) Machine learning: After the preprocessing procedure, we assess the signal samples using machine learning tools. The goal here is to find the best-practice classifier for the signal samples with respect to their corresponding ground truth labels. ( Fig. 5. Quantile-quantile plot for the distribution of sample signal values against normal distribution B. Sample collection For sample collection, we first generate the symbol alphabet. As introduced in Section II, we group a sequence of binaries as one compact symbol in order to make the symbol losses and insertions insignificant to the capacity estimation. For convenience, we denote the binaries as “0”s and “1”s. Let the frame size, m, be a configurable integer which is greater than 1. Each symbol in the covert channel alphabet is composed of m binaries. We should have 2m distinct symbols in the alphabet. For example, when m = 2, the alphabet is {“00”, “10”, “01”, “11”}. Our empirical study shows that the design of alphabet does not cause significant impact on the capacity estimation. As introduced previously, during the transmission of data symbols, the channel noise causes the variation of the duration for transmitting a data bit. This variation accumulates as more points are sampled. Eventually, it leads to a bit insertion or loss error. Here, we use delimiters to constrain the propagation of shifted boundaries. The delimiters are produced by the sender frequently alternating between contention and idle to the contention sensitive operation, as shown in Figure 6. The receiver detects the delimiter by monitoring the difference between consecutive sampling values. If the accumulated difference exceeds a predefined threshold, the receiver infers the current signal to be a delimiter. To find the end of the delimiter, we perform the same procedure except that, if the accumulated difference falls below a predefined threshold, the receiver infers the current signal as the end of the delimiter. Using different delimiter size leads to a performance-accuracy trade-off for the covert channel. Increasing the delimiter size improves the transmission accuracy while reducing the transmission speed. Decreasing delimiter size increases the transmission speed while reducing the transmission accuracy. We include a configurable parameter to the covert channel implementation for determining the delimiter size used. For each symbol in the alphabet, we transmit N copies of the symbol through the covert channel, consecutively, with delimiters inserted in between. We generate a variety of sample sets with different transmission frequency, B. By this way, we 30 ten sets. At each iteration, we call the target set the test set. The samples other than those in the test set form the training set. We train the classifier using the training set and evaluate its performance on the test set. Initially, we set the number of neurons in the hidden layer the same as the number of neurons in the input layer. We increase the number of neurons in the hidden layer until the cross-validation accuracy seizes to improve. The result of the cross-validation procedure is recorded as a matrix, P . The number of the rows and the number of columns of the matrix is equal to the number of distinct symbols in the alphabet. The element on the ith row and jth column of the matrix indicates the number of samples which are intended to be symbol j at the sender side and classified as symbol i at the receiver side. Assuming that there are three distinct symbols, s1 , s2 and s3 , in the alphabet and a hundred samples are collected for each symbol, the matrix shown in (5) depicts that all of the samples for s1 , s2 and s3 are correctly classified except five samples of s2 which are classified as s1 . ⎤ ⎡ 100 5 0 0 ⎦ P = ⎣ 0 95 (5) 0 0 100 !" ) * " + Fig. 7. " Clustering of data points Then, we use the classifier to classify the samples and produce the conditional probability, R(i|j), introduced in (2). With the purpose of finding a solution for all possible scenarios, Neural Network [11] is selected as the supervised machine learning tool for the proposed framework, since it is good at generating flexible decision boundaries. Here, we apply a three-layer feed-forward Neural Network to assess the signal samples. The three layers are referred to as the input, hidden and output layers. We note that Neural Network is not the only viable machine learning tool for this task. We have also experimented other classifiers like LDA and QDA. However, the performance was indifferent. It turns out that the quality of the samples collected is much more important. Cross-validation technique is commonly used in machine learning to avoid over-fitting, a behavior such that the classifier performs well on trained samples but poorly on others, when the sample size is limited. For finding the best-practice classifier, we apply the ten-fold cross-validation procedure on the labeled signal samples. Ten-fold cross-validation procedure can be described as follows. The samples are randomly partitioned into ten sets. We iterate the following process for each of the Then, we transform P into the required conditional probability, R(i|j), by dividing each column element by the corresponding column sum. The example shown in (5) will be transformed into (6), where R(i|j) equals the element on the ith row and jth column of matrix R. ⎤ ⎡ 1 0.05 0 0.95 0⎦ R=⎣ (6) 0 0 1 3) Iterative capacity computation: After obtaining the conditional probability, R(i|j), we use the iterative algorithm proposed by Arimoto [10] to obtain the optimal prior q = {qj }j=1...N distribution for the sender symbols which is required by (2). To distinguish between the prior distribution vector q computed on different iterations, we denote the prior distribution obtained on the tth iteration as q t = {qjt }j=1...N . The algorithm is as follows. Algorithm 1 Customized Hierarchical Clustering Algorithm 1: C = build cluster list(S) 2: while C.size > k and no cluster contains single point do 3: D = init distance list() 4: for all cluster c ∈ {x ∈ C, x = C.last} do 5: D.add(distance(c, c.next)) 6: end for 7: for all cluster c ∈ {x ∈ C, x = C.last} do 8: if distance(c, c.next) = D.min then 9: merge(c, c.next) 10: break f or loop 11: end if 12: end for 13: end while 14: if C.size < k then 15: split(C, k) // split clusters proportionally 16: end if • • Set the initial prior, q 0 , to be qj0 = N1 for all j = 1 . . . N . Then iterate the following two steps until the vector difference between q t and q t+1 is smaller than a threshold, which is set to 0.0001 in our framework. Let φt be defined as follows. R(i|j)qjt φt (j|i) = N t k=1 R(i|k)qk • (7) Update q using the following equation and normalize q by dividing each element with the sum of all elements N t+1 t qj = exp R(i|j) log(φj|i ) (8) i=1 Arimoto [10] has proven the optimality and convergence of the algorithm. However, there is one issue in the algorithm. All 31 elements in R must be strictly positive due to the logarithm term in step 3. We tackle this issue by adding a small number, 1e6, to each entry of R. For the example conditional probability R(i|j) described in (6), q converge as follows: q0 q 1 q2 = = {0.3333333, 0.3333333, 0.3333333} {0.3407914, 0.2933110, 0.3658976} = {0.3440178, 0.2901843, 0.3657979} Algorithm 2 Profiling protocol for the sender 1: tcp connect(receiver) 2: tcp transmit(start signal) 3: for all alphabet ∈ alphabetsets do 4: for i = 1 to N do 5: for all binary value ∈ alphabet do 6: if binary value = 0 then 7: idle(ds ) 8: else 9: contention(ds ) 10: end if 11: end for 12: Transmit the delimiter. 13: end for 14: end for 15: tcp transmit(ending signal) Lastly, we apply (2) to compute the upper bound of covert channel capacity. D. Implementation details The implementation of AP F C 3 involves three entities, namely, the sender, the receiver and the controller. The sender and receiver are a pair of VMs co-residing on the same physical machine provided by the cloud service provider. The controller can be installed on a separate machine or on the same machine. We require that the controller can directly communicate with both the sender and the receiver, e.g., through SSH. The entire process will be decomposed into three steps as introduced in section III. Initially, we transmit the executable files from the controller to the sender and the receiver. For the parameter tuning step, the sender and receiver cooperate on collecting samples for measuring the noise model parameters, μ, θ, and γ. The samples are statistically analyzed at the receiver side. Then, the parameterized noise model is sent back to the controller. The controller enumerates a large amount of covert channel parameter settings and filters out those which would perform poorly on the given noise model. The selected parameter settings are sent to the receiver and the sender to reconfigure the covert channel implementation. The receiver and the sender collect samples of the alphabet symbols, as described in the sample collection step. The samples are transmitted back to the controller to compute the final capacity estimation. The pseudo code for the sender, the receiver, and the controller is shown in Algorithm 2, Algorithm 3, and Algorithm 4, respectively. Algorithm 3 Profiling protocol for the receiver 1: listen(tcp port) 2: repeat 3: idle() 4: until Receive the start signal from the sender. 5: repeat 6: sampling data = contention(period dr ) 7: write file(sampling data) 8: until receive the ending signal signal encoding protocol, the symbol size, and the delimiter size. 1) Signal encoding protocol: To demonstrate the impact of different encoding protocols on the covert channel, we conduct simulations to find out which signal encoding method, among Non-Return-to-Zero (NRZ) encoding, Manchester Encoding and Differential Manchester (Diff-Manchester) Encoding, is more efficient under a chosen scenario. V. E VALUATION Algorithm 4 Profiling protocol for the controller 1: transmit(receiver binaries, receiver V M ) // via SCP. 2: transmit(sender binaries, sender V M ) //via SCP. 3: launch(receiver) // via SSH. 4: launch(sender) // via SSH. 5: repeat 6: idle() 7: until the receiver finishes profiling. 8: retrieve(sampling data, receiver V M ) // via SCP. 9: delimiters = detect delimiters(sampling data) 10: segments = split(sampling data, delimiters) 11: for all segment ∈ segments do 12: customized hieratical clustering(segment) // see Algorithm 1 13: end for 14: Process the sampling data with neural network classifier. In this section, we carry out the experiments on cross-VM covert channels and evaluate the performance of AP F C 3 . We first demonstrate the impact of configurable parameters on cross-VM covert channel capacity. Then, we evaluate the effectiveness of the parameter tuning method proposed in section IV. At last, we compare the estimated capacity with the achieved data rate for the covert channels introduced in section II on our test platforms. A. Impact of configurable covert channel parameters on the capacity of cross-VM covert channels Before evaluating the performance of AP F C 3 on estimating the capacity of cross-VM covert channels, we conduct the following experiments to demonstrate the impact of the configurable parameters to the covert channel. Here, we conduct three experiments to exam three configurable parameters, namely, the 32 ,-./ 0,1/ As we introduced before, we reduce the parameter space by performing a parameter tuning step introduced in section III. In order to evaluate the parameter tuning process, we compare the data rate between the covert channels constructed with and without fine-tuned parameters. As there is a trade-off between the data rate and the error rate for constructed covert channels, we use the following method to compute the error rate. The sender sends 16 packets which are 400 bits in length to the receiver. We use a longest common subsequence algorithm to compute the correctly received bits and compute the average error rate. We only show the achieved data rate with an error rate below 20%. Figure 11 shows the box plot for the achieved data rates for both the constructed covert channels with and without fine-tuned parameters. We can observe that, in most of the cases, the constructed covert channels with fine-tuned parameters achieve higher data rates than those without fine-tuned parameters, which indicates the effectiveness of our parameter tuning method. B. The impact of Parameter tuning on covert channel data rate transmission rates. We observe that the capacity estimations using 3-bit, 4-bit and 5-bit symbols exhibit similar trends. Neither of them outperforms the rest significantly. In our remaining experiment, we choose to use 4-bit symbols. 3) Different delimiter size: As introduced in section IV, there is a trade-off between data transmission rate and accuracy on the delimiters inserted between consecutive data signals. A long delimiter increases the transmission accuracy while decreasing the data rate. A short delimiter increases the data transmission rate while decreasing the accuracy. Here, we conduct an experiment to evaluate the impact of different delimiter sizes to the capacity estimation result. In this experiment, we estimate the capacity of memory bus covert channels on the Xen test environment for the delimiter sizes which are three, four, five and six times of the sender interval, ds . Figure 10 shows memory bus covert channel capacity estimation with different delimiter sizes. We can observe that when the delimiter size is four times of the sender interval, ds , the estimated capacity is maximized. We found similar results for the other two test environments. We use this setting for the rest of our experiments. Fig. 10. Estimated memory bus covert channel capacity with different delimiter sizes '1 '1 '1 ,-./ In this experiment, we fix the covert channel parameters except the data transmission rate and estimate the capacity for different signal encoding protocols against a range of data rates. The corresponding covert channel capacity estimated using AP F C 3 is reported for the memory bus covert channel on the Xen platform. As we can see from Figure 8, the Non-Return-to-Zero encoding outperforms the other encoding methods significantly. The advantage of Manchester and differential Manchester encodings is that the receiver only needs to distinguish four different binary signal patterns, “0”, “00”, “1”, and “11”. If Manchester and differential Manchester encodings are used, in order to achieve the same data rate with NRZ, the sender interval, ds , must be halved. This means that Manchester and differential Manchester encodings are more vulnerable to noise caused by timer imprecision. The experiment result also indicates that the timer precision is the major bottleneck for the capacity of crossVM covert channels. 2) Different symbol size: To demonstrate the insignificance of different symbol sizes on covert channel capacity estimation, we conduct the following experiment. Using AP F C 3 , we estimate the memory bus covert channel introduced by Wu et al. [3] on the VMWare test environment. We estimate the capacity using 3-bit, 4-bit and 5-bit symbols. Figure 9 shows the estimation result against a range of data Fig. 8. Estimated memory bus covert channel capacity with different encoding methods 5 " 5 " 5 " 5 " ,1/ 234 5' 0,1/ ,-./ Fig. 9. Estimated memory bus covert channel capacity with different symbol sizes 33 ) ) * ;. * ; . ) ) * ;. * ; . ) )) * ;. * ; . ) ) ) 09 8* ) ) *678 *678 09 8* )) 5:9 5:9 *678 09 8* 5:9 (a) VMWare (b) Xen (c) KVM Fig. 11. Data rate comparison between the covert channels constructed with and without fine-tuned parameters that of the achieved data rate. The distribution of estimated capacity always has smaller variance and tends towards the upper end of the achieved data rate. This indicates that AP F C 3 is capable of producing precise estimation on the upper bound of covert channel capacity. For each type of covert channels, the estimated capacity and the achieved data rate are the least on the KVM platform. This result complies with our intuition on the quality of the signals captured on the platforms, shown in Figure 1. We notice that there is a significant gap between the estimated capacity and achieved data rate for CPU load and Memory bus covert channels on the Xen platform. We think this gap is due to the complexity introduced by the credit scheduler deployed by Xen hypervisor which was discussed in [3]. )) %& # $% * 0 ) ) Fig. 12. ) *67 01 8* 5:1 At last, we conduct experiments to compare the capacity estimation produced by AP F C 3 with the data rate achieved by constructed covert channels with fine-tuned parameters. Along with the achieved data rate, we also report the transmission error rate. In this set of experiments, we plot the estimated capacity with the achieved data rate of constructed covert channels over different time instances. As the system noise changes over time, we expect fluctuation on both the estimated capacity and the achieved data rate. The estimated capacity, however, should be greater than the achieved data rate in most cases as AP F C 3 estimates the upper bound of the capacity. Estimated capacities on three different platforms C. Comparison between estimated capacity and achieved data rate To evaluate the performance of AP F C 3 , we use it to assess the capacity for four types of cross-VM covert channels on our test platforms. Then, we construct the covert channels with fine-tuned parameters and compare the estimated capacity with the achieved data rate. We first carry out the capacity estimation using AP F C 3 for four types of covert channels, namely CPU load, Memory bus, CPU L2 Cache, and Disk bus covert channels, on three test platforms, VMWare, Xen, and KVM, as introduced in section I. The results are shown in Figure 12. We can observe that CPU load based and memory bus based covert channels are estimated to have the highest capacity for the VMWare and Xen platforms.Disk bus covert channel, on the other hand, is estimated to have the lowest capacity in all platforms. This result is consistent with the achieved data rates reported in [3] [4] [2]. We also observe that the estimated capacity for the KVM platform is much lower than other platforms. The is consistent with our previous experiment which shows that covert channel signal on the KVM platform is subject to the largest amount of noise. In Figure 13, we compare the distribution of data rate achieved by covert channels constructed with fine-tuned parameters and their estimated capacity. We can observe that, the estimated capacity has a maximum value slightly above The experiment results on the VMWare platform are depicted in Figure 14. As the timer function on the VMWare platform has high precision and consistency, we are able to construct covert channels with low transmission error rate. We can observe that, for CPU load, memory bus, and L2 cache covert channels, the estimated capacity is close to the achieved data rate with transmission error rate less than 20%. For the disk bus covert channel, we could not construct a covert channel with transmission error rate below 20%. However, the achieved data rate is quite close to the estimated capacity when the transmission error rate is above 20% and below 40%. The experiment results on the Xen platform are depicted in Figure 15. We can observe that the capacity estimation is close to the achieved bit rate for the L2 cache covert channel and disk bus covert channel. For the CPU load covert channel and the memory bus covert channel, there is a gap between the estimated capacity and achieved data rate when we limit the 34 )) ;. * ;. ) ) ;. * ;. ) )) ) ) ) ) *678 09 ) ) ;. * ;. 8* ) ) 5:9 ) *678 09 ) ) ) 8* 5:9 *678 09 8* 5:9 (a) VMWare (b) Xen (c) KVM Fig. 13. Comparison between the distribution of achieved data rates and the distribution of estimated capacity for covert channels constructed with fine-tuned parameters *67 * < < < < < < < < < 8 * < < )< < < < < < < )< Fig. 14. < < )< 5:1 * * < )< 01 * * < < < *67 < < )< < < < << < < )< < )< )< < < < < 8 < )< < )< < < < )< < )< < < < < < < < < < < 5:1 * < )< < < * < 01 < < )< < < )< < < < )< )< < error rate to be less than 20%. The gap becomes significantly smaller when we loose the error rate constraint to be less than 50%. This result may indicate that there are better ways for constructing the covert channels than those introduced in existing work on the Xen platform. We leave this as an extension to our work. The experiment result on the KVM platform are depicted in Figure 16. We can observe that the estimated capacity is close to the achieved data rate for the CPU load covert channel and the L2 cache covert channel. The estimated capacity for memory bus covert channel and disk bus covert channel is below 10 bps. Also, we could not construct a memory bus covert channel or a disk bus covert channel with a data rate above 10 bps while having an error rate smaller than 20%. Therefore, we omitted the figures on these two types of covert channels for the KVM platform. < < * < Fig. 15. Estimated capacity vs. achieved data rate on VMWare )< < < < )< )< < < < * < < < )< < < Estimated capacity vs. achieved data rate on Xen *67 * 8 * < < < < < < Fig. 16. < < < < < )< < < < < < < < < Estimated capacity vs. achieved data rate on KVM [15]. In cloud computing, Ristenpart et al [2] were the first to demonstrate the possibility of conducting cross-VM attacks in the public cloud. Wong et al. [16] described a theoretically CPU function unit based cross-VM covert channel against simultaneous multi-threaded processors. Xu et al [4] experimentally quantified the bit rate of the CPU L2 cache based cross-VM covert channel on Amazon EC2 [17]. A memory bus based cross-VM covert channel was demonstrated by Wu et al [3]. These works focus on demonstrating the existence of various cross-VM covert channel attacks and proving the threat of VI. R ELATED WORK Since covert channel was defined by Lampson [12] in 1973, it has long been studied in computer systems. Network based covert channel is a well-established research topic [13] [14] 35 information leakage through such covert channels. Cross-VM covert channels post new challenges on covert channel capacity estimation since they often have higher bit rates and the media utilized to establish covert information flow is more sophisticated. The hardware and software environments vary dramatically from case to case. Thus, it calls for more research effort to provide deeper understanding of cross-VM covert channel risks in public clouds. In [18], Zander et al applied the classic Shannon equation which requires calculating signal-to-noise ratio. As we mentioned in Section I, crossVM covert channels do not rely on a precise clock, which makes it practically infeasible to obtain signal-to-noise ratio. An alternative approach based on the Shannon entropy formulation was introduced in [7]. If the estimated capacity of cross-VM covert channels is concerning on a cloud platform, the provider may consider applying anti-measures to mitigate the threat. Existing antimeasures includes: disabling or limiting access to fine grained timer functions [19] [20] [21] [22], redesigning access policy to shared physical components [23] [24] [25], monitoring the accesses to resources shared between VMs [26], and jamming [27]. [4] Y. Xu, M. Bailey, F. Jahanian, K. Joshi, M. Hiltunen, and R. Schlichting, “An exploration of l2 cache covert channels in virtualized environments,” in Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop, ser. CCSW ’11. ACM, 2011, pp. 29–40. [5] K. Okamura and Y. Oyama, “Load-based covert channels between xen virtual machines,” in Proceedings of the 2010 ACM Symposium on Applied Computing, ser. SAC ’10. ACM, 2010, pp. 173–180. [6] C. E. Shannon, “A mathematical theory of communication,” SIGMOBILE Mob. Comput. Commun. Rev., vol. 5, no. 1, pp. 3–55, 2001. [7] I. Gray, J.W., “On introducing noise into the bus-contention channel,” in , 1993 IEEE Computer Society Symposium on Research in Security and Privacy, 1993. Proceedings, 1993, pp. 90–98. [8] J. Millen, “Finite-state noiseless covert channels,” in Computer Security Foundations Workshop II, 1989., Proceedings of the, 1989, pp. 81–86. [9] J. W. Gray and III, “Countermeasures and tradeoffs for a class of covert timing channels,” Tech. Rep., 1994. [10] S. Arimoto, “An algorithm for computing the capacity of arbitrary discrete memoryless channels,” IEEE Transactions on Information Theory, vol. 18, no. 1, pp. 14–20, 1972. [11] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Prentice Hall PTR, 1998. [12] B. W. Lampson, “A note on the confinement problem,” Commun. ACM, vol. 16, no. 10, pp. 613–615, 1973. [13] R. Smith and G. Scott Knight, “Predictable design of network-based covert communication systems,” in IEEE Symposium on Security and Privacy, 2008. SP 2008, 2008, pp. 311–321. [14] V. Crespi, G. Cybenko, and A. Giani, “Engineering statistical behaviors for attacking and defending covert channels,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 1, pp. 124–136, 2013. [15] X. Zi, L. Yao, X. Jiang, L. Pan, and J. Li, “Evaluating the transmission rate of covert timing channels in a network,” Computer Networks, vol. 55, no. 12, pp. 2760–2771, 2011. [16] Z. Wang and R. Lee, “Covert and side channels due to processor architecture,” in Computer Security Applications Conference, 2006. ACSAC ’06. 22nd Annual, 2006, pp. 473–482. [17] I. Amazon Web Services, “Amazon elastic compute cloud (ec2),” http: //aws.amazon.com/ec2/, 2014, [Online; accessed 23-Jan-2014]. [18] S. Zander, P. Branch, and G. Armitage, “Capacity of temperature-based covert channels,” IEEE Communications Letters, vol. 15, no. 1, pp. 82– 84, 2011. [19] B. C. Vattikonda, S. Das, and H. Shacham, “Eliminating fine grained timers in xen,” in Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop, ser. CCSW ’11, 2011, pp. 41–46. [20] J. Wu, L. Ding, Y. Lin, N. Min-Allah, and Y. Wang, “XenPump: a new method to mitigate timing channel in cloud computing,” in 2012 IEEE 5th International Conference on Cloud Computing (CLOUD), 2012, pp. 678–685. [21] R. Martin, J. Demme, and S. Sethumadhavan, “TimeWarp: rethinking timekeeping and performance monitoring mechanisms to mitigate sidechannel attacks,” in 2012 39th Annual International Symposium on Computer Architecture (ISCA), 2012, pp. 118–129. [22] P. Li, D. Gao, and M. K. Reiter, “Mitigating access-driven timing channels in clouds using stopwatch,” in Dependable Systems and Networks (DSN), 2013 43rd Annual IEEE/IFIP International Conference on. IEEE, 2013, pp. 1–12. [23] J. Shi, X. Song, H. Chen, and B. Zang, “Limiting cache-based sidechannel in multi-tenant cloud using dynamic page coloring,” in 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W), 2011, pp. 194–199. [24] H. Raj, R. Nathuji, A. Singh, and P. England, “Resource management for isolation enhanced cloud services,” in Proceedings of the 2009 ACM Workshop on Cloud Computing Security, ser. CCSW ’09. ACM, 2009, pp. 77–84. [25] T. Kim, M. Peinado, and G. Mainar-Ruiz, “STEALTHMEM: systemlevel protection against cache-based side channel attacks in the cloud,” in Proceedings of the 21st USENIX Conference on Security Symposium, ser. Security’12. USENIX Association, 2012, pp. 11–11. [26] B. Saltaformaggio, D. Xu, and X. Zhang, “Busmonitor: A hypervisorbased solution for memory bus covert channels,” Proceedings of EuroSec, 2013. [27] R. Zhang, X. Su, J. Wang, C. Wang, W. Liu, and R. Lau, “On mitigating the risk of cross-vm covert channels in a public cloud,” IEEE Transactions on Parallel and Distributed Systems, vol. PP, no. 99, pp. 1–1, 2014. VII. C ONCLUSION Assessing cross-VM covert channel capacity facilitates risk management for cloud users and providers. In this paper, we presented an automated cross-VM covert channel capacity assessment framework, named AP F C 3 , based on the Shannon entropy equation. We proposed a simple noise model which helps on tuning the parameters of various cross-VM covert channels. We proposed a sample collection method and a machine learning strategy for computing the conditional probability required by the Shannon entropy equation. We evaluated the proposed framework by comparing the estimated capacity with the achieved data rate of covert channels constructed with fine-tuned parameters. The evaluation result shows that the capacity estimated using AP F C 3 is close to the achieved data rate. ACKNOWLEDGMENT This work was supported in part by and Hong Kong ITF research project (No. UIM/250) and Hong Kong General Research Funding under project 122913. R EFERENCES [1] Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Cross-vm side channels and their use to extract private keys,” in Proceedings of the 2012 ACM conference on Computer and communications security, ser. CCS ’12, 2012, pp. 305–316. [2] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, you, get off of my cloud: Exploring information leakage in third-party compute clouds,” in Proceedings of the 16th ACM Conference on Computer and Communications Security, ser. CCS ’09. ACM, 2009, pp. 199–212. [3] Z. Wu, Z. Xu, and H. Wang, “Whispers in the hyper-space: Highspeed covert channel attacks in the cloud,” in Proceedings of the 21st USENIX Conference on Security Symposium, ser. Security’12. USENIX Association, 2012, pp. 9–9. 36