Private Information Retrieval Using Trusted Hardware

Private Information Retrieval Using Trusted Hardware Shuhong Wang1 , Xuhua Ding1 , Robert H. Deng1 , and Feng Bao2 1 School of Information Systems, SMU {shuhongw,xhding,robertdeng}@smu.edu.sg 2 Institute for Infocomm Research, Singapore baofeng@i2r.a-star.edu.sg Abstract. Many theoretical PIR (Private Information Retrieval) constructions have been proposed in the past years. Though information theoretically secure, most of them are impractical to deploy due to the prohibitively high communication and computation complexity. The recent trend in outsourcing databases fuels the research on practical PIR schemes. In this paper, we propose a new PIR system by making use of trusted hardware. Our system is proven to be information theoretically secure. Furthermore, we derive the computation complexity lower bound for hardware-based PIR schemes and show that our construction meets the lower bounds for both the communication and computation costs, respectively. 1 Introduction Retrieval of sensitive data from databases or web services, such as patent databases, medical databases, and stock quotes, invokes concerns on user privacy exposure. A database server or web server may be interested in garner information about user proles by examining users' database access activities. For example, a company's query on a patent from a patent database may imply that it is pursuing a related idea; an investor's query on a stock quote may indicate that he is planning to buy or sell the stock. In such cases, the server's ability of performing information inference is unfavorable to the users. Ideally, users' database retrieval patterns are not leaked to any other parties, including the servers. A PIR (Private Information Retrieval) scheme allows a user to retrieve a data item from a database without revealing information about the data item. The earliest references of "query privacy" date back to Rivest et al [19] and Feigenbaum [9]. The rst formal notion of PIR was dened by Chor et al [6]. In their formalization, a database is modelled as a n-bit string x = x1 x2 · · · xn , and a user is interested in retrieving one bit from x. With this formalization, many results have been produced in recent years. Depending on whether trusted hardware is employed or not, we classify PIR schemes into two categories: traditional PIR which does not utilize any trusted hardware and hardware-based PIR which 2 Shuhong Wang et. al. employs trusted hardware in order to reduce communication and computation complexities. The major body of PIR work focuses on the traditional PIR. Interested readers are referred to a survey [10] for a thorough review. A big challenge in PIR design is to minimize the communication complexity, which measures the number of bits transmitted between the user and the server(s) per query. A trivial solution of PIR is for the server to return the entire database. Therefore, the upper bound of communication complexity is O(n) while the lower bound is O(log n), since by all means the user has to provide an index. For a single server PIR with information theoretic privacy, it is proven in [6] that the communication complexity is at least O(n) and therefore conrming that O(n) is the lower bound. Two approaches are used to reduce the communication cost. One is to duplicate the database in dierent servers, with the assumption that the servers do not communicate with each other. Without assuming any limit the servers' computation capability, PIR schemes with multiple database copies are able to oer information theoretic security with lower communication cost. The best known result is [3] due to Beimel et. al., with communication complexity O(nlog log ω/ω log ω ), where ω is the number of database copies. The other approach still uses single server model but assumes that the server's computation capability is bounded. Schemes following this approach oer computational security with relatively low communication complexity. The best result to date is due to Lipmaa [16], where the user and the server communication complexity are O(κ log2 n) and O(κ log n) respectively, with κ being the secure parameter of the underlying computationally hard problem. Another key performance metric of PIR schemes is their computation complexity. All existing traditional PIR schemes require high computation cost at the server(s) end. Beimel et al [4] proved that the expected computation of the server(s) is Ω(n)1 , which implies that any study on traditional PIR schemes is only able to improve its computation cost by a constant factor. To the best of our knowledge, two hardware-based PIR constructions exist in literature. The earlier scheme [20] due to Smith and Saord is proposed solely to reduce the communication cost. On each query, a trusted hardware reads all the data items from an external database and returns the requested one to the user. The other hardware-based scheme [14, 15] is due to Iliev and Smith. The scheme facilitates an ecient online query process by ooading heavy computation load oine. For each query, its online process costs O(1). Nonetheless, depending on the hardware's internal storage size, for every k (k << n) queries the external database needs to be reshued with a computation cost O(n log n). For convenience, we refer to the rst scheme as SS01 and the latter as IS04. Both schemes have O(log n) communication complexity. Our Contributions The contributions of this paper are three-fold: (1) We present a new PIR scheme using the same trusted hardware model as in [14] and 1 Ω is the notation for asymptotic lower bound. f (n) = Ω(g(n)) if there exists a positive constant c and a positive integer n0 such that 0 ≤ cg(n) ≤ f (n) for all n ≥ n0 . PIR Using Trusted Hardware 3 prove that it is secure in the information theoretical sense; (2) Among all existing PIR constructions, our scheme achieves the best performance in all aspects: O(log n) communication complexity, O(1) online computation cost and O(n) oine computation cost; (3) We prove that our average computation complexity per query, O(n/k), is the lower bound for hardware-based PIR schemes using the same model, where k (k << n) is the maximum number of data items stored by the trusted hardware. 2 Models and Denitions Database Model and Its Permutation. We use π to denote a permutation of n integers: (1, 2, · · · , n). For 1 ≤ i ≤ n, the image of i under π is denoted by π(i). A database D is modelled as an array, represented by D = [d1 , d2 , · · · dn ], where di is the i-th data item in its original form, for 1 ≤ i ≤ n. A permuted D under π is denoted by Dπ and its i-th data record is denoted by Dπ [i], for 1 ≤ i ≤ n. The database D is permuted into Dπ by using π in such a way that the i-th element of Dπ is the π(i)-th element in D, i.e. Dπ [i] = dπ(i) (1) In other words, Dπ = π −1 (D). To protect the secrecy of π , the permutation is always coupled with encryption operations. In the rest of the paper, we use Dπ [i] ' dπ(i) to denote that Dπ [i] is the ciphertext of dπ(i) . To illustrate the idea, a trivial example is as follows. Let π = (1324), which means π(1) = 3, π(2) = 4, π(3) = 2, π(4) = 1. Then for D = [d1 , d2 , d3 , d4 ], we have Dπ ' [d3 , d4 , d2 , d1 ]. To distinguish the entries in the original plaintext database and the permuted and encrypted database, we use the convention throughout the paper that data items refer to those in D and data records refer to those in Dπ . Architecture. As shown in Figure 1 below, our hardware-based PIR scheme comprises of three types of entities: a group of users; a server and a trusted hardware denoted by TH. The server hosts a permuted and encrypted version of a database D = [d1 , · · · , dn ], which consists of n items of equal length2 . TH is a secure and tamper-resistant device residing on the server. With limited computation power and storage cache, TH shues the original database D into database Dπ based on a permutation π ; it remembers the permutation π and answers users' queries. Each user interacts with TH via a secure channel, e.g. a SSL connection. When a user wants to retrieve the i-th data item of D, she sends a query q to TH through the channel. On receiving q , TH accesses the permuted database Dπ and retrieves the intended item di . Throughout the paper, by using "q = i", we mean the query q requests the i-th item of D. By saying "the index of a (data) item", we refer to its index in the original database D, by saying "the index of a (data) record", we refer to its index in the shued database in which the record locates. 2 If necessary, we use padding for those data items with dierent length. Dierent to the bit model in [6],we extend it to the block model. 4 Shuhong Wang et. al. π Fig. 1. Hardware-based PIR Model Trusted Hardware TH is trusted in the sense that it honestly executes the PIR protocol. Neither outside adversaries nor the server is able to tamper its execution or access its private space. Its limited private cache is able to store up to k (k << n) data items along with their indices. The indices of those cached items are managed by TH using a list denoted by Γ . In other words, Γ stores the original indices. We assume TH is capable of performing CPA-secure (i.e., secure under Chosen Plaintext Attacks) symmetric key encryption/decryption and to generate random numbers or secret keys. Access Pattern : As in [11], the access pattern for a time period is dened A = (a1 , · · · , aN ), where ai is the data record read in the i-th database access, for i ∈ [1, N ]. We observe that a record in the access pattern is essentially a probabilistic result of both the current query and the query history. In fact, the latter results in the current state of TH and the database. Adversary : We consider adversaries who attempt to derive non-trivial information from the user's queries. Possible adversaries include both outside attackers and the server. Note that we do not assume any trust on the server. The adversary is able to monitor all the input and output of TH. Moreover, the adversary is allowed to query the database at her will and receives the replies from TH. Stained Query and Clean Query : A query is stained if its content, e.g. the index of the requested data, is known to the adversary without observing the access pattern. This may occur in several scenarios. For instance, a query is compromised or revealed accidently; or the query could be originated from the adversary herself. On the other hand, a query is clean if the adversary does not know its content before observing the access pattern. Security Model : Our security model follows the security notion in ORAM [11]. We measure the information leakage from PIR query executions. A secure PIR scheme ensures that the adversary does not gain additional information to determine the distribution of queries. Formally, we dene it as follows. PIR Using Trusted Hardware 5 Denition 1. A hardware-based PIR scheme is secure, if given a clean query q and an access pattern A, the conditional probability for the event that the query is on index j (i.e., q=j) is the same as its a-priori probability, i.e. Pr(q = j|A) = Pr(q = j), for all j ∈ [1, n]. The equation implies that the access pattern A reveals to the adversary no information on the target query q 's content. Table 1 below highlights the notations used throughout this paper. Table 1. Notations Notation Description k D π0 , π1 , · · · Dπs ai A As Γ The maximum number of data items cached in TH. The original database in the form of (d1 , d2 , · · · , dn ). A sequence of secret random permutations of n elements {1, 2, · · · , n}. A permuted database of D using permutation πs such that Dπs [j] ' dπs (j) , for 1 ≤ j ≤ n, where Dπs [j] denotes the j -th record in Dπs . The retrieved data record by TH during its i-th access to a shue database. The access pattern comprising all the retrieved records (a1 , · · · , aN ) during a xed time period. The access pattern comprising all the retrieved records during the s-th session, as dened in Section 3. The list of (original) indices of all data items stored in TH's cache. 3 The PIR Scheme System setup We consider applications where a trusted third party (TTP) is available to initialize the system. This TTP is involved only in the initialization phase and then stays oine afterwards. For other scenarios where the TTP is not available, an alternative solution is provided in Section 6. TTP secretly selects a random permutation π0 and a secret key sk0 . It permutes the original database D into Dπ0 , which is encrypted under sk0 , such that Dπ0 [j] ' dπ0 (j) for j ∈ [1, n]. Dπ0 is then delivered to the server. TTP secretly assigns π0 and sk0 to TH, which completes the system initialization. The outline of our PIR scheme is as follows. Every k consecutive query executions are called a session. For the s-th session, s ≥ 0, let πs , Dπs and sks denote the permutation, the database, and the encryption key respectively. On receiving a query from the user, TH retrieves a data record from Dπs , decrypts it with sks to get the data item, and stores the item in its private cache. Then TH 6 Shuhong Wang et. al. replies to the user with the desired data item. The detailed operations on data retrieval are shown in Algorithm 1. After k queries are executed, TH generates a new random permutation πs+1 and an encryption key sks+1 . It reshues Dπs into Dπs+1 by employing πs+1 and sks+1 . Note that in the newly shued database Dπs+1 , all data items are encrypted under the secret key sks+1 . The details on database reshue are given in Algorithm 2. The original database D is not involved in any database retrieval operations. Since TH always performs a decryption for every read operation and an encryption for every write operation, we omit them in the algorithm description in order to keep the presentation compact and concise. Retrieval Query Process The basic idea of our retrieval algorithm is the following. TH always reads a dierent record on every query and every record is accessed at most once. Thus, if the database is well permutated (in the sense of oblivious permutation), all database accesses within the session appear random to the adversary. Without loss of generality, suppose that the user intends to retrieve di in D during the s-th session (s ≥ 0). On receiving the query for index i, TH performs the following: If the requested di is not in TH's cache, it locates the corresponding record in the permutated database Dπs by computing the record index as πs−1 (i). If the requested item resides in TH, it reads from Dπs a random record which is not accessed before 3 . The algorithm is elaborated in Figure 2 below. Algorithm 1: Retrieving record di using Dπs 1. TH decrypts the query and gets the requested index i. 2. If i ∈ /Γ 3. TH reads πs−1 (i)-th record of Dπs and stores the item di into the cache; 4. Γ = Γ ∪ {i}; 5. Else 6. TH selects a random index j , j ∈R {1, · · · , n} \ Γ 7. TH reads the πs−1 (j)-th record from Dπs and stores the item dj into the cache; 8. Γ = Γ ∪ {j}; 9. TH returns di to the user. Fig. 2. Retrieval Query Processing Algorithm Access Pattern The access pattern As produced by Algorithm 1 is a sequence of data records which are retrieved from Dπs during the s-th session. It is clear from Figure 2 that on each query, TH reads exactly one data record from Dπs . Therefore, when the s-th session terminates, As has exactly k records. 3 The operation should be coded so that both "if" and "else" situations take the same time to stand against side-channel attack. This requirement is applied at the similar situation of reshue algorithm later. PIR Using Trusted Hardware 7 Reshue Process After k retrievals, TH's private cache reaches its limit, which demands a reshue of the database with a new permutation. Note that simply using cache substitution introduces a risk of privacy exposure. The reason is that when a discarded item is requested again, the adversary knows that a data record is retrieved more than once by TH from the same location. Therefore, a reshue procedure must be executed at the end of each session. TH rst secretly chooses a new random permutation πs+1 . The expected database Dπs+1 satises Dπs+1 [j] ' dπs+1 (j) , j ∈ [1, n]. The correlation between Dπs and Dπs+1 is Dπs+1 [j] ' Dπs [πs−1 ◦ πs+1 (j)], (2) for 1 ≤ j ≤ n, where πs−1 ◦ πs+1 (j) means πs−1 (πs+1 (j)). The basic idea of our reshue algorithm is as follows. We sort the items in TH's cache in ascending order based on their new positions in Dπs+1 . We observe that those un-cached items in Dπs will also be logically sorted in the ascending order based on their new indexes, because the database supports index-based record retrieval. The reshue process is similar to a merge-sort of the sorted cached items and un-cached items. TH plays two roles: (1) participating in the merge-sort to initialize Dπs+1 ; (2) obfuscating the read/write pattern to protect the secrecy of πs+1 . TH rst sorts indices in Γ based on the ascending order of their images −1 under πs+1 . It assigns database Dπs+1 sequentially, starting from Dπs+1 [1]. For the rst n−k assignments, TH always performs one read operation and one write operation; for the other k assignments, TH always performs one write operation. The initialization of Dπs+1 [j], j ∈ [1, n], falls into one of the following two cases, depending on whether its corresponding item is in the cache or not. Case(i) The corresponding item is not cached (i.e., πs+1 (j) 6∈ Γ ): TH reads it (i.e., the record Dπs [πs−1 ◦ πs+1 (j)]) from Dπs and writes it to Dπs+1 as Dπs+1 [j]. Case (ii) The corresponding item is in the cache (i.e., πs+1 (j) ∈ Γ ): Before retrieving Dπs [πs−1 ◦πs+1 (j)] from the cache and writing it into Dπs+1 as Dπs+1 [j], TH also performs a read operation for two purposes: (a) to demonstrate the same reading pattern as if in Case (i) so that the secrecy of πs+1 is protected; (b) to save the cost of future reads. Moreover, instead of randomly reading a record from the Dπs , TH looks for the smallest index which has not been initialized and falls in Case (i). It then retrieves the corresponding data record from Dπs . Since Γ is sorted, this searching process totally costs k comparisons for the entire reshue process. The benet of this approach coupled with a sorted Γ is that both the costs for testing if πs+1 (j) ∈ Γ and the item retrieval from the cache are O(1); Otherwise, both cost O(k) per item and O(nk) in total. The details of the reshue algorithm are shown in Figure 3, where min denotes the head of sorted Γ . We use sortedel(i)/ sortins(i) to denote the sorted deletion/insertion of index i from/to Γ and subsequent adjustments. 8 Shuhong Wang et. al. Fig. 3. Database Reshue Algorithm Algorithm 2: Reshuffle Dπs into Dπs+1 , executed by TH 1. secretly select a new random permutation πs+1 ; −1 2. sort indices in Γ based on the order of their images under πs+1 . 0 set j = 1; j = 1. 3. while 1 ≤ j ≤ n − k do 4. while πs+1 (j 0 ) ∈ Γ do j 0 = j 0 + 1 end; 5. set r = πs−1 ◦ πs+1 (j 0 ); read Dπs [r] from Dπs ; 6. if j = j 0 /∗ Case (i): πs+1 (j) ∈ / Γ ∗/ 7. write Dπs+1 by setting Dπs+1 [j] ' Dπs [r]; 8. else /∗ Case (ii): πs+1 (j) ∈ Γ ∗/ 9. write Dπs+1 by setting Dπs+1 [j] ' dmin ; a 10. sortedel(j); Insert Dπs [r] into cache and sorteins(j 0 ); 11. j = j + 1;j 0 = j 0 + 1; 12. end{while}; 13. while n − k + 1 ≤ j ≤ n do 14. set Dπs+1 [j] = dmin ; sortdel(j); 15. j = j + 1; 16. end a d −1 min is exactly Dπs [πs ◦ πs+1 (j)] since Dπs+1 is lled in by an increas- ing order. The reshue algorithm is secure and ecient. An intuitive explanation of its security is as follows. After a reshue, the new database is reset to its initial status. If an item has been accessed in the previous session, it is placed at a random position by the reshue. Other items are indeed relatively linkable between sessions. However, the linkage does not provide the adversary any advantage since they have not been accessed at all. Note that the addition in the inner loop is executed at most n − 1 times in total, since j 0 never decreases. Because Γ is a sorted list and the inserted and deleted indices are in an ascending order, the insertion and deletion are of constant cost. Totally at most n comparisons are needed for the whole execution. Therefore, the overall computation complexity of Algorithm 2 is O(n). Reshuffle Pattern The access pattern produced by Algorithm 2 is denoted by Rs . Since TH only reads n − k data records, Rs has exactly n − k elements. Note that the writing pattern is omitted because it is in a xed order, i.e., sequentially writing from position 1 to position n. 4 Security We now proceed to analyze the security of our scheme based on the notion dened in Section 2. Our proof essentially goes as follows. Lemma 1 proves that the reshue procedure is oblivious in the same notion as in Oblivious RAM [11]. Thus after each reshue, the database is reset into the initial state such that PIR Using Trusted Hardware 9 the accesses between dierent sessions are not correlated. Then in Theorem 1 we show that each individual query session does not leak information of the query, which leads to the conclusion on user privacy across all sessions. Lemma 1. The reshue algorithm in Figure 3 is oblivious. For any non-negative integer s, any integer j ∈ [1, n], Pr(Dπs [j] = dl | A0 , R0 , · · · , As−1 , Rs−1 ) = 1/n, (3) for all l ∈ [1, n], where Ai and Ri , i ∈ [0, s − 1], are the access pattern and reshue pattern for i-th session respectively. We prove Lemma 1 for a xed j by induction on the session index s. The proof applies to all j ∈ [1, n]. I. s = 0. Since Dπ0 , the initial shued database, is constructed in advance under a secret random permutation π0 , the probability Pr(Dπ0 [j] = dl | ∅) = 1/n holds for all 1 ≤ l ≤ n. Proof II. Suppose the lemma is true for s = i, that is, Pr(Dπi [j] = dl | A0 , R0 , · · · , Ai−1 , Ri−1 ) = 1/n. We proceed to prove that it holds for s = i + 1, i.e. Pr(Dπi+1 [j] = dl | A1 , R1 , · · · , Ai , Ri ) = 1/n, (4) for all l ∈ [1, n]. By conditional probability, Pr(Dπi+1 [j] = dl | A1 , R1 , · · · , Ai , Ri ) n X = Pr(Dπi+1 [j] = Dπi [x] | A1 , R1 , · · · , Ai , Ri ) · Pr(Dπi [x] = dl | A1 , R1 , · · · , Ai , Ri ). x=1 (5) For clarity, we dene px and qx by px = Pr(Dπi+1 [j] = Dπi [x] | A1 , R1 , · · · , Ai , Ri ), and qx = Pr(Dπi [x] = dl | A1 , R1 , · · · , Ai , Ri ). The objective now is to prove n X (6) px qx = 1/n. x=1 Dene X = {x| x ∈ [1, n], πi (x) ∈ Γ } and Y = {x| x ∈ [1, n], πi (x) ∈ / Γ }. Note that |X| = |Γ | = k , |Y | = n − k and X ∪ Y = [1, n]. Thereafter, n X x=1 p x qx = X x∈X px qx + X x∈Y px qx . (7) 10 Shuhong Wang et. al. We observe that the adversary would have dierent projections on the new indices for those records in Dπi . For those items in TH's cache, i.e. those whose indices in Dπi are in X , the adversary obtains no information about their positions in Dπi+1 . On the other hand, for the other items, i.e. those whose indices in Dπi are in Y , the adversary is certain that they would not be placed to positions which have been initialized before their retrievals from Dπi . Moreover, the item retrieved in the rst reshue read will appear in one of the rst k + 1 positions in Dπi+1 . Therefore, for x ∈ X , px = Pr(πi+1 (j) = πi (x)) = 1/n It does not hold for px , x ∈ Y . Consequently, X X px = 1 − px = (n − k)/n. x∈Y (8) (9) x∈X The computation of qx is related to the stained queries. Recall that our adversary is allowed to explore the protocol by sending queries and receiving replies. Therefore, queries in TH could be stained. Let C denote the set of stained queries in the i-th session. We have two cases for 1 ≤ l ≤ n: P Case (1) l ∈ C : x∈X qx = 1, because in the adversary's perspective, there exists one and only one item in the cache which corresponds to query on dl . For x ∈ Y , qx = 0 because none matches. Thus n X px qx = 1/n + 0 = 1/n, for l ∈ C. x=1 (10) P qx = δ for 0 ≤ δ ≤ 1, then x∈Y qx = 1 − δ . Note that the execution of queries does not aect the adversary's observation on those not cached records, since they are not accessed. Therefore, by induction assumption: Pr(Dπi [x] = dl | A0 , R0 , · · · , Ai−1 , Ri−1 ) = 1/n, for 1−δ all l ∈ [1, n], we have qx = n−k due to equiprobability 4 . Hence, Case (2) l ∈/ C : Suppose n X P x∈X px qx = δ/n + (1 − δ)/n = 1/n. (11) x=1 Combining Case 1 and Case 2, we have n X px qx = 1/n, for all 1 ≤ l ≤ n, x=1 which concludes the proof. ¤ Lemma 1 implies that the reshue procedure resets the observed distribution of the data items. Therefore, the events occurring during separated sessions are independent of each other. Theorem 1 below addresses the security of the proposed PIR scheme. 4 Hint: Otherwise, for x0 , x1 ∈ Y , qx0 6= qx1 implies Pr(Dπi [x0 ] = dl ) 6= Pr(Dπi [x1 ] = dl ) where d ∈ / C . This result is also provable by the indistinguishability for the adversary using two dierent permutations which are identical for l ∈ C . PIR Using Trusted Hardware 11 Theorem 1. Given a time period, the observation of the access pattern A = (a1 , a2 , · · · , aN ), N > 0, provides the adversary no additional knowledge to determine any clean query q , i.e. for all j ∈ [1, n], Pr(q = j|A) = Pr(q = j) (12) where Pr(q = j) is an a-priori probability of query q being on index j . For 1 < t ≤ N , let Pr(at | a1 , · · · , at−1 ) denote the probability of the event that data at is accessed immediately after the access of t − 1 records. Let Pr(at | a1 , · · · , at−1 ; q = j) denote the probability of the same event with additional knowledge that the requested index of query q is j . Note that we do not assume any temporal order of the query q and the t-th query. We proceed to show below that Proof Pr(at | a1 , · · · , at−1 ) = Pr(at | a1 , · · · , at−1 ; q = j) (13) Without loss of generality, suppose at is read from Dπs during the s-th session. Consider the following two cases: 1. at ∈ Rs , i.e. at is accessed during a reshue process: Obviously Pr(at | a1 , · · · , at−1 ) = Pr(at | a1 , · · · , at−1 ; q = j), due to the fact that the access to at is completely determined by permutation πs and πs+1 . 2. at ∈ As , i.e. at is accessed during a query process: Let this query be the l-th query in this session, l ∈ [1, k]. Therefore, l − 1 data items are cached by TH before at is read. We consider two scenarios based upon Algorithm 1: (a) The requested data is cached in TH: at is randomly chosen from those 1 data items not cached in TH. Therefore, Pr(at | a1 , · · · , at−1 ) = n−(l−1) . (b) The requested data is not cached in TH: at is retrieved from Dπs based on the permutation πs . According to Lemma 1, the probability that at 1 is selected is n−(l−1) . Note that the compromise of a query, i.e. knowing q = j , possibly helps an adversary to determine whether at is in case (2a) or (2b). Nonetheless, this information does not change Pr(at | a1 , · · · , at−1 ), since their values are 1 n−(l−1) in both cases. Thus, Pr(at | a1 , · · · , at−1 ) = Pr(at | a1 , · · · , at−1 , q = j) when at ∈ As . In total, we conclude that for any t and q = j , Pr(at | a1 , · · · , at−1 ) = Pr(at | a1 , · · · , at−1 ; q = j). Shuhong Wang et. al. 12 As a result, Pr(A | q = j) = Pr(a1 , · · · , aN | q = j) = Pr(aN | a1 , · · · , aN −1 ; q = j) · Pr(a1 , · · · , at−1 | q = j) = N Y Pr(at | a1 , · · · , at−1 ; q = j) t=1 = N Y Pr(at | a1 , · · · , at−1 ) t=1 = Pr(A). (14) Then, Pr(q = j | A) = Pr(q = j, A)/Pr(A) Pr(A | q = j) · Pr(q = j) = Pr(A) = Pr(q = j). The result shows that, given the access pattern, the a-posteriori probability of a query equals to its a-priori probability, which concludes the security proof for our PIR scheme. ¤ 5 Performance We proceed to analyze the communication and computation complexity of our PIR scheme. They are evaluated with respect to the database size n. Both complexities of our scheme reach the respective lower bounds for hardware-based PIR schemes. Communication: We consider the user/system communication cost per query. In our scheme, the users only inputs an index of the desired data item and TH returns exactly one data item. Therefore, its communication complexity per query is O(log n). Note that O(log n) is the lower bound of communication cost for all PIR constructions. Computation: For simplicity purpose, each reading, writing, encryption, and decryption of a data item is treated as one operation. The computation cost is measured by the average number of operations per session and per query. We also measure the online cost which excludes the expense of oine reshue operations. As evident in Figure 2 and 3, it costs the trusted hardware O(1) operations to process a query and O(n) operations to reshue the database. Table 2 compares the computation cost of our scheme against [20] and [14, 15]. Our scheme outperforms the other two hardware-based PIR schemes in all three metrics. The advantage originates from our reshue algorithm which utilizes the hardware's cache in a more ecient manner. Moreover, we prove PIR Using Trusted Hardware 13 Table 2. Comparison of Computation Cost of Three Hardware-based PIR Schemes. Schemes Our scheme IS04 [14, 15] SS01 [20] Total cost Online cost Average Cost per session (k queries) per query per query O(n) O(n log n) O(kn) O(1) O(1) O(n) O(n/k) O( nk log n) O(n) that the average cost per retrieval of our scheme reaches the lower bound for all information-theoretic PIR schemes with the same trusted hardware system model. Our result is summarized in the following theorem. Theorem 2. For any information-theoretically secure PIR scheme with a trusted hardware storing maximum k data items, the average computation cost per retrieval is Ω(n/k). Our proof is constructed in a similar manner to the proof in [4] which shows the computational lower bound for traditional PIR schemes. Fix a PIR scheme, let Bi denote the set of all indices that the hardware reads in order to return di . Bi is essentially a random variable probabilistically determined by both the user query on di and the items that the hardware has already stored inside. Consequently, E(|Bi |) denotes the expect number of data items to read by the hardware to process a query on index i. We evaluate E(|Bi |) as the computation cost for PIR schemes. For 1 ≤ l ≤ n, let Pr(l ∈ Bi ) be the probability that the hardware reads dl in order to answer the query on index i. We dene P(l) as the maximum of these probabilities for all 1 ≤ i ≤ n, i.e. Proof: P(l) = max {Pr(l ∈ Bi )}. 1≤i≤n Note that for an information-theoretically secure PIR scheme, user privacy implies that B1 , B2 , · · · , Bn have the identical distribution. Therefore, for convenience purpose, let P(l) = Pr(l ∈ B1 ). Due to Lemma 5 of [4]5 , E(|Bi |) = n X P(l). (15) l=1 Our target now is to show E(|Bi |) = Ω(n/k). We prove it by contradiction. Suppose that among P(1), · · · , P(n), there at least exist k + 1 of them whose values are less than 1/(k+1). Without loss of generality, let the k+1 probabilities 5 It can be proved by dening the random variables Y1 , · · · , Yn where Yl = 1 if l ∈ Bi and Yl = 0 otherwise. 14 Shuhong Wang et. al. be P(1), P(2), · · · , P(k+1). Now consider the probability Pr(1 ∈ / B1 ∩2 ∈ / B2 · · ·∩ (k + 1) ∈ / Bk+1 ). We have, Pr(1 ∈ / B1 ∩ 2 ∈ / B2 · · · ∩ (k + 1) ∈ / Bk+1 ) = 1 − Pr(1 ∈ B1 ∪ 2 ∈ B2 · · · ∪ (k + 1) ∈ Bk+1 ) ≥ 1− k+1 X Pr(l ∈ Bl ) = 1 − l=1 k+1 X P(l) l=1 > 1 − (k + 1) 1 = 0. k+1 On the other hand, note that TH only caches k data items in maximum. As a consequence, there always exists one data item which must be read from the database during the k + 1 queries on 1, 2, · · · , k + 1. Thus, the event that 1 ∈ / B1 ∩2 ∈ / B2 · · ·∩(k+1) ∈ / Bk+1 never occurs, i.e. Pr(1 ∈ / B1 ∩2 ∈ / B2 · · ·∩(k+1) ∈ / Bk+1 ) = 0, which contradicts the probability computation above. Thus, at most k elements in {P(1), · · · , P(n)} whose values are less than 1/(k + 1) . As a result, n X P(l) ≥ (n − k) · l=1 1 , k+1 which shows the lower bound for average computation cost is Ω(n/k). (16) ¤ 6 Discussion Database Initialization Using TH For applications where no trusted third party exists, TH can be used to initialize the database. TH rst chooses a random permutation π0 . For 1 ≤ i ≤ n, TH tags the i-th item di with its new index π0−1 (i). Using the merge-sort algorithm [8], d1 , d2 , · · · , dn are sorted based on their new indices by TH. With the limited cache size in TH, Batcher's odd-even merges sorter [1] is an appropriate choice which requires (log2 n−log n+4)n/4−1 comparisons. One may argue that Bene² network [22] and Goldstein et al's switch networks [12] incur less comparisons. Unfortunately, neither is feasible in our system since the rst one requires at least n log n-bit (>> k ) memory in TH while the latter has a prohibitively high setup cost. Note that encryption is applied during tagging and merging so that the process is oblivious to the server. A simple example is presented in Fig. 4. The database in the example has 4 items d1 , d2 , d3 , d4 . The permutation is π0 = (1324), i.e. π0 (1) = 3, π0 (2) = 4, π0 (3) = 2 and π0 (4) = 1. The circles denote TH and the squares denote encrypted data items. After initialization, the original four items are permuted as shown on the right end. All the encrypted items are stored on the host. In every operation, only two items are read into TH's cache and then written back to the server. PIR Using Trusted Hardware 15 Fig. 4. Initial oblivious shue example using odd-even merges. Instantiation of Encryption and Permutation Algorithms An implicit assumption of our security proof in Section 4 is the semantic security of the encryption of the database. Otherwise, the encryption reveals the data information and consequently exposes user privacy. Our adversary model in Section 2 allows the adversary to submit queries and observe the subsequent access patterns and replies. Thereafter, the adversary is able to obtain k pairs of plaintext and ciphertext in maximum for each encryption key, since dierent random keys are selected in dierent sessions. Thus, it is demanded to have an encryption algorithm semantically secure under CPA (Chosen Plaintext Attack) model. In practice, CPA secure symmetric ciphers such as AES, are preferred over public key encryptions, since the latter have more expensive computation cost and higher storage space demand. For the permutation algorithm, we argue that it is impractical for a hardwarebased PIR to employ a true random permutation, since it requires O(n log n) bits of storage, comparable to the size of the whole database. As a result, we opt for a pseudo-random permutation with a light computation load. Since a cipher secure under CPA is transformed into an invertible pseudorandom permutation, we choose a CPA secure block cipher, e.g. AES, to implement the needed pseudo-permutation. With a block cipher, a message is encrypted by blocks. When n 6= 2dlog ne , the ciphertext may be greater than n. In that case, the encryption is repeated until the output is in the appropriate range. Since 2dlog ne ≤ 2n, the expected number of encryptions is less than 2. Black and Rogaway's result in [5] provides more information on ciphers with arbitrary nite domains. Service Continuity The database service is disrupted during the reshue process. The duration of a reshue is non-negligible since it is an O(n) process. A trivial approach to maintaining the continuity of service is to deploy two pieces of trusted hardware. While one is re-permuting the database, the other deals with user queries. In case that installing an extra hardware is infeasible, an alternative is to split the 16 Shuhong Wang et. al. cache of the hardware into two halves with each having the capacity of storing k/2 items. Consequently, the average computation cost will be doubled. Update of data items A byproduct of the reshue process is database update operations. To update di , the trusted hardware reads di obliviously in the same way as handling a read request. Then, di is updated inside the hardware's cache and written into the new permuted database during the upcoming reshue process. Though the new value of di is not written immediately into the database, data consistency is ensured since the hardware returns the updated value directly from the cache upon user requests. 7 Conclusion In summary, we present in this paper a novel PIR scheme with the support of a trusted hardware. The new PIR construction is provably secure. The observation of the access pattern does not oer additional information to adaptive adversaries in determining the data items retrieved by a user. Similar to other hardware-based PIR schemes, the communication complexity of our scheme reaches its lower bound, O(log n). In terms of computation complexity, our design is more ecient than all other existing constructions. Its online cost per query is O(1) and the average cost per query is only O(n/k), which outperforms the best known result by a factor O(log n) (though using a big-O notation, the hidden constant factor is around 1 here). Furthermore, we prove that O(n/k) is the lower bound of computation cost for PIR schemes with the same trusted hardware based architecture. The Trusted Computing Group (TCG) [21] denes a set of Trusted Computing Platform (TCP) specications aiming to provide hardware-based root of trust and a set of primitive functions to propagate trust to application software as well as across platforms. The root of trust in TCP is a hardware component on the motherboard called the Trusted Platform Module (TPM). TPM provides protected data by never releasing root keys outside of the TPM. In addition, TPM provides some primitive cryptographic functions, such as random number generation, RSA key pair generation, RSA algorithms and hash function. Most importantly, TPM provides mechanism for integrity measurement, storage, and reporting of a platform, from which trust and attestation capabilities can be built. In addition to TCG compliant TPM, Intel's LaGrande Technology (LT) [13] includes an extended CPU enabling software domain separation and protection. Beyond the hardware layer there is a domain manager supporting protected execution environments by domain separation, including separation of processes, memory pages, and device drivers. This emergence of industry standard trusted computing technologies promise to establish an adequate foundation for building a practical trusted platform for our proposed PIR scheme. How to extend and improve our proposed PIR scheme based on trusted computing technologies will be one of our future research directions. PIR Using Trusted Hardware 17 Acknowledgement This research is partly supported by the Oce of Research, Singapore Management University. We would like to thank anonymous reviewers for their valuable suggestions and criticisms on an earlier draft of this paper. References 1. Kenneth E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, pages 307314, 1968. 2. M. Bellare, A. Desai, D. Pointcheval, and P. Rogaway. Relations among Notions of Security for Public-key Encryption Schemes. In Proceedings of Crypto '98, LNCS 1462, pages 2645, Berlin, 1998. 3. Amos Beimel, Yuval Ishai, Eyal Kushilevitz, and Jean-François Raymond. Breaking the o(n1/(2k−1) ) barrier for information-theoretic private information retrieval. In FOCS, pages 261270. IEEE Computer Society, 2002. 4. Amos Beimel, Yuval Ishai, and Tal Malkin. Reducing the servers computation in private information retrieval: Pir with preprocessing. In CRYPTO, pages 5573, 2000. 5. John Black and Phillip Rogaway. Ciphers with arbitrary nite domains. In Bart Preneel, editor, CT-RSA, volume 2271 of Lecture Notes in Computer Science, pages 114130. Springer, 2002. 6. Benny Chor, Eyal Kushilevitz, Oded Goldreich, and Madhu Sudan. Private information retrieval. In FOCS, pages 4150, 1995. 7. Benny Chor, Eyal Kushilevitz, Oded Goldreich, and Madhu Sudan. Private information retrieval. J. ACM, 45(6):965981, 1998. 8. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord Stein. Introduction to Algorithms, Second Edition. ISBN 0-262-03293-7. 9. Joan Feigenbaum. Encrypting problem instances: Or ..., can you take advantage of someone without having to trust him? In Hugh C. Williams, editor, CRYPTO, volume 218 of Lecture Notes in Computer Science, pages 477488. Springer, 1985. 10. William Gasarch. A survey on private information retrieval. The Bulletin of the European Association for Theoretical Computer Science, Computational Complexity Column(82), 2004. 11. Oded Goldreich and Rafail Ostrovsky. Software protection and simulation on oblivious rams. J. ACM, 43(3):431473, 1996. 12. J. L. Goldstein and S. W. Leibholz. On the synthesis of signal switching networks with transient blocking. IEEE Transactions on Electronic Computers. vol.16, no.5, 637-641, 1967. 13. LaGrande technology architecrure. Intel Developer Forum, 2003. 14. Alexander Iliev and Sean Smith. Private information storage with logarithm-space secure hardware. In International Information Security Workshops, pages 199214, 2004. 15. Alexander Iliev and Sean W. Smith. Protecting client privacy with trusted computing at the server. IEEE Security & Privacy, 3(2):2028, 2005. 16. Helger Lipmaa. An oblivious transfer protocol with log-squared communication. In Jianying Zhou, Javier Lopez, Robert H. Deng, and Feng Bao, editors, ISC, volume 3650 of Lecture Notes in Computer Science, pages 314328. Springer, 2005. 18 Shuhong Wang et. al. 17. Michael Luby and Charles Racko. How to construct pseudorandom permutations from pseudorandom functions. SIAM J. Comput., 17(2):373386, 1988. 18. Jacques Patarin. Luby-racko: 7 rounds are enough for 2n(1−²) security. In Dan Boneh, editor, CRYPTO, volume 2729 of Lecture Notes in Computer Science, pages 513529. Springer, 2003. 19. Ronald L. Rivest, Leonard Adleman, and Michael L. Dertouzos. On data banks and privacy homomorphisms. 20. Sean W. Smith and David Saord. Practical server privacy with secure coprocessors. IBM Systems Journal, 40(3):683695, 2001. 21. TCG Specication Architecture Overview. Available from http://www.trustedcomputinggroup.org. 22. Abraham Waksman. A permutation network. J. ACM, 15(1):159163, 1968.

Private Information Retrieval Using Trusted Hardware

Related documents

Products

Support

Private Information Retrieval Using Trusted Hardware

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib