Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Topics in Private Information Retrieval Niv Gilboa Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Topics in Private Information Retrieval Research Thesis Submitted in partial fulllment of the requirements for the degree of Doctor of Philosophy Niv Gilboa Submitted to the Senate of The Technion { Israel Institute of Technology Tevet 5761 Haifa January 2001 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 This research was done under the supervision of Prof. Benny Chor, Prof. Shimon Even and Prof. Moni Naor in the Department of Computer Science. It is a great pleasure to thank some of the people to whom I am indebted for their help, support and friendship during my graduate studies. First and foremost I would like to thank my advisors Benny, Shimon and Moni. Benny, who was my sole advisor throughout most of my studies, had a profound inuence on my thinking and contributed a great deal to the research I conducted. Benny was the rst to arouse my interest in the eld of cryptography during a course he taught in 1995. Since then I have admired his skills in teaching and research. Not the least of the many things he taught me is to always question my thought processes and to look for errors that are forever lurking beneath the surface. I thank Shimon for accepting the post of my advisor at a very advanced stage of my studies, and Moni for several enlightening conversations that had signicant eect on my research. I would also like to express my gratitude to Hugo Krawczyk who taught me a lot of cryptography during a semester in which I acted as his teaching assistant, and has given me wise counsel ever since. A special note of thanks is due to Yuval Ishai for his friendship and for our productive professional collaboration (although it is true that usually our joint eorts were neither professional nor productive). Yuval and Shlomit have always provided a home away from home during my Technion years, and for that they will always have a warm place in my heart. The time I spent at the Technion would not have been the same without the past, present and honorary members of room 429. Ranging from the incurable optimist (Ronit) through the hard-working realists (Gidi and Eran) to those who were delightfully pessimistic (Yuval, Dorit, Dror, Nadav, and basically everybody else) this colorful cast of characters could not be improved upon. We shared in such escapades as multiple trips to the hospital with injured graduate students (who shall remain unnamed), meals at Muach (home of the rened palate) and numerous wagers. Most important, however, was the assurance that a witty conversation was never far away, whether it involved freeBSD, religious toleration, personal issues or vast conspiracy theories. Finally, I thank those who are closest to me. Efrat who has been my spouse throughout this long period of time and has accepted my various quirks with laughter. My parents and sister who supported me in every possible way, and my grandmother who was always proud of her eldest grandson. The generous nancial help of the Gutwirth family is gratefully acknowledged. Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Contents Abstract Notation 1 Introduction 1.1 Models of privacy : : : : : : : : : : : : : : : : : : : 1.1.1 Information theoretic privacy : : : : : : : : 1.1.2 Computational privacy : : : : : : : : : : : : 1.1.3 Single server CPIR : : : : : : : : : : : : : : 1.1.4 Symmetrically private information retrieval : 1.2 PIR generalizations and similar questions : : : : : : 1.2.1 Block retrieval : : : : : : : : : : : : : : : : : 1.2.2 Private retrieval of information by keywords 1.2.3 t-privacy : : : : : : : : : : : : : : : : : : : : 1.2.4 Private information storage : : : : : : : : : 1.3 Modes of operation : : : : : : : : : : : : : : : : : : 1.3.1 Random servers : : : : : : : : : : : : : : : : 1.3.2 Commodity servers : : : : : : : : : : : : : : 1.4 PIR as a primitive : : : : : : : : : : : : : : : : : : 1.5 PIR techniques in other areas : : : : : : : : : : : : 1.5.1 Oblivious polynomial evaluation : : : : : : : 1.5.2 Joint generation of RSA keys : : : : : : : : 1.6 Lower bounds : : : : : : : : : : : : : : : : : : : : : 1.7 Related work : : : : : : : : : : : : : : : : : : : : : 1.7.1 Multi-party private computation : : : : : : : 1.7.2 Instance hiding : : : : : : : : : : : : : : : : 1.7.3 Communication complexity problems : : : : 2 Model and Denitions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.1 The PIR Model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 4 5 6 6 7 9 10 12 12 13 14 15 17 17 18 19 20 20 20 22 23 23 25 26 28 28 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Contents { contd. Pseudo-random generators : : : : : : : : : : The CPIR Model : : : : : : : : : : : : : : : Final Notation : : : : : : : : : : : : : : : : Symmetrically private information retrieval : General multi-party computation denitions 2.6.1 Basic denitions : : : : : : : : : : : : 2.6.2 Composition of protocols : : : : : : : 2.7 private information retrieval by keywords : : 2.8 Joint generation of RSA keys : : : : : : : : 2.2 2.3 2.4 2.5 2.6 : : : : : : : : : 3 Computationally private information retrieval 3.1 Correlated Pseudo Randomness : : : : : 3.1.1 Denitions : : : : : : : : : : : : : 3.1.2 Statement of Main Results : : : : 3.1.3 Construction : : : : : : : : : : : 3.1.4 Proof of correlation : : : : : : : : 3.2 Lengths of input and output : : : : : : : 3.3 Proof of index indistinguishability : : : : 3.3.1 Three useful statements : : : : : 3.3.2 Notation used in the proof : : : : 3.3.3 Polynomial indistinguishability : 3.3.4 A quantitative version : : : : : : 3.4 Integrating the proofs : : : : : : : : : : : 3.5 PIR schemes : : : : : : : : : : : : : : : : 3.5.1 A direct scheme : : : : : : : : : : 3.5.2 A Generic Transformation : : : : 3.6 Concluding Remarks and open problems 4 Private information retrieval by keywords 4.1 Denitions and Notation : : : : : : : 4.1.1 The PIR model : : : : : : : : 4.1.2 The CPIR model : : : : : : : 4.1.3 Search data structure : : : : : 4.2 Private Retrieval of Blocks : : : : : : 4.2.1 SPIR(`; n; k) : : : : : : : : : : 4.2.2 PIR(`; n; k) : : : : : : : : : : 4.3 General Solutions to PERKY(`; n; k) 4.4 Specic implementations : : : : : : : 4.4.1 Binary Search Tree : : : : : : 4.4.2 Trie : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 33 35 36 37 37 40 41 44 45 45 46 48 49 51 52 56 56 58 59 66 72 73 73 75 80 82 82 82 83 85 86 86 88 88 93 93 94 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Contents { contd. 4.4.3 Perfect Hashing : : : : : : : : : : : : : 4.5 Symmetric PERKY : : : : : : : : : : : : : : : 4.6 Other PERKY Topics : : : : : : : : : : : : : 4.6.1 Reducing Communication Complexity 4.6.2 The address of w : : : : : : : : : : : : 4.7 Open problems : : : : : : : : : : : : : : : : : : : : : : : 5 Joint Generation of RSA Keys by Two Parties 5.1 preliminaries : : : : : : : : : : : : : : : : : : : : 5.1.1 Smallest primes : : : : : : : : : : : : : : 5.1.2 Useful techniques : : : : : : : : : : : : : 5.2 Overview : : : : : : : : : : : : : : : : : : : : : : 5.3 Computing N : : : : : : : : : : : : : : : : : : : 5.3.1 Oblivious transfers : : : : : : : : : : : : 5.3.2 Oblivious polynomial evaluation : : : : : 5.3.3 Benaloh's encryption : : : : : : : : : : : 5.4 Amortization and initial primality test : : : : : 5.4.1 OT and oblivious polynomial evaluation 5.4.2 Homomorphic encryption : : : : : : : : : 5.5 Computing d : : : : : : : : : : : : : : : : : : : 5.6 Improvements : : : : : : : : : : : : : : : : : : : 5.7 Performance : : : : : : : : : : : : : : : : : : : : Bibliography : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 96 99 103 103 105 105 107 107 107 107 109 110 111 112 113 114 114 115 115 117 118 120 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Abstract In the problem of private information retrieval (PIR) a user queries a database in order to retrieve one out of n data items, while hiding the identity of the retrieved item from the database administrator. PIR can be solved trivially by having the database send all the data to the user. However, in this case the communication complexity can be prohibitively high. In the last few years extensive research has been conducted on PIR and related problems. The research eorts have mainly focused on constructing PIR schemes that are as ecient as possible in terms of communication. Various PIR scenarios and variations on the original problem have been considered. In the rst PIR works information-theoretic privacy was required. In other words the databse could learn no information about the identity of the retrieved item regardless of its computational power. In order to allow ecient schemes it was assumed that the data is replicated at several servers which could communicate with the user, but not among themselves. Subsequent works introduced other models, such as relaxing the user's privacy requirement to computational privacy, or adding the requirement of database privacy, which states that the user may learn a single data item but nothing else. This dissertation consists of three works that are all connected to PIR and related problems. The rst work introduces the concept of computationally private information retrieval (CPIR). In other words, information retrieval in which privacy is maintained as long as the servers (two or more) in which the database is held are computationally bounded. The PIR schemes we construct rely on the mildest standard cryptographic assumption: the existence of one way functions, or equivalently the existence of pseudo-random generators. The quality of our construction depends on the exact assumption we make on pseudo-random generators. The standard assumption states that there exists a pseudo-random generator GEN which expands seeds by a polynomial factor to n bits. The distribution that results from choosing a seed at random and expanding it by GEN cannot be distinguished from the uniform distribution on f0; 1gn by any probabilistic polynomial algorithm. Given this assumption, for any constant c we construct a CPIR scheme with communication complexity O(n1=c). The results are slightly dierent if we view the generator as expanding seeds 1 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 of length (n) to strings of length n. The expansion from (n) to n may be by more than a polynomial factor.q Given this assumption plog(n=(n)) our scheme has communication complexity about (n) 2 log(n=(n)) 2 . In either case our schemes have substantially lower communication complexity than the best (currently known) two server information-theoretically private scheme: O(n1=3). The main technical tool used in the construction of the CPIR schemes is a \correlated" pseudo random generator. On input r (a short random string) and an index ` (1 ` n), the nal output of this generator is a pair of n bit binary strings G(u` ; 1n) and G(w` ; 1n ). These two strings are pseudo random and highly correlated: They dier at the `-th bit, and are identical elsewhere. As an intermediate step, the generator is required to produce a pair of short "succinct representations" u`; w`, so that G(u` ; 1n ) is eciently computable from u` (and G(w` ; 1n ) from w`, respectively). The 2n representations fu`gn`=1 ; fw`gn`=1 are pseudo random. The second topic in the dissertation presents a private retrieval problem that diers from PIR in the structure of the database. In PIR the assumption is that the data items are homogeneous and that each is identied by its position in a list of items. The user retrieves the item in the i-th position while keeping i secret. However, usually a user does not know the position of a desired data item in the internal structure of the database. Instead, the user typically holds some information about the desired data, such as a keyword that is linked to it. The user sends this keyword to the database and receives in return a list of all the items which correspond to the keyword. The PrivatE information Retrieval by KeYwords (PERKY) problem introduced in this work is a step towards modeling this more realistic scenario. We assume that the database consists of n keywords, each of length `. The user holds a single keyword w and wishes to nd whether it is one of the keywords of the database, without leaking information about w. As is also the case in PIR, dierent variants of PERKY can be considered. The required user privacy may be either information-theoretic or computational, database privacy may be required (or not) and the database may be held by a single server or at multiple sites that do not communicate among themselves. In our work we show several schemes that solve PERKY by reducing it to the problem of PIR. The main idea is to have the servers that hold the database organize it in a data structure which facilitates search operations. Such structures include binary trees, hash tables etc. The user conducts an oblivious walk on the data structure by using several executions of a PIR scheme. The properties of the resultant PERKY scheme are a function of both the data structures and PIR schemes we employ. In one of the PERKY schemes we show, which utilizes a data structure based on perfect hashing, the communication complexity is greater than that of a PIR scheme by only a constant factor. Another PERKY scheme, which utilizes a trie data structure, 2 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 allows protection of database privacy in the sense that the user only learns whether the keyword w is one of the keywords in the database, but cannot obtain any other information about the database. The third work in the dissertation addresses a dierent type of problem altogether. It presents a protocol for two parties to privately generate an RSA key in a distributed manner. At the end of the protocol the public key, which is a modulus N = PQ, and an encryption exponent e is known to both parties. Individually, neither party obtains information about the decryption key d and the prime factors of N : P and Q. However, d is shared among the parties so that threshold decryption is possible. An alternative way to present this concept is that the keys are distributed in such a way that both parties can jointly sign a message with the secret key, but neither can sign alone. PIR can be useful in solutions to the joint generation of RSA keys problem. In particular, the protocol we show uses PIR techniques in order to privately compute some values, which are essential for the private computation of the decryption key. 3 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Notation PIR CPIR SPIR PERKY DBj U k n x i w { Private Information Retrieval { Computationally-Private Information Retrieval { Symmetrically-Private Information Retrieval { Private Information Retrieval by Keywords { the j -th database in a PIR scheme { the user in a PIR scheme { number of databases or players { length of data string in a PIR scheme { length of data block in PIR scheme or of word in PERKY { data string in a PIR scheme { user's retrieval index in a PIR scheme { user's retrieval word in a PERKY scheme { a security parameter { the set f1; 2; : : : ; ng { the i-th unit vector of length n { exclusive-or of binary strings k { concatenation of binary strings GEN { a pseudo-random generator SREP1 ; SREP2 { succinct representation generators G { expansion generator SREP1 ; SREP2; G { a pseudo-random generator [n] e`(n) scheme P; Q P a ; P b ; Qa ; Qb N e d { large primes { additive shares of P and Q respectively { RSA modulus, product of P and Q. { RSA public exponent { RSA private exponent 4 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Chapter 1 Introduction The emergence of the Internet is making publicly accessible databases more common and more important than ever. Anyone, corporate executive to private citizen, can access such a database and retrieve up-to-date information for a (hopefully moderate) fee. However, issues of security and privacy immediately arise. The database may wish to limit the user's accessibility to certain data, while the user may wish to keep his query private. That is, the user retrieves a certain item of data without allowing the database to gain information about the identity of the retrieved item. The task of private information retrieval (PIR) schemes is to overcome various privacy and security problems that appear in information retrieval from publicly accessible databases. The emphasis of most PIR schemes (beginning with the rst PIR work [25]) is on protecting the user's privacy, although some works [38, 63] deal also with the privacy of the database. Currently, PIR schemes are still fairly theoretical in nature. The models used are simplied and idealized versions of actual commercial databases. Formally speaking, the database is modeled as an n bit binary string x, and the user's query is a single address i, 1 i n. The goal of the PIR scheme is to allow the user to learn the i-th bit of the data string, xi, without revealing any information to the database about i. PIR research is devoted to extending this basic model by taking into account various practical considerations, and to constructing ecient protocols in various settings. While the main motivation for PIR is practical in nature, it is also similar to several theoretical questions which arose from complexity theory or communication complexity. We discuss some of these questions in Section 1.7. PIR is also part of an interesting trend in multi-party private computation. The goal is to allow several players to jointly compute a function of their inputs without revealing "too much" information on these inputs. During the 1980's the main research eort in this eld has been to establish what functions can be privately computed and under what constraints. A more contemporary trend is to discover how eciently specic 5 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 functions can be computed. PIR research is a part of this trend as are works in threshold cryptography [31], joint generation of El-Gamal cryptosystem keys[67], RSA keys [15, 26, 27, 35, 68, 40], or oblivious polynomial evaluation [63]. Much of the remainder of the introduction is a survey of PIR literature and its background. In section 1.7 we present several problems and directions of research which preceded PIR and are important in understanding its evolution. In sections 1.1, 1.2, 1.3, 1.4, 1.5, 1.6 we present and discuss in brief the PIR works that have been published so far. The survey also includes an introduction to each of the technical chapters in this work. In subsection 1.1.2 we present the notion of computationally private information retrieval, as an introduction to chapter 3. In subsection 1.2.2 we introduce private information retrieval by keywords, which is fully presented in chapter 4. Finally, in subsection 1.5.2 we outline joint generation of RSA keys, fully presented in chapter 5. 1.1 Models of privacy 1.1.1 Information theoretic privacy In the rst work to present the PIR problem [25] Chor, Goldreich, Kushilevitz and Sudan considered the following model. A string x 2 f0; 1gn is replicated at k servers DB1; : : : ; DBk , which do not communicate with one another. A user, U is interested in retrieving the i-th bit xi. The user's privacy is maintained in the information-theoretic sense. In other words, each server (individually) is not allowed to gain information on the identity of the bit being retrieved, regardless of the server's computational power. Several schemes that solve this basic PIR problem were put forward in [25]. Each scheme is parameterized by k (the number of non-communicating servers required to implement it) and by its communication complexity. The best schemes in [25] had the following parameters: A 2-server scheme with communication complexity O(n1=3). For any k, a k-server scheme with communication complexity O(k2 n1=k log k). A particular instance of 2the above is a (log n)=3 server scheme with communication complexity O(log n log log n). In [3] Ambainis generalized the 2 server protocol of [25] 2into a k server protocol. The communication complexity of his k server scheme is 2O(k )n1=(2k?1). This is better than the previous k server scheme for any constant number k (and indeed for any k less than approximately log1=3 n). 6 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 In [54] Ishai and Kushilevitz present a linear algebraic framework in which to view certain types of information-theoretic PIR schemes. They show that both the schemes of [25] and of [3] are specic cases of this framework. Furthermore, they constructed an improved k-server scheme achieving communication complexity O(k3n1=(2k?1)). 1.1.2 Computational privacy It seems that the techniques currently used for information theoretically private information retrieval cannot yield k-server schemes whose communication complexity is asymptotically lower than O(n1=(2k?1)). However, this obstacle led to the evolution of other interesting directions of PIR research. One idea, which constitutes a major contribution of this dissertation, is to weaken the privacy requirement to computational privacy. A server whose computational resources are limited (say to polynomial time computations) cannot infer any information on the identity of the retrieved bit i, given reasonable cryptographic assumptions. This relaxation of the privacy requirements is very natural; any "real world" server has limited computational resources. In [22] Chor and Gilboa present the rst computationally private information retrieval (CPIR) scheme. A complete version of that work appears in chapter 3. The PIR model in this work is identical to the information-theoretic PIR model except for relaxing the privacy requirement to computational privacy. We show how to construct ecient 2 server CPIR schemes1 using a novel tool which we call a correlated pseudorandom generator. Performing cryptographic tasks in a distributed manner often requires some form of correlated randomness. That is, the parties involved in the task have access to correlated random sources. In chapter 3 (an initial version appears in [22]) we develop a tool for achieving a specic type of correlated randomness at a low cost in communication. We are interested in the following scenario. A dealer chooses two strings a; b 2 f0; 1gn uniformly at random subject to the condition that they are identical except for a single bit at a predetermined position `. The dealer communicates with two parties in such a manner that the rst one learns a, the second learns b, but neither learns any information about `. It follows from [25] that the communication in this setting between the dealer (a PIR user) and each of the parties (PIR data servers) is at least n ? 1 bits. Suppose we are ready to settle for such a result in a computational setting, in order to reduce the communication costs. We would like to develop a scheme which has the following stages: the dealer sends a short string u to the rst party and a short string w to the second party, each party eciently expands its string into two 1 one. The scheme can be generalized to more than 2 non-communicating servers but not to a single 7 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 n bit strings G1(u; 1n) and G2 (w; 1n ) respectively. These expanded strings should be identical except for a single bit at the predetermined position `. A task like we just described can surely be achieved using any pseudo random generator GEN . Let s be a seed for GEN . Then u = s and w = sk` can be expanded to G1 (u; 1n) = GEN (s; 1n ) and G2(w; 1n ) = GEN (s; 1n ) e`(n), which dier only at the `-th bit (e`(n) denotes the `-th unit vector of length n, and k denotes concatenation). On the other hand, it is clear that the party who gets the string sk` learns `. With this example in mind, our goal is to distribute two succinct representations u; w such that G(u; 1n ) G(w; 1n ) = e`(n)2 and neither u nor w yield any eciently computable information about `. We show how to obtain the desired correlation at a low cost, under the mildest acceptable cryptographic assumption { the existence of pseudo random generators [14, 74]. (This is equivalent to the existence of one way functions [51, 50].) Given this assumption, using any pseudo-random generator GEN we develop a \correlated" pseudo-random generator, which on input (1n ; `; r) (where r is chosen randomly from an appropriate space) produces two pseudo-random n bit strings that dier only in the `-th bit. The generator consists of a trio of algorithms: (SREP1 ; SREP2; G) and its operation can be divided into two stages. In the rst stage, each SREP algorithm produces a short string which we call a succinct representation. SREP1 produces u and SREP2 produces w. If the input r is random then the output of each SREP algorithm is pseudo-random (individually, not jointly). In the second stage the expansion algorithm, G, is applied (separately) to u and w. The result is the required pair of n bit strings, that dier only in the `-th bit. Therefore, the dealer can distribute a; b among the two parties by sending the rst party the output of SREP1 and the other the output of SREP2. The parties can themselves apply G and expand the strings they receive. We measure the quality of the correlated pseudo-random generator by the length of the succinct representations u and w. The shorter they are, the better. The motivation for using this specic measure is that in the application we quoted above, the communication from the distributor to each of the two parties is exactly one such succinct representation. In our construction, the length of the succinct representations is dependent on the exact assumption we make on pseudo-random generators. The standard assumption states that there exists a pseudo-random generator GEN which expands seeds by a polynomial factor from (n) = n1=d to n bits. The distribution that results from choosing a seed at random and expanding it by GEN cannot be distinguished from the uniform distribution on f0; 1gn by any probabilistic polynomial algorithm. Given this assumption, for any constant c we construct a correlated pseudo-random generator with succinct representations of length O(n1=c). The re2 We require that the same algorithm be used to expand both strings. 8 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 sults are slightly dierent if we view the pseudo-random generator as expanding yet shorter seeds of length (n) to strings of length n, where (n) may be sub-polynomial in n. Given this assumption we present aqcorrelated pseudo-random plog(n=(n)) generator with succinct representations of length (n) 2 log(n=(n)) 2 . In the second part of Chapter 3 we apply the correlated pseudo-random generators to the problem of computationally private information retrieval (CPIR). We show two slightly dierent CPIR schemes. The rst is a direct application of the correlated pseudo-random generator and is presented in subsection 3.5.1. The second transforms any one round, two database computationally private scheme to a one round, two database computationally private scheme with (typically) lower communication complexity. This second scheme is constructed with similar techniques to those employed in the correlated pseudo-random generator and is presented in subsection 3.5.2. Both schemes use the same intractability assumption { the existence of pseudo random generators. Given such a generator GEN that expands the q (n) bits topnlog(bits communication complexity of the rst is about 2(n) log(n=(n)) 2 n=(n)). Somewhat better results can be achieved by the second scheme through repeated recursive executions of the transformation it provides (from one PIR scheme to another). Both schemes employ only one round of communication, do not require any coding in storing the database contents and they are memoryless { neither users nor databases have to remember any of the communication's history. 1.1.3 Single server CPIR Single server PIR schemes maintaining information-theoretic privacy have communication complexity at least n, as shown in [25]. However, that lower bound does not hold for computationally private schemes. The aim of many CPIR works following [22] was to break the single server barrier and to construct ecient (i.e sub-linear communication complexity) CPIR schemes in which the participants are a user and a single server. The price that has to be paid for single server schemes is stronger cryptographic assumptions. Currently, all the known single server CPIR schemes rely on specic number-theoretic cryptographic assumptions. The rst single server scheme was suggested by Kushilevitz and Ostrovsky in [58]. They used the well known quadratic residuosity assumption in order to construct p2 [49] log n log (n) , where (n) is a scheme with communication complexity roughly (n)2 the security parameter. In [72] Stern uses homomorphic encryption systems such as the Benaloh system [12], or the Naccache-Stern system [62] (which are natural extensions of the Goldwasser-Micali cryptosystem [49]) to obtain an improved version 9 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 of the Kushilevitz-Ostrovsky plog n protocol. His scheme has communication complexity approximately (n)2 . Both the Kushilevitz-Ostrovsky and the Stern protocols depend on certain homomorphic properties of the cryptographic primitives they use. In [60] Mann shows how to obtain similar schemes from a wide class of trap-door one-way functions that exhibit certain homomorphic properties. His scheme can be based, for instance, on the decisional Die-Hellman assumption [64] or on lattice problems see [2, 45, 46]. The Mann scheme has the same communication complexity as the Kushilevitz-Ostrovsky scheme (apart from the fact that the security parameter (n) is not necessarily the same for dierent cryptographic assumptions). In [17] Cachin, Micali and Stadler use yet another assumption and present a dierent single server CPIR scheme. They put forward the assumption that given a large product of two primes N = PQ it is infeasible to determine for some prime p > 2 whether p j (N ), where is Euler's totient3 function, and the factorization of N is unknown. This scheme diers from other PIR schemes in that its communication complexity depends only on the security parameter (n) and not directly on n. Indeed the communication complexity of the protocol, O((n)4 ) is polynomial in the security parameter. 1.1.4 Symmetrically private information retrieval Typically, in general multi-party private computations the privacy of all players must be maintained. That is, no information about a player's input should be leaked by the protocol, beyond what follows as a direct consequence of the output. Furthermore, a great deal of research has been devoted to the case of malicious players. Those are players that attempt to learn information about other participants' input by diverging from the protocol in some way. The symmetrically private information retrieval (SPIR) problem diers from PIR in that it adds the requirement of the database's privacy. The user is allowed to obtain a single bit of the database xi, but no other information about x. The motivation for this extra restriction is quite natural. A commercial database that sells data items for a certain rate would certainly prefer to use a retrieval protocol which allows the user to obtain only the data he paid for and nothing else. The SPIR problem is in fact very similar to two problems presented during the 1980's. The rst is 1 out of n oblivious transfer, which is a natural generalization of 1 out of 2 oblivious transfer [34]. The second problem is all or nothing disclosure of secrets (ANDOS) presented in [16]. In both of these problems there are two players. A server holding n data items x1; : : :; xn and a user holding an index i; 1 i n 3 (n) = jm1 m < n; gcd(m; n) = 1j 10 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 (the titles of the players are dierent in oblivious transfer and ANDOS works). The goal is for the user to retrieve xi without learning any information about other items and without letting the server know which item was obtained. The focus of oblivious transfer and ANDOS works is to show that these tasks are at all possible, and to nd the mildest cryptographic assumptions which can be used to devise a protocol that solves the problem. The emphasis of SPIR is on eciency rather than minimal assumptions. Furthermore, SPIR research is also interested in the PIR scenario of the data being replicated at several non-communicating sites, which is not the case for the other problems. In [38] Gertner, Ishai, Kushilevitz and Malkin introduced the notion of SPIR. Their main results were in the multi-server, information-theoretic privacy setting. In this model they show that PIR schemes can be used to construct SPIR schemes, as long as the servers have a source of shared randomness (without which the privacy of the data can not be protected in an information-theoretic sense). In case of an honest user the following results are attained: A transformation of any k-server information-theoretic PIR scheme to a k + 1server information-theoretic SPIR scheme with the same communication complexity (up to a multiplicative constant). A k-server information-theoretic SPIR scheme, which for any constant k has communication complexity O(n1=(2k?1)). Using the same techniques in conjunction with the PIR schemes of [54] yields for any k 2 SPIR schemes with communication complexity O(k3n1=(2k?1)). An O(log n)-server information-theoretic SPIR scheme which has communica2 tion complexity O(log n log log n). A 2-server computational SPIR scheme which has communication complexity p2log n= (n) , where (n) is the security parameter, and the cryptographic (n)2 assumption is that one-way functions exist. The work also dealt with the case of a dishonest user. All the above mentioned schemes can be adapted to this case at the cost of a multiplicative O(log n) factor in the communication complexity. A disadvantage of [38] is that it deals with single server schemes by using general (and inecient) zero-knowledge protocols. A new approach was oered in [63] by Naor and Pinkas. They showed how to transform any PIR scheme to a SPIR scheme at the cost of log n 1 out of 2 oblivious transfers (and some extra computation by the server). Their idea is very useful in putting together single-server schemes, and in eciently handling dishonest users. However, it does have the drawback that even if 11 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 the original PIR scheme is information-theoretically private, the corresponding SPIR scheme is computationally private, and relies on the security of oblivious transfers. An interesting and dierent approach was used by Stern [72] to construct a single server SPIR scheme. Unlike the schemes of [38, 63] the Stern idea is not a general reduction of SPIR to PIR but a specic SPIR protocol. It uses the same cryptographic assumptions as the PIR protocol introduced in the same and achieves the same plogwork, n communication complexity (up to a constant), (n)2 . 1.2 PIR generalizations and similar questions PIR in its original form is a very "clean" and restricted problem. Any attempt to use PIR schemes in "real world" applications is bound to raise a host of questions that remain unanswered in the basic model. In this section we outline several problems that arise along these lines. They either generalize the PIR problem, or are tangential to it and were inspired by PIR research. They are all addressed at problems that PIR solutions leave unsolved. 1.2.1 Block retrieval The problem of privately retrieving blocks of information was introduced in the rst PIR work [25]. In this model k servers hold n blocks of ` bits each which comprise the data, and the user wishes to retrieve the i-th block while keeping i private. This problem is a strict generalization of PIR, since PIR can be regarded as the private block retrieval problem, with ` = 1. Since a block of ` bits can clearly be retrieved via ` invocations of a regular PIR protocol, the goal is to achieve more ecient solutions than that. The technique of balancing the communication between the servers and user was presented in [25] and used to great advantage in block retrieval. The idea is to transform a PIR scheme P in which the user's communication complexity is and a server's communication complexity is into a PIR scheme P 0 in which the user sends 0 bits and each server sends 0 bits. Applying this method Chor et al showed that if the number of non-communicating servers is a constant k, any block of size ` n1=(k?1) can be retrieved with communication complexity O(`). In the paper the bound was explicitly stated only for k = 2, but the generalization is immediate using two theorems that appear in that work. A dierent balancing technique was used in [24] to prove another claim for the multi-server information-theoretically private model. It was shown that for any block size ` and for any constant number of servers k, k 2, It is possible to privately retrieve the i-th block with communication complexity O(n1=(2k?1) `k=(2k?1)). 12 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Using other PIR schemes as building blocks for private block retrieval schemes is certainly possible (although nothing further has been published to date). Especially appealing are the information-theoretically private schemes of [54], the computationally private schemes of [22] and the single-server CPIR schemes of [72]. 1.2.2 Private retrieval of information by keywords One of the ways the PIR model diverges from actual databases is in the type of queries it permits. In PIR the user is supposed to know enough about the internal structure of the database in order to pinpoint the location of the desired data. Thus, the user applies a query of the type: "what is the data item at the i-th position?". A more realistic scenario is that the user has some keyword linked to the required data. The user queries the database about the keyword and receives in return a list of data items that correspond to the keyword. The PrivatE information Retrieval by KeYwords (PERKY) problem attempts to model this more realistic setting. In PERKY the data servers hold a list of n keywords s1; : : : ; sn, while the user holds a single keyword w. The user wishes to discover whether w 2 fs1; : : : ; sng or not, without leaking any information about w. Solving this problem allows private retrieval of any data associated with w (such as its address in the database). The PERKY problem was introduced in [23] but was not published until [24]. Independently it appeared in [58] and was presented in brief as an extension of the rst single-server scheme. In [24] PERKY was more fully explored. An extended version of this work appears in chapter 4. We describe a simple, modular way to privately access data by keywords. Our scheme combines PIR solutions together with data structures that support search operations, in order to retrieve information privately in the keyword model. We present a general transformation from PIR schemes to retrieval by keywords schemes, using a large class of data structures. We also describe specic instantiations of this transformation. The main idea in our constructions is the following: the servers insert all the keywords they hold, s1; : : : ; sn, into a data structure, which supports search operations on strings. The user holds a specic keyword w, and conducts an \oblivious walk" on the data structure until either the word w is found, or the user is assured of the fact that w is not one of s1; : : : ; sn. A typical search in the data structure involves a sequence of operations, where each operation consists of fetching the contents of a word from memory, performing a \local" computation, which depends on the keyword and the fetched contents, and either determining a new address based on the computation or terminating the search (successfully or unsuccessfully). This sequence of operations can be viewed as a walk on the data structure. By employing repeated 13 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 invocations of a PIR scheme (or more precisely a private block retrieval scheme), we transform this walk into an oblivious walk on the data structure, where each server gets no information on the walk, except its length. This implies, in particular, that nothing is revealed about the desired keyword itself beyond its length. Apart from the reductions of PERKY to PIR (which form the heart of chapter 4) we also show the inverse reduction from PIR to PERKY, and present the notion of symmetrically private information retrieval by keywords, which is analogous to SPIR. In this problem the user obtains a single bit of information (whether w 2 fs1; : : : ; sng) and nothing else. Among the results of we show are the following: Given a k-server `-bit block retrieval scheme with communication complexity C , there exists a k-server PERKY scheme so that for keywords of length ` the communication complexity is O(C ). Given a k-server SPIR scheme with communication complexity C , there exists a k-server symmetric PERKY scheme so that for keywords of length ` the communication complexity is no more than 2` log n C . Given a k-server PERKY scheme in which the keyword length is log n, there exists a PIR scheme with the same communication complexity. 1.2.3 -privacy t One of the most important parameters in a multi-party private computation is the adversary structure. One of the most prevalent adversary structures is the t-threshold structure, which contains all the subsets with at most t-parties, for a given parameter t. A multi-party protocol which ensures that no subset of at most t players receive any "illegitimate" information is called t-private. The PIR problem requires 1-privacy among the data servers. Each server by itself cannot infer any information (in the information-theoretic or the computational sense) about i, the identity of the retrieved item. However, all the previously mentioned multi-server PIR schemes suer from the drawback that if several servers cooperate then the privacy of the user is seriously compromised. This problem does not arise, o course, in single-server schemes. In the t-privacy information retrieval problem the data string x is replicated at k servers, the user retrieves a bit xi while keeping the identity of i hidden from any coalition of at most t servers. The practical motivation for the problem is that the requirement in the usual PIR setting that no two servers communicate is articial and in many cases impossible to implement. However, it might be more likely that only a small number of servers cooperate in the "illegal" action of trying to discover 14 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 what i is. Therefore, a scheme that withstands an attack on the privacy of the user by a small coalition of servers is very desirable. In order to present the technical results we adopt the format of [54]. Instead of regarding the communication complexity as a function of k, the number of servers, and t, the privacy threshold, we regard k as a function of t and d where the communication complexity is O(n1=d). The t-privacy problem was rst considered in the information-theoretic model in [25]. The authors showed that for any two natural numbers t and d there exists a t-private (dt ? t + 1)-server information-theoretic PIR scheme with communication complexity O(n1=d ). The polynomial interpolation technique used in [25] is an adaptation of locally random reductions and instance hiding techniques [1, 7, 9]. Those methods are themselves a particular application of the Ben-Or, Goldwasser and Wigderson multi-party private computations protocol [11], which in turn uses Shamir's polynomial interpolation secret sharing scheme [71]. Improved results were obtained by Ishai and Kushilevitz in [54] who used as the basis of their work the replication based secret sharing scheme presented by Ito, Saito and Nishizeki in [55]. Ishai and Kushilevitz show t-private k-server informationtheoretic PIR schemes with communication complexity O(n1=d) for the following values of k: For any natural number t and for any odd integer d 3, $ % ! d + t ? 3 k = min dt ? 2 ; dt ? t : For any natural numbers t; d, % ! $ d + t ? 3 ; dt ? t + 1 : k = min dt ? 2 A nal observation about t-private schemes was oered by Chor et al in [25]. They pointed out that any t-private k-server PIR scheme can be transformed into a bk=tcserver PIR scheme (i.e with 1-privacy). Hence, t-private schemes cannot become too ecient without aecting regular PIR schemes (although there is some room for improvement in t-private schemes as can be deduced by comparing the 1-private and t-private schemes of [54]). 1.2.4 Private information storage Private information storage is an interesting extension to the PIR model which was proposed by Ostrovsky and Shoup in [66]. In this problem the user can both privately 15 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 retrieve a bit xi from a database and privately write a bit onto the database. The privacy of the storage operation pertains both to the position of the stored item and to its value. In particular, a server holding the database has no information as to the actual data (otherwise, a storage operation could not be private). Initially, in [66], the motivation for private storage was very similar to that of PIR. However, it seems very unlikely that public databases would allow each and every one of the multitude of users who access them to change their contents. That is especially true considering the fact that a database administrator cannot monitor the database in this scenario, since its contents are hidden in some manner. Another point against the publicly accessible database model is that the reliability of the database could be jeopardized by a malicious user bent on storing false information. A nal point to consider in this context is that if a database administrator is truly curious about a stored value, he can pose as a user, query the data and retrieve the desired item. Despite all of the above, private storage and retrieval may still be very useful but in another niche altogether. Consider a setting in which a single user (or a group of users that have some joint interests) wishes to store data at a remote site and retrieve it or change it at his leisure. Such a scenario may ensue due to the user's wish to save local storage space or to have a safe back-up for his data (certain regrettable events involving our system administration brought forth this train of thought). The user would clearly wish to retrieve and store the data privately, and hence would like to use a private retrieval and storage scheme. The problems of the storage model discussed previously are an integral part of a multi-user storage scheme, but are irrelevant here as there is only one unique user who can access the data (some type of an access control mechanism such as passwords has to be added here). The private storage schemes proposed in [66] are an adaptation of oblivious RAM techniques [42, 65, 48]. In the oblivious RAM problem there are two parties: a CPU and a main memory. The CPU and memory interact in order to execute a computer program. The CPU has a limited amount of local memory in registers, but most of the data used during the program's execution is stored at the main memory. There is a privacy requirement which states that the main memory is not allowed to have any information on the access pattern of the CPU (the memory addresses which the CPU accesses). There are two major dierences between the oblivious RAM and private storage and retrieval problems. The rst is that the user in private storage does not have any local memory, unlike the CPU in oblivious RAM. The second is that private storage allows distribution of the data servers, unlike the main memory in oblivious RAM. The main result in [66] is a reduction of the private storage and retrieval problem to PIR. Given a k-server PIR scheme with communication complexity C they construct a k +1 private storage and retrieval scheme with communication complexity C log3 n. The reduction ensures information-theoretic privacy. That is, if the PIR scheme is 16 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 information-theoretically private, then so is the private storage and retrieval scheme. Ostrovsky and Shoup also include results in the computational privacy model. Private storage and retrieval schemes are constructed for a 4-server setting and for a 2-server setting. In the rst case assuming the existence of one-way functions suces. In the second one has to make the stronger assumption that one-way trapdoor permutations exist. In both cases the asymptotic communication complexity is very low: polynomial in the security parameter (n) and log n. However, both schemes require intensive use of general multi-party computation protocols [75, 47] and seem inecient for reasonable choices of the parameters n and (n). A disadvantage of the storage solutions presented in [66] is that they call for O(log n) rounds of communication for each storage or retrieval operation. Dierent storage protocols with a constant number of communication rounds are proposed in [41]. The main thrust of the work is in the computational privacy model via generalizations of techniques used in [22]. It also deals with t-private storage and retrieval schemes. 1.3 Modes of operation One way of tackling inherent problems of the PIR model is to change it. The two works we outline in this section add new entities to the PIR setting as a means of bypassing certain obstacles. 1.3.1 Random servers In [37] Gertner, Goldwasser and Malkin discuss the data replication problem. The conventional view of multi-server PIR schemes is that the data servers are the legal owners of the data, or at least do not abuse the trust of the database who provided them with the data (e.g., by selling it for private prot). In [37] there is an explicit separation between the legal owner of the data, the database, and the k servers whose only function is to assist in PIR schemes. An extra privacy requirement is added: no server individually can gain any information about the data string x. This last requirement can be strengthened to protecting the privacy of the data against any coalition of up to t servers (where 1 t k). Another problem addressed by this work is that of the heavy on-line workload of the database. That is solved by shifting almost all the load to the servers, while the database's on-line computation and communication remain minimal. The approach of [37] is to have the database use a secret sharing scheme to distribute the string x among the servers in such a way that any coalition of servers in the adversary structure receives no information about x. The work provides solutions for 17 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 t-privacy of the data (in which case a k-server scheme is transformed into a k(t + 1)server scheme of the new type), and for full privacy. In this second case the data is kept private of all the k servers together. However, the database and servers have to engage in a periodic and expensive re-initialization phase. The computation and communication of the auxiliary servers in all the schemes of this work is comparable to that of the of the data servers in the underlying PIR schemes. The computation and communication of the database is minimal (O(1), or in some cases no work at all). 1.3.2 Commodity servers In [32] Di-Crescenzo, Ishai and Ostrovsky introduce a model which also includes a user, data servers and auxiliary servers. Their model, goals and results greatly dier from those of [37]. They follow the footsteps of Beaver in [6] and introduce "commodity based PIR". In this setting the data servers have the same function as in regular PIR. The auxiliary servers (called commodity servers) are active only at an o-line stage, and are used to reduce the total communication complexity at the on-line stage. (Compare this goal with [37] in which the auxiliary servers are used to perform the tasks of the data servers in regular PIR). The commodity servers can remain unaware of all the parameters of the PIR scheme. They do not know the database x, the user's query i and indeed do not know if there are other commodity servers in the system and what their number is. The only function of the servers is to send a single message (the commodity) per query to each data server and to the user. The basic result of [32] is that given a single commodity server any PIR scheme P which is executed in one round of communication can be transformed into a commodity PIR scheme that has the following properties: The number of data servers remains the same as in P . The o-line communication complexity is (roughly) identical to the number of bits sent by the user in P . The on-line communication is the number of bits sent by a data server in P plus log n bits. It follows that the PIR schemes most suited for commodity protocols are those in which the data servers' communication complexity is very low (such schemes appear in [25, 54] and in chapter 3). A lot of work extending this basic result is to be found in [32]. The issue that most troubled the authors is a collusion between the commodity server and some of 18 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 the databases. In the basic scheme, if the commodity server colludes with even one of the databases the privacy of the user is completely jeopardized. More advanced schemes proposed in [32] ensure that the privacy of the user is maintained as long as at most m ? 1 out of m commodity servers collude with a database. Following is a short list of some of the schemes presented in the work: A single database m-commodity server scheme, based on [58]. A 2m database m-commodity server scheme based on [22] that can withstand a collusion of m ? 1 commodity servers with one of the databases (but the number of databases has become very large). An mtd + 1 database, m-commodity server scheme that oers informationtheoretic privacy, and is based on schemes of [25]. It can withstand a collusion of up to t databases and m ? 1 commodity servers. In all of these schemes the communication of each party remains much the same as it was in the basic scheme (except for an additional poly((n)) bits sent by the user in the single database scheme). Hence, the user's communication complexity remains low:logarithmic in n and at most polynomial in the security parameter (n). 1.4 PIR as a primitive An interesting and recent development has been to investigate the "power" of PIR in comparison with other cryptographic primitives. The information-theoretic PIR works [25, 3, 54] showed that if the number of non-communicating servers is at least two, PIR schemes with sub-linear communication are possible, without any cryptographic assumptions. Furthermore, [22] showed that assuming the existence of one-way functions is sucient to ensure very ecient 2-server schemes (in terms of communication). However, single server schemes [58, 60, 72, 17] all required specic trap-door one-way functions. This gap between multi-server and single-server schemes turns out to be no coincidence. In [10] Beimel et al show that a single server PIR scheme with communication complexity at most n ? 1 can be used to construct a one-way function. This can be achieved both directly, or by constructing other cryptographic primitives, such as commitment schemes, which are known to be equivalent to one-way functions [51]. A second work by Di-Crescenzo, Malkin and Ostrovsky [33] developed this direction further and showed that single-server PIR with sub-linear communication complexity implies oblivious transfer, which in turn implies the existence of trap-door one-way functions. There is also a simple construction of secret key exchange ( based on a single-server sub-linear communication PIR scheme [53]. 19 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Therefore, while no cryptographic assumptions at all are needed for multi-server PIR schemes, and one-way functions suce for very ecient multi-server schemes, the existence of trap-door functions is essential for single-server PIR schemes. Furthermore, results such as [52] hint that nding a single-server CPIR scheme based on weaker assumptions, such as one-way functions, may prove to be a major breakthrough in theoretical computer science. 1.5 PIR techniques in other areas Research in PIR has proved benecial to other private computation problems. This is especially true of SPIR works, since SPIR considers the more usual model of protecting the privacy of all parties in the protocol. We provide two examples of problems seemingly unrelated to PIR, which use SPIR techniques as part of their solution. 1.5.1 Oblivious polynomial evaluation The oblivious polynomial evaluation problem (OPE) was rst introduced and solved by Naor and Pinkas in [63] (in the same work as their novel SPIR technique). The OPE model is as follows. Alice holds a eld element 2 F and Bob holds a polynomial B (x) over F . At the end of the protocol Alice obtains only B () while Bob learns nothing at all. The intractability assumption used in [63] is new and is presented briey in chapter 5. In broad terms the idea of the protocol is that Alice sends d lists of m eld elements each to Bob. Bob computes a value for each element and now holds d lists of m values. Alice and Bob engage in d invocations of a single-server SPIR scheme (where Alice is the user and Bob is the server) through which Alice learns one value of each of the d lists. As proven in [63] the d values suce to gain B () and nothing else. 1.5.2 Joint generation of RSA keys Another problem in which SPIR techniques prove useful is that of joint generation of RSA keys [15, 26, 27, 35, 68, 40]. In chapter 5 (which was rst published as [40]) we show how two parties can jointly generate RSA public and private keys. Following the execution of our protocol each party learns the public key: N = PQ and e, but does not know the factorization of N or the decryption exponent d. The exponent d is shared among the two players in such a way that joint decryption of cipher-texts is possible. Generation of RSA keys in a private, distributed manner gures prominently in several cryptographic protocols. An example is threshold cryptography, see [30] for a 20 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 survey. In a threshold RSA signature scheme there are k parties who share the RSA keys in such a way that any t of them can sign a message, but no subset of at most t ? 1 players can generate a signature. A solution to this problem is presented in [29]. An important requirement in that work is that both the public modulus N and the private key are generated by a dealer and subsequently distributed to the parties. The weakness of this model is that there is a single point of failure{ the dealer himself. Any adversary who compromises the dealer can learn all the necessary information and in particular forge signatures. Boneh and Franklin show in [15] how to generate the keys without a dealer's help. Therefore, an adversary has to subvert a large enough coalition of the participants in order to forge signatures. Several specic phases of the Boneh-Franklin protocol utilize reduced and optimized versions of information theoretically private multi-party computations [11]. Those phases require at least three participants: Alice and Bob who share the secret key and Henry, a helper party, who knows at the end of the protocol only the public RSA modulus N . Subsequent works [26, 27, 68] and [35] consider other variants of the problem of jointly generating RSA keys. In [26] Cocks proposes a method for two parties to jointly generate a key. He extends his technique to an arbitrary number of parties k [27]. The proposed protocol suers from several drawbacks. The rst is that the security is unproven and as Coppersmith pointed out (see [27]) the privacy of the players may be compromised in certain situations. The second is that the protocol is far less ecient than the Boneh-Franklin protocol. In [68] Poupard and Stern show a dierent technique for two parties to jointly generate a key. Their method has proven security given standard cryptographic assumptions. Some of the techniques employed in the current work are similar to the ideas of [68] but the emphasis is dierent. Poupard and Stern focus on maintaining robustness of the protocol, while we emphasize eciency. In [35] Frankel, Mackenzie and Yung investigate a model of malicious adversaries as opposed to the passive adversaries considered in [15, 26, 27] and in our work. They show how to jointly generate the keys in the presence of any minority of misbehaving parties. The current work focuses on joint generation of RSA keys by just two parties. We use the Boneh-Franklin protocol and replace each three party sub-protocol with a two party sub-protocol. We construct three protocols. The rst is based on 21 oblivious transfer of strings. Thus, its security guarantee is similar to that of general circuit evaluation techniques [75, 47]. The protocol is more ecient than the general techniques and is approximately on par with Cocks' method and slightly faster than the Poupard-Stern method. The second utilizes a new intractability assumption akin to noisy polynomial reconstruction that was proposed in [63]. The third protocol is based on a certain type of homomorphic encryption function (a concrete example is given by Benaloh in [12, 13]). This protocol is signicantly more ecient than the 21 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 others both in computation and communication. It's running time is (by a rough estimate) about 10 times the running time the Boneh-Franklin protocol. There are several reasons for using 3 dierent protocols and assumptions. The rst assumption is the mildest one may hope to use. The second protocol has the appealing property that unlike the other two it is not aected by the size of the modulus. In other words the larger the RSA modulus being used, the more ecient this protocol becomes in comparison with the others. Another interesting property of the rst two protocols is that a good solution to an open problem we state at the end of the paper may make them more ecient in terms of computation than the homomorphic encryption protocol. We assume that an adversary is passive and static. In other words, the two parties follow the protocol to the letter. An adversary who compromises a party may only try to learn extra information about the other party through its view of the communication. Furthermore, an adversary who takes over Alice at some point in the execution of the protocol cannot switch over to Bob later on, and vice versa. 1.6 Lower bounds The greatest part of the PIR research eort has been expended in discovering upper bounds for the communication complexity in various settings. Finding matching lower bounds has been mostly neglected. A trivial bound is log n + 1 bits, since communication complexity arguments prove this bound even without the privacy requirement. The multi-server model used in information-theoretic PIR was engendered by a lower bound proven in [25]. The lower bound states that any single-server PIR scheme maintaining information-theoretic privacy has communication complexity of at least n. Thus, any meaningful information-theoretic PIR scheme must assume the multiserver model. Currently no general lower bounds for the multi-server case exist. However, there have been some advances in lower bounds for multi-server PIR schemes that have specic properties. The rst such bound was proven in [25]. The lower bound applies to any 2-server PIR scheme that is comprised of the following phases: the user sends a query q1 to the rst server and q2 to the second server, each server sends a single bit as reply and nally the user's answer is the exclusive-or of these bits. As mentioned previously, the authors prove that for this type of scheme the communication sent to each server must have at least n ? 1 bits. A more general (though much weaker) lower bound in the information-theoretic setting was provided by Mann in [60]. Mann proves a bound for multi-server, oneround information-theoretic PIR schemes. The communication complexity for a k22 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 server scheme of this type is greater than (k2=(k ? 1) ? ") log n, for any constant " > 0. In particular this result shows that any two server, one round, informationtheoretically private scheme has communication complexity that is at least (roughly) 4 log n. This result is the most general lower bound proven so far. Of special interest is the fact that all the known information-theoretic PIR schemes use one round of communication, [25, 3, 54]. On the one hand this bound serves to demonstrate that information-theoretic PIR is probably harder than non-private retrieval. On the other hand, the wide gap between it and the upper bounds for the information-theoretic setting serve to underline the paucity of our knowledge concerning PIR lower bounds. 1.7 Related work 1.7.1 Multi-party private computation We begin by describing a simple model for multi-party private computation. Let f be a function mapping k inputs to k outputs. The computation involves k parties p1; : : : ; pk who hold one input each, pj holds aj . The participants wish to compute a the value of a k-parameter function f at a certain point (a1; : : :; ak ). The j -th player pj holds only the j -th input aj . The goal is to construct a private protocol that computes f (a1; : : : ; ak) = (o1; : : : ; ok ) so that pj obtains only the j -th output oj . By a private protocol we mean (in very loose terms) that a player (or a specied coalition of players) do not learn from the execution of the protocol any information which does not follow from its input and output. PIR is a particular instance of multi-party private computation. In PIR there are k + 1 parties (user and k servers), the user's input is i, the input of all k servers is x, and the function f is dened as f (i; x; : : :; x) = (xi; ?; : : :; ?). That is, the user obtains xi while the servers have no output. The model of communication assumes that the protocol proceeds in synchronous communication rounds. In each round any player can send messages to any other player. The communication is carried out over secure channels, so that only the two parties exchanging a specic message may learn what it is (or indeed obtain any information about it). Dening exactly what privacy means in general is a Herculean task, undertaken in various papers and theses [5, 61, 18, 43], and is still undergoing changes [19]. In this work we are interested only in specic instances of private computation, and specic models of privacy, which we dene more rigorously in following chapters. In this section we only give a avor of the intricacies that are involved in private computation. It is convenient to visualize an evil adversary, lurking o-stage, just waiting to 23 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 pounce on the unwary participant in the protocol and glean some information. Privacy has to be maintained given a predetermined adversary structure C . That is a collection of subsets of players, i.e C 2[k] , which may be susceptible to the adversary. He may corrupt and take-over any such subset during the execution of the protocol. In the original PIR problem the adversary structure consists of all the subsets of servers containing exactly one server. In the t-privacy extension [25, 54] C includes all the subsets of servers containing at most t servers. In the symmetrically private information retrieval problem [38, 63] C is comprised of all the singletons (single server or just the user). This last is the more typical scenario of private computation works, in that there are privacy constraints placed on all the parties. The PIR case, where the user is allowed to learn more information about x than just xi, is unusual. The power of the adversary is as important a parameter as the adversary structure. The adversary may be passive (also called honest-but-curious), i.e the parties it controls do not deviate from the protocol but do attempt to gather as much information as possible. Or it can be active and have the parties it holds under sway deviate maliciously from the protocol. It can be computationally bounded (for instance, limited to polynomial time) or computationally unbounded. It can also be either static or adaptive. By static we mean that the adversary decides which of the parties to corrupt ahead of time. An adaptive adversary can decide which parties to attack during the execution of the protocol. Currently PIR schemes consider a passive, static adversary in either of the computationally unbounded [25, 3, 54] or bounded [22, 58, 60, 72, 17] models. the main reason for assuming a limited adversary is simply that the focus of PIR works so far has been on eciency and not on strengthening the adversary. It is quite likely that subsequent works will rectify the situation. However, there is a great dierence between adding the property of being active to that of being adaptive. Works in general multi-party private computation show that an active adversary signicantly reduces the size of malicious subsets of players which can be tolerated by a private protocol [11, 21, 47]. The same is also true of PIR. For instance, given an active adversary that may control either one of two servers, no PIR scheme even exists. In contrast, basically the same results that can be proved for static adversaries hold for adaptive adversaries as well [20]. Furthermore, all the PIR protocols constructed so far are executed in one round of communication, which makes the question of adaptivity irrelevant 4. In this work we adhere to the conventions of other PIR papers and assume that our adversary is honest-but-curious and static. When assuming that the adversary is computationally unbounded we say that a private protocol maintains informationThis last is not true for private information retrieval by keywords, as in chapter 4, or of private storage [66]. 4 24 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 theoretic privacy. In the case of a computationally bounded adversary we speak of computational privacy. After describing the adversary we are ready to discuss what the privacy of a protocol means exactly. As we mentioned earlier the notion we wish to capture is that the adversary will under no circumstances learn any inputs and outputs of the protocol that do not "belong" to parties he has corrupted. In the honest-butcurious, static and computationally unbounded setting that notion is captured by the following requirement. Let T 2 C be some subset of players which may be taken over by the adversary. For any two input k-tuples a; a0 such that aj = a0j and f (aj ) = f (a0j ) for all j 2 T , the joint view of all parties in T of the protocol when the input is a is identically distributed to the view when the input is a0. In the passive and computationally bounded setting the denition changes only slightly. Instead of requiring the two distributions to be identical as above, we require only that they be computationally indistinguishable (see chapter 2 for more on the subject). Since the adversary in PIR is assumed to be static and passive, we adopt this denition of privacy. If the adversary might be more powerful, i.e active and/or adaptive the denition has to change. In this case two models of the protocol's execution are considered. The real-life model involves the actual execution of the protocol. In the ideal model an external trusted party is used. All the inputs may be sent to that party, and it can compute f (a1; : : :; ak ) = (o1 ; : : :; ok ) and distribute each output to the appropriate player. The protocol is considered private if what the adversary sees in the real-life execution, while corrupting T , can be eciently simulated if in the ideal protocol the adversary corrupts that same subset of players T . Thus, the adversary gains no more knowledge than what could be obtained in an ideal solution. 1.7.2 Instance hiding The problem of instance hiding was introduced and studied in [1, 7, 8]. A computationally bounded user holds an input (instance) i 2 f0; 1gm and wishes to compute the value of a known Boolean function f : f0; 1gm ?! f0; 1g on i. Since f may be hard to compute, the user is allowed to query k computationally unbounded oracles in order to learn f (i). The only restriction is that i is kept secret from each oracle. A k-oracle instance hiding problem can be formulated in a way similar to a k-server information theoretic PIR problem. The servers act as the oracles, f : f0; 1glog n ?! f0; 1g is given by the data string x, where f (i) = xi. By executing an instance hiding scheme the user can learn xi while hiding i, as is the requirement in PIR. The problems dier in two respects. The rst is that in instance hiding, even though f is possibly dicult to compute, it is publicly known, and thus its structure may enable the user to form more ecient queries than is the case in PIR, where 25 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 a-priori x may be any string in f0; 1gn . However, currently no such instance hiding scheme which exploits the specic structure of f is known. The second dierence between the two problems is that in PIR n is considered a feasible quantity, while in instance hiding 2m, which is analogous to n, is infeasible. Therefore, instance hiding schemes limit the user to computation and communication that are polynomial in jij (or poly-logarithmic in n if viewed in a PIR context). On the other hand, PIR allows the user poly(n) computation and communication. Adaptations and improvements of instance hiding schemes were important themes in the rst PIR work of [25]. The best schemes for k 3 servers in that work are a variation on distributed polynomial interpolation ideas proposed in [7, 8]. Subsequent PIR works [3, 54] improved the schemes of [25] for "small" values of k. The O(log n)server PIR schemes which were presented in [25] and are variations of instance hiding schemes are still the most ecient we know of for that number of servers (their communication complexity is O(log2 n log log n) and the computation of the user is poly-logarithmic in n). 1.7.3 Communication complexity problems Since in instance hiding schemes the focus was on poly-logarithmic communication and computation by the user, instance hiding works did not provide the best solutions for a small number of servers. An indication that two server information-theoretic PIR with sub-linear communication is possible was provided, prior to the rst PIR publication, in [69]. A problem considered in that work is the following: Two servers hold the same string x 2 f0; 1gn , and one index each. The rst holds j , and the second `, where 1 j; ` n. The goal is for each server to send a single message to a user such that the user learns x(j+`)modn . A solution to this problem can be transformed into a PIR scheme by having the user randomly select j; ` so that j + ` i mod n, send j to the rst user and ` to the second, and nally execute the above-mentioned solution. The O(n log log n= log n) solution in [69] to this protocol can be viewed as the rst sub-linear communication solution for PIR. A similar problem which can also be used as a PIR protocol was introduced in [4]. Here a user holds k indices 1 i1; : : :; ik n, while k servers hold the same string x 2 f0; 1gn . Each server also holds k ? 1 of the indices that the user has. The j -th server holds all the indices except for ij . The indices are regarded as log n-bit binary strings. The goal is for each server to send a single message to the user, so that the user can obtain xi1:::i , where denotes bit by bit exclusive or. The protocol presented in [4] provides a solution for the problem with communication complexity O(knH2 (1=(k+1)), where H2 is the binary entropy function. For example, for k = 2 their protocol's communication complexity is O(nH2 (1=3)) O(n0:92). Any solution to this problem can be transformed into a k-server information-theoretic PIR scheme with k 26 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 an additive communication factor of k(k ? 1) log n. The results of [4] are a signicant improvement over [69], but still yield PIR schemes that are inferior to any quoted in PIR works (beginning with [25]). 27 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Chapter 2 Model and Denitions 2.1 The PIR Model Let DB1; : : : ; DBk be k servers, each holding a copy of an n bit binary string x = x1 : : :xn (the database). A user, denoted by U , wishes to retrieve one of the bits xi (1 i n). The servers can communicate with the user, but not with each other. The following is a denition of private information retrieval in a somewhat restricted setting, which suces for our purposes. We dene a scheme that achieves information-theoretic privacy, is executed in one round of communication and assures the user of correct retrieval with probability 1. All information-theoretically private schemes that appear in the literature are of this form. Denition 1: A one round, k-server PIR scheme maintaining information-theoretic privacy P is a trio of algorithms: Q(1n; i; r): The query algorithm receives as input 1n , the length of the database, i, the retrieval index and r a random input. Its output is a k-tuple of queries (q1; : : : ; qk). A(j; qj ; x): The answer algorithm receives as input a server number j (j 2 [k]), a query qj and the database x. Its output is an answer aj . R(1n ; i; r; a1; : : :; ak ): The reconstruction algorithm receives as input the database length 1n , the retrieval index i, the random input r and the k answers. Its output is a single bit. The user is dened by the two algorithms Q and R. The j -th server is dened by the algorithm Aj which is A(j; ; ) (the algorithm A where the rst argument is restricted to j ). 28 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 P involves three steps: U uses Q to generate k queries and send one query to each server (qj to DBj ). Each server replies with an answer (DBj uses Aj to generate aj ). Finally U uses R to reconstruct xi from the answers. P must have the following two properties: Correctness: For all n 2 IN; x 2 f0; 1gn ; i 2 [n] and r, if Q(1n; i; r) outputs (q1; : : : ; qk) then: R(1n ; i; r; A1(q1; x); : : :; Ak (qk; x)) = xi: Privacy: Let Distj (n; i) denote the distribution of the query algorithm's output re- stricted to its j -th entry as induced by the random choices of r (in other words, the distribution of qj ). Then, for every i1; i2 and j , where 1 i1 i2 n, and j 2 [k] we require that Distj (n; i1) = Distj (n; i2): Next, we dene private retrieval of blocks. This notion is very similar to PIR. The only dierence is that x is not necessarily a binary string. Instead, the database x is made up of n blocks of ` bits each. The retrieved item is now a block of bits xi 2 f0; 1g` . The denition is extended in a natural manner. Denition 2 : P is a one round, k-server PIR of blocks scheme maintaining information-theoretic privacy if it is a trio of algorithms: Q(1n; 0` ; i; r): The query algorithm receives as input 1n , the number of data blocks, 0` , the length of each block, i, the retrieval index and r, the random input. Its output is a k-tuple of queries (q1; : : : ; qk ). A(j; qj ; x): The answer algorithm, receives as input a server number j (j 2 [k]), a query qj and the database x. Its output is an answer aj . R(1n ; 0` ; i; r; a1; : : : ; ak): The reconstruction algorithm, receives as input the database length, a retrieval index, random input and the k answers. Its output is a single block of ` bits. The user is dened by the two algorithms Q and R. The j -th server is dened by the algorithm Aj which is A(j; ; ). P involves three steps: U uses Q to generate k queries and send one query to each server (qj to DBj ). Each server replies with an answer (DBj uses Aj to generate aj ). Finally U uses R to reconstruct xi from the answers. P must have the following two properties: 29 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Correctness: For all n; ` 2 IN; x 2 f0; 1gn` ; i 2 [n] and r, if the output of the query algorithm Q(1n ; 0` ; i; r) is (q1; : : : ; qk ) then: R(1n ; 0` ; i; r; A1(q1; x); : : :; Ak (qk ; x)) = xi: Privacy: Let Distj (n; i) denote the distribution of the query algorithm's output re- stricted to its j -th entry as induced by the random choices of r (in other words, the distribution of the qj s). Then, for every i1; i2 and j , where 1 i1 i2 n, and j 2 [k] we require that Distj (n; `; i1) = Distj (n; `; i2): 2.2 Pseudo-random generators We start by dening computational indistinguishability and pseudo randomness. We use both the standard version of the denitions [44, p. 85], and a quantitative version. We include this second case in order to enable an exact analysis of the correlated generator we construct in Chapter 3in terms of the pseudo random generator whose existence we assume. We begin by dening computational indistinguishability of two probability ensembles. Denition 3: A probability ensemble Y is a sequence of probability distributions Y = fYngn2IN , where each distribution Yn ranges over some domain Domn . Denition 4: Let `ength : IN ?! IN be a monotonously non-decreasing function satisfying `ength(n) n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let Y = fYn gn2IN and Z = fZn gn2IN be two probability ensembles such that the domain of both Yn and Zn is f0; 1g`ength(n) . We say that Y and Z are TD (n); "(n){computationally indistinguishable if the following holds. For every distinguisher D, a probabilistic algorithm whose running time is bounded by TD(), there is an n0 such that for all n n0 jPr[D(Yn ; 1n ) = 1] ? Pr[D(Zn ; 1n ) = 1]j < "(n) where the probability is taken over the distributions Yn , Zn , and over the coin tosses of D. We say that Y and Z are non-uniformly TD (n); "(n){computationally indistinguishable if for every (possibly non-uniform) family of circuits, fCircngn2IN , where the size of the n-th circuit Circn is bounded by TD (n), there is an n0 such that for all n n0 jPr[D(Yn ; 1n ) = 1] ? Pr[D(Zn ; 1n ) = 1]j < "(n) where the probability is as above. 30 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Notice that by giving the distinguisher D the input 1n in addition to the \real input" (a `ength(n) bit string) we make the requirement more stringent. This is because `ength(n) could be substantially smaller than n, yet we let D run TD (n) steps, not just TD (`ength(n)). Denition 5: Let `ength : IN ?! IN be a monotonously non-decreasing function satisfying `ength(n) n. Let T be a family of functions of the form TD : IN ?! IN , and let E be a family of functions of the form " : IN ?! [0; 1]. Let Y = fYn gn2IN and Z = fZn gn2IN be two probability ensembles such that the domain of both Yn and Zn is f0; 1g`ength(n) . We say that Y and Z are T ; E {computationally indistinguishable if for every TD 2 T and " 2 E , Y and Z are TD (n); "(n){computationally indistinguishable. We say that Y and Z are computationally indistinguishable if they are T ; E { computationally indistinguishable, where E = fn?c j c 2 IN g and T = fnc j c 2 IN g: Now we turn to the special case in which one of the ensembles is the sequence of uniform distributions over f0; 1g`ength(n) . Denition 6: Let `ength : IN ?! IN and let Uni = fUningn2IN be a probability ensemble. We say that Uni is the uniform ensemble corresponding to `ength if for every n 2 IN the distribution Unin is the uniform distribution on the set f0; 1g`ength(n) . Denition 7: Let `ength : IN ?! IN be a monotonously non-decreasing function satisfying `ength(n) n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let Y = fYn gn2IN be a probability ensemble such that the domain of Yn is f0; 1g`ength(n) and let Uni = fUningn2IN be the uniform ensemble corresponding to `ength. We say that Y is TD(n); "(n){pseudo random if Y and Uni are TD (n); "(n){computationally indistinguishable. We say that Y is non-uniformly TD (n); "(n){pseudo random if Y and Uni are non-uniformly TD (n); "(n){computationally indistinguishable. Let T be a family of functions of the form TD : IN ?! IN , and let E be a family of functions of the form " : IN ?! [0; 1]. The ensemble Y is called T ; E {pseudo random if Y and Uni are T ; E {computationally indistinguishable. The ensemble Y is simply called pseudo random if Y and Uni are computationally indistinguishable. We are now ready to dene pseudo-random generators. Denition 8: Let : IN ?! IN be a monotonously non-decreasing function, satisfying (n) < n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. A deterministic algorithm GEN is called a TD(n); "(n){pseudo random generator if it has the following properties: Stretching: For every n 2 IN , GEN receives as input s 2 f0; 1g(n), the seed, and 1n , the output length, and produces as output a string GEN (s; 1n ) 2 f0; 1gn . 31 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Computational indistinguishability: The ensemble fGEN (s; 1n )gn2IN (induced by choosing s 2 f0; 1g(n) uniformly at random) is TD (n); "(n){pseudo random (`ength(n) = n). A TD (n); "(n){pseudo random generator GEN is called a non-uniformly TD (n); "(n){ pseudo random generator if it has the property of: Non-uniform computational indistinguishability: The ensemble of distributions fGEN (s; 1n )gn2IN , (induced by choosing s 2 f0; 1g(n) uniformly at random) is non-uniformly TD (n); "(n){pseudo random (`ength(n) = n). Let T be a family of functions of the form TD : IN ?! IN , and let E be a family of functions of the form " : IN ?! [0; 1]. We say that GEN is a T ; E {pseudo random generator if it is a TD (n); "(n){pseudo random generator for every TD 2 T and " 2 E . Notation 1: The running time of GEN (s; 1n ) is denoted by TGEN (n). We say that the stretch of GEN is (n) to n. The parameter n is called the security parameter. (Notice that this diers from the standard denition in which (n) is called the security parameter [44, p. 76].) Denition 9 : We say that a T ; E {pseudo random generator GEN is simply a pseudo random generator if it has the following properties: There exists a constant ctime such that TGEN (n) nctime . There exists a constant c`en > 1 such that the seed length is (n) = n1=c`en . The ensemble fGEN (s; 1n )gn2IN , induced by choosing s 2 f0; 1g(n) uniformly at random, is pseudo random (`ength(n) = n). Denition 9 coincides with the usual denition for pseudo random generators. Informally it states that the generator is a polynomial time algorithm such that any polynomial time bounded probabilistic algorithm cannot gain an inverse polynomial (in n) advantage in distinguishing between the uniform distribution and the distribution of the generator's outputs. It is known [50, 51, 59] that the existence of such a generator is equivalent to the existence of one way functions. 32 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 2.3 The CPIR Model After dening the notion of computational indistinguishability we are ready to proceed and dene one round, computationally private information retrieval schemes. Intuitively, the dierence between information-theoretic privacy and computational privacy is that the distribution of queries received by a server when the user wishes to retrieve xi1 is indistinguishable from that of the queries when the user is retrieving xi2 (as opposed to being identical). Denition 10: A function i : IN ?! IN is called an index function if for every n 2 IN we have 1 i(n) n. We use the same notation as in denition 1. In particular, we denote by Distj (n; i) the distribution of the j -th query (qj ) as induced by the random choices of r. Denition 11 : Let TD : IN ?! IN , and " : IN ?! [0; 1]. P is a one round, k-server PIR scheme maintaining TD(n); "(n){computational privacy if it is a PIR scheme according to Denition 1, except for the privacy condition which is changed to: Privacy: Let n 2 IN and let qlen : IN ?! IN be the query length function. For every i 2 [n] and for every j 2 [k] the j -th output of the query generator Q(n; i; r) (denoted qj ) is of the same length qlen(n). For every j (j 2 [k]) and for every two index functions i1 ; i2 : IN ?! IN the two probability ensembles Distj;i1 = fDistj (n; i1(n))gn2IN and Distj;i2 = fDistj (n; i2(n))gn2IN are TD (n); "(n){computationally indistinguishable (in this case `ength(n) = qlen(n)). We say that P is a one round, k-server PIR scheme maintaining non-uniform TD (n), "(n) computational privacy if for every j , i1() and i2() as above, the ensembles Distj;i1 , Distj;i2 are non-uniformly TD(n); "(n){computationally indistinguishable. We say that P is a one round, k-server PIR scheme maintaining computational privacy (in short a computationally private information retrieval scheme) if for every j , i1() and i2() as above, the ensembles Distj;i1 , Distj;i2 are computationally indistinguishable. Discussion: In a way, Denition 11 seems unsatisfactory. Given a computationally bounded distinguisher D and two sequences of query distributions, it ensures that for a large enough n0 the algorithm D can't distinguish between the sequences. In other words, it can't distinguish between a query retrieving i1(n) and a query retrieving i2(n). However, the denition does not state that there is a single n0 such that for 33 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 every pair of index functions i1; i2 the query distributions for i1(n) and i2(n), where n > n0 are indistinguishable by D. Our next denition addresses this problem and may thus be more attractive. Denition 12: Let `ength : IN ?! IN be a monotonously non-decreasing function, satisfying `ength(n) n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let Y = fYn;i gn2IN ;i2[n] be a probability ensemble such that each Yn;i ranges over f0; 1g`ength(n) . We say that Y is TD (n); "(n){index indistinguishable if for every distinguisher D, a probabilistic algorithm whose running time is bounded by TD (), there is an n0 such that for all n n0 and for all i1 ; i2 2 [n] jPr[D(Yn;i1 ; 1n) = 1] ? Pr[D(Yn;i2 ; 1n) = 1]j < "(n) where the probability is taken over the ensembles Yn;i1 , Yn;i2 , and over the coin tosses of D. P is a one round, k-server PIR scheme maintaining TD(n); "(n){computational privacy if it is a PIR scheme according to Denition 1, except for the privacy condition which is changed to: Privacy: For every j 2 [k], the ensemble Distj = fDistj (n; i)gn2IN ;i2[n] is index indistinguishable. The following lemma shows that both denitions can be used. Lemma 1: Denition 11 and Denition 12 are equivalent. Proof: We show that any protocol that is a one round, k-server CPIR scheme according to one denition is also a one round, k-server CPIR scheme according to the second denition. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let P be a protocol between DB1; : : : ; DBk and U that is TD (n); "(n){computationally private according to Definition 12. Let D be a distinguisher running in time TD (). Let i1; i2 be two index functions. By assumption there exists an n0 such that for any n n0 and for i1(n); i2(n) 2 [n] (similarly for any two other elements in [n]): Pr[D(Yn;i1 (n); 1n ) = 1] ? Pr[D(Yn;i2 (n); 1n ) = 1] < "(n): Therefore P is TD(n); "(n){computationally private according to Denition 11. Now let P be TD (n); "(n){computationally private according to Denition 11. Assume, towards a contradiction, that P is not TD (n); "(n){computationally private according to Denition 12. In other words, there is a distinguisher D running in time TD (), there is an innite sequence of natural numbers n1; n2; : : : and for every nj in the sequence there is a pair of indices i1(nj ); i2(nj ) 2 [nj ] such that Pr[D(Yn;i1 (n ); 1n ) = 1] ? Pr[D(Yn;i2 (n ); 1n ) = 1] "(n): j j 34 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 However, the two innite sequences (i1(n1); i1(n2); : : :) and (i2(n1); i2(n2); : : :) can both be easily extended into two index functions i1; i2 : IN ?! IN . These two index functions contradict the assumption that P is TD(n); "(n){computationally private according to Denition 11. In this work we restrict ourselves to PIR schemes that have certain properties which we discuss below. First and foremost we only discuss two server schemes, that is k = 2. All other properties are shared by both the computationally private schemes that we construct, and the information theoretically private schemes that we use. 1. Symmetry of the servers, which means that the following conditions hold: (a) For every n 2 IN and for every 1 i n, the two distributions D1(n; i) and D2(n; i) are identical. In other words, the queries sent to DB1 and DB2 are drawn from the same distribution. (b) Let A1(qu; x) and A2(qu; x) denote the responses of DB1 and DB2 to the query qu. Then for every qu, A1(qu; x) = A2(qu; x). It is easy to transform any one round PIR scheme P into a scheme Q that has this property. The user in Q ips a coin and sends the result to both servers. If a server receives 0 it maintains its identity, and if 1 it \switches its identity" (DB1 is considered to be DB2 and vice versa). Now the parties execute the scheme P . Q maintains symmetry of the servers at the cost of two extra bits of communication. 2. Each server replies to any query of the right length q(n), even if the query does not have the correct form. Again, transforming a scheme which does not have this property to one that does is easy. 2.4 Final Notation We denote the problem of allowing the user to privately retrieve any of the n bits that DB1 and DB2 hold by P IR(n). We denote by B2 the best scheme (in terms of communication complexity) that solves P IR(n), maintains information-theoretic privacy, and is executed in one round of communication. The most ecient scheme of this type known to date [25] has communication complexity O(n1=3). If x; y denote two binary strings of the same length, then x y denotes their bitwise exclusive or (assuming that they are of identical length). For any two strings 35 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 x; y we denote by xky the concatenation of y to x. Given a set S and an element a we denote: ( L S a = SS [n ffaagg ifif aa 262 SS We denote by e`(n) (1 ` n) the `-th unit vector of length n { an n bit binary string with 0 entries everywhere except 1 in the `-th position. 2.5 Symmetrically private information retrieval In this variant of PIR, introduced in [38], the data privacy is also protected. The user is allowed to retrieve a single data item but no other information (such as partial information on subsets of data items). The original denition appears in [38] and is analogous to PIR denitions. We slightly generalize it by dening symmetrically private information retrieval (SPIR) of blocks instead of single bits. We dene both information-theoretically private and computationally private SPIR schemes. An important dierence between PIR and its symmetric counterpart SPIR is that multi-server SPIR schemes (k > 1) require that the servers have a source of shared randomness. We model this by adding (the same) random input rs to all the servers, while denoting the user's random string by ru. Denition 13 : A one round, k-server SPIR of blocks scheme P maintaining information-theoretic privacy is a trio of algorithms: Q(n; `; i; ru): The query algorithm outputs a k-tuple of queries (q1; : : : ; qk ). A(j; qj ; xkrs): The answer algorithm outputs an answer aj . R(n; `; i; ru ; a1; : : :; ak ): The reconstruction algorithm outputs a single block of ` bits. We say that P maintains information-theoretic privacy if it follows the same steps as the one round PIR scheme presented in Denition 1, satises the correctness and privacy requirements of that denition and furthermore satises a third requirement: Data privacy: Let Q be a deterministic query algorithm (modeling a possibly dishonest user U that does not run Q). Let A(x) denote the distribution of the answers of all servers, where Q generates the queries (q1; : : :; qk ). Then, there exists an index i 2 [n] such that for any x; x0 2 f0; 1g`n if xi = x0i (the i-th block is equal) then A (x) and A(x0) are identical (in other words, even a dishonest user is able to learn only that one element, xi and nothing else). 36 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Let TD : IN ?! IN , and " : IN ?! [0; 1]. We say that P maintains TD ; "{ computational security if it follows the same steps as the one round CPIR scheme presented in Denition 11, satises the correctness and privacy requirements of that denition and furthermore satises a third requirement: Data privacy: Let Q be a deterministic query algorithm. Let A(x) denote the distribution of the answers of all servers, where Q generates the queries q1; : : :; qk . Then, for every distinguisher D running in time TD () there is a c0 , such that for every n; ` 2 IN; n` c0 there exists an index i 2 [n] such that for any x; x0 2 f0; 1g`n if xi = x0i (the i-th block is equal) then Pr[D(A(x); 1n` ) = 1] ? Pr[D(A(x0); 1n` ) = 1] < "(n`) where the probability is taken over the distributions A(x); A (x0), and over the coin tosses of D. 2.6 General multi-party computation denitions In this section we dene private multi-party computation in a much broader sense than we did in the PIR denitions. We used a limited framework for PIR because all the known results t nicely within this framework and furthermore the proofs are simpler. We do emphasize that the PIR denitions we presented are a specic case of the multi-party computation denitions we now show. Any PIR scheme which is private according to the PIR denitions is also private according to the general denitions. We adopt the framework presented by Canetti in [19], which itself drew on a variety of sources [5, 18, 43, 61]. The essential ingredient in dening the security of a protocol is comparing an adversary's view of the "real" protocol to his view during the execution of an "ideal" protocol. The adversary we dene can be either passive or active1, and either computationally unbounded (for the case of information-theoretic privacy) or computationally bounded. Our denitions refer explicitly to an active adversary, since a passive adversary is a particular case of an active one who decides to follow the protocol. We proceed to formally dene the various aspects of a private multi-party computation in our context. 2.6.1 Basic denitions Function: The goal of the protocol is to compute a k-input and k-output probabilistic We sometimes refer to protocols that withstand an active adversary as "secure", while keeping the label "private" for those that deal only with a passive adversary. 1 37 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 function2 f . The notation we use is f : Domk Rf ?! E k , where D is the domain and E the range for each party's input and output respectively. Rf denotes the random input of the function. The participants: are k parties which are modeled as k interactive and probabilistic Turing machines P1; : : :; Pk . Pj holds an input aj 2 Domj and some random input rj . Communication: All the messages are assumed to be sent over secure channels. Real-life model: The real-life adversary is characterized by an interactive Turing machine A which describes the behavior of corrupted parties and by the adversary structure C 2 2[k]. The adversary can corrupt any subset T 2 C . It starts o the protocol with the inputs of the corrupted parties and an auxiliary input z, which models information learned during the execution of previous protocols and is especially important for composing protocols. The computation proceeds in rounds of communication. At each such round the adversary outputs the messages sent by the corrupted parties (possibly after receiving the messages sent by uncorrupted parties in the same round). At the end of the computation all the parties locally generate their output. The adversary also outputs its entire view of the protocol's execution. That view includes the inputs it started with and the messages sent and received by the corrupted parties. We use the following notation for the various random variables involved in the protocol. ADV (A; a; z; r) denotes the output of adversary A and EXEC (A; a; z; r)i denotes the output of Pi . a = a1; : : : ; ak denotes the input vector, r = r1; : : : ; rk denotes the random input vector and z denotes the adversary's auxiliary input. Let EXEC (A; a; z; r) = (ADV (A; a; z; r); EXEC (A; a; z; r)1; : : :; EXEC (A; a; z; r)k ): Finally, let EXEC (A; a; z) denote the random variable describing EXEC (A; a; z; r), where r is chosen uniformly at random. Ideal model: The adversary in the real model is again characterized by an interactive Turing machine S (which is also randomized) and by the adversary structure A. The model also includes an external trusted party T . Each of the parties Pi (1 i n) begins the computation with an input ai (they do not need a random input here). The adversary S begins the computation with the inputs of the corrupted parties, a random input and the auxiliary input. Each party hands its input to T (the inputs of corrupted parties may be arbitrarily changed by the adversary if it is active). Let b = b1; : : :; bk be the input vector received by T . The trusted party proceeds to compute f (b) and sends f (b)i to Pi . Each Pi outputs the value received (corrupted parties may output some arbitrary string). S outputs 2 In PIR the computed function is not probabilistic but in joint generation of RSA keys it is. 38 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 all the information gathered during the computation: its inputs, the inputs of the corrupted parties and the function values received by the corrupted parties. We use the following notation: ADV (S ; a; z; r) denotes the output of adversary S and IDEAL(S ; a; z; r)i denotes the output of Pi . a = a1; : : : ; ak, denotes the input vector, z denotes the adversary's auxiliary inputr = r denotes the random input of the adversary, r. Let IDEAL(S ; a; z; r) = (ADV (S ; a; z; r); IDEAL(S ; a; z; r)1; : : :; IDEAL(S ; a; z; r)k ): Finally, let IDEAL(S ; a; z) denote the random variable describing IDEAL(S ; a; z; r), where r is chosen uniformly at random. Comparing computations: We require that for every adversary A in the reallife model there exists an adversary S in the ideal model that emulates A. That is, the output distribution of the real-life protocol, EXEC (A; a; z), is similar to that of the output distribution of the ideal protocol, IDEAL(S ; a; z). This similarity is interpreted as identity of distributions (in the information-theoretic setting) or indistinguishability of probability ensembles (in the computational setting). Furthermore, we require that the computational complexity of S be comparable to (usually that means polynomial in) the computational complexity of A3. The formal denition in the information-theoretic setting follows. Denition 14: Let f : Domk Rf ?! E k and let be a protocol for k parties. We say that C -securely computes f in the information-theoretic setting if for any real-life adversary A with adversary structure C there exists an ideal model adversary S whose running time is polynomial in the running time of A, such that for every input vector a, and every auxiliary input z EXEC (A; a; z) = IDEAL(S ; a; z): In order to provide the denition for the computational setting, we have to dene appropriate probability ensembles. Let Domk Domk denote all the input vectors that are encoded by bits. We view EXEC (A; a; z) as an ensemble of probability distributions. For every 2 IN we denote by EXEC (A; a; z) the restriction of EXEC (A; a; z) to all the inputs a 2 Domk . We abuse the notation somewhat here and use EXEC (A; a; z) = fEXEC(A; a; z)g2IN to denote an ensemble of distributions (EXEC (A; a; z) was originally used as a distribution and EXEC (A; a; z) as a conditional distribution.). Similarly we dene IDEAL(S ; a; z) as an ensemble of probability distributions. The formal denition in the computational setting follows. 3 On this and other subtle points in the denition see more in [19]. 39 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Denition 15: Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let f : Domk Rf ?! E k and let be a protocol for k parties. We say that C -securely computes f in the computational setting if for any real-life adversary A with adversary structure C there exists an ideal model adversary S whose running time is polynomial in the running time of A, such that for every input vector a, and every auxiliary input z The ensembles EXEC (A; a; z) and IDEAL(S ; a; z) are TD(); "(){computationally indistinguishable. 2.6.2 Composition of protocols An important requirement we want private protocols to fulll is that of modular composition. Given a simple task for which we designed a private protocol, we can solve a more dicult problem by using the rst protocol as a sub-routine. In order to formalize this notion we dene a semi-ideal model in which a function g is computed by using protocols for m functions f1; : : :; fm as sub-routines. A trusted party can compute each of f1; : : : ; fm but nothing else. The semi-ideal model is a stepping stone between the real-life model, in which k parties compute g by using as sub-routines protocols that compute f1; : : :; fm, and the ideal model in which g is computed by a trusted party. The semi-ideal model: We start by considering the real-life model of subsection 2.6.1 for the computation of a function g. the model is augmented by a trusted party T that computes the functions f1; : : :; fm. At special communication rounds a function fj (1 j m) is specied and an ideal model computation of it takes place. That is, the parties hand the inputs (for fj ) to T , who computes the value of fj at the given point and returns the appropriate output to each party. Let f = (f1; : : : ; fm). We refer to the semi-ideal model we specied as the f-ideal model. Let EXEC (A; a; z; f) denote the random variable describing the output of the f-ideal model, with adversary A, input a, auxiliary input z, and access to a trusted party that can compute any of f1; : : : ; fm. A protocol in this model is not a real-life protocol as it uses a trusted party. however, its security is dened in comparison with the ideal model computation of g as in subsection 2.6.1. Denition 16: Let g : Domk Rf ?! E k and let be a protocol for k parties in the f-ideal model. We say that C -securely computes g in the information-theoretic (computational) setting in the f-ideal model if for any adversary A in the f-ideal model, with adversary structure C there exists an ideal model adversary S whose running time is polynomial in the running time of A, such that for every input vector a, and every auxiliary input z The distributions EXEC (A; a; z) and IDEAL(S ; a; z) are identical (the ensembles EXEC (A; a; z) and IDEAL(S ; a; z) are computationally 40 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 indistinguishable4 ). Real-life model: We replace the trusted party in the f-ideal model with m proto- cols 1; : : : ; m. The protocol j securely computes the function fj in the appropriate setting. That is either computationally or information-theoretically as required, and given the same adversary structure C . Each call to T is replaced by an invocation of one of the protocols 1; : : :; m. While j is running, the protocol remains inactive. The output of j is treated by each party as the value returned by T . Let = (1; : : : ; m). We denote by the protocol in which each evaluation of fj by T is replaced by an invocation of j . The point of the modular composition concept is the following theorem proved in [19]. Informally, it states that a secure protocol in the semi-ideal model is also secure in the real-life model, if the trusted party (that computes only f1; : : :; fm) is replaced by secure protocols. The only limit is that at most one sub-routine protocol is invoked at each communication round. Theorem 1: (Canetti [19]) let f1; : : : ; fm; g be k-input and k-output functions. Let be a k party protocol that C -securely computes g in the information-theoretic (computational) setting in the f-ideal model. Assume that no more than one ideal evaluation call is made at each communication round. Let 1 ; : : :; m be k-party protocols that C -securely compute f1; : : : ; fm respectively in the information-theoretic (computational) setting. Then, the protocol C -securely computes g in the informationtheoretic (computational) setting. 2.7 private information retrieval by keywords The PrivatE information Retrieval by KeYwords (PERKY) problem is dened informally as follows. Let DB1 ; : : :; DBk be k non-communicating servers, each holding a copy of n binary strings s1 : : : sn 2 f0; 1g` . A user, U , holds a binary string w 2 f0; 1g` and wishes to nd out whether w = si for some i 2 [n]. No server may learn any information about w. Given such a protocol it is possible (with a slight overhead) to retrieve any data associated with w (such as its address in the database). Supercially PERKY seems very similar to PIR and the formal denitions should therefore also be similar. However, there are some important dierences in the solutions to the two problems. The rst dierence is that unlike PIR schemes, the PERKY schemes we construct employ multiple rounds of communication. The second is that the solutions we present are reductions to PIR schemes and are dependent on the We abuse the notation here, denoting distributions and ensembles identically, in the same way we did in subsection 2.6.1. 4 41 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 security of these schemes. We therefore require modular composition theorems. In order to formally dene the problem we adopt the framework of multi-party protocols as presented in section 2.6. Notation 2: For any ` 2 IN the function f` : f0; 1g` 2f0;1g ?! f0; 1g is dened by: ( f`(w; fs1; : : :; sng) = 10 ifif ww 262 ffss1;; :: :: :;:; ssn gg 1 n Denition 17: The private information retrieval by keywords problem is parame` terized by 3 natural numbers `; n; k and is comprised of the following elements: Parties: A user U and k servers DB1; : : : ; DBk . Function: The k + 1 party function computed receives as input w 2 f0; 1g` and k copies of s1; : : : ; sn , n binary strings, each of length `. The output of the function is (f` (w; fs1; : : : ; sng); ?; : : :; ?), that is, only the user receives the result. Adversary structure: is the set C = ffDB1 g ; : : :; fDBk gg. Adversary: The adversary is considered to be passive (honest but curious), and is either computationally bounded or not. The symmetrically private information retrieval by keywords problem has the same relation to PERKY that SPIR has to PIR. That is, the user learns whether w 2 fs1; : : :; sng but is not allowed to gain any other information about the database. As in SPIR, in any multi-server scheme, the servers share a source of randomness, which we model as a string rs that is concatenated to the database. Denition 18: The symmetrically private information retrieval by keywords problem is parameterized by 3 natural numbers `; n; k and is comprised of the following elements: Parties: A user U and k servers DB1; : : : ; DBk . Function: The same function as in Denition 17. Adversary structure: is the set C = ffUg ; fDB1 g ; : : :; fDBk gg. Adversary: The adversary is considered to be passive if it controls one of the servers but can be either passive or active if it controls the user. The adversary is either computationally bounded or not. 42 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 This denition does not fall completely within the framework described in section 2.6. Hitherto we assumed that the adversary is either active or passive regardless of the corrupted parties. However, here we consider protocols in which the user may be active if it corrupts a certain party (the user), but is not allowed to be active if any other party is corrupted. This is not a problem because of the adversarial setting we allow. Since the adversary is non-adaptive and may control only a single party, it has to decide prior to the execution of the protocol which party is to be corrupted. Thus we can view our protocols as withstanding two types of adversaries and adversary structures. The rst is a passive adversary with adversary structure C = ffDB1 g ; : : :; fDBk gg. The second is an active adversary with adversary structure C = ffUgg. Notation 3: In SPIR and SPERKY schemes we refer by user privacy to the property of maintaining privacy versus an adversary who corrupts one of the servers. We refer by data privacy to the property of maintaining privacy against an adversary who controls the user. Notation 4: PIR: Let the problem of privately retrieving a block of ` bits, from a database of n blocks held by k non-communicating servers be denoted by PIR(`; n; k). The problem PIR(`; n; k) where ` = 1 is denoted by PIR(n; k). SPIR(`; n; k) (SPIR(n; k)) denote the problem of symmetrically private information retrieval for the same parameters as in PIR(`; n; k) (PIR(n; k)). PERKY: PERKY(`; n; k) (PERKY(n; k)) denotes the problem of private information retrieval by keywords where the database is held by k servers and is comprised of n keywords, each of length `. SPERKY(`; n; k) (SPERKY(n; k)) denotes the problem of symmetrically private information retrieval by keywords for the same parameters as in PERKY(`; n; k) (PERKY(n; k)). Notation 5 : Let P be a communication protocol that solves one of the problems PIR(`; n; k), SPIR(`; n; k), PERKY(`; n; k) or SPERKY(`; n; k). We denote its communication complexity by CP (`; n; k). If ` = 1 we denote the communication complexity of P by CP (n; k). On occasion we distinguish between the user's communication complexity denoted by P (`; n; k) (P (n; k) if ` = 1), and a server's communication complexity denoted by P (`; n; k) (P (n; k) if ` = 1). Notation 6: Let P be a one round PIR or SPIR scheme. We denote by QP ; AP and RP the query, answer and reconstruction algorithms of P . 43 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 2.8 Joint generation of RSA keys Recall the denitions of the RSA cryptosystem [70]. The system includes a public key, which is a pair N; e and a private key d. N is called the RSA modulus and is a product of two primes, N = PQ. e is called the public exponent (or encryption exponent) and is chosen so that gcd((N ); e) = 1, where denotes Euler's totient function ((n) = jm1 m < n; gcd(m; n) = 1j). d is called the private exponent (or decryption exponent) and is chosen so that de 1 mod (N ). The idea in joint generation of RSA keys is that k parties execute a multi-party protocol and jointly produce the public key N; e (which everyone receives) and share the private key. That P k is, the j -th party holds dj such that j=1 dj = d. No t-coalition of the parties can get any information on d through their participation in the protocol5. We focus on the case of 2 parties with privacy threshold t = 1. The formal denition follows. Denition 19 : Joint generation of RSA keys by two parties is comprised of the following elements: Parties: There are 2 parties, Alice and Bob. Function: The only input is a security parameter . Alice's output is N; e; da and Bob's output is N; e; db . The public modulus N is chosen uniformly at random6 out of all the products of two primes N = PQ such that the binary representation of P and Q is of length =2. e is any number such that gcd((N ); e) = 1. da and db are distributed uniformly at random among all pairs such that da + db = d7. Adversary structure: is the set C = ffAliceg ; fBobgg. Adversary: The adversary is assumed to be passive and computationally bounded. Privacy is maintained in the computational sense. Given e a computationally unbounded party can compute the unique corresponding d. 6 The distribution achieved in all the known protocols is not exactly uniform but is "close enough", see chapter 5. 7 Once again, our protocol does not achieve exactly the desired distribution, but one that is "close enough". 5 44 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Chapter 3 Computationally private information retrieval In this chapter we investigate the notion of a correlated pseudo-random generator and present a construction of such a generator. We then proceed to use the correlated generator to construct computationally private information retrieval schemes. We show two such schemes: one which is a direct consequence of our correlated generator, and another slightly more ecient scheme, which uses similar ideas. Both schemes are very ecient in terms of communication complexity. 3.1 Correlated Pseudo Randomness In this section we formally dene the notion of a \correlated pseudo random generator", and then describe a construction of such a generator. We are motivated by the problem of sending two n bit strings, one to each of two parties, such that the following requirements are met: The two n bit strings dier at exactly one bit, whose location is part of the input of the problem. The communication complexity is as low as possible. The communication received by a single party (which describes one n bit string) reveals no information, in the computational sense, about the index in which the two strings dier. A correlated pseudo-random generator consists of two succinct representation generators SREP1 and SREP2 , and an expansion algorithm G. The generators SREP1 ; SREP2 receive as input an index ` 2 [n] and produce two correlated short 45 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 strings u and w, which are of length (n) < n. Applying G to these two \succinct representations", one gets longer strings G(u; 1n ) and G(w; 1n ), both of length n. These two strings dier at exactly one bit whose location is `. The strings u and w are referred to as succinct representations, because they fully describe the n bit strings G(u; 1n) and G(w; 1n ), but are much shorter. An important property of SREP1 (SREP2 ) is, informally speaking, that its output u (w) is distributed pseudo-randomly for any input1 `. The construction of the correlated pseudo random generator we describe is based on any \conventional" pseudo random generator GEN . 3.1.1 Denitions Denition 20: Let : IN ?! IN so that (n) < n for every n, let TD : IN ?! IN , and let " : IN ?! [0; 1]. A TD(n); "(n) correlated pseudo random generator consists of three algorithms: two succinct representation generators SREP1 ; SREP2 and an expansion algorithm G. SREP1 ; SREP2 get inputs 1n , ` (1 ` n), and r (a random input of appropriate length) and produce as output u = SREP1 (1n ; `; r), w = SREP2 (1n ; `; r), both of length (n). G gets as input 1n and a string z of length (n), and outputs a string G(z; 1n ) of length n. The succinct representation generators and the expansion algorithm should satisfy the following properties: correlation: For any n 2 IN; ` 2 [n] and r, the n bit strings G(SREP1 (1n ; `; r); 1n ) and G(SREP2 (1n; `; r); 1n ) dier at exactly the `-th bit. TD (n); "(n) index indistinguishability: For any index function `() the ensemble of distributions fSREP1 (1n ; `(n); r)gn2IN (r is chosen at random) is TD (n), "(n){pseudo random and the same holds for fSREP2(1n ; `(n); r)gn2IN (in this case `ength(n) = (n) ). Eciency: SREP1 (1n ; `; r) and SREP2(1n ; `; r) can be computed in poly(n) many steps. It would suce if the output distribution of SREP1 given as input `1 and its output distribution given as input `2 would be indistinguishable (for large enough values of n). Our construction satises the more stringent requirement that each of these distributions can not be distinguished from the uniform distribution. 1 46 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 The correlated generator is called pseudo random if the TD(n); "(n) index indistinguishability property is replaced by Standard index indistinguishability: For any index function `() the two ensembles of distributions fSREP1(1n ; `(n); r)gn2IN and fSREP2 (1n ; `(n); r)gn2IN (where r is chosen at random) are pseudo random. The generator is non-uniformly TD (n), "(n){pseudo random if the TD (n); "(n) index indistinguishability property is replaced by Non-uniform TD (n); "(n) index indistinguishability: For any index function ` the ensembles fSREP1 (1n; `(n); r)gn2IN and fSREP2 (1n ; `(n); r)gn2IN (where r is chosen at random) are non-uniformly TD(n); "(n){pseudo random. We now spell out some of the attributes of the above denition. The length of the succinct representations (n) is required to be smaller then n. The reason is that (n) is the communication complexity of our CPIR scheme and is the main parameter we wish to minimize. On the other hand the length of the random input, r, is not restricted. While in general we would like this quantity to be substantially shorter than n, we do not require this explicitly as a part of the denition. We remark that property Index indistinguishability, which asserts that the ensembles fSREP1(1n ; `(n); r)gn2IN and fSREP2 (1n ; `(n); r)gn2IN (where r is chosen at random) are pseudo random, does imply in particular that for dierent `1() and `2(), the corresponding ensembles are indistinguishable. Finally, it seems natural to make the stronger requirement that the correlated generator achieve the following: Pseudo random output: for any index function ` : IN ?! IN the probability ensemble fG(SREP1 (1n ; `(n); r); 1n )gn2IN , where r is chosen uniformly at random, is TD(n); "(n){pseudo random, and likewise for fG(SREP2 (1n ; `(n); r); 1n )gn2IN . Let GEN (s; 1n ) be any TD(n); "(n){pseudo random generator with stretch (n) to n. Given the correlated generator (SREP1; SREP2 ; G) it is easy to construct a dierent correlated generator (SREP10 , SREP20 , G0) that has this third property. The succinct representation generators SREP10 , SREP20 receive as input (1n ; `; rks) where n; `; r are as described earlier and s 2 f0; 1g(n) is chosen uniformly and independently of all the other elements. The output of the succinct representation generators is SREP10 (1n ; `; rks) = SREP1 (1n ; `; r)ks, SREP20 (1n ; `; rks) = SREP2(1n ; `; r)ks. The output of the expansion algorithm on input z is dened to be G0(z; 1n ) = G(z; 1n ) GEN (s; 1n ). Thus, (SREP10 ; SREP20 ; G0) satises all three properties at a small overhead compared to (SREP1 ; SREP2; G). 47 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 3.1.2 Statement of Main Results We present three versions of our main theorem. All the versions are based on the same construction which appears in Section 3.1.3. This construction makes use of a pseudo-random generator GEN which is used as a building block of a correlated generator (SREP1; SREP2 ; G). The three versions dier in the assumptions we make about GEN . In the rst version we assume that GEN is a standard pseudo-random generator (Denition 9). In other words, it stretches a seed by a polynomial factor, and is resistant to any polynomial time distinguisher. In the second version (which we call the quantitative version) we assume that GEN is a general TD (n); "(n){ pseudo random generator, as in Denition 8. Finally, in the third version we make a stronger assumption about GEN . Namely, that even a (possibly non-uniform) family of circuits cannot distinguish between its output and the uniform ensemble with non-negligible probability. Theorem 2 : [Standard version] Let c`en > 1 be a constant and let GEN be a pseudo random generator with stretch n1=(2c`en) to n. For every constant c`en > 1 there exists a correlated pseudo random generator that has the following properties: The length of the strings produced by each succinct representation generator on input (1n ; `; r) is (n) = O(n1=c`en ). The length of the random input, r, required by each succinct representation generator on input (1n ; `; r), is O(n1=(2c`en) ). Theorem 3: [Quantitative version:] Let TD : IN ?! IN , and let " : IN ?! [0; 1]. Let GEN be a TD (n); "(n){pseudo random generator with stretch (n) to n. Let t(n) = 8(1="(n))2 ln 16(n="(n)). Then there exists a correlated TD(n); "(n){pseudo random generator (SREP1 ; SREP2 ; G) such that for q s 2 n n 4 1 + 2 ln 2 log (n)+1 ? 1 (n) = 2 log ln 2 (n) TD (n) = T2t((nn)) ? (2(n) + 23 )TGEN (n) ? 2(n) and "(n) = 2n (n)"(n): The length of the strings produced by each succinct representation generator on input (1n ; `; r) is (n) (n) (n) 2(n) . The length of the random input, r, required by each succinct representation generator on input (1n ; `; r) is: (n) (n)(n) + 2(n) 2(n) . D 48 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Theorem 4: [Non-uniform version:] Let TD : IN ?! IN , and let " : IN ?! [0; 1]. Let GEN be a non-uniform TD (n); "(n){pseudo random generator with stretch (n) to n. Then there exists a correlated non-uniform TD (n); "(n){pseudo random generator (SREP1 ; SREP2; G) that has the same properties as stated in Theorem 3 except for the following: TD (n) = TD (n) ? 2(n)TGEN (n) ? 2(n) ? log n and "(n) = (n)"(n): An interesting observation is that the security assurance provided by Theorem 4 is signicantly better than the one provided by Theorem 3. Or in other words, a higher level of security can be proved if GEN is resistant to (possibly non-uniform) circuits. If GEN is a non-uniform TD (n); "(n){pseudo random generator then the constructed correlated pseudo-random generator can withstand distinguishers with a longer running time than would otherwise be the case (i.e. if GEN a TD(n); "(n){pseudo random generator). The dierence in running time is roughly by a multiplicative factor of 2t(n), where t(n) is as dened in Theorem 3. Furthermore, a distinguishing algorithm has a lower probability of distinguishing between the output of a correlated pseudo-random generator constructed with a non-uniform TD(n); "(n){pseudo random generator than one constructed with a TD (n); "(n){pseudo random generator. The dierence is a multiplicative factor of 2n. Notice that the succinct representation length and the quality of the indistinguishability depend primarily on the ratio of (n) to n. We begin by describing the construction of the generator, and then prove Theorems 2, 3 and 4 by a series of lemmata. 3.1.3 Construction We construct (SREP1 ; SREP2; G) using a series of \intermediate generators", which we denote by (SREP10; SREP20; G0), (SREP11 ; SREP21; G1),: : :. Recall that a correlated pseudo-random generator as dened in Subsection 3.1.1 receives three parameters as input: (1n ; `; r). 1n is the security parameter and the output length of the expansion algorithm G, ` is the index at which the two output strings dier, and r is the random input. In contrast, the intermediate generator (SREP1q ; SREP2q ; Gq ) receives four parameters as input (1n ; nq ; `q ; rq ). The security parameter remains 1n , but the nal output length is nq . This value indicates the length of the strings that are produced by Gq , and is in general smaller than n. Specically, let uq` = SREP1q (1n ; nq ; `q ; rq ) and w`q = SREP2q (1n ; nq ; `q ; rq ) denote the outputs of the two intermediate generators. Then both Gq (uq` ; nq ; 1n ) and Gq (w`q ; nq ; 1n) are of length nq , and they are identical except at the `q -th bit. The correlated generator we construct, (SREP1 ; SREP2; G), on input (1n ; `; r), performs a single operation: invoking an appropriate intermediate generator with q q q q 49 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 parameters (1n ; n; `; r). If the generator GEN is assumed to be a standard pseudorandom generator an intermediate generator (SREP1c ; SREP2c ; Gc) is invoked for some constant c. If GEN is assumed to be a (possibly non-uniform) TD (n); "(n){ pseudo random generator with stretch factor (n) to n,qthe intermediate generator (SREP1(n) ; SREP2(n); G(n)) is invoked (where (n) 2 log (nn) ). We now dene the intermediate generators by induction on q. Notationq 7: We denote by q (nq ) the output length of SREP1q (1n ; nq ; `q ; rq ) (and of SREP2 on the same input). Induction basis: On input (1n ; n0; `0; r0), the generator (SREP10; SREP20; G0) does the following. SREP10(1n ; n0; `0; r0) = r0 while SREP20 (1n; n0; `0; r0) = r0 e`0 (n0). G0 is simply the identity function (G0(z; n0; 1n ) = z). Inductive step: qLet q 1, and assume that the constructions of the algorithms q ? 1 ? 1 SREP1 , SREP2 ; and Gq?1 are given. We construct SREP1q ; SREP2q ; Gq . Let 1 mq nq be a parameter (whose precise value depends on n and q, and will be determined later). View the elements of the set [nq ] as the coordinates of a two dimensional matrix in which there are mq rows and nq?1 =4 dnq =mq e columns. Let the `q -th element be the (i; `q?1)-th entry in the mq -by-nq?1 matrix (`q 2 [nq ]). The succinct representation generators SREP1q ; SREP2q use a random input rq which consists of three parts: 1. A random subset S q [mq], chosen uniformly (mq bits). 2. mq +1 random independent seeds for the generator GEN , with security parameter n, chosen uniformly s1; : : :; si?1; s1i ; s2i ; si+1; : : : ; sm : The domain of these seeds is f0; 1g(n) , so overall (mq +1)(n) bits are required to represent this part. 3. An appropriate random input rq?1 for SREP1q?1 ; SREP2q?1 . We invoke SREP1q?1 (1n ; nq?1; `q?1; rq?1) and SREP2q?1 (1n; nq?1; `q?1 ; rq?1) to produce the two correlated succinct representations u`q??11 ; w`q??11 (of length q?1(nq?1)) respectively. Recall that by assumption Gq?1 (u`q??11 ; nq?1 ; 1n) Gq?1(w`q??11 ; nq?1 ; 1n) = e` ?1 (nq?1 ) is a unit vector of length nq?1. By truncating extra bits, we can view GEN as expanding strings of length (n) to strings of length q?1(nq?1 ). We dene the correction words cw1 ; cw2: ( q?1 (s1i ; 1n) if i 2 S q cw1 = uw`q??11 GEN 2 n = Sq ` ?1 GEN (si ; 1 ) if i 2 q q q q q q 50 q q Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 ( q?1 GEN (s2 ; 1n ) if i 2 S q i cw2 = wuq`??11 GEN 1 ; 1n ) if i 2 ( s = Sq ` ?1 i The output uq` ; w`q of the two succinct representation generators SREP1q ; SREP2q is dened as follows: q q q q uq` =4 s1k : : : ksi?1ks1i ksi+1k : : : ksm kcw1kcw2kS q w`q =4 s1k : : : ksi?1ks2i ksi+1k : : : ksm kcw1kcw2kS q i q q q q The length of each succinct representation is q (nq ) =4 mq ((n) + 1) + 2q?1 (nq?1): (3.1) We now dene the expansion algorithm Gq . A succinct representation of the appropriate length q (nq ) is parsed as s = s1k : : : ksm kcw1kcw2kT : q The rst part consists of mq seeds s1; : : :; sm , all in f0; 1g(n), that can be expanded by GEN to strings of length q?1(nq?1) each2. The last part in the succinct representation is a string of length mq which we interpret as a subset T [mq] of indices. The two correction words are simply strings of length q?1(nq?1). We now dene, for h = 1; : : :; mq ( cw1 GEN (sh ; 1n) if h 2 T vh = cw =T 2 GEN (sh ; 1n ) if h 2 On input z; nq ; 1n , the output of Gq is the string q Gq (z; nq ; 1n ) = Gq?1 (v1; nq?1; 1n )k : : : kGq?1(vm ; nq?1; 1n ): q 3.1.4 Proof of correlation In this subsection we show that the output of G is indeed a pair of n bit strings that dier only in the `-th bit. For the sake of brevity we slightly abuse the notation in the next lemma, and use Gq (z) to denote Gq (z; nq; 1n ) (and similarly Gq?1 (z) to denote Gq?1 (z; nq?1; 1n )). Lemma 2: For every q (q = 0; 1; : : :), n; nq ; `q and rq , if uq` = SREP1q (1n ; nq ; `q ; rq ) q q and w` = SREP2 (1n ; nq ; `q ; rq ), then Gq (uq` ) Gq (w`q ) = e` (nq ) : q q q 2 q q They are actually expanded by GEN to length n, but we only need the rst q?1(nq?1 ) bits. 51 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Proof: We prove the claim by induction on q. For q = 0, by construction G0(u0`0 ) G0 (w`00 ) = u0`0 w`00 = r0 (r0 e`0 (n0)) = e`0 (n0) : We now assume that for every j 2 [nq?1] we have Gq?1(ujq?1) Gq?1 (wjq?1) = e` ?1 (nq?1 ), and prove the hypothesis for Gq . Let `q 2 [nq ], and assume that it is represented in the mq nq?1 matrix as (i; `q?1). We denote the output of Gq in the following way: Gq (uq` ) = U1k : : : kUm and Gq (w`q ) = W1k : : : kWm , where U1; : : :; Um ; W1; : : : ; Wm are all in f0; 1gn ?1 . Let h be a value in the range 1; : : : ; mq. By the denition of Gq there are two strings yh; zh 2 f0; 1g ?1 (n ?1) such that Uh = q q q q q q q q q q Gq?1 (yh) and Wh = Gq?1(zh) (we referred to both yh and zh as vh in Section 3.1.3). We begin by dealing with the case h 6= i and show that yh = zh . The last element of uq` is the set S q [mq] and the last element of w`q is the set S q fig [mq]. Since h 6= i, h 2 S q if and only if h 2 S q fig. Therefore yh = cw1 GEN (sh ; 1n ) () zh = cw1 GEN (sh; 1n ) and yh = cw2 GEN (sh; 1n ) () zh = cw2 GEN (sh ; 1n). Hence, yh = zh and as a consequence Uh = Wh . In the second case h = i, and i is an element of exactly one of the sets S q or S q i. For both possibilities, i 2 S q and i 62 S q , yi = cw1 GEN (s1i ; 1n ) = u`q??11 and zi = cw2 GEN (s2i ; 1n ) = w`q??11 . By assumption Gq?1 (u`q??11 ) Gq?1 (w`q??11 ) = e` ?1 (nq?1). If we denote by o the all zero vector of length nq?1 we have: Gq (uq` ) Gq (w`q ) = (U1k : : : kUm ) (W1k : : : kWm ) = Gq?1 (y1) Gq?1(z1)k : : : kGq?1(ym ) Gq?1 (zm ) = ok : : : koke` ?1 (nq?1)kok : : : ko = e` (nq ) : q q q q q q q q q q q q q q q 3.2 Lengths of input and output In this section we analyze the stretching properties of the succinct representation generators, SREP1 and SREP2. For every q, the length of the outputs of the succinct representation generators SREP1q ; SREP2q depends on the value of the parameter mq. Our aim is to choose a value for each mq so that the overall succinct representation length (n) is minimized. This is important in out application because (n) is the communication cost of our protocol, and that is the parameter we wish to minimize. After setting mq we discuss the number of random bits that the succinct representation generators require as input. This resource is not as important in our context as the length of the output. We analyze it in order to obtain a clearer picture of the properties of our construction. 52 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Lemma 3 : For every n; (n) and every q (q = 0; 1; 2; : : :) there is a choice of mq such that the succinct representations produced by SREP1q (1n ; `q ; nq ; rq ) and SREP2q (1n ; `q ; nq ; rq ) are of length 1 q (nq ) = 2 2 (q + 1)((n) + 1) +1 nq +1 : Proof: We begin by proving that for every q (q = 0; 1; 2; : : :) the minimal succinct representation length is of the form 1 q (nq ) = cq ((n) + 1) +1 nq +1 (3.2) where cq is a constant whose value depends only on q and not on (n) or nq . The proof is by induction on q. For q = 0, by construction, 0(n0) = n0, so by setting c0 = 1 we have the right form 0(n0) = 1 n10((n)+1)0. By Equation 3.1 the relation between q (nq ); q?1(nq?1 ) and mq is given by: q (nq ) = mq ((n) + 1) + 2q?1 (nq?1) : Since q?1(nq?1) is positive valued and appears in the recursive equation with a plus sign, to minimize q (nq ) we need to minimize q?1(nq?1) as well. Assume that the claim is correct for q ? 1. We look for a value of mq which minimizes q (nq ). By the induction hypothesis, we have ?1 1 q (nq ) = mq ((n) + 1) + 2cq?1 ((n) + 1) (nq?1) : We can view q (nq ) as a function of mq since nq?1 = nq =mq . By computing its derivative with respect to mq , we obtain that q (nq ) is minimized for ! ! +11 2 c n q?1 +1 q mq = q : (3.3) (n) + 1 Substituting, we get 1 ? 1 q (nq ) = (2cq?1 ) +1 (q +1 + q +1 )((n) + 1) +1 nq +1 : We obtain from this expression that indeed Equation 3.2 holds for ? 1 cq = (2cq?1 ) +1 (q +1 + q +1 ) : We complete the proof by showing inductively that cq = 2q=2(q + 1). For q = 0, we indeed have that cq = 1. Assume now that the hypothesis of the induction is true for q ? 1. ? 1 cq = (2cq?1) +1 (q +1 + q +1 ) ? +1 1 = (2 2 q) +1 (q +1 + q +1 ) = 2 2 (1 + q) : q q q q q q q q q q q q q q q q q q q q q q q q q q q 53 q q q q q q q q q q q Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Corollary 5 : For any constant q, q (nq ) = O(n1q=(q+1)(n)). Therefore, if the correlated generator SREP1 ; SREP2 ; G invokes the q-th intermediate generator, for some constant q, then the output length of the succinct representation generators is (n) = O(n1=(q+1)(n)). Suppose we set a large value for n and range over q (q = 0; 1; 2; : : :). The sequence 1 +1 +1 of values q (nq ) = 2 2 (q + 1)((n) + 1) n is monotonously decreasing for small values of q. However, at some point the term 2 2 takes over, causing q (nq ) to increase. We wish to set (n) to be the optimal value in the sequence q = 0; 1; 2; : : :. In other words the value of q for which q (nq ) is minimal. By computing the derivative of q (nq ) as a function of q we obtain Corollary 6. We omit the arithmetic. q q q q q Corollary 6 : Let (SREP1(n) ; SREP2(n) ; G(n)) be the generator in the series of intermediate generators, (SREP10; SREP20 ; G0),(SREP11 ; SREP21 ; G1 ), : : : for which q (nq ) is minimal. Then, q 1 + 2 ln2 2 log (nn)+1 ? 1 s n ; 2 log (n) = ln 2 (n) and the length of a string produced by the succinct representation generators on input (1n ; `; r) equals (n) 2(n) (n)(n): In the next lemma we investigate the size of the random input rq required by SREP1q and SREP2q . After setting the values of mq (q = 0; 1; 2; : : :) the stretching factor3 of SREP1q and SREP2q is determined. By Equation 3.3 we have that 1 ! ( +1) n q mq = 2 2 ((n) + 1) (3.4) q q for every q. Lemma 4: Let mq =q 2q=2(nq =((nq)+1))1=(q+1) and let the size of the random input, rq , required by SREP1 and SREP2 (together) be denoted by q (nq ). Then, 1 ?q q (nq ) = q(n) + nq +1 ((n) + 1) +1 2 2 (2q+1 ? 1): q q q That is the ratio between the length of the output of the succinct representation generators and the length of the random input. 3 54 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Proof: We prove the claim by induction on q. For q = 0 the random input, r0, is an n0 bit long binary string as the rst generator returns r0, while the second returns r0 e`0 (n0). Now we assume the claim for q ? 1 and prove it for q. The random input rq includes a random subset of [mq] (mq bits), mq +1 seeds for GEN ((mq +1)(n) bits) and a random input for SREP1q?1 and SREP2q?1 (q?1(nq?1) bits). Therefore, q (nq ) = mq((n) + 1) + (n) + q?1 (nq?1): Substituting for mq the value 2q=2(nq =((n) + 1))1=(q+1) we have 0 +1 1 1 1 +1 ( ( n ) + 1) n q +1 A q (nq ) = 2 2 nq ((n) + 1) +1 + (n) + q?1 @ 22 1 1 ?1 = 2 2 nq +1 ((n) + 1) +1 + q(n) + 2? 12 nq +1 ((n) + 1) +1 2? 2 (2q ? 1) 1 = q(n) + nq +1 ((n) + 1) +1 2? 2 (2q+1 ? 1) q q q q q q q q q q q q q q q q q q q q Corollary 7 : For any constant q the random input r that SREP1q ; SREP2q ; Gq receives must be of size q (nq ) = O(n1q=(q+1)(n)). Therefore, if the correlated generator SREP1 ; SREP2 ; G invokes the q-th intermediate generator, for some constant q, then the random input required by the succinct representation generators is (n) = O(n1=(q+1) (n)). If we again set SREP1 ; SREP2; G to invoke that intermediate generator for which the succinct representations are of minimal length we have that: Corollary 8: The random input r required by the correlated pseudo(random genern ) ator SREP1 ; SREP2; G is of equal length to that required by SREP1 , SREP2(n) , G(n): (n) 1 ?(n) (n) = (n)(n) + n (n)+1 (n) + 1) (n)+1 2 2 (2(n)+1 ? 1) (n)(n) + 2(n) 2(n); Corollary 9: SREP1 and SREP2 expand the random input, which is of length (n) to the succinct representation, which is of length (n). The ratio between the output length and the random input length is s (n) + 1 1 log n : 2 2 (n) 55 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 3.3 Proof of index indistinguishability In this section we analyze the index indistinguishability property (Denition 20) of the construction described in Subsection 3.1.3. The current section includes several technical propositions, which we divide into a few sub-sections in order to enhance readability. The organization of the section is as follows. In Subsection 3.3.1 we prove several useful lemmata, while Subsection 3.3.2 is comprised of some helpful notation. In Subsection 3.3.3 we prove index indistinguishability under the assumption that GEN is a standard pseudo-random generator (Denition 9). In Subsection 3.3.4 we base the proof on GEN being a general TD(n); "(n){pseudo random generator (Denition 8). 3.3.1 Three useful statements The rst two lemmata we present have appeared previously in the literature, and are presented here for the sake of completeness. Lemma 5 : Let `ength : IN ?! IN be a monotonously non-decreasing function, satisfying `ength(n) n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let A = fAngn2IN , B = fBngn2IN be two probability ensembles such that both An and Bn range over f0; 1g`ength(n) . Let M be an algorithm mapping strings to strings, so that if a and b are strings of the same length, then M (a) and M (b) are strings of the same length. Let A0 = M (A), B 0 = M (B ) be the ensembles induced by applying M to A; B , respectively. Let TM (n) be the running time of M on `ength(n) bit strings. If A and B are TD (n); "(n){indistinguishable then the ensembles A0; B 0 are TD (n) ? TM (n), "(n) indistinguishable. If A and B are computationally indistinguishable and TM (n) is polynomial in n then A0; B 0 are computationally indistinguishable. Proof: For every algorithm D0 that distinguishes between the ensembles A0 and B 0 we dene a corresponding algorithm D that distinguishes between A and B . On an input strings z and 1n (see Denition 4), D computes M (z), invokes D0 on input (M (z); 1n ) and returns the output of D0. For every D0 whose running time is TD (n) ? TM (n), the running time of the corresponding algorithm D is TD(n). By assumption A and B are TD (n); "(n){ indistinguishable and thus there exists an n0 such that for every n n0, the probability that D distinguishes between An and Bn is smaller than "(n). Therefore, for every n n0, the probability that D0 distinguishes between A0n and Bn0 is smaller than "(n). 56 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 For every D0 whose running time on strings of length n is polynomial in n, if M is a polynomial time mapping, then the corresponding algorithm D runs in polynomial time. If A and B are computationally indistinguishable then for any constant c there exists an n0 such that for every n n0, D distinguishes between An and Bn with probability less than 1=nc . Therefore, for every n n0 D0 distinguishes between A0n and Bn0 with probability less than 1=nc . This in turn implies that A0; B 0 are computationally indistinguishable. Lemma 6 : Let `ength : IN ?! IN be a monotonously non-decreasing function, satisfying `ength(n) n. Let TD ; TD0 : IN ?! IN , and "; "0 : IN ?! [0; 1]. Let A = fAngn2IN , B = fBn gn2IN , C = fCngn2IN be three probability ensembles such that An, Bn and Cn range over f0; 1g`ength(n) . Assume that A; B are TD(n); "(n){indistinguishable while B; C are TD0 (n); "0(n) indistinguishable. Then A; C are minfTD (n); TD0 (n)g; "(n)+ "0(n) indistinguish- able. Assume A; B are computationally indistinguishable, and B; C are computationally indistinguishable. Then A; C are computationally indistinguishable. Proof: Let D be a distinguisher between the ensembles A and C , whose running time is minfTD (n); TD0 (n)g. That same algorithm D can be used to distinguish between any two ensembles, in particular between A and B and between B and C . We use the notation pY (n) = Pr[D(Yn ; 1n) = 1] for each of the three ensembles Y = A; B; C . If A; B are TD (n); "(n){indistinguishable, while B; C are TD0 (n); "0(n) indistinguishable there exist n1 and n2 such that for any n n1 we have jpA (n) ? pB (n)j < "(n), and for any n n2 we have jpB (n) ? pC (n)j < "0(). Let n0 = maxfn1; n2g. For any n > n0 we have jpA (n) ? pC (n)j jpA (n) ? pB (n)j + jpB (n) ? pC (n)j < "(n) + "0(n) Assume that A; B are computationally indistinguishable, B; C are computationally indistinguishable and the running time of D is polynomial in n. Therefore, for every constant c there exist n1 and n2 such that for any n n1 we have jpA (n) ? pB (n)j < 1=2nc and for any n n2 we have jpB (n) ? pC (n)j < 1=2nc . Let n0 = maxfn1; n2g. For any n > n0 we have jpA (n) ? pC (n)j < 1=nc , and thus A; C are computationally indistinguishable. We need one more lemma in order to precisely analyze the running time bounds on algorithms that try to distinguish the output of SREP1q (or SREP2q ) from the uniform distribution. 57 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Lemma 7: Theq computational complexityq of producing the pair of succinct representations SREP1 (1n ; nq ; `q ; rq) and SREP2 (1n ; nq ; `q ; rq) is no more than 2q (nq )+ 2qTGEN (n) steps. Proof: Let f (q; n; nq) denote the time needed to generate the pair of succinct representations SREP1q (1n ; nq ; `q ; rq ) and SREP2q (1n ; nq ; `q ; rq ). Note that f does not depend on the values of `q and rq . Generating the two succinct representations can be divided into several stages: Sampling random elements: s1; : : :; sm and S q (q (nq ) ? 2q?1 (nq?1) + k(n) steps). Generating the strings u`q??11 ; w`q??11 (f (q ? 1; n; nq?1) steps). q q q Applying GEN to the two seeds s1i ,s2i ( 2TGEN (n) step). Finally computing cw1, cw2 and S q i (no more than q (nq ) ? k(n) steps). Therefore, the following recursive inequality holds: f (q; n; nq ) 2(q (nq ) ? q?1(nq?1 ) + TGEN (n)) + f (q ? 1; n; nq?1) Solving for f as a function of q veries the statement of the lemma. 3.3.2 Notation used in the proof Notation 8: Let ` : IN ?! IN be an index function and let `ength : IN ?! IN be a monotonously non-decreasing function, satisfying `ength(n) n. Let GEN be a pseudo-random generator stretching seeds of length (n) to strings of length n, let SREP1; SREP2 ; G be the correlated generator and SREP1q ; SREP2q ; Gq (q = 0; 1; 2; : : :) be the intermediate generators as constructed in Subsection 3.1.3. We denote by UNI`ength(n);`(n) the distribution induced by binary strings of the form rk`(n) where r 2 f0; 1g`ength(n) is chosen uniformly at random. We 3.1.3. If the input of the correlated generator is (1n ; `(n); r) we denote the input of the q-th intermediate generator (which is completely determined given the input of the correlated generator) by (1n ; nq ; `q (n); rq ). For every n 2 IN we denote by Aqn;`(n) the distribution q induced by uq`q (n) = SREP1q (1n ; nq ; `q (n); rq ). We denote by Bn;` (n) the distribution 4 q q Bn;` (n)n = An;`o(n) k`(n). The corresponding n q o ensembles given q and ` are denoted by q q q A` = An;`(n) n2IN and by B` = Bn;`(n) n2IN . We analogously dene the ensembles Aq` ; B`q induced by the outputs of SREP2q . 58 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Notation 9: Let ` : IN ?! IN be an index function. Denote by GEN`ength(n);`(n) the distribution induced by binary strings of the form GEN (s; 1n )k`(n). We con- struct GEN (s; 1n ) as follows. First, a seed s is chosen uniformly at random from f0; 1gkappa(n). Then, GEN is applied to input the (s; 1n ) and used to compute an n bit output. Finally, the rst `ength(n) bits of the output are taken to be GEN (s; 1n ). The corresponding ensembles are denoted by n o n o UNI`ength;` =4 UNI`ength(n);`(n) n2IN ; GEN`ength;` =4 GEN`ength(n);`(n) n2IN : Notation 10: Let : IN ?! IN be any function and let ` : IN ?! IN be an index function. We dene the ve ensembles A` , A` , B` , B` , UNI;` by o n A` =4 An;`(n()n) n2IN n (n) o B` =4 Bn;` (n) n2IN o 4n UNI ;` = UNI ( )(n);`(n) n2IN o n A` =4 An;`(n()n) n2IN n (n) o B` =4 Bn;` (n) n2IN n where the distributions are as described in Notation 8 and Notation 9. We denote by T(n) the time required to compute (n). 3.3.3 Polynomial indistinguishability In this subsection we assume that GEN is a standard pseudo-random generator with stretch n1=2c`en to n, as in Denition 9. Furthermore, we assume that GEN takes as input seeds of length (n) = n1=2c`en(n) and a security parameter 1n and outputs a string of length n. We use GEN to construct the correlated pseudo-random generator SREP1 ; SREP2; G as in Subsection 3.1.3. Our nal goal is to prove that if GEN is a pseudo-random generator (that is, its output is indistinguishable from a random string) then the output of the succinct representation generators we dened is also indistinguishable from a random string. In the following lemma we take the rst step towards this goal. For a given index function ` we examine strings that have `(n) concatenated to them. We argue that if the output of GEN , concatenated to `(n), is indistinguishable from a random string, concatenated to `(n), then the output of a succinct representation generator with a concatenated `(n) is also indistinguishable from a random string with a concatenated `(n). Note that proving the above claim is not sucient to prove that the output of the succinct pseudo-random generators is pseudo-random. The remaining gap is dealt with in Lemma 9, which shows that the output of GEN concatenated to `(n) is indistinguishable from a random string concatenated to `(n). 59 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Lemma 8: Let `ength(n) = n for every n 2 IN and let ` : IN ?! IN be an index function. If the ensemble GEN`ength;` is computationally indistinguishable from the ensemble UNI`ength;` then for every constant q (q = 0; 1; 2; : : :) the ensembles B`q ; B`q are computationally indistinguishable from the ensemble UNIq ;` . Proof: We prove the claim explicitly for B`q (q = 0; 1; 2; : : :) by induction on q. The proof is symmetric for B`q . The claim for the basis of the induction, q = 0, follows immediately from the construction. Recall that for any n and any `(n) the output of SREP10(1n ; `(n); r) is uniformly distributed over f0; 1g0(n0 ). Thus, for every n 0 and ` the two distributions Bn;` (n) and UNI0 (n0 );`(n) are identical and therefore the 0 ensembles B` and UNI0;` are identical (without any assumptions about GEN`ength;` ). Assume that the inductive claim is correct for q ? 1. Let `() be an index function. We prove by a hybrid argument that B`q is computationally indistinguishable from the ensemble UNIq ;`. q For any n 2 IN the distribution Bn;` (n) is over strings of length q (nq ) + log n and can be parsed as: uq`q (n)k`(n) =4 s1k : : : ksi?1ks1i ksi+1k : : : ksmq kcw1kcw2kS q k`(n): We use the notation `(n) and `q (n) in analogous fashion to ` and `q in Subsection 3.1.3 (`q (n) can be easily computed from `(n).). We assume w.l.o.g that i 2 S q and thus cw1 = u`qq??11 (n) GEN (s1i ) and cw2 = w`qq??11(n) GEN (s2i ). q Intuition: We wish to compare Bn;` (n) to another distribution over strings of length q (nq ) + log n, which is simply the uniform distribution over q (nq ) bits with `(n) concatenated at the end. These two distributions are very similar. The only dierence is that cw1 and cw2 are not random strings. Replacing cw2 by a random string cannot be detected because of the following. s2i is a seed which is independent of uq`q (n), therefore its expansion by GEN is indistinguishable from a random string, and hence cw2 can't be distinguished from a random string. Replacing cw1 by a random string cannot be detected because because of a dierent argument. u`qq??11(n)k`(n) is indistinguishable from a random string with a concatenated `(n) by the induction hypothesis. Therefore, u`qq??11(n) is indistinguishable from a random string, and the same holds for cw1. A formalization of the above intuition now follows. Consider the distribution q Hn;` (n) over strings of the form hq` (n) =4 s1k : : : ksi?1ks1i ksi+1k : : : ksm kcw1kw`q??11 (n) r2kS q k`(n); where r2 is an q?1(nq?1) bit binary string, chosen uniformly at random. The sole 4 q q 2 ) is replaced by r2 . Let H q = dierence between B and H is that GEN ( s i ` n;` ( n ) n;` ( n ) n q o Hn;`(n) n2IN be the corresponding ensemble of distributions. q q 60 q Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 We construct a probabilistic polynomial time algorithm M1 that for every n transforms a string zk`(n) drawn from either GEN ?1 (n ?1 );`(n) or UNI ?1(n ?1 );`(n) to a q q string drawn from either Bn;` (n) or Hn;`(n) respectively. Given z k`(n) as input, M1 does the following: 1. Chooses uniformly at random and independently a set S q 2 [mq] and mq seeds s1; : : :; sm 2 f0; 1gn1 2 . 2. Computes `q?1(n) and i from `(n). 3. Constructs u`q??11(n) and w`q??11 (n) which are the outputs of the succinct representation generators SREP1q?1 and SREP2q?1 . 4. Outputs q q q q = c`en q q q s1k : : : ksm ku`q??11(n) GEN (si ; 1n )kw`q??11(n) zkS q k`(n): q q q q If zk`(n) is drawn from GEN ?1(n ?1 );`(n) then the output is drawn from Bn;` (n) and q if zk`(n) is drawn from UNI ?1(n ?1);`(n) then the output is drawn from Hn;`(n) . M1 runs in polynomial time (in n) due to the following facts. The number of random bits it needs is less than q (nq ) n. The computation of `q?1 (n) and i is polynomial4 in log n. The construction of the pair u`q??11(n) and w`q??11 (n) takes polynomial time, see Lemma 7. Finally, since GEN is a pseudo-random generator the computation of GEN (si ; 1n ) takes polynomial time (in n). Since r2 is random and independent of all other entries, hq` (n) can also be viewed as: hq` (n) =4 s1k : : : ksi?1 ks1i ksi+1k : : : ksm kcw1kr2kS q k`(n); where r2 is a random binary string of length q?1(nq?1). q Consider the distribution Fn;` (n) over strings of the form q q q q q q q q q f`q (n) =4 s1k : : : ksi?1ks1i ksi+1k : : : ksm kr1 GEN (s1i ; 1n)kr2kS q k`(n); q q where r1 is an q?1(nq?1) bit binary string, chosen uniformly at random. The sole q?1 q q dierence between Hn;` (n) and Fn;`(n) is that u` ?1 (n) is replaced by r1. Since r1 GEN (s1i ; 1n ) is random the string f`q (n) can be viewed as rk`(n) where r 2 f0; 1g (n ) is chosen uniformly at random. Thus, f`q (n) is in fact drawn from the distribution UNI (n );`(n). q q q q q q q The computation can be carried out by recursively calculating the value mq as in Subsection 3 and then by calculating the position of `q (n) in the matrix: (i; `q (n)) such that `q (n) = (i ? 1)nq?1 + `q?1 (n). 4 61 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 We construct a probabilistic polynomial time algorithm M2 that for every n transq?1 or UNI forms a string zk`(n) drawn from either Bn;` ?1 (n ?1 );`(n) to a string drawn (n) q from either Hn;`(n) or UNI (n );`(n) respectively. Given zk`(n) as input, M2 does the following: q q q q 1. Chooses uniformly at random and independently s1; : : :; sm 2 f0; 1gn1 2 , r2 and S q 2 [mq]. 2. Computes `q?1(n) and i from `(n). 3. Outputs s1k : : : ksm kz GEN (si ; 1n )kr2kS q k`(n): = c`en q q Similar arguments to those we used to show that M1 runs in polynomial time prove that M2 runs in polynomial time (indeed M2 requires less running time than M1). We complete the proof of the Lemma by the following chain of arguments. M1 is a polynomial time algorithm that maps GEN ?1 ;` to B`q and UNI ?1;` to H`q . We have by Lemma 5 that if GEN ?1 ;` and UNI ?1;` are computationally indistinguishable then B`q and H`q are also computationally indistinguishable. By the induction hypothesis if GEN`ength;` and UNI`ength;` are computationally indistinguishable then so are B`q?1 and UNI ?1;`. We combine this with the polynomial time mapping M2 that transforms B`q?1 to H`q and UNI ?1 ;` to UNI ;` and obtain that if GEN ?1;`(n) and UNI ?1;` are computationally indistinguishable then so are H`q and UNI ;`. Therefore, if GEN ?1 ;`(n) and UNI ?1;` are computationally indistinguishable then B`q is computationally indistinguishable from H`q and H`q is computationally indistinguishable from UNI ;`. By Lemma 6 we have that if GEN ?1;` is computationally indistinguishable from UNI ?1;` then B`q and UNI ;` are computationally indistinguishable. In the next lemma we show that if GEN is a pseudo-random generator then the ensembles GEN`ength;` (the output of GEN with `(n) attached at the end) and UNI`ength;` (the uniform ensemble with `(n) attached at the end) are computationally indistinguishable. Note that this does not follow immediately from the fact that the output of GEN (without `(n)) is indistinguishable from a random string. q q q q q q q q q q q q q q q q Lemma 9: Let ` : IN ?! IN be an index function and let `ength : IN ?! IN be a monotonously non-decreasing function, satisfying `ength(n) n, which is computable in uniform polynomial time. Let GEN be a pseudo-random generator. Then, the ensemble GEN`ength;` is computationally indistinguishable from UNI`ength;` . 62 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Proof: Our rst impulse might be to try proving this lemma by the same argument employed in Lemma 5. That is, by using a mapping M that takes as input a string z 2 f0; 1g`ength(n) (which is either pseudo-random or truly random) attaches `(n) to z and thus produces a string which is either drawn from GEN`ength(n);`(n) or from UNI`ength(n);`(n) . However, `(n) may not be computable in uniform polynomial time. Hence, we have to rely on a slightly dierent argument. We assume towards a contradiction that there exists an innite sequence `(n1), `(n2), : : : such that the distribution UNI`ength(n);`(n) can be eciently distinguished from GEN`ength(n);`(n) . We show that a sequence with such properties can be discovered by sampling, and used to distinguish between the uniform and pseudo-random ensembles. We now provide the necessary details. Assume towards a contradiction that there exists a polynomial time algorithm D such that for some constant c and for an innite sequence of natural numbers Pr[D(1n ; GEN`ength(n);`(n) ) = 1] ? Pr[D(1n ; UNI`ength(n);`(n) ) = 1] n1c : For every i 2 [n] we use the notation pG (n; i) =4 Pr[D(1n ; GEN`ength(n);i ) = 1] and 4 also pU (n; i) = Pr[D(1n ; UNI`ength(n);i ) = 1]. We can assume w.l.o.g that for any n in an innite sequence of natural numbers n1; n2; : : : we have pG (n; `(n)) ? pU (n; `(n)) 1=nc (note the absence of the absolute value) and design the algorithm D0 accordingly5. We construct an algorithm D0 that distinguishes with non-negligible probability between the output of the pseudo-random generator, GEN , and the uniform distribution for the same innite sequence of natural numbers, n1; n2; : : :. Informally, D0 attempts to nd an index i 2 [n] such that pG (n; i) ? pU (n; i) 1=2nc . If D0 succeeds it takes the rst `ength(n) bits of its input string z, which we denote by z`ength(n) , executes D on input z`ength(n)ki and outputs the result. If it does not succeed it outputs 1. Assume that for some n and `(n) the algorithm D distinguishes between the distributions GEN`ength(n);`(n) and UNI`ength(n);`(n) with probability at least 1=nc . Then, we show that D0 distinguishes between the pseudo-random distribution on n bits, GENn , and the uniform distribution on n bits, UNIn , with probability at least 1=(2nc+1 ). The algorithm D0: 1. On input (1n ; z), where z 2 f0; 1gn , compute `ength(n). 2. Choose randomly i 2 [n] and estimate the probability pG (n; i) by the following steps: If this assumption is incorrect then pU (n; `(n)) ? pG (n; `(n)) 1=nc for an innite sequence n1 ; n2; : : :. In that case the algorithm D00 which can be obtained from D0 by exchanging pG (n; i) with pU (n; i) (for every n and i) achieves the same purpose as D0 . 5 63 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 4 2c (a) Choose t(n) = 8n ln 16nc+1 seeds s1; : : :; st(n) 2 f0; 1g(n) uniformly at random and independently. (b) For every j = 1; : : : ; t let GEN (sj ; 1n ) be the `ength(n) bit binary string obtained by having GEN expand sj to n bits and then taking the rst `ength(n) bits. (c) For every j = 1; : : : ; t execute D on input (1n ; GEN (sj ; 1n)ki). Dene a random variable Xj to be 1 if D returns 1 on its input and 0 otherwise. (d) Estimate pG (n; i) by Pt(n) 4 j =1 Xj pG (n; i) = n . 3. For every i 2 [n] estimate the probability pU (n; i) in similar fashion: (a) Choose t(n) strings r1; : : : ; rt(n) 2 f0; 1g`ength(n) uniformly at random and independently. (b) For every j = 1; : : : ; t execute D on input (1n ; rj ki). Dene Yj to be 1 if D returns 1 on its input and 0 otherwise. (c) Estimate pU (n; i) by Pt(n) 4 j =1 Yj pU (n; i) = n . 4. If pG (n; i) ? pU (n; i) 1=2nc then execute D on input z`ength(n)ki and return the result as the output of D0 . Otherwise return 1 as output. Analysis: In order to show that the estimates pG (n; i) and pU (n; i) are close to pG (n; i) and pU (n; i) respectively, we use the Cherno bound (see for instance [44, p. 109]). It states that if p 1=2 and Z1; : : :; Zt0 are independent 0 ? 1 random variables so that 8i; Pr[Zi = 1] = p then for any , 0 < p(1 ? p), we have 3 2 Pt0 2 Z 0 Pr 4 j=10 j ? p > 5 < 2 e? 2 (1? ) t t p p Assume that for n and `(n) we have pG (n; `(n)) ? pU (n; `(n)) 1=nc . jPr[D0(1n ; GENn ) = 1] ? Pr[D0(1n ; UNIn ) = 1]j 0 n 0 n Pr[PD (1 ; GENn ) = 1] ? Pr[D (1 ; UNIn ) = 1] = 1 n Pr[ 1 ] (p (n; i) ? p (n; i)) p ( n; i ) ? p ( n; i ) G U G U n i=1 2n c 64 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 We divide the indices 1; : : :; n into 3 separate categories which we denote by C1; C2; C3 (C1 [ C2 [ C3 = [n]). C1 includes all the indices i for which pG (n; i) ? pU (n; i) 1=nc . In particular `(n) 2 C1. For the sake of brevity in the following calculations we use the notation j = `(n). For the C1 category we have: 1P pG (n; i) ? pU (n; i) 2n1 ] (pG (n; i) ? pU (n; i)) n i2C1 Pr[ h i 1 1 n +1 Pr hpG (n; j ) ? pU (n; j ) 2n i 1 1 1 = n +1 Pr jpGh(n; j ) ? pG (n; j )j < 4n ^ jpU(n; j ) ? pU (n; j )j < 4n i 1 1 1 n +1 1 ? Pr h jpG (n; j ) ? pG (n; j )j 4ni _ jhpU (n; j ) ? pU (n; j )j 4n i 1 1 1 n +1 1 ? Pr jpG (n; j ) ? pG (n; j )j 4n ? Pr jpU (n; j ) ? pU (n; j )j 4n c c c c c c c c c c c c Here we put the Cherno bound to good use. pG (n; j ) (or pU (n; j )) is an estimate of pG (n; j ) (or pU (n; j )) by a sum of 0 ? 1 variables exactly as in the statement of the bound6 . When estimating pG (n; j ) we used the Xj variables which are independent 0 ? 1 variables with Pr[Xj = 1] = pG (n; j ) and we employed the Yj s in similar manner to estimate pU (n; j ). Thus, continuing the above computation: !! 2 (1 4 )2 1 ? 2 ( (1)(14 ? ) ( )) t(n) ? t ( n ) > nc+1 1 ? 2 e ? e 2 ( )(1? ( )) nc1+1 1 ? 2 2e? ln16n +1 4n3c+1 The category C2 includes all the indices i for which 0 pG (n; i) ? pU (n; i)) < 1=nc . Obviously, 1 X Pr[p (n; i) ? p (n; i) 1 ] (p (n; i) ? p (n; i)) 0: U U n i2C2 G 2nc G pG n;j = nc pG n;j pU n;j = nc pU n;j c The third category C3 includes all the indices i for which pG (n; i) ? pU (n; i) < 0. 1P pG (n; i) ? pU (n; i) 2n1 ] (pG (n; i) ? pU (n; i)) n i2C3 Pr[ P pG(n; i) ? pU (n; i) 21n ] ? n1 i2C3 Pr[ h i P ? n1 i2C3 Pr h pG (n; i) ? pG (n; i) 4n1 i _ pUh(n; i) ? pU (n; i) 4n1 i ? 1 P Pr jp (n; i) ? p (n; i)j 1 + Pr jp (n; i) ? p (n; i)j 1 c c c n i2C3 G G 4nc c U U 4nc If pG (n; j) > 1=2 we consider the complement 1 ? pG(n; j) which is the probability that D returns 0 on a pseudo-random string. 6 65 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Again, by the Cherno bound, (1 4 )2 ? n1 Pi2C3 2 e? 2 ( )(1? ( )) 8n2 ? 4 e? ln16n +1 = nc pG n;i pG n;i c ln16nc+1 +2e 2 =4n ) 2c c+1 ? 2pU (n;i(1)(1 ?pU (n;i)) 8n ln 16n c ! = c ? 4n1+1 c Combining all the previous calculations we obtain: jPr[D0(1n ; GENn ) = 1] ? Pr[D0(1n ; UNIn) = 1]j P P 1 3 pG(n; i) ? pU (n; i) 2n1 ] (pG (n; i) ? pU (n; i)) n j =1 i2C Pr[ 1 3 = 4n +1 + 0 ? 4n +1 c j c 1 2nc+1 c Corollary 10: Let q be qsome constant natural number, let GEN be a pseudo-random q generator and let SREP1 ; SREP2 ; Gq be a correlated generator based on GEN . For any index function ` : IN ?! IN the output ensembles Aq` ; Aq` , of SREP1q ; SREP2q respectively, are pseudo-random. Proof: Assume that there qexists a qpseudo-random generator GEN and it is used to construct the ensembles B` and B` . By Lemma 8 and Lemma 9 we deduce that B`q and B`q are computationally indistinguishable from the ensemble UNIq ;`. Recall q q that for any n, Bn;` (n) = An;`(n) k`(n) and UNIq (nq );`(n) = UNIq (nq ) k`(n), where UNIn q (nq ) denotes the uniform distribution on q (nq ) bit strings. Thus, if an algorithm D distinguishes between Aqn;`(n) and UNIn with probability at least 1=nc q then that same algorithm distinguishes between Bn;` (n) and UNIq (nq );`(n) with the same probability by simply ignoring the attached `(n). A symmetric argument proves the claim for Aq`. 3.3.4 A quantitative version In this subsection we present a quantitative analysis of the correlated generator's index indistinguishability property. We assume that the pseudo-random generator GEN expands seeds of length (n) for some function (n) (compared to seeds of length n1=2c`en for some constant c`en > 1 as in Subsection 3.3.3) to strings of length n. Any probabilistic algorithm running in time TD(n) cannot distinguish between the pseudo-random and the uniform distributions with advantage "(n) (compared to 66 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 probabilistic polynomial time algorithms not being able to distinguish between the two distributions with advantage n1=c for any constant c as in Subsection 3.3.3) for large enough values of n. The benets inherent in the quantitative version are twofold. The rst advantage is a more exact analysis of the index indistinguishability property of the correlated pseudo-random generator. The second and more important advantage is that the full potential of the generator's stretching factor is explored. When we use the standard notion of polynomial indistinguishability (as in Subsection 3.3.3) (n) = n1=2c`en. By Corollary 5, for any constant q, we have q (nq ) = O(n1q=(q+1)(n)). By Denition 9 the seed length (n) can be chosen as n1=2c`en for any constant c`en. Thus, for any desired constant cout, the output of a succinct representation generator may be determined to be n1=cout, by choosing appropriate values for (n) and q. On the other hand, if the desired output length of a succinct representation generator is asymptotically smaller than n1=cout for any constant cout then security can't be proven according to Denition 9. However, by using the quantitative version of indistinguishability we might reach a better solution. Indeed, in Subsection p3 we showed that the optimal succinct representation length is q 2log( n= ( ( n ))) (n) (n) (n) 2 and is achieved after (n) 2 log(n=(n)) recursive invocations. It is interesting, therefore, to determine the security of the shortest possible succinct representations in the case where n is super-polynomial in (n). Most of the analysis in this subsection follows in the footsteps of the lemmata we proved in Subsection 3.3.3. We do note, however, one important dierence. As mentioned previously, the correlated generator SREP1 ; SREP2; G in Subsection 3.3.3 is identical to an intermediate generator SREP1q ; SREP2q ; Gq (for some constant q). In other words, for every nal output length n, the number of recursive invocations is the same (i.e q). In this subsection we take into account a state of aairs in which the number of recursive invocations is a function of n. Therefore, proving that the ensembles Aq`; Aq` are pseudo-random does not suce for the proof of the index indistinguishability property. Intuition: In Lemma 10 it is assumed that GEN`ength;` and UNI`ength;` are computationally indistinguishable. We wish to prove that the output of a succinct representation generator concatenated to `(n) (e.g. B`) is indistinguishable from a random string concatenated to `(n). We use a hybrid argument (although a dierent one from the line of reasoning in Lemma 8). uq` (n), the output of SREP1 is constructed from a series of random and pseudorandom strings, which are joined together by concatenation and exclusive-or operations. If we exchange all the pseudo-random strings in the construction by random string, then the result is also a random string. We dene a sequence of distributions, using the same construction each time. The rst distribution is B`, the second uses q 67 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 the same construction, but with one random string instead of a pseudo-random string, the third uses two random strings and so on. The crux of the proof is showing that each successive pair of distributions is indistinguishable (for a large enough n), and since the number of distributions in the sequence is not too large B` and UNI ;` are indistinguishable. q Lemma 10: Let TD : IN ?! IN and " : IN ?! [0; 1] be two functions. Let GEN be a TD (n); "(n){pseudo random generator. Let t(n) = 8(1="(n))2 ln 16(n="(n)) and let `ength(n) = n for every n 2 IN . Let q s 2 n n : 4 1 + 2 ln 2 log (n)+1 ? 1 (n) = 2 log ln 2 (n) For every index function ` : IN ?! IN if the ensembles GEN`ength;` and UNI`ength;` are (TD (n)?3tTGEN (n))=2t(n); 2n"(n){computationally indistinguishable then the ensembles B`; B` are TD (n); "(n){computationally indistinguishable from the ensemble UNI ;`, where q TD (n) = 2Tt(n) ? (2(n) + 23 )TGEN ? 2(n) "(n) = 2n (n)"(n): D We prove the claim explicitly only for B` (since the proof for B` is sym(n) 0 1 metric). We begin by regarding the distributions Bn;` (n) ; Bn;`(n) ; : : : ; Bn;`(n) anew, and change the notation slightly so that a q superscript (or subscript) is added to every q q element of Bn;` (n) (1 q (n)). Thus, Bn;`(n) is dened on strings of the form, Proof: uq` (n)k`(n) = sq1k : : : ksqi ?1 ksq;i 1ksqi +1k : : : ksqm kcw1q kcw2q kS qk`(n): q q q q q Recall that if iq 2 S q then cw1q = u`q??11 (n) GEN (sq;i 1; 1n ) and cw2q = w`q??11 (n) GEN (sq;i 2; 1n ), where sq;i 2 is a random seed independent of all the other entries in uq` (n). If iq 62 S q then the values of cw1q and cw2q are exchanged. For every n we inductively dene a sequence of distributions qn;` (n) (q = 0; 1; : : :) on strings of length q (nq )+log n. The distribution 0n;`0 (n) is simply the distribution UNI0(n0);`(n) (that is the uniform distribution on strings of length 0(n0) followed by the binary representation of `(n)). The distribution qn;` (n) is dened over the following strings: q q q q q q q q q q q q q;1 q q q q `q (n) k`(n) = s1 k : : : ksi?1 ksi ksi+1 k : : : ksmq kncw1 kncw2 kS k`(n): 68 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 All the elements are sampled as in the output of SREP1q (1n ; nq ; `q ; rq ), uq` (n), except for ncw1q and ncw2q . If i 2 S q then ncw1q = `q??11 (n) GEN (s1i ; 1n ), but ncw2q is chosen uniformly at random from f0; 1g ?1 (n ?1). If i 62 S q then ncw1q and ncw2q exchange their values. Observation 1: q` (n) = UNI (n );`(n) (the uniform distribution over f0; 1g (n ) concatenated to `(n)). This observation is easily proved by induction. If q = 0 it follows from the denition. Assume that `q??11 (n) = UNI ?1(n ?1);`(n), and therefore ncw1q is sampled uniformly at random. All the other elements of `q (n) are also sampled uniformly at random and independently, and thus `q (n) is a random string. Observation 2: there is a sequence of q pseudo-random strings that are used in q the construction of u` (n), which were produced from seeds that are independent of uq` (n). Thus GEN (sq;i 2; 1n ) is used in the construction of either cw1q or cw2q , while u`q??11 (n) is used in the construction of the other correction word. In the construction of u`q??11 (n) there is also such a pseudo-random string GEN (sqi ??11;2 ; 1n). The sequence extends until GEN (s1i1;2; 1n ). The only dierence between `q (n) and uq` (n) is that the sequence GEN (s1i1;2; 1n ); GEN (s2i2;2; 1n ); : : :; GEN (sq;i 2; 1n ) used in the construction of uq` (n) is replaced by a sequence of random strings in the construction of `q (n). We now dene a sequence of hybrid distributions, 0n;`(n); 1n;`(n); : : : ; n;`(n()n). Each of them is on binary strings of length (n)(n) + log n (recall that by denition (n)(n) = (n)). For every q = 0; : : : ; (n), a string hqn;`(q) drawn from qn;`(q) is constructed in similar fashion to un;`(n()n) k`(n) except for one detail. The sequence of pseudo-random strings GEN (s1i1;2; 1n ); GEN (s2i2;2; 1n ); : : :; GEN (si ((n));2; 1n ) used in ;2 n the construction of un;`(n()n) is replaced by the sequence r1; : : : ; rq , GEN (sqi +1 +1 ; 1 ), : : :,GEN (si ((n));2; 1n ). r1; : : : ; rq are chosen uniformly at random and independently. For every j (1 j q) rj is of the same length as the string GEN (sj;i 2; 1n ). By this denition we have that n;`(n()n) = n;`(n()n) = UNI ( )(n);`(n) (n) 0n;`(n) = Bn;` (n) Assume towards a contradiction that there exists an algorithm M running in time (TD (n) ? 3t(n)TGEN (n))=2t(n) ? 2(n) (n) ? 2(n)TGEN (n) that for every n in an innite sequence of natural numbers (n1; n2; : : :) distinguishes between the distributions (n) and UNI Bn;` ( )(n);`(n) with probability greater than 2n (n)"(n). We construct (n) 0 an algorithm M that runs in time (TD (n) ? 3t(n)TGEN (n))=2t(n) and distinguishes between GEN`ength(n);`(n) and UNI`ength(n);`(n) with probability greater than 2n"(n). q q q q q q q q q q q q q q q q q q q q q q q q n q n j n n 69 q Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 The algorithm M 0: 1. Given an input (1n ; z), choose uniformly at random an integer q 2 [(n)]. 2. Construct a sequence of strings 1; : : :; (n) such that for every j 2 [(n)], j 2 f0; 1g (n ). If j < q then j is chosen uniformly at random. Otherwise, if j > q, j is constructed by choosing uniformly at random a seed sj;i 2 2 f0; 1g(n), expanding it to n bits via an application of GEN , and taking the rst j (nj ) bits as the desired string. Finally, q is set to the rst q?1(nq?1) bits of z. 3. Construct a binary string x of length (n)(n)+log n using a similar construction to that of u` ((n))(n)k`(n). The only dierence is that the sequence of pseudorandom strings GEN (s1i1;2; 1n ); : : :; GEN (si ((n));2; 1n ) used in the construction of un;`(n()n) is replaced by the sequence 1; : : :; (n). 4. Execute M on input x and return its answer as the output of M 0. Analysis: the running time of M 0 is comprised of the construction of x and the execution of M on x (the contribution of the other elements of M is negligible). The construction of x takes no more time than the construction of a pair of succinct representations, u`((nn)) = SREP1 (1n; `; r) and w`((nn)) = SREP2 (1n ; `; r). By Lemma 7 that step takes no more than 2(n)(n) + 2(n)TGEN (n). Thus the running time of M 0 is (TD (n) ? 3tTGEN (n))=2t(n). the rest of the analysis is a standard hybrid argument. From the description of the algorithm we obtain that: X (n) Pr[M 0(GEN`ength(n);`(n);1 ) = 1] = (1n) Pr[M (qn;`(n); 1n ) = 1] q=1 and (X n)?1 Pr[M 0(UNI`ength(n);`(n) ; 1n ) = 1] = (1n) Pr[M (qn;`(n); 1n ) = 1]: j j j n n n Thus, q=0 0 (GEN`ength(n);`(n) ; 1n ) = 1] ? Pr[M 0(UNI`ength(n);`(n) ; 1n ) = 1] = Pr[M 1 Pr[M ((n) ; 1n ) = 1] ? Pr[M (0 n ) = 1] ; 1 = n;`(n) n;`(n) (n) (n) n 1 n (n) Pr[M (Bn;`(n) ; 1 ) = 1] ? Pr[M (UNI ( ) (n);`(n) ; 1 ) = 1] 2n"(n) n 70 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Lemma 11: Let TD : IN ?! IN and " : IN ?! [0; 1] be two functions. Let GEN be a TD (n); "(n){pseudo random generator. Let t(n) = 8(1="(n))2 ln 16(n="(n)), let `ength(n) = n for every n 2 IN and let ` : IN ?! IN be an index function. Then, the ensemble GEN`ength;` is (TD(n) ? 3t(n)TGEN (n))=2t(n); 2n"(n){computationally indistinguishable from UNI`ength;` . Proof: This lemma is an analog of Lemma 9 and to prove it we use the same arguments as in the proof to Lemma 9. That is, we assume towards a contradiction that there exists an algorithm D that runs in time (TD (n) ? 3t(n)TGEN (n))=2t(n) and for every n in an innite sequence of natural numbers n1; n2; : : : distinguishes between the distributions GEN`ength(n);`(n) and UNI`ength(n);`(n) with probability 2n"(n). Given this assumption the same algorithm D0 which we construct in the proof to Lemma 9 can be used to distinguish between the output of GEN and the uniform distribution. The analysis of the algorithm is similar to what we showed in the proof of Lemma 9 except for exchanging 1=nc by "(n). In Lemma 9 it suced that D0 runs in polynomial time. Here we analyze its running time more thoroughly. The most time consuming operations undertaken by D0 are the expansion of t(n) = 8(1="(n))2 ln 16(n="(n)) seeds into strings of length n (t(n) TGEN (n) time) and executing D on 2t(n) inputs (TD (n) ? 3t(n)TGEN (n)) time). Sampling t(n) seeds and t(n) random strings takes less than 2t(n) TGEN (n) time. Thus7, the total running time of D0 is TD (n). If we analyze D0 using the Cherno bound (as in Lemma 9) we see that D0 distinguishes between the uniform and pseudo-random distributions with 1=(2n) the probability that D distinguishes between GEN`ength(n);`(n) and UNI`ength(n);`(n) , namely "(n). Therefore we have arrived at a contradiction and the lemma is correct. Corollary 11: Let TD : IN ?! IN and " : IN ?! [0; 1] be two functions. Let GEN be a TD (n); "(n){pseudo random generator. For any index function ` : IN ?! IN the ensembles Aq`; Aq` are TD (n); "(n){pseudo random, where TD(n) = TD2(tn) ? (2(n) + 23 )TGEN (n) ? 2(n) "(n) = 2n (n)"(n): Corollary 12: Let TD : IN ?! IN and " : IN ?! [0; 1] be two functions. Let GEN be a non-uniform TD(n); "(n){pseudo random generator. For any index function ` : IN ?! IN the ensembles Aq` ; Aq` are non-uniformly TD(n); "(n){pseudo random, where TD (n) = TD (n) ? 2(n)TGEN (n) ? 2(n) ? log n "(n) = (n)"(n): We disregard the other components of D0 whose total contribution to its running time is negligible. 7 71 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Proof: In our context, the important dierence between a uniform and non-uniform approach is that in the second case `(n) can be "hard wired" as part of any family of circuits fDn gn2IN , which can't necessarily be done in a uniform algorithm D. Thus `(n) need not be sampled (as is done in the proofs to Lemma 9 and Lemma 11). If GEN is a non-uniform TD (n); "(n){pseudo random generator then the probability ensembles UNI`ength;` and GEN`ength;` are TD(n) ? log n; "(n){computationally n o indistinguishable. The reason is that otherwise there exists a family D n n2IN , such that for every n in an innite sequence D n is of size TD(n) ? log n and it distinguishes between UNI`ength;` and GEN`ength;` with probability at least "(n). We construct a family of circuits fDngn2IN by attaching `(n) to Dn and having Dn on input z 2 f0; 1gn invoke Dn on z`ength(n) k`(n). Dn is of size TD (n) and distinguishes between GENn and UNIn with probability at least "(n) which is a contradiction. Using the same arguments as in the proof of Lemma 10 (only with a tighter bound on the indistinguishability of UNI`ength;` and GEN`ength;` ) veries the statement of the corollary. 3.4 Integrating the proofs In this section we show how the various lemmata prove Theorems 2, 3 and 4. Proof: [Theorem 2]: We dene the correlated generator SREP1; SREP2 ; G to be SREP1q ; SREP2q ; Gq , where q's value is determined by the desired size of the succinct representations. Therefore nq is equal to the nal output length n, nq?1 = nq =mq and so on. mq is dened by Equation 3.4. The correlation property of SREP1 ; SREP2; G is proved by Lemma 2. The standard index indistinguishability property of SREP1 ; SREP2 ; G is proved by Corollary 10. The length of the succinct representations is determined by q. Corollary 5 states that for any constant q the length of the outputs of SREP1q (1n ; nq ; `q ; rq ) and SREP2q (1n ; nq ; `q ; rq ) is O((n)n1=(q+1)). By Denition 9 GEN expands seeds of length (n) = n1=(2c`en) to strings of length n. Setting q = 2c`en ? 1 we have that the length of the succinct representations is O(n1=c`en ). Finally, the length of the random input required by SREP1 and SREP2 is given by Corollary 7. Proof: [Theorem 3:] We dene the correlated generator SREP1; SREP2 ; G to be SREP1(n) , SREP2(n) , G(n), where (n) is as dened in Theorem 3. The correlation property of SREP1 ; SREP2 ; G is proved by Lemma 2. The TD (n); "(n){index indistinguishability property of SREP1; SREP2 ; G is proved by Corollary 11. The length of the succinct representations is given by Corollary 6. The size of the random input that the succinct representations require is given by 72 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Corollary 8. Proof: [Theorem 4:] We dene the correlated generator SREP1; SREP2 ; G to be SREP1(n) , SREP2(n) , G(n), where (n) is as dened in Theorem 3. The correlation property of SREP1 ; SREP2 ; G is proved by Lemma 2. The non-uniform TD (n); "(n){index indistinguishability property of SREP1 ; SREP2; G is proved by Corollary 11. The length of the succinct representations is given by Corollary 12. The size of the random input that the succinct representations require is given by Corollary 8. 3.5 PIR schemes In this section we use the ideas we developed earlier in order to construct two server computationally private information retrieval schemes. We begin the section with a helpful Lemma and then show two CPIR schemes in the following subsections. One is a direct application of the correlated pseudo-random generator we constructed, while the other is a slightly dierent variation, which has some advantages over the rst. Lemma 12 : Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let P be a one round, 2-server PIR scheme such that All queries are of the same length q(n) (as in Denition 11). For every j (j 2 f1; 2g) and for every index function i : IN ?! IN the probability ensemble Dj;i = fDj (n; i(n))gn2IN is TD (n); "(n){pseudo random. Then P maintains TD (n); 2"(n){computational privacy. Proof: Let i1; i2 : IN ?! IN be two index functions and let j 2 f1; 2g. Consider the three probability n oensembles: Dj;i1 = fDj (n; i1(n))gn2IN , Dj;i2 = fDj (n; i2(qn(n))) gn2IN and Uni = Uniq(n) n2IN , where Uniq(n) is distributed uniformly over f0; 1g . By assumption Dj;i1 and Uni are TD (n); "(n){computationally indistinguishable and so are Dj;i2 and Uni. By Lemma 6 the two distributions Dj;i1 and Dj;i2 are TD(n), 2"(n) computationally indistinguishable. The lemma follows from Denition 11. 3.5.1 A direct scheme In this subsection we use the correlated generator (SREP1 ; SREP2; G) in order to obtain an ecient PIR scheme. The scheme is similar to the simple linear scheme of [25]. The only dierence is that the long fully random queries are replaced by short succinct representations. The succinct representations are expanded at the server 73 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 end by the generator described in section 3.1. The notation (n); (n) we use in the following theorem is dened in Section 3.1. Theorem 13: If there exists a pseudo-random generator GEN , then for every con- stant c > 1 there exists a two server, one round, computationally private information retrieval scheme P in which the user sends each server O(n1=c ) bits of communication and receives one bit in return. Proof: The Scheme: Suppose the user wishes to retrieve x`, the value of the `-th bit of the database. Algorithm Q: The user constructs two queries, u =4 SREP1(1n ; `; r) and w =4 SREP2 (1n; `; r), where the correlated generator SREP1 ; SREP2; G is as constructed in Theorem 2 (u = SREP1q (1n ; nq ; `q ; rq), w = SREP2q (1n ; nq ; `q ; rq ) and q = 2c ? 1). u is sent to DB1 and w is sent to DB2. Algorithm A: DB1 computes the n bit string G(u; 1n ) =4 U1; : : :; Un and DB2 com4 putes the n bit string G(w; 1n ) = W1; : : : ; Wn. DB1 computes a bit b1, and DB2 computes a bit b2: b1 =4 n M j;xj =1 Uj b2 =4 Finally, b1; b2 are sent to U . Algorithm R: U calculates x` = b1 b2. n M j;xj =1 Wj We can deduce all the elements of the proof from the statement of Theorem 2. By that theorem we have: By the choice of u and w: G(u; 1n )G(w; 1n ) = e`(n) and therefore, b1 b2 = x`. The computational privacy of the scheme is implied by Theorem 2 together with Lemma 12, The number of bits that U sends to each server is O(n1=c). Corollary 14 : Let TD : IN ?! IN , and let " : IN ?! [0; 1]. Let GEN be a TD (n); "(n){pseudo random generator with stretch (n) to n. Let (n), (n) and t(n) be dened as in Theorem 3. Then, there exists a one round, TD(n); "(n){ computationally private information retrieval scheme P such that 74 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 TD (n) = 2Tt(n) ? (2(n) + 23 )TGEN ? 2(n) and "(n) = 4n (n)"(n): The number of bits that U sends to each server is (n) (n) (n) 2(n). D If GEN is a non-uniform TD (n); "(n){pseudo random generator then TD(n) = TD ? 2(n)TGEN ? 2(n) ? log n and " = 2(n)"(n): Proof: Replace the results of the standard version (Theorem 2) by the results of the quantitative versions (Theorem 3 or Theorem 4). 3.5.2 A Generic Transformation In this subsection we describe a construction of a wide class of CPIR schemes. We begin with a scheme that has the properties required in subsection 2.38 and furthermore the query ensembles, D1;` and D2;`, are pseudo-random for any index function `(n). We construct a dierent P IR(n) scheme that has these same properties. We denote the original scheme by P and the new scheme we construct by . P and dier in three major aspects: communication complexity, computational complexity and degree of privacy. The construction of from P is very similar to the construction of the intermediate generator (SREP1q ; SREP2q ; Gq ) from the generator (SREP1q?1 ; SREP2q?1 ; Gq?1) in section 3.1. The query a user sends to each server in resembles a succinct representation for Gq . The main dierence between the generator and the scheme is the way this succinct representation/query is interpreted. Gq interprets the succinct representation as mq succinct representations for the generator Gq?1 (denoted in section 3.1 by v1; : : :; vm ). In a server interprets the query it receives as m queries in the original scheme P . q Notation 11: We denote by P (n); (n) the communication complexity of the user in P and respectively. We denote by P (n); (n) the communication complexity of a server in P and respectively. We use QP ; AP and RP to denote the Q; A and R algorithms of the one round scheme P (see Denition 1). Similarly we use Q; A and R to denote the Q; A and R algorithms of the one round scheme . We denote by TP (n) the computational complexity of QP (1n ; i; r). The following theorem describes the properties of in terms of the properties of a pseudo-random generator GEN and of P itself. For expository purposes we only 8 That is, the scheme has to be a 2-server, one round T (n); "(n){computational PIR scheme in D which symmetry of the servers is maintained and the servers answer any query of the right length. 75 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 concentrate on the case in which GEN is a non-uniform pseudo random generator and P is a non-uniform CPIR scheme. The construction we present works well enough in the other cases (for instance GEN being a standard pseudo-random generator and P being a standard CPIR scheme). However, in such a case the privacy property of has to be changed from what is stated in Theorem 15. The revised statement can be easily deduced by utilizing the tools we developed in Section 3.3. For instance, in the example we quoted, will be a standard CPIR scheme. Theorem 15: Let TD1 ; TD2 : IN ?! IN , and let ; " : IN ?! [0; 1]. Let GEN be a non-uniform TD1 (n); (n){pseudo random generator with stretch (n) to n. Let P be a one round, 2-server PIR scheme such that All queries are of the same length P (n), symmetry of the databases is maintained and any query of the right length is answered. For every j (j 2 f1; 2g) and for every index function i : IN ?! IN the probability ensemble Dj;i = fDj (n; i(n))gn2IN is non-uniformly TD2 (n); "(n){pseudo random. Dene TD : IN ?! IN by TD (n) = min fTD1 (n); TD2 (n)g for every n 2 IN . Then, for every m 2 [n] there exists a one round, two server PIR scheme m such that All queries are of the same length (n), symmetry of the databases is maintained and any query of the right length is answered. For every j (j 2 f1; 2g) and for every index function i : IN ?! IN the probability ensemble Dj;i = fDj (n; i(n))gn2IN is non-uniformly TD (n) ? (TP (n=m) + TGEN (n) + (n) + log n); + "{pseudo random. and the communication complexity is as follows: n) (n) = m((n) + 1) + 2P ( m n) (n) = 2P ( m We assume from here on that the value of m is set and denote m simply as . Application: Construction of computationally private schemes with more exibility than the direct scheme of Subsection 3.5.1 in terms of the relation between privacy and communication complexity. For instance, suppose that the user requires a high level of security. He is willing to use a scheme that is T (n); 2"(n){computationally private (for some reasonable T ), but not a scheme that is T 0(n); 4"(n){computationally private. Given a TD (n); "(n){pseudo random generator GEN and using a similar 76 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 construction to that of Subsection 3.5.1 we can obtain a scheme with communication complexity O(((n)n)1=2) that is TD ? 2TGEN ? 2(n) ? log n; 2"(n){computationally private. The way we achieve this is by changing the correlated generator invoked in the scheme from SREP1 ; SREP2; G to SREP11 ; SREP21; G1. However, using the generic transformation we introduce in Theorem 15 with the scheme B2 (see subsection 2.4) as P we can obtain a TD (n) ? O(n1=4); 2"(n){computationally private scheme with improved communication complexity: O((n)1=2n1=4), by choosing m = n1=2. Proof: We begin with a high level description of . The data x 2 f0; 1gn is viewed as an m dn=me binary matrix. Suppose U wishes to retrieve the `-th item, which is in the (i; j )-th position of the two dimensional matrix. Each of the m rows of the matrix is regarded as a separate database of dn=me bits. We denote the rows by x(1); : : :; x(m). The user and servers execute the scheme P m times in parallel. In each execution the database is viewed as one of the rows of the matrix. U sends a query u to DB1 and a query w to DB2. DB1 interprets u as m queries for P , v1,: : :,vi?1,vi1, vi+1,: : :,vm, and DB2 interprets w as m queries v1,: : :,vi?1,vi2,vi+1,: : :,vm. For every h = 1; : : : ; m each server executes its part of the scheme P where the database is x(h) and the query is the h-th of the interpreted queries. DB1 computes m answers o1,: : :, oi?1,o1i ,oi+1,: : :,om, and DB2 computes m answers o1 ,: : :, oi?1 ,o2i ,oi+1 ,: : :,om . The m answers computed by DB1 are identical to the answers computed by DB2 for every row of the matrix except the i-th row, because the queries they received were identical and P maintains symmetry of the servers. The answers o1i and o2i together allow U to retrieve the desired bit. In the last part of the scheme the servers send U two short strings that allow the user to learn both o1i and o2i . The Scheme : Algorithm Q: U chooses uniformly a subset S [m]. The user also chooses uniformly at random and independently m + 1 seeds for the generator GEN , s1; : : :; si?1; s1i ; s2i ; si+1; : : :; sm 2 f0; 1g(n) U executes the algorithm QP (1n=m ; j; r). That is, invokes the query algorithm of P where the length of the database is n=m and the desired item is the j -th bit. The output of QP is two legal queries of the scheme P , uj and wj . The user computes two correction words as follows: ( (s1i ; 1n) if i 2 S cw1 = uwj GEN =S j GEN (s2i ; 1n ) if i 2 ( GEN (s2i ; 1n ) if i 2 S cw2 = wu j GEN (s1i ; 1n) if i 2= S j 77 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 U constructs the two queries of as follows: u = s1k : : : ksi?1ks1i ksi+1k : : : ksmkcw1kcw2kS w = s1k : : : ksi?1ks2i ksi+1k : : : ksmkcw1kcw2kS i u is sent to DB1 and w to DB2. Algorithm A: Each server receives a string of the form s1k : : : jksmkcw1kcw2kT . Each server computes vh, h = 1; : : : ; m as follows: ( cw1 GEN (sh ; 1n) if h 2 T vh = cw =T 2 GEN (sh ; 1n ) if h 2 Each server executes AP m times in parallel. In execution number h, the database is the row x(h), and the user's query is vh. Every vh is a legitimate query because of the assumption on P . Thus, each server produces an answer for each execution of AP . DB1 produces o1; : : : ; oi?1,o1i ,oi+1,: : :,om and DB2 produces o1; : : :; oi?1 ,o2i ,oi+1 ,: : :,om . The answers are identical for any invocation but the i-th because P satises the symmetry of servers requirement. DB1 computes X X A1 =4 oh; B1 =4 oh h2S and DB2 computes A2 =4 h2S X 4 X o ; B = oh : h 2 L L h2S fig h2S fig All 4 sums A1; B1; A2; B2 are sent to U . Algorithm R: U can now compute the desired replies from the scheme P , namely o1i and o2i . If i 2 S , then o1i = A1 A2 and o2i = B1 B2. If i 62 S , then o1i = B1 B2 while o2i = A1 A2. Given these two replies, U can utilize RP to retrieve the j -th bit of the i-th row, i.e x` . U executes RP (1n=m ; j; r; o1i ; o2i ) and the result is x`. In order to prove the theorem we show that the retrieved bit is indeed x`, that the privacy requirement is satised and that the communication complexity is as stated. Correct retrieval: The proof that the retrieved bit indeed is x` is based on the fact that P is a one round P IR(n) scheme. Suppose DB1 receives uj , DB2 receives wj and the database is x(i). Then, by correctness property of P we have RP (1n=m ; j; r; AP (1; uj ; x(i)); AP (2; wj ; x(i))) = x(i)j = x`: 78 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 The queries each server uses in the m executions of P are v1; : : :; vm, which are computed in algorithm A. Consider some row h that is dierent from i. Since h 6= i, either h 2 T for both servers, or h 2= T for both servers. Thus, both servers compute the same \interpreted query" vh, which is: 8 u GEN (s1 ; 1n) GEN (s ; 1n ) h 2 S; i 2 S > > < wjj GEN (si2i ; 1n) GEN (shh ; 1n ) h 2= S; i 2 S vh = > w GEN (s2 ; 1n) GEN (s ; 1n ) h 2 S; i 2= S > : ujj GEN (s1i; 1n) GEN (shh; 1n ) h 2= S; i 2= S i For the i-th row (h = i) the query that DB1 produces is: ( 1 GEN (s1i ; 1n ) = uj GEN (s1i ; 1n ) GEN (s1i ; 1n ) = uj ; i 2 S vi = cw cw2 GEN (s1i ; 1n ) = uj GEN (s1i ; 1n ) GEN (s1i ; 1n ) = uj ; i 2= S A similar argument shows that the i-th interpreted query of DB2 is wj regardless of whether i 2 S or not. In the next phase of the scheme each server uses the interpreted query and executes AP . For each row h, h 6= i, both servers have the same query vh. By assumption that query is valid in the scheme P and the answers of both servers on an identical query are the same. Therefore, for any row h 6= i both servers produce the same answer oh. The answers of DB1; DB2 when executing P on the x(i) are o1i ; o2i respectively, which together allow the user to retrieve x` . The messages that the servers send the user are: ( 1 P A1 = oPi o h2Snfig oh ii 22= SS h2S h (P i2S B1 = o1 h2SPoh o i 2= S i h2S nfig h (P i2S nfig oh A2 = o2 h2SP =S h2S oh i 2 i ( 2 P oi h2S oh i 2 S B2 = P i 2= S h2S nfig oh it follows from the above that U obtains the desired answers o1i ; o2i by the two exclusive or operations A1 A2 and B1 B2. Thus, that the user can retrieve x` by invoking RP . Privacy: we prove the privacy of the scheme using similar arguments to those we employed in Section 3.3. Therefore, we give only an outline of the proof. We prove 79 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 explicitly the privacy with respect to DB1 , since the privacy of DB2 can be argued in a symmetric manner. The only message DB1 receives is the string un;`(n) = s1k : : : ksmkcw1kcw2kS . All the elements except cw1 and cw2 are chosen uniformly at random and independently. Let ` : IN ?! IN be an index function. We check how dicult it is to distinguish un;`(n) from hn;`(n) = s1k : : : ksmkcw1kr2kS , and to distinguish hn;` (n) from fn;`(n) = s1k : : : ksmkr1kr2kS , where r1 and r2 are chosen uniformly at random and independently. We dene 3 distributions: D1(n; `(n)) is induced by the strings un;`(n), Hn;`(n) is induced by the strings gn;`(n) and Uni(n) is the uniform distribution on f0; 1g(n), which n isoinduced by fn;`(n) . We dene 3 ensembles: D1;` = fD1(n; `(n)gn2IN , H` = Hn;`(n) n2IN and Uni the uniform ensemble. We assume that i 2 S (the case i 62 S is symmetric). Starting with a string that is drawn from either the pseudo random distribution (i.e. GEN (s2i ; 1n)) or the uniform distribution (i.e. r2) we can construct a circuit which transforms it to a string that is drawn from either D1(n; `(n)) or Hn;`(n) respectively. The size of the circuit needed for this transformation is bounded by TP (n=m) + TGEN (n) + (n) + log n (as in Section 3.3 the log n summand is derived from the value `(n) which is wired into the circuit). Therefore, by the reasoning of Lemma 5 D1;` and H` are non-uniformly TD (n) ? (TP (n=m) + TGEN (n) + (n) + log n); (n){indistinguishable. Starting with a string that is drawn from either the query distribution of P (i.e. uj ) or the uniform distribution (i.e. r1) we can construct a circuit which transforms it to a string that is drawn from either Hn;`(n) or Uni(n) respectively. The time needed for this transformation is bounded by TGEN (n) + (n). Therefore, by Lemma 5 H`(n) ; Uni(n) are non-uniformly TD (n) ? (TGEN (n)+ (n)+log n); "(n){ indistinguishable. Hence, by the same reasoning as Lemma 6 we have that D1;` is TD (n)?(TP (n=m)+ TGEN (n) + (n)); (n) + "(n){pseudo random. By Lemma 12 we have that is TD (n) ? (TP (n=m) + TGEN (n) + (n)); 2((n) + "(n)){computationally private. Communication complexity: Each server receives m seeds of length (n), one set of length m, and two strings that have the same length as a query in the scheme P where the database is of length n=m. The sum is m((n) + 1) + 2P (n=m). Each server answers with two strings which are each the same length as an answer in the scheme P where the database is of length n=m, or 2P (n=m).. 3.6 Concluding Remarks and open problems Following the introduction of computational PIR in the rst version of our work [22], Kushilevitz and Ostrovsky [58] realized that the (n) lower bound on single server 80 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 PIR schemes, proved in [25] for the information theoretic setting, does not hold for the computational setting. They constructed an elegant single server computational PIR scheme, based on the intractability assumption of quadratic residuosity. More ecient single server CPIR schemes, based on dierent intractability assumption, were later presented by Mann [60], Stern [72] and by Cachin, Micali and Stadler [17]. All these schemes are computationally private and rely on specic number theoretic intractability assumptions that in particular imply the existence of trap door one way function. Indeed it has been recently shown in [10, 33] that single server CPIR schemes with sub-linear communication imply the existence of trap door one way functions. On the other hand, our two server construction relies on the weaker assumption that one way functions exist. Given a generator that expands (n) bits to n bits, the computational complexity of our correlated pseudo random (and the resulting two server CPIR plog(n=kgenerator ) schemes) is (approximately) 2 . It is an interesting open question to nd a construction which brings this complexity down to a polynomial in k + log n. 81 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Chapter 4 Private information retrieval by keywords In this chapter we introduce the problem of private information retrieval by keywords and present several solutions. Our solutions are in the form of reductions from PrivatE information Retrieval by KeYwords (PERKY) schemes to PIR schemes. We also show reductions between the symmetric problems (SPIR to SPERKY). We begin with a number of denitions. Among them are denitions of PIR and CPIR, which are a generalized version of the previous denitions (in Section 2). In this chapter we view the database x = x1; : : : ; xn as n blocks of ` bits each. That is, for every i 2 [n] we have xi 2 f0; 1g` . Previously the block size was 1 in all our discussions. 4.1 Denitions and Notation 4.1.1 The PIR model Let DB1 ; : : :; DBk be k servers, each holding a copy of a database x = x1 : : : xn where for every i 2 [n], xi is a block of ` bits. A user, denoted by U , wishes to retrieve one of the bits xi (1 i n). The servers can communicate with the user, but not with each other. The following is a denition of private information retrieval in a somewhat limited setting, which suces for our purposes. We dene a scheme that achieves informationtheoretic privacy, is executed in one round of communication and assures the user of correct retrieval with probability 1. Denition 21: P is a one round, k-server PIR scheme maintaining informationtheoretic privacy if it is a trio of algorithms: 82 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Q(n; `; i; r): The query algorithm, receives as input n, the number of data blocks, `, the length of each block, i, the retrieval index and r a random input. Its output is a k-tuple of queries (q1; : : : ; qk ). A(j; qj ; x): The answer algorithm, receives as input a server number j (j 2 [k]), a query qj and the database x. Its output is an answer aj . R(n; `; i; r; a1; : : :; ak ): The reconstruction algorithm, receives as input the database length, retrieval index, random input and the k answers. Its output is a single block of ` bits. The user is dened by the two algorithms Q and R. The j -th server is dened by the algorithm Aj which is A(j; ; ) (the algorithm A where the rst argument is restricted to j ). P involves three steps: U uses Q to generate k queries and send one query to each server (qj to DBj ). Each server replies with an answer (DBj uses Aj to generate aj ). Finally U uses R to reconstruct xi from the answers. P must have the following two properties: Correctness: For all n; ` 2 IN; x 2 f0; 1gn ; i 2 [n] and r, if Q(n; `; i; r) outputs (q1; : : : ; qk) then: R(n; `; i; r; A1(q1; x); : : :; Ak(qk ; x)) = xi: Privacy: Let Dj (n; `; i) denote the distribution of the query algorithm's output re- stricted to its j -th entry as induced by the random choices of r (in other words, the distribution of the qj s). Then, for every i1; i2 and j , where 1 i1 i2 n, and j 2 [k] we require that Dj (n; `; i1) = D(n; `; i2 ): 4.1.2 The CPIR model We begin by dening computational indistinguishability of two probability ensembles. Denition 22: A probability ensemble Y is an enumerated sequence of probability distributions Y = fYn;` gn;`2IN , where each distribution Yn;` ranges over some domain Dn;` . Denition 23 : Let `ength : IN IN ?! IN be a monotonously non-decreasing function satisfying `ength(n; `) n`. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let 83 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Y = fYn;` gn;`2IN and Z = fZn;` gn;`2IN be two probability ensembles such that the domain of both Yn;` and Zn;` is f0; 1g`ength(n;`) . We say that Y and Z are TD(n`); "(n`){ computationally indistinguishable if the following holds: For every distinguisher D, a probabilistic algorithm whose running time is bounded by TD (), there is a c0 such that for all n; ` where n` c0 Pr[D(Yn;` ; 1n` ) = 1] ? Pr[D(Zn;` ; 1n` ) = 1] < "(n`) where the probability is taken over the distributions Yn;` , Zn;` , and over the coin tosses of D. Denition 24 : Let `ength : IN IN ?! IN be a monotonously non-decreasing function satisfying `ength(n; `) n`. Let T be a family of functions of the form TD : IN ?! IN , and let E be a family of functions of the form " : IN ?! [0; 1]. Let Y = fYn;` gn;`2IN and Z = fZn;` gn;`2IN be two probability ensembles such that the domain of both Yn;` and Zn;` is f0; 1g`ength(n;`) . We say that Y and Z are T ; E { computationally indistinguishable if for every TD 2 T and " 2 E , Y and Z are TD (n`); "(n`){computationally indistinguishable. We say that Y and Z are computationally indistinguishable if they are T ; E {computationally indistinguishable, where E = f(n`)?c j c 2 IN g and T = fn`)c j c 2 IN g: We use the same notation as in denition 21. In particular, we denote by Dj (n; `; i) the distribution of the j -th query (qj ) as induced by the random choices of r. Denition 25 : Let TD : IN ?! IN , and " : IN ?! [0; 1]. P is a one round, k-server PIR scheme maintaining TD (n`); "(n`){computational privacy if it is a PIR scheme according to Denition 21, except for the privacy condition which is changed to: Privacy: Let n; ` 2 IN and let qlen : IN IN ?! IN be the query length function. For every i 2 [n] and for every j 2 [k] the j -th output of the query generator Q(n; `; i; r) (denoted qj ) is of the same length qlen(n; `). For every j (j 2 [k]) and for every two index functions i1; i2 : IN IN ?! IN the two probability ensembles Dj;i1 = fDj (n; `; i1(n; `))gn;`2IN and Dj;i2 = fDj (n; `; i2(n; `))gn;`2IN are TD(n`); "(n`){computationally indistinguishable (in this case `ength(n; `) = qlen(n; `)). We say that P is a one round, k-server PIR scheme maintaining computational privacy (in short a computationally private information retrieval scheme) if for every j , i1() and i2() as above, the ensembles Dj;i1 , Dj;i2 are computationally indistinguishable. 84 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 4.1.3 Search data structure In this subsection we dene in a formal way the intuitive notion of a data structure that supports search operations on strings. Such a data structure holds n strings fs1; : : :; sng and (eciently) answers queries of the type: is w 2 fs1; : : :; sng. Informally we model the data structure as t words of length m bits each, which are stored in the memory. The search algorithm A begins at a known address, the root, and at each step performs a computation which determines another memory address containing one of the t words. After at most d such steps the algorithm returns an answer, which is 1 i w 2 fs1; : : :; sn g. Our denition is general enough that just about any search data structure we know of can be represented in those terms. Denition 26: Let t; m; d : IN IN ?! IN be three functions. We say that DS is a (t; m; d){search data structure if it is a trio (M (S ); root; B ) which for every n; ` 2 IN 4 and every S = fs1; : : :; sng ; s1; : : : ; sn 2 f0; 1g` satises the following requirements: Structure: M (S ) is a set of t(n; `) binary words, u1; : : :; ut(n;`) 2 f0; 1gm(n;`) . Search algorithm: B (w; u; aux) is an algorithm that receives nas input w 2 fo0; 1g` , the query string, u the current word which is in the set u1; : : :; ut(n;`) , and some auxiliary information aux modeling the history of the search. Its output is either a pair (add0 ; aux0): an address of a new current word (i.e add0 2 [t(n; `)]) and auxiliary information (which are used in the next invocation of B ) or a bit ans(w; S ) 2 f0; 1g. Search length: Let B q (w; u; aux) denote q consecutive executions of B beginning with input (w; u; aux). Two consecutive executions mean that if (add0 ; aux0) is the output of the rst execution, then (w; uadd0 ; aux0) is the input of the second. q consecutive executions are the natural generalization of this notion. Then, B q (w; uroot; init) (where init is some appropriate initial information) returns ans(w; S ) for some q; q d(n; `). Correctness: ans(w; S ) = 1 if and only if w 2 S . Notation 12: We denote by Tq (n; `) the set of all addresses in the range 1; : : : ; t(n; `) that can appear as the rst element in the output of the q-th invocation of B , for some structure M and query string w. Formally, Tq (n; `) =4 fadd 2 [t(n; `)] : 9w; 9S; 9aux; s:t: Aq (w; uroot; init) = (add; aux)g : 4 We denote the cardinality of Tq(n; `) by tq (n; `) = jTq (n; `)j. We denote by MAP an algorithm that maps an address in t(n; `) to an address in tq (n; `). MAP takes 85 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 as input (n; `; q; add), where add 2 [t(n; `)] and outputs addq . If add 2 Tq (n; `) then addq 2 [tq(n; `)]. Otherwise addq = 0. In some search data structures, two dierent words ui and uj may have two different lengths. While it is always possible to choose m(n; `) as the maximum of these lengths, we may nd the following notation to be convenient. Notation 13: Let mq (n; `) denote the maximum length of all the words in Tq (n; `). Let m0(n; `) denote the length of the root word. 4.2 Private Retrieval of Blocks In [25] both PIR(n; k) and PIR(`; n; k) were introduced. PIR(`; n; k) schemes were presented as generic transformations from PIR(n; k) schemes. In [38] SPIR(n; k) was formulated, but was not generalized to SPIR(`; n; k). Most of the research to date focused on PIR(n; k) schemes, while PIR(`; n; k) protocols received little attention. Since in the PERKY context, the more general PIR(`; n; k) and SPIR(`; n; k) problems are important we devote a section to the two problems. This section contains two parts. In the rst we show how a construction of a PIR(`; n; k) scheme from a PIR(n; k) scheme that was presented in [25] can be generalized to construct a SPIR(`; n; k) scheme from a SPIR(n; k) scheme. In the second part we give a brief overview of the known PIR(`; n; k) schemes. 4.2.1 SPIR( `; n; k ) We are interested in a scheme in which the user retrieves a block of ` bits (` 1) out of n such blocks that the servers hold. We require that in this type of scheme the user obtain a single block of ` bits and nothing else. In particular, the user cannot obtain bits from dierent blocks. The \naive" solution of executing a SPIR(n; k) scheme ` times does not work, because the user can retrieve bits from dierent blocks. Lemma 13 : Let P be a one round, k server SPIR(n; k) scheme (for dishonest users), maintaining information-theoretic privacy, with user communication complexity P (n; k) and server communication complexity P (n; k). Then, there exists a one round, SPIR(`; n; k) scheme (for dishonest users), P 0, which maintains informationtheoretic privacy and has user communication complexity P 0 (`; n; k) = P (n; k) and server communication complexity P 0 (`; n; k) = `P (n; k). Proof: The protocol is essentially the same as the general solution to PIR(`; n; k) presented in [25]. The scheme P 0: 86 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Algorithm QP 0 : Assume that U wishes to retrieve the i-th block of bits. U executes the query algorithm QP (n; `; i; ru) and sends the j -th entry of the output to DBj . Algorithm AP 0 : The servers regard the database as an ` n matrix, where each column is one of the blocks of bits. Denote the q-th row of the matrix by x(q). The scheme P is invoked by the servers ` times so that each invocation is independent of the others. In order to achieve that their source of randomness must generate ` times as many random bits as are required in P . We assume that the servers have a shared random string rs = rs1 ; : : : ; rs such that for every q = 1; : : : ; ` the substring rs can be used as the shared random string in P . Furthermore, the ` substrings are chosen independently. DBj executes ` q AP (j; qj ; x(1)krs1 ); : : : ; A(j; qj ; x(`)krs ): ` Let the ` outputs be denoted by a1j ; : : :; a`j . DBj sends a1j ; : : : ; a`j to U . Algorithm RP 0 : U executes RP (n; `; i; ru; aq1; : : :; aqk ) for q = 1; : : :; ` thereby retrieving the i-th block of bits. Correctness and communication complexity: It follows from the correctness property of P that RP (n; `; i; ru; aq1; : : : ; aqk) returns the i-th bit of the row x(q), for any q = 1; : : : ; `. Therefore, U gets the whole i-th column (block). It is also evident from the construction that the communication complexity of the user is P (n; k) as its queries are identical to those of P , and the communication complexity of each server is `P (n; k). Privacy: The privacy of the user in P 0 is identical to its privacy in P since it depends only on the output distribution of the algorithm QP . Data privacy follows from the data privacy property of P . Let U be some deterministic user (which can be modeled by q1; : : : ; qk, the k-tuple of queries it generates). By the data privacy of P , for some i 2 [n] and for every y 2 f0; 1gn the answer distribution AP (y) is independent of y, given yi. Let x be the database in P 0, i.e. an ` n binary matrix. Let x0 be an ` n binary matrix such that the i-th column of x is identical to that of x0. the answer distribution of P 0 on database x is AP 0 (x) = AP (x(1)); : : :; AP (x(`)), and on x0 it is AP 0 (x0) = AP (x0(1)); : : : ; AP (x0(`)). For every q = 1; : : : ; ` we have AP (x(q)) = AP (x0(q)). Furthermore the ` sub-distributions AP (x(1)); : : :; AP (x(`)) are independent, and so are the ` sub-distributions AP (x0(1)), : : :, AP (x0(`)). Therefore, AP 0 (x) = AP 0 (x0). 87 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 4.2.2 PIR( `; n; k ) There are two known constructions of PIR(`; n; k) from PIR(n; k). Lemma 14: Let P be a one round, k server PIR(n; k) scheme (where k 1), maintaining information-theoretic privacy (TD (n); "(n){computational privacy), with user communication complexity P (n; k) and server communication complexity P (n; k). Then, there exists a one round, PIR(`; n; k) scheme, P 0, which maintains informationtheoretic privacy (TD (n); "(n){computational privacy), and has user communication complexity P 0 (`; n; k) = P (n; k) and server communication complexity P 0 (`; n; k) = `P (n; k). Proof: The scheme is identical to the one presented in Lemma 13. In this case the computational privacy carries over to P 0 from P without change. The second construction is of a more limited nature. It is applicable only in a multi-server setting (k 2). We state the result for a constant number of servers in the information-theoretic setting (its generalization to the computational setting, or to a non-constant number of servers is possible , but does not yield ecient schemes). Lemma 15: For any constant k, k 2, and for any `, there exists a one round PIR(`; n; k) scheme, maintaining information theoretic privacy, with communication k 1 complexity O(n 2k?1 ` 2k?1 ). The lemma is proved by combining two techniques: the ecient PIR(n; k) schemes of [3], and a generalization of the balancing technique of [22]. The full proof appears in [39]. 4.3 General Solutions to PERKY( `; n; k ) A trivial solution to the PERKY(`; n; k) problem is to have the server send all the strings it holds to U . The communication complexity in this case is O(n`). Our focus is on providing more ecient solutions to the problem. As a rst step we show simple reductions from PIR to PERKY and in the opposite direction too. Theorem 16 : Let P be a one round PIR(n; k) (SPIR(n; k)) scheme with com- munication complexity CP (n; k). Then, there exists a scheme that solves the problem PERKY(`; n; k) (SPERKY(`; n; k)) with communication complexity C(`; n; k) = CP (2` ; k). If P maintains information-theoretic privacy (information-theoretic data privacy), then so does . If P maintains TD(n); "(n){computational privacy then maintains TD (2` ); "(2` ){computational privacy. 88 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Proof: The scheme : The n strings s1; : : :; sn are replaced with their incidence vector: a 2` bit string in which the j -th bit is 1 i the j -th ` bit string, in the lexicographic order, is one of s1; : : : ; sn. Suppose that the word w that U holds is the i-th word in the lexicographic order. U and DB1; : : : ; DBk execute P on a database of length 2`, where the user retrieves the i-th bit. Correctness and communication complexity: The correctness property of and the claim that its communication complexity is CP (2` ; k) follow immediately from the analogous properties of P . User privacy: Next, we show that if a passive adversary controls one of the servers DB1; : : : ; DBk privacy is maintained. If P is information-theoretically (TD (n); "(n){ computationally private) then for any j 2 [k] and any i1; i2 2 [2` ] the two query distributions Dj (2` ; i1) and Dj (2` ; i1) are equal (TD (2`); "(2` ){computationally indistinguishable). We prove the privacy of by constructing a simulator S for every real life adversary A. To simulate the view of an adversary A who corrupts a server DBj the simulator S has to do the following: 1. Arbitrarily pick some index i0 2 [2`] and generate the j -th query, qj of a user retrieving i0. 2. Output that query together with the rest of its view, namely the database x. By the privacy property of P we have that the output distribution of S is identical (TD (2`), "(2` ){computationally indistinguishable) to the view of A, thus proving the user privacy of . Data Privacy: Let P be an information theoretically private SPIR scheme. We show that is an information theoretically private SPERKY scheme. Let U be the (possibly dishonest) user in . We construct two simulators S0; S1. On input w, the user's input, and b, the output (which is 1 if and only if w 2 fs1; : : :; sng) S0; S1 do the following: 1. S0; S1 run the query algorithm of U and obtain a k-tuple of queries (q1; : : : ; qk). 2. S0 sets the database to x0 = 02 , S1 sets the database to x0 = 12 . S0; S1 ips coins to get the shared random string rs. 3. S0; S1 run the answer algorithms, Aj (qj ; x0krs) for j = 1; : : : ; k and obtain a k-tuple of answers (a1; : : : ; ak ). 4. Finally, S0; S1 output w; b; (q1; : : :; qk ); (a1; : : : ; ak ). Let x be the 2` bit incidence vector of fs1; : : :; sn g. By the data privacy property of P we have that for some i 2 [2`] and for every x0 such that x0i = xi, the answer ` ` 89 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 distribution (the distribution on (a1; : : : ; ak )) A(x) = A(x0). For one of the simulators S1 or S2 their database x0 satises x0i = xi. For that simulator, the distribution it outputs is identical to the view of U , thereby proving the security of according to the denition of SPERKY schemes. Theorem 17: Let be a PERKY(`; n; k) (SPERKY(`; n; k)) scheme with commu- nication complexity C(`; n; k). Then, there exists a scheme P that solves the problem PIR(n; k) (SPIR(n; k)) with communication complexity CP (n; k) = C (log n+1; n; k). Proof: The scheme P : The servers replace the n bit string they hold, x = x1; : : : ; xn, with n strings of length log n + 1 bits each, s1; : : :; sn . Let j 2 [n] and let its binary expansion be j1; : : :; jlog n. Dene sj to be sj =4 j1; : : : ; jlog n ; xj . Suppose that the user is interested in the i-th bit. That bit is 1 i si = i1; : : :; ilog n ; 1. U and DB1; : : : ; DBk the scheme on input (log n + 1; n; k) where the word that U holds is i1; : : : ; ilog n; 1. The correctness and communication complexity of P are evident. However, we cannot prove its privacy according to Denitions 21, 25 and 13 because P is not necessarily a one round scheme. However, if we dene PIR (SPIR) according to the more general denitions of Subsection 2.6.1 where the computed function is the PIR function, we get the same security assurance for P that has. Discussion The above reduction indicates that we cannot hope to nd PERKY schemes which are signicantly more ecient than PIR schemes. Our aim is now to use PIR schemes in order to construct PERKY schemes that are more ecient than what can be obtained by Theorem 16. The main idea in all of our subsequent PERKY constructions is the following: the servers insert s1; : : : ; sn into a search data structure. The user conducts an oblivious walk on the data structure until either the word w is found, or U is assured of the fact that w is not one of s1; : : : ; sn. A typical search in the data structure involves a sequence of operations, where each operation consists of fetching the contents of a word from memory, performing a \local" computation, which depends on the keyword and the fetched contents, and either determining a new memory address based on the computation, or terminating the search (successfully or unsuccessfully). This sequence of operations can be viewed as a walk on the data structure. We now describe a general outline of transforming this walk into an oblivious walk on the data structure, namely a walk where each server gets no information on the walk (and, therefore, on the desired keyword itself). 90 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 In the oblivious walk, the operations of the original data structure are divided between the user and the server(s). Each server maintains the structure M , without any modication. The server supports the user in fetching words with known addresses from the memory, by performing an agreed upon PIR scheme. The local computations, as prescribed by the algorithm B , are performed exclusively by the user. The result of each local computation determines the address of the next word to be fetched. If the data structure requires (in the worst case) d memory accesses, the user will invoke the PIR scheme d times successively. If for some given keyword fewer than d invocations are required, the user will still execute the PIR scheme d times, with arbitrary (dummy) addresses in the last operations. Otherwise the server learns the search length for this specic keyword. This discussion leads to the following theorem: Theorem 18: Let DS be a (t; m; d){search data structure (Denition 26) and let P be a one round PIR(`; n; k) scheme with communication complexity CP (`; n; k). Then, there exists a scheme that solves the problem PERKY(`; n; k) and has the following properties: Privacy: If P maintains information-theoretic (or computational) privacy then maintains information theoretic (or computational1 ) privacy. Communication complexity: C(`; n; k) = m0(n; `) + dX (n;`) q=1 CP (mq(n; `); tq (n; `); k): Round complexity: The number of communication rounds in is d(n; `) + 1. Proof: We slightly abuse the notation and once n and ` are set, do not explicitly write them down. Thus, mq denotes mq (n; `), tq denotes tq(n; `) etc. The scheme : 1. The servers DB1; : : : ; DBk insert the n strings s1; : : :; sn into the agreed upon data structure, DS . DB1 sends uroot, the rst word in any search sequence, to the user. 2. U sets ans =?2, u = uroot and aux = init (where init is the initial information). Computational privacy is assured if and only if constructing DS, conducting a search operation in DS and executing P can all be performed in polynomial time. However, since all search data structures, and PIR schemes in the literature are of this type, we do not state these requirements explicitly in the theorem. 2 ans denotes the answer of the protocol, whether w 2 fs ; : : :; s g. At the beginning of the 1 n scheme its value is unknown. 1 91 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 3. For every q = 1; : : : ; d the following is performed by DB1; : : : ; DBk and U : (a) The user, U , executes B (w; u; aux) and gets as output either (add0; aux0) or ans(w; S ). If the output is (add0; aux0) then the user computes a new address add = Map(n; `; q; add0). Otherwise the user does the following: it sets add = 13, if ans is still ? it sets it to ans(w; S ) and nally aux is set to aux0. (b) The parties execute the PIR scheme P with parameters (tq ; mq; k). The database is the tq words in Tq (each of length m). The user's retrieval index is add, and the user retrieves the full word of m bits at the desired position. (c) The user sets u to be the retrieved word. 4. the user's answer is ans. Correctness: We claim that at the end of the scheme ans is the correct bit (1 i w 2 fs1; : : :; sn g) because of the correctness property of DS . The protocol carries out a distributed evaluation of B q (w; uroot; init) (Denition 26) for every q = 1; : : : ; d. Since for some q; q d, the output of B q(w; uroot; ) is ans(w; S ), which is the correct bit and since ans = ans(w; S ) the claim follows. Privacy: We utilize Theorem 1 to prove the information-theoretic (or computational) privacy of the scheme. We show that the protocol maintains informationtheoretic (computational) privacy in the semi-ideal model, in which a trusted party computes the output of each PIR protocol. Theorem 1 ensures that if the PIR protocol P maintains information-theoretic (computational) privacy then maintains information-theoretic (computational) privacy. Let A be an adversary in the semi-ideal model. It can control a server DBj (1 j k) of its choice. A simulator S in the semi-ideal model mimics A trivially. On input s1; : : : ; sn its output is d copies of s1; : : :; sn. The only messages a reallife adversary receives are part of d invocations of P . In the semi-ideal model, for each invocation of P the adversary sends s1; : : :; sn to the trusted party and gets no answers. Thus, the simulator's output is identical to the view of A (In the case of an adversary that corrupts DB1 the simulator has to add uroot to the output.). Complexity: In the rst round of communication uroot, which is an m bit word, is sent to the user. The rest of the communication involves d executions of the PIR scheme P with parameters (t1; m; k); : : :; (td; m; k). Thus the total communication complexity is m + Pd C (m; t ; k) and the number of rounds is d + 1. q=1 P 3 q From this point onwards the user carries out bogus queries. 92 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 We remark that schemes of the type we described above support the simultaneous implementation of private as well as \regular" (non-private) retrieval, which is certainly a desirable property from a practical point of view. 4.4 Specic implementations In this section we give three examples of PERKY schemes based on the general construction of Theorem 18, using a dierent data structure each time. The most ecient scheme in terms of communication and round complexity is the one described in Subsection 4.4.3, which is based on perfect hashing [73, 36]. The expected computational complexity of this scheme is linear, although in the worst case it might become exponential (in `). The scheme in Subsection 4.4.1, based on binary search trees, has linear deterministic computational complexity but is inferior in communication and number of rounds. The last scheme, in Subsection 4.4.2 is based on the trie data structure, [56]. It is inferior to the other two options in all complexity measures but can be converted into a symmetrically private scheme (Subsection 4.5). 4.4.1 Binary Search Tree Corollary 19 : Let P be a one round PIR(`; n; k) scheme with communication complexity CP (`; n; k). Then, there exists a scheme that solves PERKY(`; n; k) and has the following properties: Privacy: If P maintains information theoretic (or computational) privacy then maintains information theoretic (or computational) privacy. Communication complexity: C(`; n; k) = ` + log Xn q=1 CP (`; 2q?1 ; k): Round complexity: The number of communication rounds in is log n + 1. Proof: We show a specic search data structure DS such that the corollary follows from Theorem 18. Consider the complete binary search tree consisting of 2dlog(n+1)e ? 1 elements (that is the smallest complete binary tree with at least n nodes). Every node holds an ` bit string such that all the strings in its left sub-tree are smaller than it in the lexicographic order, and all the strings in its right sub-tree are greater than it. Searching for a word w proceeds by beginning at the root of the tree and for each 93 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 node moving to its left son if the string at the node is greater than w and to its right son if the string at the node is smaller than w. The search terminates successfully upon reaching a node that holds w, or unsuccessfully upon reaching a leaf node that does not hold w. We now describe the various parameters of the search data structure in the terms of Denition 26. The number of words in the structure is t(n; `) = 2dlog(n+1)e?1. The rst n words are s1; : : :; sn and the rest are all the same dummy word 1` . Thus, m(n; `) = `. The maximumsearch length is equal to the depth of the tree d(n; `) = dlog(n + 1)e?1. The root of the structure is the tree's root, and its index is 2dlog(n+1)e?1. For every q = 1; : : : ; d(n; `), the set of indices Tq includes the following: ) ( q+1 ? 1 1 3 2 Tq = 2q+1 2dlog(n+1)e; 2q+1 2dlog(n+1)e; : : : ; 2q+1 2dlog(n+1)e : The number of elements in each such set is tq = jTqj = 2q . Finally, the search algorithm B receives as input three parameters (w; u; aux). w is the given query, u is the value of the string at the current node and aux contains n; q and the address of u, which is (1=2q+1 ) p2dlog(n+1)e for some p. If u = w the output of B is ans(w; S ) = 1. Otherwise, if q = dlog(n + 1)e its output is ans(w; S ) = 0. If q < dlog(n + 1)e then B 's output is (add0; aux0), which is determined as follows. If u > w in the lexicographic order then add0 = (1=2q+2 ) (2p + 1)2dlog(n+1)e. Otherwise, add0 = (1=2q+2 ) (2p ? 1)2dlog(n+1)e. In both cases, aux0 is (n; q + 1; add0). 4.4.2 Trie In this subsection we present a PERKY(`; n; k) scheme, which is based on the trie data structure [56, page 481]. Denition 27: Let s be a binary string of q bits, 0 q `. We say that s is a legal prex of s1; : : :; sn , if it is a prex of some sj (1 j n). Corollary 20 : Let P be a one round PIR(`; n; k) scheme with communication complexity CP (`; n; k). Then, there exists a scheme that solves PERKY(`; n; k) and has the following properties: Privacy: If P maintains information theoretic (or computational) privacy then maintains information theoretic (or computational) privacy. Communication complexity: C (`; n; k) = (` ? 1)CP (log n; 2n; k): 94 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Round complexity: The number of communication rounds in is `. Proof: In its basic form a trie holding n strings of length ` is a binary tree of depth ` that has n leaf nodes. Each edge represents a bit and each node represents a legal prex. Since a node v in a tree corresponds to the unique path from the root to it, we take the prex represented by v to be the concatenation of all bits on the path edges from the root to v. Therefore, each of the leaf nodes represents one of the strings held by the trie. Let w = w1 : : : w` be the query string (w1; : : : ; w` 2 f0; 1g). A search operation proceeds down a path from the root to a leaf node. At depth q ? 1 (q = 1; : : : ; `) the current node corresponds to the prex w1; : : :; wq?1. The search continues down the edge marked by wq if it exists and terminates (unsuccessfully) otherwise. The search terminates successfully if a leaf node has been reached. We slightly modify the basic trie structure in the scheme . The basic structure has the disadvantage that the number of nodes in each level of the tree (except the rst and last) depends not only on on n and `, but also on the actual strings s1; : : : ; sn. In the structure we use each level is assumed to have n nodes4 (which is an upper bound on the number of necessary nodes). The servers pad each level with "dummy" nodes, in order to ensure that the total number of nodes in that level is n. Each dummy node is simply a string of zeroes. Another modication is that the nodes at depth ` ? 1 are the leaves instead of those at depth `. Each such node represents an ` ? 1 bit legal prex, y1 : : : y`?1, and holds 2 bits. The rst is 1 i y1 : : :y`?1 0 2 fs1; : : :; sn g, and the second is 1 i y1 : : : y`?11 2 fs1; : : :; sn g. In any other level each node is composed of two log n bit strings. The rst is the pointer to the left son (this edge represents the 0 bit), and the second is the pointer to the right son (the 1 bit). The internal structure of each level has to be fully determined by s1; : : : ; sn. Thus in a multi-server scheme all the servers construct the same data structure given the same set of n keywords. One possible solution is to have the actual nodes of the level appear rst, ordered by the lexicographic ordering of the prexes they represent, and the dummy nodes appear after them. We now describe the various parameters of the search data structure in terms of Denition 26. Each node in a trie contains two pointers to its left and right son. Each word in M corresponds to one such pointer. If the pointer is null (i.e the appropriate edge does not exist), the corresponding word is the all 0 word. The number of words in the structure is t(n; `) = 2n(` ? 1) + 2. The size of each word is m(n; `) = log n. The maximum search length is equal to the depth of the tree d(n; `) = ` ? 1. The root of the structure is empty5. For every q = 1; : : : ; d(n; `), the set of indices Tq includes Another, more ecient approach is to have the root contain the number of nodes in every level. In an actual implementation in computer memory, the root would contain pointers to the single bit prexes 0 and 1. Here, however, we are only interested in the address of each such prex in T1. 4 5 95 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 the q-th block of 2n words (corresponding to 2n pointers at the q-th level of the trie). Thus tq = 2n. Finally, the search algorithm B receives as input three parameters (w; u; aux). w is the given query, u is the current pointer and aux contains q. If q = 0, that is aux = init, then B outputs (add0; aux0), where add0 is the address in T1 of the single bit prex w1. If 0 < q < d(n; `) and u = 0log n then B outputs ans(w; S ) = 0. If 0 < q < d(n; `) and u 6= 0log n, then u is the address of a node v in level number q + 1. B outputs (add0; aux0), such that if wq = 0 (wq = 1) add0 is the address in Tq+1 of the left (right) pointer of v, and aux0 = q + 1. If q = d(n; `) then u is the answer of the protocol and B outputs ans(w; S ) = u. 4.4.3 Perfect Hashing Suppose the servers can nd a hash function h : f0; 1g` ?! f1; : : : ; tg, where t n, such that h is perfect for fs1; : : :; sng. A hash function is perfect for fs1; : : : ; sng if its restriction to fs1; : : :; sng f0; 1g` is one-to-one. In order to nd such a hash function we use techniques introduced in [73] and [36]. To make this presentation self contained, we give a brief description of these techniques. We dene a family of functions, H, such that 8h 2 H, h : f0; 1g` ?! f1; : : : ; tg and jHj = 22`. For any (a; b) 2 f0; 1g` f0; 1g` we dene a function ha;b by the rule ha;b(x) = ax + b. We view a and b as elements of the nite eld GF (2`), and addition and multiplication are with respect to this eld. H is a universal family of hash functions. In other words, for any x; y 2 f0; 1g` ; x 6= y, if ha;b is chosen uniformly at random from H, then Pr[ha;b(x) = ha;b(y)] = 21 . We are interested in functions that map f0; 1g` to f1; : : : ; tg, and therefore, for every ha;b 2 H, we take ha;b(x) to be only the rst log t bits of ax + b (for our purposes it is sucient to assume that t is a power of 2). The family H remains universal with Pr[ha;b(x) = ha;b(y)] = 1t . We choose uniformly at random a function h 2 H, and use it to map n eld elements s1; : : : ; sn. Let F be a random variable whose value is dened as the number of pairs si 6= sj such that h(si) = h(sj ), and is determined by the random choices of h. The expected value of F over all choices of h is: ! X n (4.1) E [F ] = Pr[h(si ) = h(sj )] = 2 1t : s 6=s ` i j If we choose t = n2 then E [F ] 12 , and since F is an integer, at least half the functions in H are 1-to-1 over s1; : : : ; sn (a function is 1-to-1 i F = 0). That address can be agreed on beforehand, e.g. address number 1 (2) corresponds to the prex 0 (1). 96 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 We'd like to map the strings s1; : : :; sn to an array of size O(n) instead of O(n2). The mapping is carried out in two stages. In the rst stage the functions in H are regarded as mapping strings of length ` to f1; : : : ; ng. We choose a function h 2 H that has the following property: F , the number of pairs si 6= sj such that h(si) = h(sj ), is at most n. By Equation 4.1, since t = n we have E [F ] n2 , and hence jF j n for at least half the functions in H. In the second stage we choose n functions: h1; : : :; hn 2 H. We denote the number of strings that h maps to i by ni, and assume that these strings are si1; : : :; sin . In order to choose hi (1 i n) we regard the functions in H as mapping strings of length ` to f1; : : :; n2i g. The function hi 2 H is required to be a 1-to-1 mapping of fsi1; : : :; sin g to f1; : : : ; n2i g. At least half the functions in H satisfy this requirement. Finding any one of the n + 1 functions h; h1; h2; : : : ; hn can be achieved eciently by choosing a function uniformly at random from H. The expected number of trials needed until all of these functions are found, is 2(n + 1). Two arrays are now constructed. In the rst there are n entries, and in each entry two data elements are stored: P A description of the function hi and the sum Piq?=11 n2q . In the second array there are nq=1 n2q entries. The strings fs1; : : :P; sng are stored in this array as follows: if h(sj ) = i, then sj is stored in cell number iq?=11 n2q + hi(sj ) in the array. By construction no two strings are stored in the same cell. We now show that the size of the second array n is at most 3n. The number of pairs P n si 6= sj such that h(si) = h(sj ) is jF j = q=1 2 . On the other hand, by the choice of h we have jF j n. Therefore: n n n X X X nq (nq ? 1) + nq n2q = i i q q=1 q=1 = 2 jF j + n 3n Discussion q=1 Our aim is to transform the above construction into a PERKY scheme. The servers construct the two arrays, and the user retrieves the string at the correct cell. In order to achieve this goal the servers have to agree on n + 1 hash functions, h; h1; : : :; hn. Having one server choose these functions, and then distribute them through U , is impractical. Since the number of these functions is n, and each one is encoded by 2` bits, the communication complexity is even larger than just having one of the servers send s1; : : :; sn to U . Below we present several solutions to this problem. 1. The simplest solution is to use a scheme that solves P IR(`; n; 1). If there is just a single server, [58, 60, 72, 17], the problem of server coordination does not 97 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 even arise. 2. The second type of solution is to slightly extend the scope of the PIR, or PERKY model. For instance, the servers can be allowed to have a source of shared randomness. This is already assumed if the model of communication and privacy being used is that of SPIR [38] (see also Sections 2.5, 2.7 and 4.5). 3. Another solution is to have all the servers go over the functions in H in deterministic fashion, for example in lexicographic order. The rst function that satises the conditions required of hi is chosen as hi . This search will always end successfully, because H contains functions of the desired type. However, in the worst case the computational complexity may become exponential in ` , as opposed to an expected time of O(`n). This may pose no problem when the only measure of eciency we use is communication complexity. It does cause diculties if the servers are assumed to be computationally bounded. PERKY scheme Given one of these possible solutions for the problem of distributing the hash functions to the servers we have: Corollary 21 : Let P be a one round PIR(`; n; k) scheme with communication complexity CP (`; n; k). Then, there exists a scheme that solves PERKY(`; n; k) and has the following properties: Privacy: If P maintains information theoretic privacy (or computational) then maintains information theoretic privacy (or computational). Communication complexity: C (`; n; k) = 2` + CP (2` + log 3n; n; k) + CP (` + 1; 3n; k): Round complexity: The number of communication rounds in is 3. Proof: We now describe the various parameters of the search data structure in the terms of Denition 26. The number of words in the structure is t(n; `) = 4n + 1. The rst word in the structure is the root which holds the index of the rst hash function h. The length of the rst word is 2`. The next n words correspond to the entries in the rst array. Each such entry is a description of a function hi 2 H (2` bits) and the sum Piq?=11 n2q (log 3n bits). The next 3n words correspond to entries in the second array. Each word 98 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 is either sj k1 if sj is mapped to that entry, or 0`+1 if none of the strings s1; : : : ; sn is mapped to the corresponding entry. The maximum search length is d(n; `) = 2. The length of the words changes between the rst word (2` bits), the elements in the next n words (2` + log 3n bits) and the nal 3n words (` + 1 bits). Finally, we describe the search algorithm B . The algorithm has three stages. In the rst stage it begins with initial information init = 0. On input (w; h; 0) the algorithm B outputs (add0; aux0) wherePadd0 = h(w) and aux0 = 1. On input P (w; ( iq?=11 n2q ; hi); 1) the algorithm outputs ( iq?=11 nq + hi(w); 2). On input (w; s; 2) it outputs 1 i wk1 = s. 4.5 Symmetric PERKY In this section we deal with the problem of protecting both the privacy of the user and the privacy of the database in a retrieval by keywords scheme. After the execution of such a scheme the servers get no information on the user's query, while the user learns whether w 2 fs1; : : :; sng but does not gain any other information on the database. We already showed one solution to this problem in Theorem 16. The disadvantage of the construction in that theorem is its high computational and communication complexity. In Theorem 18 and Section 4.4 we showed PERKY schemes that are more ecient than the reduction of Theorem 16. However, the generalized scheme of Theorem 18 cannot transform a SPIR scheme into a SPERKY scheme in the same way that it transforms PIR into PERKY. The user may get additional information, beyond a single bit that determines if w 2 fs1; : : :; sng, even if the protocol is executed in the prescribed manner. In the binary tree instantiation of Theorem 18 (see Corollary 19) the user may learn no less than log n database keywords in a single invocation of the PERKY protocol. In the perfect hashing instantiation of Theorem 18 (see Corollary 21) the user can gain information from the hash functions h; hi which he retrieves. Since they have to be 1-to-1, only certain combinations of the n keywords s1; : : : ; sn are possible, and that is more information than the user should obtain. Another way to solve the SPERKY problem with communication complexity polynomial in n and ` is by using general private multi-party computation protocols, e.g [11, 21, 75, 47]. We present a dierent solution which is more ecient and is in the spirit of an oblivious walk over a data structure as in Theorem 18. The basis of our scheme is the trie instantiation of Theorem 18, as presented in Corollary 20. It would be convenient if we could replace the PIR scheme P in Corollary 20 with a SPIR scheme P , and argue that the constructed scheme is a SPERKY scheme. That is not possible, 99 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 however, as leaks some illegal information to the user. For instance, a user that holds w 62 fs1; : : : ; sng learns which prexes of w are legal and which are not. In the scheme we construct we eliminate all such points of information leakage. Theorem 22 : Let P be a one round SPIR(`; n; k) scheme with communication complexity CP (`; n; k). Then, there exists a scheme that solves SPERKY(`; n; k) and has the following properties: Privacy: If P maintains information theoretic (or computational) privacy then maintains information theoretic (or computational) privacy. Communication complexity: C(`; n; k) = log(n + 1) + (` ? 1)CP (log(n + 1); 2(n + 1); k): Round complexity: The number of communication rounds in is `. Proof: The scheme uses a similar machinery to the schemes in Section 4.4. We describe a specic search data structure that is akin to the trie structure of Subsection 4.4.2, and use it in conjunction with the oblivious walk method of Theorem 18. The structure is random in the sense that the strings s1; : : : ; sn it holds do not determine the structure, but do induce a probability distribution on all possible structures. All the random choices that are required by the structure are carried out in a multi-server scheme by utilizing the shared randomness source. The Data Structure: Levels: Like the trie structure, this data structure has ` levels. In the rst (number 0) there is only one node, the root, and in all others there are n + 1 nodes6 . A search path begins at the root and proceeds down a path to one of the leaves. Therefore, d(n; `) = ` ? 1. Unlike the trie structure the position (address) of a node in a level is not deterministic. Rather, given n +1 nodes in a certain level a random permutation is chosen to determine the internal order of nodes. A new random permutation has to be chosen for each level prior to each invocation of the PERKY scheme (otherwise information may leak out). Node types: In the q-th level, Tq, there may be nodes of three types: 1. Legal nodes represent legal prexes. There is one node in the q-th level for every distinct legal prex of length q. 6 A slightly more ecient approach is to have 2q + 1 nodes for the q-th level, q = 1; : : :; logn, as 2q is an upper bound on the number of possible nodes at the q-th level. 100 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 2. The illegal node is a unique node at each level7 that represents all the prexes of length q which are not legal. 3. Dummy nodes are used to ll a level up. If the q-th level has no more than j legal and illegal nodes, then the other n + 1 ? j nodes in the level are dummy nodes. Unlike the trie structure, here each dummy node is an element chosen uniformly at random from the domain [n + 1]. A dummy string is random because if it were a string of zeroes as in Corollary 20, a dishonest user who retrieved it might learn illegitimate information about the number of legal prexes at that level. Node composition: Every node in levels q = 0; 1; : : : ; ` ? 2 is comprised of two pointers p0; p1 , each of length log(n + 1). Each pointer is a word in the data structure (using the terms of Denition 26). Thus, m(n; `) = log(n + 1) and t(n; `) = `?1 X q=0 tq = 2 + (n + 1)(` ? 1): If a node at the q-th level represents the prex y1 : : : yq then p0 is the address in level number q +1 of the node representing y1 : : :yq 0 and p1 is the address in level number q + 1 of the node representing y1 : : :yq 1. The ordering of the pointers within the nodes is deterministic (p0 comes rst). If y1 : : : yq0 (y1 : : :yq 1) is an illegal prex then p0 (p1 ) points to the address of the illegal node at level q + 1. Hence, both pointers of the illegal node at level q point to the illegal node at level q + 1. As remarked previously the dummy nodes contain two randomly chosen pointers. Each node in level ` ? 1 contains two bits denoting whether the two possible ` bit extensions are in fs1; : : :; sn g. Search algorithm: B receives as input (w; u; aux). If aux = init, then u is the root which is a node that contains two pointers. Otherwise u is a single pointer. aux contains the level number q. If 0 < q < d(n; `), then u is the address of a node v in level number q + 1. B outputs (add0; aux0), such that if wq = 0 (wq = 1) add0 is the address in Tq+1 of the p0 (p1) pointer of v, and aux0 = q + 1. If q = d(n; `) then u is the answer of the protocol and B outputs ans(w; S ) = u. Correctness: The correctness of the PERKY scheme induced by this structure is based on its correctness as a search data structure. For every w 2 fs1; : : :; sn g, a search path beginning at the root will terminate at level ` ? 1 with the bit 1. For every w 62 fs1; : : : ; sng, for some q; 1 q `, the prex w1 : : : wq is illegal. If q < ` the search path will reach the illegal node of level q and thence the illegal nodes of 7 That is the reason n + 1 nodes are required per level. 101 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 levels q + 1; q + 2; : : : ; ` ? 1 where both answer bits are 0. Otherwise, the search path reaches a legal node at level ` ? 1. The retrieved answer bit is 0 (even though the other bit at this node is 1). Complexity: In the rst round of communication DB1 sends the root (2(n +1) bits) to the user. In each of the next ` ? 1 rounds, the scheme P is executed, where one block of at most log(n + 1) bits is retrieved from 2(n+1) such blocks. User privacy: User privacy is argued as in Theorem 18. In order to complete the proof and show that this is a SPERKY scheme, we have to prove data privacy. Data privacy: We prove data privacy by considering the semi-ideal model in which the protocol is augmented with a trusted party that computes only the SPIR(`; n; k) protocol P and returns the answer to the user. We show that in this semi-ideal model maintains information-theoretic security against an adversary (passive or active) that corrupts the user. By Theorem 1 the real-world protocol maintains information-theoretic (computational) data privacy if P maintains informationtheoretic (computational) data privacy. Let U be a (possibly dishonest) user in the semi-ideal model. U is characterized by a set of algorithms C1; C2; : : :; C`?1 . These (possibly probabilistic algorithms) determine the retrieval index at levels 1; 2; : : : ; ` ? 1. for every q, Algorithm Cq receives as input all of U s view up to that point. The view includes the input n; `; w, an auxiliary input, zq, which models all the messages received by U and a random input rq . Cq outputs a retrieval index iq which is sent to the trusted party. If U is passive, i.e. follows the protocol to the letter, the algorithms C1; : : :; C`?1 are identical to the ` ? 1 invocations of B at levels 1; : : : ; ` ? 1. A simulator S receives as input n; `; w and ans(w; S ) (the input and output in the ideal model). Its operation is quite simple. It mimics the algorithms run by U at each stage, and simulates the answers of the servers by choosing completely random elements in the range 1; : : : ; n +1 (in other words, random pointers). We rst present the simulator in detail and then show that its output distribution is identical to the view of U . 1. S outputs the input of U : n; `; w. 4 2. S chooses at random two dierent elements in [n + 1], R10 ; R20 and sets z1 = R10kR20 which simulates the root. 3. For every q = 1; 2; : : : ; ` ? 2, S samples rq , executes Cq (n; `; w; zq ; rq ) and 4 produces iq. S sets zq+1 = zq kiqkRq where Rq is a uniformly random element in [n + 1] that simulates the output of the trusted server on index iq and the given database. 102 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 4. S outputs the input of U , n; `; w, the messages of the protocol in the semi ideal model, z`?2 and ans(w; S ). We prove by induction on q that for every q = 1; : : : ; ` ? 2 the auxiliary input of S , zq, is distributed identically to the messages received by U before the q-th invocation of the trusted party. Prior to the rst invocation of P the user, U , receives one message, which is the root. It consists of two pointers, which are the addresses in T1 of the single bit prex 0 and the single bit prex 1. By construction, the n + 1 addresses in T1 are permuted randomly. Therefore, each pointer is a random element in [n + 1]. Furthermore, the two pointers have two dierent values8. Thus, z1 = R10kR20 is distributed identically to z1. Assume that the induction hypothesis is true for zq , q 1. Since S can sample rq in the same way that U does, the input of Cq , n; `; w; zq ; rq as simulated by S is distributed identically to the input of Cq invoked by U . Therefore, iq as simulated by S is distributed identically to the iq that U uses. At this point U sends iq to the trusted party and receives in return an address in Tq . The simulator chooses a random Rq to simulate the trusted party's answer. Rq had the same distribution as the "real" answer because of the random permutation of all addresses in Tq . Thus, zq+1 is distributed identically to the view of U prior to invocation number q + 1 of the trusted party. As a conclusion we have that S receives the same input as U , produces the same distribution on the messages received during the protocol and receives the same output of the protocol. S , therefore, perfectly simulates in the ideal model the operation of an adversary U in the semi-ideal model. Hence, is information-theoretically (computationally) secure if P is information-theoretically (computationally) secure. 4.6 Other PERKY Topics 4.6.1 Reducing Communication Complexity In all of the PERKY protocols we presented, the communication complexity is a function of the form ` f (n), where > 0 is some constant and f is some sub-linear function. In certain contexts the length of the keywords, `, may be very large, which could cause the communication complexity to become infeasible. Following the execution of each of the previous PERKY protocols, the user knows with certainty whether the word it holds, w, is one of the n words held by the 8 This is true unless both single bit prexes are illegal, in which case the database is empty. 103 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 servers. If we allow errors, a certain tradeo is possible between the communication complexity of a protocol, and its probability of error. That is the probability, that at the end of the protocol U obtains a wrong answer as to whether w is one of s1; : : :; sn. This tradeo is possible through the application of the string equivalence technique, described in [57, pp. 30{31]. Lemma 16 : Let be a PERKY(`; n; k) scheme with communication complexity C(`; n; k). Let p be a prime number. Then, there exists a scheme 0 that solves the problem PERKY(`; n; k) with error probability at most n`=p and has the following properties: Privacy: If maintains information theoretic (or computational) privacy then 0 maintains information theoretic (or computational) privacy. Communication complexity: C0 (`; n; k) = C(log p; n; k): Round complexity: Identical to the round complexity of . Proof: The scheme 0: 1. U chooses p and r. p is a prime number, and r is chosen uniformly at random in the range 0; : : :; p ? 1. U sends p and r to the servers. 2. All the participants in the scheme consider the binary strings, s1,: : :,sn, as polynomials over GF (p). That is,Pa string s = b0 : : :b`?1 (b0; : : :; b`?1 2 f0; 1g) represents the polynomial s(y) = `j?=01 bj yj . The servers construct a new set of n strings. The i-th string is the binary representation of the eld element si(r). 3. The user and servers execute the scheme , in which the n strings each server holds are s1(r); : : :; sn (r), and the string that U holds is w(r). The size of all the strings is reduced to log p bits each, and therefore the communication complexity is C (log p; n; k). The probability that for some i; 1 i n, si 6= w, but si (r) = w(r) is at most (` ? 1)=p. Therefore, the probability that w 62 fs1; : : :; sng but w(r) 2 fs1(r); : : : ; sn(r)g is less than n`=p. Obviously, if w 2 fs1; : : :; sng, then w(r) 2 fs1(r); : : : ; sn(r)g. As to privacy: in order for to be private, for every adversary A who can corrupt one of the servers there is some simulator S whose output is similar (in the information-theoretic or computational sense) to the view of A. A corresponding simulator S 0 for the scheme 0 chooses p according to the required error limit (which 104 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 can be part of the denition of the scheme in the same way that n and ` are). S 0 proceeds to choose r (0 r p ? 1) uniformly at random, computes the new database s1(r); : : :; sn (r) and then runs S . Distinguishing between the output of S 0 and the view of A in the scheme 0 is exactly as hard as distinguishing between the output of S and the view of A in the scheme . 4.6.2 The address of w In this work a solution to the PERKY problem is dened as a protocol by which the user learns whether w 2 fs1; : : : ; sng or not. In many situations a user would actually like to know the address i for which w = si. After nding out the address, further queries about the keyword w or the data it represents can be accomplished by using PIR schemes, which are more ecient than their PERKY counterparts. If w 2 fs1; : : :; sng nding out i is simple given any PERKY scheme, . The servers construct log n sets of strings, which contain at most n items each. Suppose the binary representation of i is i1 : : : ilog n , and the i-th keyword is si . The keyword si appears in the q-th set (1 q log n) if and only if iq = 1. The scheme is executed by the user and servers log n times. In the q-th execution the user nds out if w appears in the q-th set constructed by the servers. The communication complexity of the above protocol is log n times the communication complexity of . However, in all the schemes shown in this work the user can nd what the address i is in a more ecient manner. In the binary tree, trie and perfect hashing schemes the user and servers execute several PIR schemes. In the last execution the servers construct an array such that by retrieving the data in the j -th cell (for some 1 j n ) the user nds out if w 2 fs1; : : :; sn g. Adding to the j -th cell the address i, which is the position of w in the original database, allows the user to discover what i is immediately. In each of the cases (tree, trie and hashing) the communication complexity increases by a constant factor at most. 4.7 Open problems In this work we introduce the notion of PERKY as a step towards closing the gap between theoretically motivated PIR works and applicable information retrieval systems that maintain privacy. A lot of further work is still required. Among the problems that remain unsolved are more complex queries and approximate queries. By complex queries we mean that each data item in the database has several keywords that correspond to it. The user holds several keywords and may wish to retrieve all the items that correspond to some functions of its keywords. By approximate queries we mean that the database returns an armative answer to the query not only if for 105 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 some i, si = w, but also if for some i the strings si and w are close according to some metric (for instance, edit distance). 106 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Chapter 5 Joint Generation of RSA Keys by Two Parties In this chapter we show how two parties can jointly generate RSA public and private keys. Following the execution of our protocol each party learns the public key: N = PQ and e, but does not know the factorization of N or the decryption exponent d. The exponent d is shared among the two players in such a way that joint decryption of cipher-texts is possible. 5.1 preliminaries Notation 14: The size of the RSA modulus N is bits (e.g = 1024). 5.1.1 Smallest primes At several points in our work we are interested in the j smallest distinct primes p1; : : : ; pj such that ij=1pi > 2 . The following table provides several useful parameters for a few typical values of . P j j pj i=1 dlog pi e 512 76 383 557 1024 133 751 1108 1536 185 1103 1634 2048 235 1483 2189 5.1.2 Useful techniques In this subsection we review several problems and techniques that were researched extensively in previous work, and which we use here. 107 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Symmetrically private information retrieval: In the problem of private information retrieval presented by Chor et al. [25] k databases (k 1) hold copies of the same n bit binary string x and a user wishes to retrieve the i-th bit xi. A PIR scheme is a protocol which allows the user to learn xi without revealing any information about i to any individual database. Symmetrically private information retrieval, introduced in [38], is identical to PIR except for the additional requirement that the user learn no information about x apart from xi. This problem is also called 1 out of m oblivious transfer and all or nothing disclosure of secrets (ANDOS). The techniques presented in [38] are especially suited for the multi-database (k 2) setting. In a recent work Naor and Pinkas [63] solve this problem by constructing a SPIR scheme out of any PIR scheme, in particular single database PIR schemes. The trivial single database PIR scheme is to simply have the database send the whole data string to the user. Clever PIR schemes involving a single database have been proposed in several works: [58, 17, 72]. They rely on a variety of cryptographic assumptions and share the property that for "small" values of n their communication complexity is worse than that of the trivial PIR scheme. We now give a brief description of the SPIR scheme we use, which is the NaorPinkas method in conjunction with the trivial PIR scheme. In our scenario the data string x is made up of n substrings of length `. The user retrieves the i-th substring without learning any other information and without leaking information about i. The database randomly chooses log n pairs of seeds for a pseudo-random generator G: (s01; s11); : : : ; (s0log n; s1log n). Every seed sbj (1 j log n, b 2 f0; 1g) is expanded into n` bits G(sbj ), which can be viewed as n substrings of length `. It then prepares a new data string y of n substrings. Suppose the binary representation of i is ilog n : : :i1. The i-th substring of y is the exclusive-ori of the i-th substring of x and the i-th i1 i2 log substring 2 of each of G(s1 ); G(s2 ); : : :; G(slog n ). The user and database combine in log n 1 -OT of strings to provide the user with a single seed from every pair. Finally, the database sends y to the user, who is now able to learn a single substring of x. The parameters of the data strings we use are such that the running time is dominated by the log n 21 -OTs and the communication complexity is dominated by the n` bits of the data string, which are sent to the user. Dense probabilstic and homomorphic encryption: we are interested in an encryption method that provides two basic properties: (1) Semantic security: as dened in [49]. (2) Additive homomorphism: we can eciently compute a function f such that f (ENC(a); ENC(b)) = ENC(a + b). Furthermore, the sum is modulo some number t, where t can be dened exibly as part of the system. As a concrete example we use Benaloh's encryption [12, 13]. The system works as 4 follows. Select two primes p; q such that: m = pq 2 , tjp ? 1, gcd(t; (p ? 1)=t) = 1 n 108 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 and gcd(t; q ? 1) = 1 1. The density of such primes along appropriate arithmetic sequences is large enough to ensure ecient generation of p; q (see [12] for details). Select y 2 Zm such that y(m)=t 6 1 mod m. The public key is m; y, and encryption of M 2 Zt is performed by choosing a random u 2 Zm and sending yM ut mod m. In order to decrypt, the holder of the secret key computes at a preprocessing 4 M(m)=t stage TM = y mod m for every M 2 Zt . Hence, t is small enough that t exponentiations can be performed. Decryption of z is by computing z(m)=t mod m and nding the unique TM to which it is equal. The scheme is semantically secure based on the assumption that deciding higher residuosity is intractable [13]. Most of our requirements are met by the weaker assumption that deciding prime residuosity is intractable [12]. Oblivious polynomial evaluation: In this problem, presented by Naor and Pinkas in [63] Alice holds a eld element 2 F and Bob holds a polynomial B (x) over F . At the end of the protocol Alice learns only B () and Bob learns nothing at all. The intractability assumption used in [63] is new and states the following. Let S () is a degree k polynomial over F and let m; dQ;x be two security parameters (dQ;x > k). Given 2dQ;x + 1 sets of m eld elements such that in each set there is one value of S at a unique point dierent than 0 and m ? 1 random eld elements, the value S (0) is pseudo-random. We now give a brief description of the protocol presented in [63] as used in our application where the polynomial B is of degree 1. Bob chooses a random bivariate polynomial Q(x; y) such that the degree of y is 1, the degree of x is dQ;x and Q(0; ) = B (). Alice chooses a random polynomial S of degree dQ;x such that S (0) = . Dene R(x) as the degree 2dQ;x polynomial R(x) = Q(x; S (x)). Alice chooses 2dQ;x + 1 dierent non-zero points xj for j = 1; : : : ; 2dQ;x + 1. For each such j Alice randomly selects m ? 1 eld elements yj;1; : : : ; yj;m?1 and sends to Bob xj and a random permutation of the m elements S (xj ); yj;1; : : :; yj;m?1 (denoted by zj;1; : : :; zj;m). Bob computes Q(xj ; zj;i) for i = 1; : : : ; m. Alice and Bob execute a SPIR scheme in which Alice retrieves Q(xj ; S (xj )). Given 2dQ;x + 1 such pairs of xj ; R(xj ) Alice can interpolate and compute R(0) = B (). The complexity of the protocol is 2dQ;x + 1 executions of the SPIR scheme for data strings of m elements. 5.2 Overview In this section we give an overview of our protocol. The stages in which we use the Boneh-Franklin protocol exactly are the selection of candidates and the full primality 1 Therefore t is odd. 109 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 test (the other stages require a third party in [15]). The protocol is executed in the following steps. 1. Choosing candidates Alice chooses independently at random two =2 ? 1 bit integers Pa; Qa 3 mod 4, and Bob chooses similarly Pb ; Qb 0 mod 4. The two parties keep their choices secret and set as candidates P = Pa + Pb and Q = Q a + Qb . 2. Computing N Alice and Bob compute N = (Pa + Pb )(Qa + Qb). We show how to perform the computation using three dierent protocols and three dierent intractability assumptions. 3. Initial primality test For each of the smallest k primes p1; : : : ; pk the participants check if pi j N (i = 1; : : : ; k). This stage is executed in conjunction with the computation of N . If N fails the initial primality test, computing a new candidate N is easier than computing it from scratch (as is the case following a failure of the full primality test) 4. Full primality test The test of [15] is essentially as follows: Alice and Bob g agree on g 2 ZN . If the Jacobi symbol N is not equal to 1 choose a new g. Otherwise Alice computes va = g(N ?P ?Q +1)=4 mod N , and Bob computes vb = g(P +Q )=4 mod N . If va = vb or va = ?vb mod N the test passes. 5. Computing and sharing d In this step we compute the decryption exponent d assuming that e is known to both parties and that gcd(e; (N )) = 1. Alice receives da and Bob receives db so that d = da +db mod (N ) and de 1 mod m. Boneh and Franklin describe two protocols for the computation of d. The rst is very ecient and can be performed by two parties, but leaks (n) mod e. Therefore, this method is suitable for small public exponents and not for the general case. The second protocol computes d for any e but requires the help of a third party. a b a b 5.3 Computing N Alice holds Pa; Qa and Bob holds Pb ; Qb. They wish to compute N = (Pa + Pb )(Qa + Qb) = PaQa + PaQb + PbQa + PbQb: We show how to carry out the computation privately using three dierent protocols. 110 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 5.3.1 Oblivious transfers Let R be a publicly known ring and let a; b 2 R. Denote = log jRj (each element in R can be encoded using bits). Assume Alice holds a and Bob holds b. They wish to perform a computation by which Alice obtains x and Bob obtains y such that x + y = ab where all operations are in R. Furthermore, the protocol ensures the privacy of each player given the existence of oblivious transfers. In other words the protocol does not help Alice and Bob to obtain information about b and a respectively. The protocol: 1. Bob selects uniformly at random and independently ring elements denoted by s0; : : : ; s?1 2 R. Bob proceeds by preparing pairs of elements in R: (t00; t10); : : : ; (t0?1; t1?1). For every i (0 i ? 1) Bob denes t0i =4 si and t1i = 2ib + si. 2. Let the binary representation of a be a?1 : : :a0. Alice and Bob execute 21 OTs. In the i-th invocation Alice chooses tia from the pair (t0i ; t1i ). 4 P?1 a 3. Alice sets x = t and Bob sets y =4 ? P?1 s . i i=0 i i i=0 i Lemma 17: x + y = ab over the ring R. Proof: P?1 a 2i. Since a?1; : : : ; a0 is the binary representation of a we can write a = i=0 i x+y = X ?1 a i ti ? si i=0 i=0 X ?1 X ?1 i (ai 2 b + si) ? si i=0 i=0 X ?1 b ai 2 i i=0 X ?1 ab 2 . In the following protocol for computing N the ring R is Z2 , the integers modulo 1. Alice and Bob use the previous protocol twice to additively share PaQb = x1 + y1 mod 2 and PbQa = x2 + y2 mod 2 . Alice holds x1; x2 and Bob holds y1; y2. 111 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 2. Bob sends y =4 y1 + y2 + Pb Qb mod 2 to Alice. 3. Alice computes PaQa + y mod 2 . Alice now holds N mod 2 , which is simply N due to the choice of . Lemma 18 : The transcript of the view of the execution of the protocol can be simulated for both Alice and Bob and therefore the protocol is secure. Proof: We denote the messages that Alice receives during the sharing of PaQb by ta00 ; : : :; ta??11 and the messages received while sharing PbQa by ta ; : : :; ta22 ??11 . In the same manner we denote Bob's random choices for the sharing of PaQb and Pb Qa by s0; : : :; s?1 and s ; : : :; s2?1 respectively. Bob's view can be simulated because the only messages Alice sent him were her part of 2 independent oblivious transfers. Alice receives 2 + 1 elements in Z2 : ta00 ; : : :; ta22 ??11 ; y: The uniformly random and independent choices by which s0; : : : ; s2?1 are selected ensure that the messages Alice receives are distributed uniformly subject to the condition that 2X ?1 tai + y N ? Pa Qa mod 2 : i i=0 Since Alice can compute N ? PaQa a simulator Sa can produce the same distribution as that of the messages Alice receives, given N; Pa ; Qa. Lemma 19: The computation time and the communication complexity of the protocol are dominated by 2 oblivious transfers. The transfered strings are of length . 5.3.2 Oblivious polynomial evaluation Alice and Bob agree on a prime p > 2 and set F to be GF (p). They employ the following protocol to compute N : 1. Bob chooses a random element r 2 F . He prepares two polynomials over F : B1(x) = Pb x + r and B2 = Qbx ? r + Pb Qb. 2. Alice uses the oblivious polynomial evaluation protocol of [63] to attain B1(Qa) and B2(Pa). Alice computes N = PaQa + B1(Qa) + B2(Pa). 112 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 The security of the protocol depends on the security of the cryptographic assumption outlined in subsection 5.1.2 and of a similar argument to the proof of lemma 18. Lemma 20 : The computational complexity of the protocol is dominated by the execution of 2 log m(2dQ;x + 1) oblivious transfers, where m and dQ;x are the security parameters. The communication complexity is less than 3m(2dQ;x + 1). 5.3.3 Benaloh's encryption We now compute N by using the homomorphic encryption described in subsection 5.1.2. Let p1 ; : : :; pj be the smallest primes such that ij=1pi > 2 . The following protocol is used to compute N mod pi : 4 1. Let t = pi . Alice constructs the encryption system: an appropriate p; q; y, and sends the public key y; m = pq to Bob. Alice also sends the encryption of her 4 P t shares, i.e z1 = y u1 mod m and z2 =4 yQ ut2 mod m, where u1; u2 2 Zm are selected uniformly at random and independently . 2. Bob computes the encryption of Pb Qb mod t, which is denoted by z3, calculates z =4 z1Q z2P z3 mod m and sends z to Alice. 3. Alice decrypts z, adds to the result PaQa modulo t and obtains N mod t. The two players repeat this protocol for each pi, i = 1; : : : ; j . Alice is able to reconstruct N from N mod pi , i = 1; : : : ; j by using the Chinese remainder theorem. Lemma 21: Assuming the intractability of the prime residuosity problem, the transcript of the views of both parties in the protocol can be simulated. Proof: The distribution of Bob's view can be simulated by encrypting two arbitrary messages assuming the intractability of prime residuosity. Therefore, Alice's privacy is assured. The distribution of Alice's view can be simulated as follows. Given N , N mod pi can be computed for every i. The only message that Alice receives is z =4 z1Q z2P z3 mod m. By the denition of z3 and the encryption system z3 = yP Q ut mod m where u is a random in Zm . Thus z is a random element in the appropriate coset (all the elements whose decryption is N ? PaQa mod t). Lemma 22: The running time of the protocol is dominated by the single decryption Alice executes, the communication complexity is 3 and the protocol requires one round of communication. a a b b b b 113 b b Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 5.4 Amortization and initial primality test The initial primality test consists of checking whether a candidate N is divisible by one of the rst k primes p1; : : : ; pk . If it is then either P = Pa + Pb or Q = Qa + Qb is not a prime. This test can be carried out by Alice following the computation of N . If a candidate N passes the initial primality test, Alice publishes its value and it becomes a candidate for the full primality test of [15]. However, if it fails the test a new N has to be computed. In this section we show how to eciently nd a new candidate following a failure of the initial test by the previous candidate. The total cost of computing a series of candidates is lower than using the protocols of section 5.3 each time anew. We show two dierent approaches. One for the oblivious transfer and oblivious polynomial evaluation protocols, and the other for the homomorphic encryption protocol. 5.4.1 OT and oblivious polynomial evaluation Suppose that after Alice and Bob discard a certain candidate N they compute a new one by having Alice retain the previous Pa; Qa and having Bob choose new Pb ; Qb. In that case, as we show below, computing the new N can be much more ecient than if both parties choose new shares. The drawback is that given both values of N Bob can gain information about Pa; Qa. Therefore, in this stage (unlike the full primality test) Alice does not send the value of N to Bob. Assume Bob holds two sequences of strings: (a01; : : :; a0n), (a11; : : :; a1n) and Alice wishes to retrieve one sequence without revealing which one to Bob 2 and without gaining information about the second sequence. Instead of invoking a 1 -OT protocol n times the players agree on a pseudo-random generator G and do the following: 1. Bob chooses two random seeds s1; s2. 2. Alice uses a single invocation of 21 -OT to gain sb, where b 2 f0; 1g denotes the desired string sequence. 3. Bob sends to Alice the rst sequence masked (i.e bit by bit exclusive-or) by G(s1) and the second sequence masked by G(s2). Alice can unmask the required sequence while the second sequence remains pseudorandom. In the protocol of subsection 5.3.1 N is computed using only oblivious transfers in which Alice retrieves a set of 2 strings from Bob. Alice's choices of which strings to retrieve depend only on her input Pa; Qa. Therefore if Alice retains Pa and Qa while Bob selects a sequence of inputs (Pb1; Q1b ); : : : ; (Pbn; Qnb), the two 114 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 players can compute a sequence of candidates N 1; : : :; N n with as many oblivious transfers as are needed to compute a single N . The same idea can be used in the oblivious polynomial evaluation protocol, as noted in [63]. The evaluation of many polynomials at the same point requires as many oblivious transfers as the evaluation of a single polynomial at that point. Thus, computing a sequence of candidates N requires only 2 log m(2dQ;x + 1) computations 2 of 1 -OT. 5.4.2 Homomorphic encryption Alice and Bob combine the two stages of computing N and trial divisions by using the protocol of subsection 5.3.3 exibly. Let p1; : : : ; pj0 be the j 0 smallest distinct 0 primes such that 0ij=1pi > 2(?1)=2. Alice and Bob pick their elements at random in the range 0; : : : ; ij=1pi ? 1 by choosing random elements in each Zp for i = 1; : : : ; j 0. Alice and Bob compute N mod pi as described in subsection 5.3.3. If N 0 mod pi then at least one of the elements P = Pa + Pb or Q = Qa + Qb is divided by pi. In that case, Alice and Bob choose new, random elements: Pa; Pb ; Qa; Qb mod pi, and recompute N mod pi . The probability of this happening is less than 2=pi . Thus the 0 P j expected number of re-computations is less than i=1 2=pi . This quantity is about 3:1 for = 1024 (2 does not cause a problem because Pa Qa 3 mod 4 and Pb Qb 0 mod 4). Setting Pa mod pi for i = 1; : : : ; j 0 determines Pa , and by the same reasoning the other 3 shares that Alice and Bob hold are also set. The two players complete the computation of N by determining the value of N mod pi (using the protocol of subsection 5.3.3) for i = j 0 + 1; : : : ; j , where ij=1pi > 2 . If for one of these primes N 0 mod pi Alice and Bob discard their shares and pick new candidates 2. i 5.5 Computing d 4 Alice and Bob share (N ) in an additive manner. Alice holds a = N ?Pa ?Qa+1, Bob holds b = ?Qb ? Pb and a + b = (N ). The two parties agree on a public exponent e. Denote =4 dlog ee. We follow in the footsteps of the Boneh-Franklin protocol and employ their algorithm to invert e modulo (N ) without making reductions modulo (N ): 1. Compute = ?(N )?1 mod e. An interesting optimization is not to discard the whole share (Pa ; Pb; Qa; Qb), but for each specic share, say Pa, only to select a new Pa mod pi for i = j 0 ? c; : : :; j 0 , where c is a small constant. The probability is very high that the new N thus dened is not a multiple of pi . 2 115 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 2. Compute d = ((N ) + 1)=e. Now de 1 mod (N ) and therefore d is the inverse of e modulo (N ). As a rst step Alice and Bob change the additive sharing of (N ) into a multiplicative sharing modulo e, without leaking information about (N ) to either party. At the end of the sub-protocol Alice holds r(N ) mod e and Bob holds r?1 mod e, where r is a random element in Ze. 1. Bob chooses uniformly at random r 2 Ze. Alice and Bob invoke the protocol of 4 subsection 5.3.1, setting R =4 Ze, a = a and b =4 r. At the end of the protocol Alice holds x and Bob holds y such that x + y ar mod e. 2. Bob sends y + b r mod e to Alice. 3. Alice computes x + y + br r(N ) mod e, and Bob computes r?1 mod e. Lemma 23: The computation time and the communication complexity of the protocol are dominated by oblivious transfers. After completing the sub-protocol we described above, Alice and Bob nish the inversion algorithm by performing the following steps: 1. The two parties hold multiplicative shares of (N ) mod e. They compute the inverse of their shares modulo e and thus have a; b respectively such that a b ?(N )?1 mod e. 2. Alice and Bob re-convert their current shares into additive shares modulo e, i.e a; b such that a + b mod e. Bob chooses randomly b 2 Ze and the two 4 parties combine to enable Alice to gain a = ? b + ab mod e. This is done by employing essentially the same protocol we used for transforming a; b into a multiplicative sharing. If we replace a by a, b by r and ? b by rb we get the same protocol. 3. The two parties would like to compute (N ). What they actually compute 4 is = ( a + b)(a + b). The result is either exactly (N ) or ( + e)(N ). The computation is carried out similarly to the computation of N in subsection 5.3.1. The ring used is Zk where k > 4e 2 . We modify the protocol in two ways. The rst modication is that is not revealed to Alice but remains split additively over Zk among the two players. In other words they perform step 1 of the protocol in subsection 5.3.1 and additively share ab + ba. Alice adds aa to her share and Bob adds b b to his. 116 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 The sum of the two shares over the integers is either or + k. The second option is unacceptable (unlike the two possibilities for the value of ). In order to forestall this problem we introduce a second modication. The sharing of results in Alice holding + y0 mod k and Bob holding ?y0 mod k. Furthermore Bob selects y0 by his random choices. We require Bob to make those choices subject to the condition that y0 < k=2. Since y0 is not completely random, Alice might gain a slight advantage. However, that advantage is at most a single bit of knowledge about (N ) (which can be guessed in any attempt to discover (N )). 4 4. Alice sets da =4 d( + y0)=ee and Bob sets db = b(?y0 + 1)=ec (these calculations are over the integers). Hence, either da + db = ((N ) + 1)=e or da + db = ((N ) + 1)=e + (N ). 5.6 Improvements In this section we suggest some eciency improvements. O-line preprocessing: The performance of the protocol based on Benaloh's encryption can be signicantly improved by some o-line preparation. Obviously, for any t used as a modulus in the protocol a suitable encryption system has to be constructed (i.e a suitable m = pq has to be found, a table of all values TM = yM(m)=t mod m has to be computed etc.). Further improvement of the online communication and computational complexity can be attained at the cost of some space and o-line computation. Instead of constructing a separate encryption system for t1 and t2, Alice constructs a single system for t = t1t2. The lookup table needed for decryption is formed as follows. Alice computes TM = yM(m)=t mod m for M = 0; : : : ; t1 ? 1 and TM = yMt1(m)=t mod m for M = 0; : : : ; t2 ? 1. The entries of the table are obtained by calculating TM TM mod m for every pair M; M . Constructing this table takes more time than constructing the two separate tables for t1; t2. The additional time is bounded by the time required to compute t2(t1 + log t1 log t2) modular multiplications over Zm (computing TM involves log t1 log t2 multiplications in comparison with log t2 multiplications in the original table). The size of the table is t log m (slightly more than t). This gure which might be prohibitive for large t can be signicantly reduced. After computing every entry in the table it is possible by using perfect hashing [36] to eciently generate a 1-to-1 function h from the entries of the table to 0; : : : ; 3t?1. A new table is now constructed in which instead of original entry TM an entry (h(TM ); M ) is stored. Decryption of z is performed by nding the entry holding h(z(m)=t) mod m and reading the 117 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 corresponding M . The size of the stored table is 2t log t. As an example of the reduction in space complexity consider the case t = 3 751 = 2253. The original table requires more than 221 bits while the hashing table requires less than 216 bits. It is straightforward to use t = t1t2 instead of t1 and t2 separately in subsections 5.3.3 and 5.4.2. The protocols in both subsections remain almost without change apart from omitting the sub-protocols for t1 and t2 and adding a sub-protocol for t. In subsection 5.4.2 it is not enough to check whether N 0 mod t. It is necessary to retain the two tests of N mod t1 and N mod t2. Note that here we need the stronger higher residuosity intractability assumption replaces the prime residuosity assumption. Alternative computation of d: The last part of generating RSA keys is constructing the private key. Using Benaloh's encryption we can sometimes improve the computation and communication complexity of the construction in comparison with the results of section 5.5. The improvement is possible if the parameter t of a Benaloh encryption can be set to e (that is, the homomorphism is modulo e) so that ecient decryption is possible. Therefore, e has to be a product of \small" primes, see [13]. The protocol for generating and sharing d is a combination of the protocols of subsection 5.3.3 and section 5.5. We leave the details to the full version of the paper. 5.7 Performance The most resource consuming part of our protocol, in terms of computation and communication, is the computation of N together with trial divisions. We use trial divisions because of the following result by DeBruijn [28]. If a random =2 bit integer passes the trial divisions for all primes less than B then asymptotically: Pr[p prime j p 6 0 mod pi ; 8pi B ] = 5:14 lnB (1 + o( 2 )): We focus on the performance of the more ecient version of our protocol, using homomorphic encryption. We also assume that the o-line preprocessing suggested in section 5.6 is used. Let j 0; j be dened as in section 5.3.3. We pair o the rst j 0 primes and prepare encryption systems for products of such pairs (as in section 5.6.) The number of exponentiations (decryptions) needed to obtain one N is on average about j + 3 ? j 0=2. The probability that this N is a product of two primes is approximately (5:14 ln pj0 =)2. Another obvious optimization is to divide the decryptions between the two parties evenly. In other words for half the primes the rst party plays Alice and for the other half they switch. 118 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 If = 1024 and we pair o the rst j 0 = 76 primes the running time of our protocol is (by a rough estimate) less than 10 times the running time of the BonehFranklin protocol. The communication complexity is (again very roughly) 42MB. If the paricipants are willing to improve the online communication complexity in return for space and pair o the other j ? j 0 needed to compute N the communication complexity is reduced to about 29MB. Open problem: Boneh and Franklin show in [15] how to test whether N is a product of two primes, where both parties hold N . It would be interesting to devise a distributed test to check whether N is a product of two primes if Alice holds N; Pa ; Qa and Bob only has his private shares Pb; Qb. The motivation is that in the oblivious transfer and oblivious polynomial evaluation protocols we presented Pa; Qa will have to be selected only once. Thus the number of oblivious transfers in the whole protocol is reduced to the number required for computing a single candidate N . 119 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 Bibliography [1] M. Abadi, J. Feigenbaum, and J. Kilian. On hiding information from an oracle (extended abstract). In Proc. of the 19th Annu. ACM Symp. on the Theory of Computing, pages 195{203, 1987. Journal version in JCSS vol 39 pp. 21-50, 1989. [2] M. Ajtai and C. Dwork. A public-key cryptosystem with worst-case/average-case equivalence. In Proc. of the 29th Annu. ACM Symp. on the Theory of Computing, pages 284{293, 1997. [3] A. Ambainis. An upper bound for private information retrieval. In Proc. of 24th ICALP, pages 401{407, 1997. [4] L. Babai, P. Kimmel, and S. Lokam. Simultaneous messages vs. communication. In Proc. of 12th STACS, LNCS 900, Springer Verlag, volume 900, pages 361{372, 1995. [5] D. Beaver. Foundations of secure interactive computing. In Advances in Cryptology - CRYPTO '91, 1991. [6] D. Beaver. Commodity-based cryptography. In Proc. of the 29th Annu. ACM Symp. on the Theory of Computing, pages 446{455, 1997. [7] D. Beaver and J. Feigenbaum. Hiding instances in multioracle queries. In STACS, pages 37{48, 1990. [8] D. Beaver, J. Feigenbaum, J. Kilian, and P. Rogaway. Security with low communication overhead. In Advances in Cryptology - CRYPTO '90, pages 62{76, 1990. [9] D. Beaver, J. Feigenbaum, J. Kilian, and P. Rogaway. Locally random reductions: Improvements and applications. J. of Cryptology, 10(1):17{36, 1997. Early version: Security with small communication overhead, CRYPTO '90, LNCS 537, pages 62-76. 120 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 [10] A. Beimel, Y. Ishai, E. Kushilevitz, and T. Malkin. One-way functions are essential for single database private information retrieval. In Proc. of the 31th Annu. ACM Symp. on the Theory of Computing, pages 89{98, 1999. [11] M. Ben-OR, S. Goldwasser, and A. Wigderson. Completeness theorems for noncryptographic fault-tolerant distributed computation. In Proc. of the 20th Annu. ACM Symp. on the Theory of Computing, pages 1{10, 1988. [12] J. Benaloh. Veriable Secret-Ballot Elections. PhD thesis, Yale University, 1987. [13] J. Benaloh. Dense probabilstic encryption. In Proc. of the Workshop on Selected Areas of Cryptography, pages 120{128, May 1994. [14] M. Blum and S. Micali. How to generate cryptographically strong sequences of pseudo-random bits. SIAM Journal on Computing, 13:850{864, 1984. [15] D. Boneh and M. Franklin. Ecient generation of shared rsa keys. In Advances in Cryptology - CRYPTO '97, pages 425{439. Springer-Verlag, 1997. THe full version appears on the web at theory.stanford.edu/ dabo/pubs.html. [16] G. Brassard, C. Crepeau, and J. M. Robert. All-or-nothing disclosure of secrets. In Advances in Cryptology - CRYPTO '86, pages 234{238, 1987. [17] C. Cachin, S. Micali, and M. Stadler. Computationally private information retrieval with polylogarithmic communication. In Advances in Cryptology - EUROCRYPT '99, pages 402{414, 1999. [18] R. Canetti. Studies in Secure Multiparty Computation and Applications. PhD thesis, Weizmann Institute, 1995. [19] R. Canetti. Security and composition of multi-party cryptographic protocols. Theory of Cryptography Library, Record 98-18. Available online from philby.ucsd.edu/cryptolib.html, 1998. [20] R. Canetti, U. Feige, O. Goldreich, and M. Naor. Adaptively secure multiparty computation. In Proc. of the 28th Annu. ACM Symp. on the Theory of Computing, pages 639{648, 1996. [21] D. Chaum, C. Crepeau, and I. Damgard. Multiparty unconditionally secure protocols(extended abstract). In Proc. of the 20th Annu. ACM Symp. on the Theory of Computing, pages 11{19, 1988. 121 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 [22] B. Chor and N. Gilboa. Computationally private information retrieval. In Proc. of the 29th Annu. ACM Symp. on the Theory of Computing, pages 304{313, 1997. [23] B. Chor, N. Gilboa, and M. Naor. Private information retrieval by keywords, December 1996. private communication. [24] B. Chor, N. Gilboa, and M. Naor. Private information retrieval by keywords. Technical Report TR CS0917, Technion, 1997. Full version submitted for publication in Designs, Codes and Cryptography. [25] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan. Private information retrieval. In Proc. of the 36th Annu. IEEE Symp. on Foundations of Computer Science, pages 41{50, 1995. [26] C. Cocks. Split knowledge generation of rsa parameters. In M. Darnell, editor, Cryptography and Coding, 6th IMA international conference, pages 89{95. Springer-verlag, December 1997. [27] C. Cocks. Split generation of rsa parameters with multiple participants, 1998. On-line version at www.cesg.gov.uk/downlds/math/rsa2.pdf. [28] N. DeBruijn. On the number of uncancelled elements in the sieve of eratosthenes. Proc. Neder. Akad., 53:803{812, 1950. Reviewed in Leveque Reviews in Number Theory, Vol. 4, Section N-28, p. 221. [29] A. DeSantis, Y. Desmedt, Y. Frankel, and M. Yung. How to share a function securly. In Proc. of the 26th Annu. ACM Symp. on the Theory of Computing, pages 522{533, 1994. [30] Y. Desmedt. Threshold cryptography. European Transactions on Telecommunications and Related Technologies, 5(4):35{43, July-August 1994. [31] Y. Desmedt and Y. Frankel. Threshold cryptosystems. In Advances in Cryptology - CRYPTO '89, volume 435 of Lecture Notes in Computer Science, pages 307{ 315, 1990. [32] G. Di-Crescenzo, Y. Ishai, and R. Ostrovsky. Universal service-providers for database private information retrieval. In Proc. of the 17th Annu. ACM Symp. on Principles of Distributed Computing, pages 91{100, 1998. [33] G. Di Crescenzo, T. Malkin, and R. Ostrovsky. Single database private information retrieval implies oblivious transfer. manuscript, 1999. 122 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 [34] S. Even, O. Goldreich, and A. Lempel. A randomized protocol for signing contracts. Comm. of ACM, 28:637{647, 1985. [35] Y. Frankel, P. D. MacKenzie, and M. Yung. Robust ecient distributed rsa-key generation. In Proc. of the 30th Annu. ACM Symp. on the Theory of Computing, pages 663{672, 1998. [36] M. Fredman, J. Komlos, and E. Szemeredi. Storing a sparse table in o(1) worst case access time. Journal of the ACM, 31:538{544, 1984. [37] Y. Gertner, S. Goldwasser, and T. Malkin. A random server model for private information retrieval (or how to achieve information theoretic PIR avoiding data replication). In Proc. of 2nd RANDOM, 1998. [38] Y. Gertner, Y. Ishai, E. Kushilevitz, and T. Malkin. Protecting data privacy in private information retrieval schemes. In Proc. of the 30th Annu. ACM Symp. on the Theory of Computing, pages 151{160, 1998. [39] N. Gilboa. Private retrieval of blocks. manuscript. [40] N. Gilboa. Joint generation of rsa keys. In Advances in Cryptology - CRYPTO '99, pages 116{129, 1999. [41] N. Gilboa and Y. Ishai. Private information storage in constant communication rounds. manuscript, 1999. [42] O. Goldreich. Towards a theory of software protection and simulation by oblivious rams. In Proc. of the 22nd Annu. ACM Symp. on the Theory of Computing, pages 182{194, 1990. [43] O. Goldreich. Secure multi-party computation (working draft). Manuscript, 1998. [44] O. Goldreich. Modern Cryptography, Probabilstic Methods and Pseudo Randomness. Springer-Verlag, 1999. [45] O. Goldreich, S. Goldwasser, and S. Halevi. Eliminating decryption errors in the ajtai-dwork cryptosystem. In Advances in Cryptology - CRYPTO '97, pages 105{111, 1997. [46] O. Goldreich, S. Goldwasser, and S. Halevi. Public-key cryptosystems from lattice reduction problems. In Advances in Cryptology - CRYPTO '97, pages 112{130, 1997. 123 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 [47] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game (extended abstract). In Proc. of the 19th Annu. ACM Symp. on the Theory of Computing, pages 218{229, 1987. [48] O. Goldreich and R. Ostrovsky. Software protection and simulation on oblivious rams. Journal of the ACM, 43:431{476, 1996. [49] S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and systems science, 28:270{299, 1984. [50] J. Hastad. Pseudo-random generators with uniform assumptions. In Proc. of the 22nd Annu. ACM Symp. on the Theory of Computing, pages 395{404, 1990. [51] R. Impagliazzo, L. Levin, and M. Luby. Pseudo-random number generation from one way functions. In Proc. of the 21st Annu. ACM Symp. on the Theory of Computing, pages 25{32, 1989. [52] R. Impagliazzo and S. Rudich. Limits on the provable consequences of oneway permutations. In Proc. of the 21st Annu. ACM Symp. on the Theory of Computing, pages 44{61, 1989. [53] Y. Ishai. Single-server, sub-linear communication pir implies secret key exchange, 1999. private communication. [54] Y. Ishai and E. Kushilevitz. Improved upper bound on information theoretic private information retrieval. In Proc. of the 31th Annu. ACM Symp. on the Theory of Computing, 1999. [55] M. Ito, A. Saito, and T. Nishizeki. Secret sharing schemes realizing general access structures. In Proc. IEEE Global Telecommunication Conf., Globecom 87, pages 99{102, 1987. [56] D. Knuth. The art of computer programming, volume 3. Addison Wesley, 1973. [57] E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997. [58] E. Kushilevitz and R. Ostrovsky. Single-database computationally private information retrieval. In Proc. of the 38th Annu. IEEE Symp. on Foundations of Computer Science, pages 364{373, 1997. [59] M. Luby. Pseudorandomness and Cryptographic Applications. Princeton University Press, 1996. 124 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 [60] E. Mann. Private access to distributed information. Master's thesis, Technion Israel Institute of Technology, Haifa, 1998. [61] S. Micali and P. Rogaway. Secure computation. In Advances in Cryptology CRYPTO '91, pages 392{404, 1991. [62] D. Naccache and J. Stern. A new public-key cryptosystem. In Advances in Cryptology - EUROCRYPT '97, 1997. [63] M. Naor and B. Pinkas. Oblivious transfer and polynomial evaluation. In Proc. of the 31th Annu. ACM Symp. on the Theory of Computing, pages 245{254, 1999. [64] M. Naor and O. Reingold. Number-theoretic constructions of ecient pseudorandom functions. In Proc. of the 38th Annu. IEEE Symp. on Foundations of Computer Science, pages 458{467, 1997. Full on-line version at www.wisdom.weizmann.ac.il/ reingold/PAPERS/qdh.ps.gz. [65] R. Ostrovsky. Software Protection and Simulation on Oblivious RAMs. PhD thesis, M.I.T., 1992. [66] R. Ostrovsky and V. Shoup. Private information storage. In Proc. of the 29th Annu. ACM Symp. on the Theory of Computing, pages 294{303, 1997. [67] T. Pederson. A threshold cryptosystem without a trusted party. In Advances in Cryptology - EUROCRYPT '91, pages 522{526, 1991. [68] G. Poupard and J. Stern. Generation of shared rsa keys by two parties. In ASIACRYPT'98, pages 11{24. Springer-Verlag LNCS 1514, 1998. [69] P. Pudlak and V. Rodl. Modied ranks of tensors and the size of circuits. In Proc. of the 25th Annu. ACM Symp. on the Theory of Computing, pages 523{531, 1993. [70] R. Rivest, A. Shamir, and L. Adelman. A method for obtaining digital signature and public key cryptosystems. Comm. of the ACM, 21, 1978. [71] A. Shamir. How to share a secret. Communications of the ACM, 22:612{613, 1979. [72] J. P. Stern. A new and ecient all{or-nothing disclosure of secrets protocol. In ASIACRYPT'98, pages 357{371. Springer-Verlag, 1998. 125 Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001 [73] M. Wegman and J. Carter. New hash functions and their use in authentication and set equality. Journal of Computer and System Sciences, 22(3):265{27, 1981. [74] A. Yao. Theory and applications of trapdoor functions. In Proc. of the 23th Annu. IEEE Symp. on Foundations of Computer Science, pages 80{91, 1982. [75] A. Yao. How to generate and exchange secrets. In Proc. of the 27th Annu. IEEE Symp. on Foundations of Computer Science, pages 162{167. IEEE Press, 1986. 126