Topics in Private Information Retrieval Niv Gilboa

advertisement
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Topics in
Private Information Retrieval
Niv Gilboa
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Topics in Private Information Retrieval
Research Thesis
Submitted in partial fulllment of the requirements
for the degree of Doctor of Philosophy
Niv Gilboa
Submitted to the Senate of
The Technion { Israel Institute of Technology
Tevet 5761
Haifa
January 2001
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
This research was done under the supervision of Prof. Benny Chor, Prof. Shimon
Even and Prof. Moni Naor in the Department of Computer Science.
It is a great pleasure to thank some of the people to whom I am indebted for their
help, support and friendship during my graduate studies.
First and foremost I would like to thank my advisors Benny, Shimon and Moni.
Benny, who was my sole advisor throughout most of my studies, had a profound
inuence on my thinking and contributed a great deal to the research I conducted.
Benny was the rst to arouse my interest in the eld of cryptography during a course
he taught in 1995. Since then I have admired his skills in teaching and research. Not
the least of the many things he taught me is to always question my thought processes
and to look for errors that are forever lurking beneath the surface.
I thank Shimon for accepting the post of my advisor at a very advanced stage of
my studies, and Moni for several enlightening conversations that had signicant eect
on my research. I would also like to express my gratitude to Hugo Krawczyk who
taught me a lot of cryptography during a semester in which I acted as his teaching
assistant, and has given me wise counsel ever since.
A special note of thanks is due to Yuval Ishai for his friendship and for our
productive professional collaboration (although it is true that usually our joint eorts
were neither professional nor productive). Yuval and Shlomit have always provided a
home away from home during my Technion years, and for that they will always have
a warm place in my heart.
The time I spent at the Technion would not have been the same without the past,
present and honorary members of room 429. Ranging from the incurable optimist
(Ronit) through the hard-working realists (Gidi and Eran) to those who were delightfully pessimistic (Yuval, Dorit, Dror, Nadav, and basically everybody else) this
colorful cast of characters could not be improved upon. We shared in such escapades
as multiple trips to the hospital with injured graduate students (who shall remain unnamed), meals at Muach (home of the rened palate) and numerous wagers. Most
important, however, was the assurance that a witty conversation was never far away,
whether it involved freeBSD, religious toleration, personal issues or vast conspiracy
theories.
Finally, I thank those who are closest to me. Efrat who has been my spouse
throughout this long period of time and has accepted my various quirks with laughter.
My parents and sister who supported me in every possible way, and my grandmother
who was always proud of her eldest grandson.
The generous nancial help of the Gutwirth family is gratefully acknowledged.
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Contents
Abstract
Notation
1 Introduction
1.1 Models of privacy : : : : : : : : : : : : : : : : : : :
1.1.1 Information theoretic privacy : : : : : : : :
1.1.2 Computational privacy : : : : : : : : : : : :
1.1.3 Single server CPIR : : : : : : : : : : : : : :
1.1.4 Symmetrically private information retrieval :
1.2 PIR generalizations and similar questions : : : : : :
1.2.1 Block retrieval : : : : : : : : : : : : : : : : :
1.2.2 Private retrieval of information by keywords
1.2.3 t-privacy : : : : : : : : : : : : : : : : : : : :
1.2.4 Private information storage : : : : : : : : :
1.3 Modes of operation : : : : : : : : : : : : : : : : : :
1.3.1 Random servers : : : : : : : : : : : : : : : :
1.3.2 Commodity servers : : : : : : : : : : : : : :
1.4 PIR as a primitive : : : : : : : : : : : : : : : : : :
1.5 PIR techniques in other areas : : : : : : : : : : : :
1.5.1 Oblivious polynomial evaluation : : : : : : :
1.5.2 Joint generation of RSA keys : : : : : : : :
1.6 Lower bounds : : : : : : : : : : : : : : : : : : : : :
1.7 Related work : : : : : : : : : : : : : : : : : : : : :
1.7.1 Multi-party private computation : : : : : : :
1.7.2 Instance hiding : : : : : : : : : : : : : : : :
1.7.3 Communication complexity problems : : : :
2 Model and Denitions
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
2.1 The PIR Model : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
1
4
5
6
6
7
9
10
12
12
13
14
15
17
17
18
19
20
20
20
22
23
23
25
26
28
28
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Contents { contd.
Pseudo-random generators : : : : : : : : : :
The CPIR Model : : : : : : : : : : : : : : :
Final Notation : : : : : : : : : : : : : : : :
Symmetrically private information retrieval :
General multi-party computation denitions
2.6.1 Basic denitions : : : : : : : : : : : :
2.6.2 Composition of protocols : : : : : : :
2.7 private information retrieval by keywords : :
2.8 Joint generation of RSA keys : : : : : : : :
2.2
2.3
2.4
2.5
2.6
:
:
:
:
:
:
:
:
:
3 Computationally private information retrieval
3.1 Correlated Pseudo Randomness : : : : :
3.1.1 Denitions : : : : : : : : : : : : :
3.1.2 Statement of Main Results : : : :
3.1.3 Construction : : : : : : : : : : :
3.1.4 Proof of correlation : : : : : : : :
3.2 Lengths of input and output : : : : : : :
3.3 Proof of index indistinguishability : : : :
3.3.1 Three useful statements : : : : :
3.3.2 Notation used in the proof : : : :
3.3.3 Polynomial indistinguishability :
3.3.4 A quantitative version : : : : : :
3.4 Integrating the proofs : : : : : : : : : : :
3.5 PIR schemes : : : : : : : : : : : : : : : :
3.5.1 A direct scheme : : : : : : : : : :
3.5.2 A Generic Transformation : : : :
3.6 Concluding Remarks and open problems
4 Private information retrieval by keywords
4.1 Denitions and Notation : : : : : : :
4.1.1 The PIR model : : : : : : : :
4.1.2 The CPIR model : : : : : : :
4.1.3 Search data structure : : : : :
4.2 Private Retrieval of Blocks : : : : : :
4.2.1 SPIR(`; n; k) : : : : : : : : : :
4.2.2 PIR(`; n; k) : : : : : : : : : :
4.3 General Solutions to PERKY(`; n; k)
4.4 Specic implementations : : : : : : :
4.4.1 Binary Search Tree : : : : : :
4.4.2 Trie : : : : : : : : : : : : : :
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
30
33
35
36
37
37
40
41
44
45
45
46
48
49
51
52
56
56
58
59
66
72
73
73
75
80
82
82
82
83
85
86
86
88
88
93
93
94
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Contents { contd.
4.4.3 Perfect Hashing : : : : : : : : : : : : :
4.5 Symmetric PERKY : : : : : : : : : : : : : : :
4.6 Other PERKY Topics : : : : : : : : : : : : :
4.6.1 Reducing Communication Complexity
4.6.2 The address of w : : : : : : : : : : : :
4.7 Open problems : : : : : : : : : : : : : : : : :
:
:
:
:
:
:
5 Joint Generation of RSA Keys by Two Parties
5.1 preliminaries : : : : : : : : : : : : : : : : : : : :
5.1.1 Smallest primes : : : : : : : : : : : : : :
5.1.2 Useful techniques : : : : : : : : : : : : :
5.2 Overview : : : : : : : : : : : : : : : : : : : : : :
5.3 Computing N : : : : : : : : : : : : : : : : : : :
5.3.1 Oblivious transfers : : : : : : : : : : : :
5.3.2 Oblivious polynomial evaluation : : : : :
5.3.3 Benaloh's encryption : : : : : : : : : : :
5.4 Amortization and initial primality test : : : : :
5.4.1 OT and oblivious polynomial evaluation
5.4.2 Homomorphic encryption : : : : : : : : :
5.5 Computing d : : : : : : : : : : : : : : : : : : :
5.6 Improvements : : : : : : : : : : : : : : : : : : :
5.7 Performance : : : : : : : : : : : : : : : : : : : :
Bibliography
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
96
99
103
103
105
105
107
107
107
107
109
110
111
112
113
114
114
115
115
117
118
120
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Abstract
In the problem of private information retrieval (PIR) a user queries a database in
order to retrieve one out of n data items, while hiding the identity of the retrieved
item from the database administrator. PIR can be solved trivially by having the
database send all the data to the user. However, in this case the communication
complexity can be prohibitively high.
In the last few years extensive research has been conducted on PIR and related
problems. The research eorts have mainly focused on constructing PIR schemes
that are as ecient as possible in terms of communication. Various PIR scenarios
and variations on the original problem have been considered. In the rst PIR works
information-theoretic privacy was required. In other words the databse could learn
no information about the identity of the retrieved item regardless of its computational power. In order to allow ecient schemes it was assumed that the data is
replicated at several servers which could communicate with the user, but not among
themselves. Subsequent works introduced other models, such as relaxing the user's
privacy requirement to computational privacy, or adding the requirement of database
privacy, which states that the user may learn a single data item but nothing else.
This dissertation consists of three works that are all connected to PIR and related problems. The rst work introduces the concept of computationally private
information retrieval (CPIR). In other words, information retrieval in which privacy
is maintained as long as the servers (two or more) in which the database is held are
computationally bounded. The PIR schemes we construct rely on the mildest standard cryptographic assumption: the existence of one way functions, or equivalently
the existence of pseudo-random generators. The quality of our construction depends
on the exact assumption we make on pseudo-random generators. The standard assumption states that there exists a pseudo-random generator GEN which expands
seeds by a polynomial factor to n bits. The distribution that results from choosing a
seed at random and expanding it by GEN cannot be distinguished from the uniform
distribution on f0; 1gn by any probabilistic polynomial algorithm. Given this assumption, for any constant c we construct a CPIR scheme with communication complexity
O(n1=c). The results are slightly dierent if we view the generator as expanding seeds
1
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
of length (n) to strings of length n. The expansion from (n) to n may be by more
than a polynomial factor.q Given this assumption
plog(n=(n)) our scheme has communication
complexity about (n) 2 log(n=(n)) 2
. In either case our schemes have
substantially lower communication complexity than the best (currently known) two
server information-theoretically private scheme: O(n1=3).
The main technical tool used in the construction of the CPIR schemes is a \correlated" pseudo random generator. On input r (a short random string) and an index
` (1 ` n), the nal output of this generator is a pair of n bit binary strings
G(u` ; 1n) and G(w` ; 1n ). These two strings are pseudo random and highly correlated:
They dier at the `-th bit, and are identical elsewhere. As an intermediate step, the
generator is required to produce a pair of short "succinct representations" u`; w`, so
that G(u` ; 1n ) is eciently computable from u` (and G(w` ; 1n ) from w`, respectively).
The 2n representations fu`gn`=1 ; fw`gn`=1 are pseudo random.
The second topic in the dissertation presents a private retrieval problem that
diers from PIR in the structure of the database. In PIR the assumption is that
the data items are homogeneous and that each is identied by its position in a list
of items. The user retrieves the item in the i-th position while keeping i secret.
However, usually a user does not know the position of a desired data item in the
internal structure of the database. Instead, the user typically holds some information
about the desired data, such as a keyword that is linked to it. The user sends this
keyword to the database and receives in return a list of all the items which correspond
to the keyword.
The PrivatE information Retrieval by KeYwords (PERKY) problem introduced in
this work is a step towards modeling this more realistic scenario. We assume that the
database consists of n keywords, each of length `. The user holds a single keyword
w and wishes to nd whether it is one of the keywords of the database, without
leaking information about w. As is also the case in PIR, dierent variants of PERKY
can be considered. The required user privacy may be either information-theoretic or
computational, database privacy may be required (or not) and the database may be
held by a single server or at multiple sites that do not communicate among themselves.
In our work we show several schemes that solve PERKY by reducing it to the
problem of PIR. The main idea is to have the servers that hold the database organize it
in a data structure which facilitates search operations. Such structures include binary
trees, hash tables etc. The user conducts an oblivious walk on the data structure by
using several executions of a PIR scheme. The properties of the resultant PERKY
scheme are a function of both the data structures and PIR schemes we employ. In
one of the PERKY schemes we show, which utilizes a data structure based on perfect
hashing, the communication complexity is greater than that of a PIR scheme by only
a constant factor. Another PERKY scheme, which utilizes a trie data structure,
2
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
allows protection of database privacy in the sense that the user only learns whether
the keyword w is one of the keywords in the database, but cannot obtain any other
information about the database.
The third work in the dissertation addresses a dierent type of problem altogether.
It presents a protocol for two parties to privately generate an RSA key in a distributed
manner. At the end of the protocol the public key, which is a modulus N = PQ,
and an encryption exponent e is known to both parties. Individually, neither party
obtains information about the decryption key d and the prime factors of N : P and Q.
However, d is shared among the parties so that threshold decryption is possible. An
alternative way to present this concept is that the keys are distributed in such a way
that both parties can jointly sign a message with the secret key, but neither can sign
alone. PIR can be useful in solutions to the joint generation of RSA keys problem. In
particular, the protocol we show uses PIR techniques in order to privately compute
some values, which are essential for the private computation of the decryption key.
3
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Notation
PIR
CPIR
SPIR
PERKY
DBj
U
k
n
x
i
w
{ Private Information Retrieval
{ Computationally-Private Information Retrieval
{ Symmetrically-Private Information Retrieval
{ Private Information Retrieval by Keywords
{ the j -th database in a PIR scheme
{ the user in a PIR scheme
{ number of databases or players
{ length of data string in a PIR scheme
{ length of data block in PIR scheme or of word in PERKY
{ data string in a PIR scheme
{ user's retrieval index in a PIR scheme
{ user's retrieval word in a PERKY scheme
{ a security parameter
{ the set f1; 2; : : : ; ng
{ the i-th unit vector of length n
{ exclusive-or of binary strings
k
{ concatenation of binary strings
GEN
{ a pseudo-random generator
SREP1 ; SREP2
{ succinct representation generators
G
{ expansion generator
SREP1 ; SREP2; G { a pseudo-random generator
[n]
e`(n)
scheme
P; Q
P a ; P b ; Qa ; Qb
N
e
d
{ large primes
{ additive shares of P and Q respectively
{ RSA modulus, product of P and Q.
{ RSA public exponent
{ RSA private exponent
4
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Chapter 1
Introduction
The emergence of the Internet is making publicly accessible databases more common
and more important than ever. Anyone, corporate executive to private citizen, can
access such a database and retrieve up-to-date information for a (hopefully moderate)
fee. However, issues of security and privacy immediately arise. The database may
wish to limit the user's accessibility to certain data, while the user may wish to keep
his query private. That is, the user retrieves a certain item of data without allowing
the database to gain information about the identity of the retrieved item.
The task of private information retrieval (PIR) schemes is to overcome various
privacy and security problems that appear in information retrieval from publicly
accessible databases. The emphasis of most PIR schemes (beginning with the rst
PIR work [25]) is on protecting the user's privacy, although some works [38, 63] deal
also with the privacy of the database.
Currently, PIR schemes are still fairly theoretical in nature. The models used are
simplied and idealized versions of actual commercial databases. Formally speaking,
the database is modeled as an n bit binary string x, and the user's query is a single
address i, 1 i n. The goal of the PIR scheme is to allow the user to learn the i-th
bit of the data string, xi, without revealing any information to the database about i.
PIR research is devoted to extending this basic model by taking into account various
practical considerations, and to constructing ecient protocols in various settings.
While the main motivation for PIR is practical in nature, it is also similar to
several theoretical questions which arose from complexity theory or communication
complexity. We discuss some of these questions in Section 1.7. PIR is also part
of an interesting trend in multi-party private computation. The goal is to allow
several players to jointly compute a function of their inputs without revealing "too
much" information on these inputs. During the 1980's the main research eort in
this eld has been to establish what functions can be privately computed and under
what constraints. A more contemporary trend is to discover how eciently specic
5
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
functions can be computed. PIR research is a part of this trend as are works in
threshold cryptography [31], joint generation of El-Gamal cryptosystem keys[67], RSA
keys [15, 26, 27, 35, 68, 40], or oblivious polynomial evaluation [63].
Much of the remainder of the introduction is a survey of PIR literature and its
background. In section 1.7 we present several problems and directions of research
which preceded PIR and are important in understanding its evolution. In sections
1.1, 1.2, 1.3, 1.4, 1.5, 1.6 we present and discuss in brief the PIR works that have been
published so far. The survey also includes an introduction to each of the technical
chapters in this work. In subsection 1.1.2 we present the notion of computationally
private information retrieval, as an introduction to chapter 3. In subsection 1.2.2
we introduce private information retrieval by keywords, which is fully presented in
chapter 4. Finally, in subsection 1.5.2 we outline joint generation of RSA keys, fully
presented in chapter 5.
1.1 Models of privacy
1.1.1 Information theoretic privacy
In the rst work to present the PIR problem [25] Chor, Goldreich, Kushilevitz and
Sudan considered the following model. A string x 2 f0; 1gn is replicated at k servers
DB1; : : : ; DBk , which do not communicate with one another. A user, U is interested in
retrieving the i-th bit xi. The user's privacy is maintained in the information-theoretic
sense. In other words, each server (individually) is not allowed to gain information on
the identity of the bit being retrieved, regardless of the server's computational power.
Several schemes that solve this basic PIR problem were put forward in [25]. Each
scheme is parameterized by k (the number of non-communicating servers required to
implement it) and by its communication complexity. The best schemes in [25] had
the following parameters:
A 2-server scheme with communication complexity O(n1=3).
For any k, a k-server scheme with communication complexity O(k2 n1=k log k).
A particular instance of 2the above is a (log n)=3 server scheme with communication complexity O(log n log log n).
In [3] Ambainis generalized the 2 server protocol of [25] 2into a k server protocol.
The communication complexity of his k server scheme is 2O(k )n1=(2k?1). This is better
than the previous k server scheme for any constant number k (and indeed for any k
less than approximately log1=3 n).
6
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
In [54] Ishai and Kushilevitz present a linear algebraic framework in which to view
certain types of information-theoretic PIR schemes. They show that both the schemes
of [25] and of [3] are specic cases of this framework. Furthermore, they constructed
an improved k-server scheme achieving communication complexity O(k3n1=(2k?1)).
1.1.2 Computational privacy
It seems that the techniques currently used for information theoretically private information retrieval cannot yield k-server schemes whose communication complexity is
asymptotically lower than O(n1=(2k?1)). However, this obstacle led to the evolution of
other interesting directions of PIR research. One idea, which constitutes a major contribution of this dissertation, is to weaken the privacy requirement to computational
privacy. A server whose computational resources are limited (say to polynomial time
computations) cannot infer any information on the identity of the retrieved bit i, given
reasonable cryptographic assumptions. This relaxation of the privacy requirements
is very natural; any "real world" server has limited computational resources.
In [22] Chor and Gilboa present the rst computationally private information
retrieval (CPIR) scheme. A complete version of that work appears in chapter 3. The
PIR model in this work is identical to the information-theoretic PIR model except for
relaxing the privacy requirement to computational privacy. We show how to construct
ecient 2 server CPIR schemes1 using a novel tool which we call a correlated pseudorandom generator.
Performing cryptographic tasks in a distributed manner often requires some form
of correlated randomness. That is, the parties involved in the task have access to correlated random sources. In chapter 3 (an initial version appears in [22]) we develop
a tool for achieving a specic type of correlated randomness at a low cost in communication. We are interested in the following scenario. A dealer chooses two strings
a; b 2 f0; 1gn uniformly at random subject to the condition that they are identical
except for a single bit at a predetermined position `. The dealer communicates with
two parties in such a manner that the rst one learns a, the second learns b, but
neither learns any information about `. It follows from [25] that the communication
in this setting between the dealer (a PIR user) and each of the parties (PIR data
servers) is at least n ? 1 bits.
Suppose we are ready to settle for such a result in a computational setting, in
order to reduce the communication costs. We would like to develop a scheme which
has the following stages: the dealer sends a short string u to the rst party and a
short string w to the second party, each party eciently expands its string into two
1
one.
The scheme can be generalized to more than 2 non-communicating servers but not to a single
7
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
n bit strings G1(u; 1n) and G2 (w; 1n ) respectively. These expanded strings should be
identical except for a single bit at the predetermined position `. A task like we just
described can surely be achieved using any pseudo random generator GEN . Let s be a
seed for GEN . Then u = s and w = sk` can be expanded to G1 (u; 1n) = GEN (s; 1n )
and G2(w; 1n ) = GEN (s; 1n ) e`(n), which dier only at the `-th bit (e`(n) denotes
the `-th unit vector of length n, and k denotes concatenation). On the other hand, it
is clear that the party who gets the string sk` learns `.
With this example in mind, our goal is to distribute two succinct representations
u; w such that G(u; 1n ) G(w; 1n ) = e`(n)2 and neither u nor w yield any eciently
computable information about `. We show how to obtain the desired correlation at
a low cost, under the mildest acceptable cryptographic assumption { the existence of
pseudo random generators [14, 74]. (This is equivalent to the existence of one way
functions [51, 50].)
Given this assumption, using any pseudo-random generator GEN we develop
a \correlated" pseudo-random generator, which on input (1n ; `; r) (where r is chosen randomly from an appropriate space) produces two pseudo-random n bit strings
that dier only in the `-th bit. The generator consists of a trio of algorithms:
(SREP1 ; SREP2; G) and its operation can be divided into two stages. In the rst
stage, each SREP algorithm produces a short string which we call a succinct representation. SREP1 produces u and SREP2 produces w. If the input r is random then
the output of each SREP algorithm is pseudo-random (individually, not jointly). In
the second stage the expansion algorithm, G, is applied (separately) to u and w. The
result is the required pair of n bit strings, that dier only in the `-th bit. Therefore,
the dealer can distribute a; b among the two parties by sending the rst party the
output of SREP1 and the other the output of SREP2. The parties can themselves
apply G and expand the strings they receive.
We measure the quality of the correlated pseudo-random generator by the length
of the succinct representations u and w. The shorter they are, the better. The motivation for using this specic measure is that in the application we quoted above, the
communication from the distributor to each of the two parties is exactly one such succinct representation. In our construction, the length of the succinct representations
is dependent on the exact assumption we make on pseudo-random generators. The
standard assumption states that there exists a pseudo-random generator GEN which
expands seeds by a polynomial factor from (n) = n1=d to n bits. The distribution
that results from choosing a seed at random and expanding it by GEN cannot be
distinguished from the uniform distribution on f0; 1gn by any probabilistic polynomial algorithm. Given this assumption, for any constant c we construct a correlated
pseudo-random generator with succinct representations of length O(n1=c). The re2
We require that the same algorithm be used to expand both strings.
8
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
sults are slightly dierent if we view the pseudo-random generator as expanding yet
shorter seeds of length (n) to strings of length n, where (n) may be sub-polynomial
in n. Given this assumption we present aqcorrelated pseudo-random
plog(n=(n)) generator with
succinct representations of length (n) 2 log(n=(n)) 2
.
In the second part of Chapter 3 we apply the correlated pseudo-random generators to the problem of computationally private information retrieval (CPIR). We
show two slightly dierent CPIR schemes. The rst is a direct application of the
correlated pseudo-random generator and is presented in subsection 3.5.1. The second transforms any one round, two database computationally private scheme to a one
round, two database computationally private scheme with (typically) lower communication complexity. This second scheme is constructed with similar techniques to those
employed in the correlated pseudo-random generator and is presented in subsection
3.5.2.
Both schemes use the same intractability assumption { the existence of pseudo
random generators. Given such a generator GEN that expands
the
q (n) bits topnlog(bits
communication complexity of the rst is about 2(n) log(n=(n)) 2 n=(n)).
Somewhat better results can be achieved by the second scheme through repeated
recursive executions of the transformation it provides (from one PIR scheme to another). Both schemes employ only one round of communication, do not require any
coding in storing the database contents and they are memoryless { neither users nor
databases have to remember any of the communication's history.
1.1.3 Single server CPIR
Single server PIR schemes maintaining information-theoretic privacy have communication complexity at least n, as shown in [25]. However, that lower bound does not
hold for computationally private schemes.
The aim of many CPIR works following [22] was to break the single server barrier
and to construct ecient (i.e sub-linear communication complexity) CPIR schemes
in which the participants are a user and a single server. The price that has to be paid
for single server schemes is stronger cryptographic assumptions. Currently, all the
known single server CPIR schemes rely on specic number-theoretic cryptographic
assumptions.
The rst single server scheme was suggested by Kushilevitz and Ostrovsky in [58].
They used the well known quadratic residuosity assumption
in order to construct
p2 [49]
log
n
log
(n) , where (n) is
a scheme with communication complexity roughly (n)2
the security parameter. In [72] Stern uses homomorphic encryption systems such
as the Benaloh system [12], or the Naccache-Stern system [62] (which are natural
extensions of the Goldwasser-Micali cryptosystem [49]) to obtain an improved version
9
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
of the Kushilevitz-Ostrovsky
plog n protocol. His scheme has communication complexity
approximately (n)2
.
Both the Kushilevitz-Ostrovsky and the Stern protocols depend on certain homomorphic properties of the cryptographic primitives they use. In [60] Mann shows how
to obtain similar schemes from a wide class of trap-door one-way functions that exhibit certain homomorphic properties. His scheme can be based, for instance, on the
decisional Die-Hellman assumption [64] or on lattice problems see [2, 45, 46]. The
Mann scheme has the same communication complexity as the Kushilevitz-Ostrovsky
scheme (apart from the fact that the security parameter (n) is not necessarily the
same for dierent cryptographic assumptions).
In [17] Cachin, Micali and Stadler use yet another assumption and present a
dierent single server CPIR scheme. They put forward the assumption that given
a large product of two primes N = PQ it is infeasible to determine for some prime
p > 2 whether p j (N ), where is Euler's totient3 function, and the factorization of
N is unknown. This scheme diers from other PIR schemes in that its communication
complexity depends only on the security parameter (n) and not directly on n. Indeed
the communication complexity of the protocol, O((n)4 ) is polynomial in the security
parameter.
1.1.4 Symmetrically private information retrieval
Typically, in general multi-party private computations the privacy of all players must
be maintained. That is, no information about a player's input should be leaked by the
protocol, beyond what follows as a direct consequence of the output. Furthermore, a
great deal of research has been devoted to the case of malicious players. Those are
players that attempt to learn information about other participants' input by diverging
from the protocol in some way.
The symmetrically private information retrieval (SPIR) problem diers from PIR
in that it adds the requirement of the database's privacy. The user is allowed to obtain
a single bit of the database xi, but no other information about x. The motivation for
this extra restriction is quite natural. A commercial database that sells data items
for a certain rate would certainly prefer to use a retrieval protocol which allows the
user to obtain only the data he paid for and nothing else.
The SPIR problem is in fact very similar to two problems presented during the
1980's. The rst is 1 out of n oblivious transfer, which is a natural generalization of
1 out of 2 oblivious transfer [34]. The second problem is all or nothing disclosure of
secrets (ANDOS) presented in [16]. In both of these problems there are two players.
A server holding n data items x1; : : :; xn and a user holding an index i; 1 i n
3
(n) = jm1 m < n; gcd(m; n) = 1j
10
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
(the titles of the players are dierent in oblivious transfer and ANDOS works). The
goal is for the user to retrieve xi without learning any information about other items
and without letting the server know which item was obtained. The focus of oblivious
transfer and ANDOS works is to show that these tasks are at all possible, and to
nd the mildest cryptographic assumptions which can be used to devise a protocol
that solves the problem. The emphasis of SPIR is on eciency rather than minimal
assumptions. Furthermore, SPIR research is also interested in the PIR scenario of
the data being replicated at several non-communicating sites, which is not the case
for the other problems.
In [38] Gertner, Ishai, Kushilevitz and Malkin introduced the notion of SPIR.
Their main results were in the multi-server, information-theoretic privacy setting. In
this model they show that PIR schemes can be used to construct SPIR schemes, as
long as the servers have a source of shared randomness (without which the privacy of
the data can not be protected in an information-theoretic sense). In case of an honest
user the following results are attained:
A transformation of any k-server information-theoretic PIR scheme to a k + 1server information-theoretic SPIR scheme with the same communication complexity (up to a multiplicative constant).
A k-server information-theoretic SPIR scheme, which for any constant k has
communication complexity O(n1=(2k?1)). Using the same techniques in conjunction with the PIR schemes of [54] yields for any k 2 SPIR schemes with
communication complexity O(k3n1=(2k?1)).
An O(log n)-server information-theoretic
SPIR scheme which has communica2
tion complexity O(log n log log n).
A 2-server
computational SPIR scheme which has communication complexity
p2log n=
(n) , where (n) is the security parameter, and the cryptographic
(n)2
assumption is that one-way functions exist.
The work also dealt with the case of a dishonest user. All the above mentioned
schemes can be adapted to this case at the cost of a multiplicative O(log n) factor in
the communication complexity.
A disadvantage of [38] is that it deals with single server schemes by using general
(and inecient) zero-knowledge protocols. A new approach was oered in [63] by
Naor and Pinkas. They showed how to transform any PIR scheme to a SPIR scheme
at the cost of log n 1 out of 2 oblivious transfers (and some extra computation by the
server). Their idea is very useful in putting together single-server schemes, and in
eciently handling dishonest users. However, it does have the drawback that even if
11
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
the original PIR scheme is information-theoretically private, the corresponding SPIR
scheme is computationally private, and relies on the security of oblivious transfers.
An interesting and dierent approach was used by Stern [72] to construct a single
server SPIR scheme. Unlike the schemes of [38, 63] the Stern idea is not a general
reduction of SPIR to PIR but a specic SPIR protocol. It uses the same cryptographic
assumptions as the PIR protocol introduced in the same
and achieves the same
plogwork,
n
communication complexity (up to a constant), (n)2
.
1.2 PIR generalizations and similar questions
PIR in its original form is a very "clean" and restricted problem. Any attempt to use
PIR schemes in "real world" applications is bound to raise a host of questions that
remain unanswered in the basic model. In this section we outline several problems
that arise along these lines. They either generalize the PIR problem, or are tangential
to it and were inspired by PIR research. They are all addressed at problems that PIR
solutions leave unsolved.
1.2.1 Block retrieval
The problem of privately retrieving blocks of information was introduced in the rst
PIR work [25]. In this model k servers hold n blocks of ` bits each which comprise
the data, and the user wishes to retrieve the i-th block while keeping i private. This
problem is a strict generalization of PIR, since PIR can be regarded as the private
block retrieval problem, with ` = 1. Since a block of ` bits can clearly be retrieved via
` invocations of a regular PIR protocol, the goal is to achieve more ecient solutions
than that.
The technique of balancing the communication between the servers and user was
presented in [25] and used to great advantage in block retrieval. The idea is to
transform a PIR scheme P in which the user's communication complexity is and
a server's communication complexity is into a PIR scheme P 0 in which the user
sends 0 bits and each server sends 0 bits. Applying this method Chor et al showed
that if the number of non-communicating servers is a constant k, any block of size
` n1=(k?1) can be retrieved with communication complexity O(`). In the paper the
bound was explicitly stated only for k = 2, but the generalization is immediate using
two theorems that appear in that work.
A dierent balancing technique was used in [24] to prove another claim for the
multi-server information-theoretically private model. It was shown that for any block
size ` and for any constant number of servers k, k 2, It is possible to privately
retrieve the i-th block with communication complexity O(n1=(2k?1) `k=(2k?1)).
12
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Using other PIR schemes as building blocks for private block retrieval schemes is
certainly possible (although nothing further has been published to date). Especially
appealing are the information-theoretically private schemes of [54], the computationally private schemes of [22] and the single-server CPIR schemes of [72].
1.2.2 Private retrieval of information by keywords
One of the ways the PIR model diverges from actual databases is in the type of
queries it permits. In PIR the user is supposed to know enough about the internal
structure of the database in order to pinpoint the location of the desired data. Thus,
the user applies a query of the type: "what is the data item at the i-th position?". A
more realistic scenario is that the user has some keyword linked to the required data.
The user queries the database about the keyword and receives in return a list of data
items that correspond to the keyword.
The PrivatE information Retrieval by KeYwords (PERKY) problem attempts
to model this more realistic setting. In PERKY the data servers hold a list of n
keywords s1; : : : ; sn, while the user holds a single keyword w. The user wishes to
discover whether w 2 fs1; : : : ; sng or not, without leaking any information about w.
Solving this problem allows private retrieval of any data associated with w (such as
its address in the database).
The PERKY problem was introduced in [23] but was not published until [24].
Independently it appeared in [58] and was presented in brief as an extension of the
rst single-server scheme. In [24] PERKY was more fully explored. An extended
version of this work appears in chapter 4.
We describe a simple, modular way to privately access data by keywords. Our
scheme combines PIR solutions together with data structures that support search
operations, in order to retrieve information privately in the keyword model. We
present a general transformation from PIR schemes to retrieval by keywords schemes,
using a large class of data structures. We also describe specic instantiations of this
transformation.
The main idea in our constructions is the following: the servers insert all the keywords they hold, s1; : : : ; sn, into a data structure, which supports search operations
on strings. The user holds a specic keyword w, and conducts an \oblivious walk"
on the data structure until either the word w is found, or the user is assured of the
fact that w is not one of s1; : : : ; sn. A typical search in the data structure involves
a sequence of operations, where each operation consists of fetching the contents of
a word from memory, performing a \local" computation, which depends on the keyword and the fetched contents, and either determining a new address based on the
computation or terminating the search (successfully or unsuccessfully). This sequence
of operations can be viewed as a walk on the data structure. By employing repeated
13
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
invocations of a PIR scheme (or more precisely a private block retrieval scheme), we
transform this walk into an oblivious walk on the data structure, where each server
gets no information on the walk, except its length. This implies, in particular, that
nothing is revealed about the desired keyword itself beyond its length.
Apart from the reductions of PERKY to PIR (which form the heart of chapter 4)
we also show the inverse reduction from PIR to PERKY, and present the notion of
symmetrically private information retrieval by keywords, which is analogous to SPIR.
In this problem the user obtains a single bit of information (whether w 2 fs1; : : : ; sng)
and nothing else. Among the results of we show are the following:
Given a k-server `-bit block retrieval scheme with communication complexity
C , there exists a k-server PERKY scheme so that for keywords of length ` the
communication complexity is O(C ).
Given a k-server SPIR scheme with communication complexity C , there exists
a k-server symmetric PERKY scheme so that for keywords of length ` the
communication complexity is no more than 2` log n C .
Given a k-server PERKY scheme in which the keyword length is log n, there
exists a PIR scheme with the same communication complexity.
1.2.3 -privacy
t
One of the most important parameters in a multi-party private computation is the
adversary structure. One of the most prevalent adversary structures is the t-threshold
structure, which contains all the subsets with at most t-parties, for a given parameter
t. A multi-party protocol which ensures that no subset of at most t players receive
any "illegitimate" information is called t-private.
The PIR problem requires 1-privacy among the data servers. Each server by itself
cannot infer any information (in the information-theoretic or the computational sense)
about i, the identity of the retrieved item. However, all the previously mentioned
multi-server PIR schemes suer from the drawback that if several servers cooperate
then the privacy of the user is seriously compromised. This problem does not arise,
o course, in single-server schemes.
In the t-privacy information retrieval problem the data string x is replicated at
k servers, the user retrieves a bit xi while keeping the identity of i hidden from any
coalition of at most t servers. The practical motivation for the problem is that the
requirement in the usual PIR setting that no two servers communicate is articial
and in many cases impossible to implement. However, it might be more likely that
only a small number of servers cooperate in the "illegal" action of trying to discover
14
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
what i is. Therefore, a scheme that withstands an attack on the privacy of the user
by a small coalition of servers is very desirable.
In order to present the technical results we adopt the format of [54]. Instead of
regarding the communication complexity as a function of k, the number of servers, and
t, the privacy threshold, we regard k as a function of t and d where the communication
complexity is O(n1=d).
The t-privacy problem was rst considered in the information-theoretic model
in [25]. The authors showed that for any two natural numbers t and d there exists a t-private (dt ? t + 1)-server information-theoretic PIR scheme with communication complexity O(n1=d ). The polynomial interpolation technique used in [25] is
an adaptation of locally random reductions and instance hiding techniques [1, 7, 9].
Those methods are themselves a particular application of the Ben-Or, Goldwasser
and Wigderson multi-party private computations protocol [11], which in turn uses
Shamir's polynomial interpolation secret sharing scheme [71].
Improved results were obtained by Ishai and Kushilevitz in [54] who used as the
basis of their work the replication based secret sharing scheme presented by Ito, Saito
and Nishizeki in [55]. Ishai and Kushilevitz show t-private k-server informationtheoretic PIR schemes with communication complexity O(n1=d) for the following values of k:
For any natural number t and for any odd integer d 3,
$
%
!
d
+
t
?
3
k = min dt ? 2
; dt ? t :
For any natural numbers t; d,
%
!
$
d
+
t
?
3
; dt ? t + 1 :
k = min dt ? 2
A nal observation about t-private schemes was oered by Chor et al in [25]. They
pointed out that any t-private k-server PIR scheme can be transformed into a bk=tcserver PIR scheme (i.e with 1-privacy). Hence, t-private schemes cannot become
too ecient without aecting regular PIR schemes (although there is some room for
improvement in t-private schemes as can be deduced by comparing the 1-private and
t-private schemes of [54]).
1.2.4 Private information storage
Private information storage is an interesting extension to the PIR model which was
proposed by Ostrovsky and Shoup in [66]. In this problem the user can both privately
15
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
retrieve a bit xi from a database and privately write a bit onto the database. The
privacy of the storage operation pertains both to the position of the stored item and
to its value. In particular, a server holding the database has no information as to the
actual data (otherwise, a storage operation could not be private).
Initially, in [66], the motivation for private storage was very similar to that of PIR.
However, it seems very unlikely that public databases would allow each and every one
of the multitude of users who access them to change their contents. That is especially
true considering the fact that a database administrator cannot monitor the database
in this scenario, since its contents are hidden in some manner. Another point against
the publicly accessible database model is that the reliability of the database could be
jeopardized by a malicious user bent on storing false information. A nal point to
consider in this context is that if a database administrator is truly curious about a
stored value, he can pose as a user, query the data and retrieve the desired item.
Despite all of the above, private storage and retrieval may still be very useful but
in another niche altogether. Consider a setting in which a single user (or a group of
users that have some joint interests) wishes to store data at a remote site and retrieve
it or change it at his leisure. Such a scenario may ensue due to the user's wish to save
local storage space or to have a safe back-up for his data (certain regrettable events
involving our system administration brought forth this train of thought). The user
would clearly wish to retrieve and store the data privately, and hence would like to use
a private retrieval and storage scheme. The problems of the storage model discussed
previously are an integral part of a multi-user storage scheme, but are irrelevant here
as there is only one unique user who can access the data (some type of an access
control mechanism such as passwords has to be added here).
The private storage schemes proposed in [66] are an adaptation of oblivious RAM
techniques [42, 65, 48]. In the oblivious RAM problem there are two parties: a CPU
and a main memory. The CPU and memory interact in order to execute a computer
program. The CPU has a limited amount of local memory in registers, but most of
the data used during the program's execution is stored at the main memory. There is
a privacy requirement which states that the main memory is not allowed to have any
information on the access pattern of the CPU (the memory addresses which the CPU
accesses). There are two major dierences between the oblivious RAM and private
storage and retrieval problems. The rst is that the user in private storage does not
have any local memory, unlike the CPU in oblivious RAM. The second is that private
storage allows distribution of the data servers, unlike the main memory in oblivious
RAM.
The main result in [66] is a reduction of the private storage and retrieval problem to
PIR. Given a k-server PIR scheme with communication complexity C they construct
a k +1 private storage and retrieval scheme with communication complexity C log3 n.
The reduction ensures information-theoretic privacy. That is, if the PIR scheme is
16
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
information-theoretically private, then so is the private storage and retrieval scheme.
Ostrovsky and Shoup also include results in the computational privacy model.
Private storage and retrieval schemes are constructed for a 4-server setting and for
a 2-server setting. In the rst case assuming the existence of one-way functions
suces. In the second one has to make the stronger assumption that one-way trapdoor permutations exist. In both cases the asymptotic communication complexity
is very low: polynomial in the security parameter (n) and log n. However, both
schemes require intensive use of general multi-party computation protocols [75, 47]
and seem inecient for reasonable choices of the parameters n and (n).
A disadvantage of the storage solutions presented in [66] is that they call for
O(log n) rounds of communication for each storage or retrieval operation. Dierent
storage protocols with a constant number of communication rounds are proposed
in [41]. The main thrust of the work is in the computational privacy model via
generalizations of techniques used in [22]. It also deals with t-private storage and
retrieval schemes.
1.3 Modes of operation
One way of tackling inherent problems of the PIR model is to change it. The two
works we outline in this section add new entities to the PIR setting as a means of
bypassing certain obstacles.
1.3.1 Random servers
In [37] Gertner, Goldwasser and Malkin discuss the data replication problem. The
conventional view of multi-server PIR schemes is that the data servers are the legal
owners of the data, or at least do not abuse the trust of the database who provided
them with the data (e.g., by selling it for private prot). In [37] there is an explicit
separation between the legal owner of the data, the database, and the k servers whose
only function is to assist in PIR schemes. An extra privacy requirement is added:
no server individually can gain any information about the data string x. This last
requirement can be strengthened to protecting the privacy of the data against any
coalition of up to t servers (where 1 t k). Another problem addressed by
this work is that of the heavy on-line workload of the database. That is solved by
shifting almost all the load to the servers, while the database's on-line computation
and communication remain minimal.
The approach of [37] is to have the database use a secret sharing scheme to distribute the string x among the servers in such a way that any coalition of servers in the
adversary structure receives no information about x. The work provides solutions for
17
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
t-privacy of the data (in which case a k-server scheme is transformed into a k(t + 1)server scheme of the new type), and for full privacy. In this second case the data is
kept private of all the k servers together. However, the database and servers have
to engage in a periodic and expensive re-initialization phase. The computation and
communication of the auxiliary servers in all the schemes of this work is comparable
to that of the of the data servers in the underlying PIR schemes. The computation
and communication of the database is minimal (O(1), or in some cases no work at
all).
1.3.2 Commodity servers
In [32] Di-Crescenzo, Ishai and Ostrovsky introduce a model which also includes
a user, data servers and auxiliary servers. Their model, goals and results greatly
dier from those of [37]. They follow the footsteps of Beaver in [6] and introduce
"commodity based PIR". In this setting the data servers have the same function as
in regular PIR. The auxiliary servers (called commodity servers) are active only at
an o-line stage, and are used to reduce the total communication complexity at the
on-line stage. (Compare this goal with [37] in which the auxiliary servers are used to
perform the tasks of the data servers in regular PIR).
The commodity servers can remain unaware of all the parameters of the PIR
scheme. They do not know the database x, the user's query i and indeed do not
know if there are other commodity servers in the system and what their number is.
The only function of the servers is to send a single message (the commodity) per
query to each data server and to the user.
The basic result of [32] is that given a single commodity server any PIR scheme
P which is executed in one round of communication can be transformed into a commodity PIR scheme that has the following properties:
The number of data servers remains the same as in P .
The o-line communication complexity is (roughly) identical to the number of
bits sent by the user in P .
The on-line communication is the number of bits sent by a data server in P
plus log n bits.
It follows that the PIR schemes most suited for commodity protocols are those in
which the data servers' communication complexity is very low (such schemes appear
in [25, 54] and in chapter 3).
A lot of work extending this basic result is to be found in [32]. The issue that
most troubled the authors is a collusion between the commodity server and some of
18
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
the databases. In the basic scheme, if the commodity server colludes with even one
of the databases the privacy of the user is completely jeopardized. More advanced
schemes proposed in [32] ensure that the privacy of the user is maintained as long as
at most m ? 1 out of m commodity servers collude with a database. Following is a
short list of some of the schemes presented in the work:
A single database m-commodity server scheme, based on [58].
A 2m database m-commodity server scheme based on [22] that can withstand a
collusion of m ? 1 commodity servers with one of the databases (but the number
of databases has become very large).
An mtd + 1 database, m-commodity server scheme that oers informationtheoretic privacy, and is based on schemes of [25]. It can withstand a collusion
of up to t databases and m ? 1 commodity servers.
In all of these schemes the communication of each party remains much the same as
it was in the basic scheme (except for an additional poly((n)) bits sent by the user
in the single database scheme). Hence, the user's communication complexity remains
low:logarithmic in n and at most polynomial in the security parameter (n).
1.4 PIR as a primitive
An interesting and recent development has been to investigate the "power" of PIR
in comparison with other cryptographic primitives. The information-theoretic PIR
works [25, 3, 54] showed that if the number of non-communicating servers is at least
two, PIR schemes with sub-linear communication are possible, without any cryptographic assumptions. Furthermore, [22] showed that assuming the existence of
one-way functions is sucient to ensure very ecient 2-server schemes (in terms of
communication). However, single server schemes [58, 60, 72, 17] all required specic
trap-door one-way functions.
This gap between multi-server and single-server schemes turns out to be no coincidence. In [10] Beimel et al show that a single server PIR scheme with communication
complexity at most n ? 1 can be used to construct a one-way function. This can
be achieved both directly, or by constructing other cryptographic primitives, such as
commitment schemes, which are known to be equivalent to one-way functions [51].
A second work by Di-Crescenzo, Malkin and Ostrovsky [33] developed this direction
further and showed that single-server PIR with sub-linear communication complexity
implies oblivious transfer, which in turn implies the existence of trap-door one-way
functions. There is also a simple construction of secret key exchange ( based on a
single-server sub-linear communication PIR scheme [53].
19
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Therefore, while no cryptographic assumptions at all are needed for multi-server
PIR schemes, and one-way functions suce for very ecient multi-server schemes,
the existence of trap-door functions is essential for single-server PIR schemes. Furthermore, results such as [52] hint that nding a single-server CPIR scheme based
on weaker assumptions, such as one-way functions, may prove to be a major breakthrough in theoretical computer science.
1.5 PIR techniques in other areas
Research in PIR has proved benecial to other private computation problems. This is
especially true of SPIR works, since SPIR considers the more usual model of protecting the privacy of all parties in the protocol. We provide two examples of problems
seemingly unrelated to PIR, which use SPIR techniques as part of their solution.
1.5.1 Oblivious polynomial evaluation
The oblivious polynomial evaluation problem (OPE) was rst introduced and solved
by Naor and Pinkas in [63] (in the same work as their novel SPIR technique). The
OPE model is as follows. Alice holds a eld element 2 F and Bob holds a polynomial B (x) over F . At the end of the protocol Alice obtains only B () while Bob
learns nothing at all. The intractability assumption used in [63] is new and is presented briey in chapter 5. In broad terms the idea of the protocol is that Alice sends
d lists of m eld elements each to Bob. Bob computes a value for each element and
now holds d lists of m values. Alice and Bob engage in d invocations of a single-server
SPIR scheme (where Alice is the user and Bob is the server) through which Alice
learns one value of each of the d lists. As proven in [63] the d values suce to gain
B () and nothing else.
1.5.2 Joint generation of RSA keys
Another problem in which SPIR techniques prove useful is that of joint generation of
RSA keys [15, 26, 27, 35, 68, 40]. In chapter 5 (which was rst published as [40]) we
show how two parties can jointly generate RSA public and private keys. Following
the execution of our protocol each party learns the public key: N = PQ and e, but
does not know the factorization of N or the decryption exponent d. The exponent d
is shared among the two players in such a way that joint decryption of cipher-texts
is possible.
Generation of RSA keys in a private, distributed manner gures prominently in
several cryptographic protocols. An example is threshold cryptography, see [30] for a
20
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
survey. In a threshold RSA signature scheme there are k parties who share the RSA
keys in such a way that any t of them can sign a message, but no subset of at most
t ? 1 players can generate a signature. A solution to this problem is presented in [29].
An important requirement in that work is that both the public modulus N and the
private key are generated by a dealer and subsequently distributed to the parties. The
weakness of this model is that there is a single point of failure{ the dealer himself.
Any adversary who compromises the dealer can learn all the necessary information
and in particular forge signatures.
Boneh and Franklin show in [15] how to generate the keys without a dealer's help.
Therefore, an adversary has to subvert a large enough coalition of the participants
in order to forge signatures. Several specic phases of the Boneh-Franklin protocol
utilize reduced and optimized versions of information theoretically private multi-party
computations [11]. Those phases require at least three participants: Alice and Bob
who share the secret key and Henry, a helper party, who knows at the end of the
protocol only the public RSA modulus N .
Subsequent works [26, 27, 68] and [35] consider other variants of the problem of
jointly generating RSA keys. In [26] Cocks proposes a method for two parties to
jointly generate a key. He extends his technique to an arbitrary number of parties
k [27]. The proposed protocol suers from several drawbacks. The rst is that the
security is unproven and as Coppersmith pointed out (see [27]) the privacy of the
players may be compromised in certain situations. The second is that the protocol is
far less ecient than the Boneh-Franklin protocol. In [68] Poupard and Stern show
a dierent technique for two parties to jointly generate a key. Their method has
proven security given standard cryptographic assumptions. Some of the techniques
employed in the current work are similar to the ideas of [68] but the emphasis is
dierent. Poupard and Stern focus on maintaining robustness of the protocol, while
we emphasize eciency. In [35] Frankel, Mackenzie and Yung investigate a model of
malicious adversaries as opposed to the passive adversaries considered in [15, 26, 27]
and in our work. They show how to jointly generate the keys in the presence of any
minority of misbehaving parties.
The current work focuses on joint generation of RSA keys by just two parties.
We use the Boneh-Franklin protocol and replace each three party sub-protocol with
a two party sub-protocol. We construct three protocols. The rst is based on 21
oblivious transfer of strings. Thus, its security guarantee is similar to that of general
circuit evaluation techniques [75, 47]. The protocol is more ecient than the general
techniques and is approximately on par with Cocks' method and slightly faster than
the Poupard-Stern method. The second utilizes a new intractability assumption akin
to noisy polynomial reconstruction that was proposed in [63]. The third protocol is
based on a certain type of homomorphic encryption function (a concrete example is
given by Benaloh in [12, 13]). This protocol is signicantly more ecient than the
21
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
others both in computation and communication. It's running time is (by a rough
estimate) about 10 times the running time the Boneh-Franklin protocol.
There are several reasons for using 3 dierent protocols and assumptions. The
rst assumption is the mildest one may hope to use. The second protocol has the
appealing property that unlike the other two it is not aected by the size of the
modulus. In other words the larger the RSA modulus being used, the more ecient
this protocol becomes in comparison with the others. Another interesting property
of the rst two protocols is that a good solution to an open problem we state at the
end of the paper may make them more ecient in terms of computation than the
homomorphic encryption protocol.
We assume that an adversary is passive and static. In other words, the two
parties follow the protocol to the letter. An adversary who compromises a party may
only try to learn extra information about the other party through its view of the
communication. Furthermore, an adversary who takes over Alice at some point in
the execution of the protocol cannot switch over to Bob later on, and vice versa.
1.6 Lower bounds
The greatest part of the PIR research eort has been expended in discovering upper
bounds for the communication complexity in various settings. Finding matching
lower bounds has been mostly neglected. A trivial bound is log n + 1 bits, since
communication complexity arguments prove this bound even without the privacy
requirement.
The multi-server model used in information-theoretic PIR was engendered by a
lower bound proven in [25]. The lower bound states that any single-server PIR scheme
maintaining information-theoretic privacy has communication complexity of at least
n. Thus, any meaningful information-theoretic PIR scheme must assume the multiserver model.
Currently no general lower bounds for the multi-server case exist. However, there
have been some advances in lower bounds for multi-server PIR schemes that have
specic properties. The rst such bound was proven in [25]. The lower bound applies
to any 2-server PIR scheme that is comprised of the following phases: the user sends
a query q1 to the rst server and q2 to the second server, each server sends a single bit
as reply and nally the user's answer is the exclusive-or of these bits. As mentioned
previously, the authors prove that for this type of scheme the communication sent to
each server must have at least n ? 1 bits.
A more general (though much weaker) lower bound in the information-theoretic
setting was provided by Mann in [60]. Mann proves a bound for multi-server, oneround information-theoretic PIR schemes. The communication complexity for a k22
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
server scheme of this type is greater than (k2=(k ? 1) ? ") log n, for any constant
" > 0. In particular this result shows that any two server, one round, informationtheoretically private scheme has communication complexity that is at least (roughly)
4 log n. This result is the most general lower bound proven so far. Of special interest
is the fact that all the known information-theoretic PIR schemes use one round of
communication, [25, 3, 54]. On the one hand this bound serves to demonstrate that
information-theoretic PIR is probably harder than non-private retrieval. On the other
hand, the wide gap between it and the upper bounds for the information-theoretic
setting serve to underline the paucity of our knowledge concerning PIR lower bounds.
1.7 Related work
1.7.1 Multi-party private computation
We begin by describing a simple model for multi-party private computation. Let f
be a function mapping k inputs to k outputs. The computation involves k parties
p1; : : : ; pk who hold one input each, pj holds aj . The participants wish to compute
a the value of a k-parameter function f at a certain point (a1; : : :; ak ). The j -th
player pj holds only the j -th input aj . The goal is to construct a private protocol
that computes f (a1; : : : ; ak) = (o1; : : : ; ok ) so that pj obtains only the j -th output
oj . By a private protocol we mean (in very loose terms) that a player (or a specied
coalition of players) do not learn from the execution of the protocol any information
which does not follow from its input and output.
PIR is a particular instance of multi-party private computation. In PIR there are
k + 1 parties (user and k servers), the user's input is i, the input of all k servers is
x, and the function f is dened as f (i; x; : : :; x) = (xi; ?; : : :; ?). That is, the user
obtains xi while the servers have no output.
The model of communication assumes that the protocol proceeds in synchronous
communication rounds. In each round any player can send messages to any other
player. The communication is carried out over secure channels, so that only the two
parties exchanging a specic message may learn what it is (or indeed obtain any
information about it).
Dening exactly what privacy means in general is a Herculean task, undertaken
in various papers and theses [5, 61, 18, 43], and is still undergoing changes [19]. In
this work we are interested only in specic instances of private computation, and
specic models of privacy, which we dene more rigorously in following chapters.
In this section we only give a avor of the intricacies that are involved in private
computation.
It is convenient to visualize an evil adversary, lurking o-stage, just waiting to
23
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
pounce on the unwary participant in the protocol and glean some information. Privacy
has to be maintained given a predetermined adversary structure C . That is a collection
of subsets of players, i.e C 2[k] , which may be susceptible to the adversary. He may
corrupt and take-over any such subset during the execution of the protocol.
In the original PIR problem the adversary structure consists of all the subsets of
servers containing exactly one server. In the t-privacy extension [25, 54] C includes
all the subsets of servers containing at most t servers. In the symmetrically private
information retrieval problem [38, 63] C is comprised of all the singletons (single server
or just the user). This last is the more typical scenario of private computation works,
in that there are privacy constraints placed on all the parties. The PIR case, where
the user is allowed to learn more information about x than just xi, is unusual.
The power of the adversary is as important a parameter as the adversary structure. The adversary may be passive (also called honest-but-curious), i.e the parties it
controls do not deviate from the protocol but do attempt to gather as much information as possible. Or it can be active and have the parties it holds under sway deviate
maliciously from the protocol. It can be computationally bounded (for instance, limited to polynomial time) or computationally unbounded. It can also be either static
or adaptive. By static we mean that the adversary decides which of the parties to
corrupt ahead of time. An adaptive adversary can decide which parties to attack
during the execution of the protocol.
Currently PIR schemes consider a passive, static adversary in either of the computationally unbounded [25, 3, 54] or bounded [22, 58, 60, 72, 17] models. the main
reason for assuming a limited adversary is simply that the focus of PIR works so far
has been on eciency and not on strengthening the adversary. It is quite likely that
subsequent works will rectify the situation. However, there is a great dierence between adding the property of being active to that of being adaptive. Works in general
multi-party private computation show that an active adversary signicantly reduces
the size of malicious subsets of players which can be tolerated by a private protocol
[11, 21, 47]. The same is also true of PIR. For instance, given an active adversary
that may control either one of two servers, no PIR scheme even exists. In contrast,
basically the same results that can be proved for static adversaries hold for adaptive
adversaries as well [20]. Furthermore, all the PIR protocols constructed so far are
executed in one round of communication, which makes the question of adaptivity
irrelevant 4.
In this work we adhere to the conventions of other PIR papers and assume that
our adversary is honest-but-curious and static. When assuming that the adversary
is computationally unbounded we say that a private protocol maintains informationThis last is not true for private information retrieval by keywords, as in chapter 4, or of private
storage [66].
4
24
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
theoretic privacy. In the case of a computationally bounded adversary we speak of
computational privacy.
After describing the adversary we are ready to discuss what the privacy of a
protocol means exactly. As we mentioned earlier the notion we wish to capture
is that the adversary will under no circumstances learn any inputs and outputs of
the protocol that do not "belong" to parties he has corrupted. In the honest-butcurious, static and computationally unbounded setting that notion is captured by
the following requirement. Let T 2 C be some subset of players which may be
taken over by the adversary. For any two input k-tuples a; a0 such that aj = a0j and
f (aj ) = f (a0j ) for all j 2 T , the joint view of all parties in T of the protocol when the
input is a is identically distributed to the view when the input is a0. In the passive
and computationally bounded setting the denition changes only slightly. Instead of
requiring the two distributions to be identical as above, we require only that they
be computationally indistinguishable (see chapter 2 for more on the subject). Since
the adversary in PIR is assumed to be static and passive, we adopt this denition of
privacy.
If the adversary might be more powerful, i.e active and/or adaptive the denition
has to change. In this case two models of the protocol's execution are considered.
The real-life model involves the actual execution of the protocol. In the ideal model
an external trusted party is used. All the inputs may be sent to that party, and it can
compute f (a1; : : :; ak ) = (o1 ; : : :; ok ) and distribute each output to the appropriate
player. The protocol is considered private if what the adversary sees in the real-life
execution, while corrupting T , can be eciently simulated if in the ideal protocol the
adversary corrupts that same subset of players T . Thus, the adversary gains no more
knowledge than what could be obtained in an ideal solution.
1.7.2 Instance hiding
The problem of instance hiding was introduced and studied in [1, 7, 8]. A computationally bounded user holds an input (instance) i 2 f0; 1gm and wishes to compute
the value of a known Boolean function f : f0; 1gm ?! f0; 1g on i. Since f may be
hard to compute, the user is allowed to query k computationally unbounded oracles
in order to learn f (i). The only restriction is that i is kept secret from each oracle.
A k-oracle instance hiding problem can be formulated in a way similar to a k-server
information theoretic PIR problem. The servers act as the oracles, f : f0; 1glog n ?!
f0; 1g is given by the data string x, where f (i) = xi. By executing an instance hiding
scheme the user can learn xi while hiding i, as is the requirement in PIR.
The problems dier in two respects. The rst is that in instance hiding, even
though f is possibly dicult to compute, it is publicly known, and thus its structure
may enable the user to form more ecient queries than is the case in PIR, where
25
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
a-priori x may be any string in f0; 1gn . However, currently no such instance hiding
scheme which exploits the specic structure of f is known.
The second dierence between the two problems is that in PIR n is considered a
feasible quantity, while in instance hiding 2m, which is analogous to n, is infeasible.
Therefore, instance hiding schemes limit the user to computation and communication
that are polynomial in jij (or poly-logarithmic in n if viewed in a PIR context). On
the other hand, PIR allows the user poly(n) computation and communication.
Adaptations and improvements of instance hiding schemes were important themes
in the rst PIR work of [25]. The best schemes for k 3 servers in that work are a
variation on distributed polynomial interpolation ideas proposed in [7, 8]. Subsequent
PIR works [3, 54] improved the schemes of [25] for "small" values of k. The O(log n)server PIR schemes which were presented in [25] and are variations of instance hiding
schemes are still the most ecient we know of for that number of servers (their
communication complexity is O(log2 n log log n) and the computation of the user is
poly-logarithmic in n).
1.7.3 Communication complexity problems
Since in instance hiding schemes the focus was on poly-logarithmic communication
and computation by the user, instance hiding works did not provide the best solutions
for a small number of servers. An indication that two server information-theoretic
PIR with sub-linear communication is possible was provided, prior to the rst PIR
publication, in [69]. A problem considered in that work is the following: Two servers
hold the same string x 2 f0; 1gn , and one index each. The rst holds j , and the second
`, where 1 j; ` n. The goal is for each server to send a single message to a user
such that the user learns x(j+`)modn . A solution to this problem can be transformed
into a PIR scheme by having the user randomly select j; ` so that j + ` i mod n,
send j to the rst user and ` to the second, and nally execute the above-mentioned
solution. The O(n log log n= log n) solution in [69] to this protocol can be viewed as
the rst sub-linear communication solution for PIR.
A similar problem which can also be used as a PIR protocol was introduced in
[4]. Here a user holds k indices 1 i1; : : :; ik n, while k servers hold the same
string x 2 f0; 1gn . Each server also holds k ? 1 of the indices that the user has. The
j -th server holds all the indices except for ij . The indices are regarded as log n-bit
binary strings. The goal is for each server to send a single message to the user, so that
the user can obtain xi1:::i , where denotes bit by bit exclusive or. The protocol
presented in [4] provides a solution for the problem with communication complexity
O(knH2 (1=(k+1)), where H2 is the binary entropy function. For example, for k = 2 their
protocol's communication complexity is O(nH2 (1=3)) O(n0:92). Any solution to this
problem can be transformed into a k-server information-theoretic PIR scheme with
k
26
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
an additive communication factor of k(k ? 1) log n. The results of [4] are a signicant
improvement over [69], but still yield PIR schemes that are inferior to any quoted in
PIR works (beginning with [25]).
27
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Chapter 2
Model and Denitions
2.1 The PIR Model
Let DB1; : : : ; DBk be k servers, each holding a copy of an n bit binary string x =
x1 : : :xn (the database). A user, denoted by U , wishes to retrieve one of the bits xi
(1 i n). The servers can communicate with the user, but not with each other.
The following is a denition of private information retrieval in a somewhat restricted setting, which suces for our purposes. We dene a scheme that achieves
information-theoretic privacy, is executed in one round of communication and assures
the user of correct retrieval with probability 1. All information-theoretically private
schemes that appear in the literature are of this form.
Denition 1: A one round, k-server PIR scheme maintaining information-theoretic
privacy P is a trio of algorithms:
Q(1n; i; r): The query algorithm receives as input 1n , the length of the database, i,
the retrieval index and r a random input. Its output is a k-tuple of queries
(q1; : : : ; qk).
A(j; qj ; x): The answer algorithm receives as input a server number j (j 2 [k]), a
query qj and the database x. Its output is an answer aj .
R(1n ; i; r; a1; : : :; ak ): The reconstruction algorithm receives as input the database
length 1n , the retrieval index i, the random input r and the k answers. Its
output is a single bit.
The user is dened by the two algorithms Q and R. The j -th server is dened by the
algorithm Aj which is A(j; ; ) (the algorithm A where the rst argument is restricted
to j ).
28
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
P involves three steps: U uses Q to generate k queries and send one query to each
server (qj to DBj ). Each server replies with an answer (DBj uses Aj to generate aj ).
Finally U uses R to reconstruct xi from the answers.
P must have the following two properties:
Correctness: For all n 2 IN; x 2 f0; 1gn ; i 2 [n] and r, if Q(1n; i; r) outputs
(q1; : : : ; qk) then:
R(1n ; i; r; A1(q1; x); : : :; Ak (qk; x)) = xi:
Privacy: Let Distj (n; i) denote the distribution of the query algorithm's output re-
stricted to its j -th entry as induced by the random choices of r (in other words,
the distribution of qj ). Then, for every i1; i2 and j , where 1 i1 i2 n, and
j 2 [k] we require that
Distj (n; i1) = Distj (n; i2):
Next, we dene private retrieval of blocks. This notion is very similar to PIR.
The only dierence is that x is not necessarily a binary string. Instead, the database
x is made up of n blocks of ` bits each. The retrieved item is now a block of bits
xi 2 f0; 1g` . The denition is extended in a natural manner.
Denition 2 : P is a one round, k-server PIR of blocks scheme maintaining
information-theoretic privacy if it is a trio of algorithms:
Q(1n; 0` ; i; r): The query algorithm receives as input 1n , the number of data blocks,
0` , the length of each block, i, the retrieval index and r, the random input. Its
output is a k-tuple of queries (q1; : : : ; qk ).
A(j; qj ; x): The answer algorithm, receives as input a server number j (j 2 [k]), a
query qj and the database x. Its output is an answer aj .
R(1n ; 0` ; i; r; a1; : : : ; ak): The reconstruction algorithm, receives as input the database
length, a retrieval index, random input and the k answers. Its output is a single
block of ` bits.
The user is dened by the two algorithms Q and R. The j -th server is dened by the
algorithm Aj which is A(j; ; ).
P involves three steps: U uses Q to generate k queries and send one query to each
server (qj to DBj ). Each server replies with an answer (DBj uses Aj to generate aj ).
Finally U uses R to reconstruct xi from the answers.
P must have the following two properties:
29
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Correctness: For all n; ` 2 IN; x 2 f0; 1gn` ; i 2 [n] and r, if the output of the query
algorithm Q(1n ; 0` ; i; r) is (q1; : : : ; qk ) then:
R(1n ; 0` ; i; r; A1(q1; x); : : :; Ak (qk ; x)) = xi:
Privacy: Let Distj (n; i) denote the distribution of the query algorithm's output re-
stricted to its j -th entry as induced by the random choices of r (in other words,
the distribution of the qj s). Then, for every i1; i2 and j , where 1 i1 i2 n,
and j 2 [k] we require that
Distj (n; `; i1) = Distj (n; `; i2):
2.2 Pseudo-random generators
We start by dening computational indistinguishability and pseudo randomness. We
use both the standard version of the denitions [44, p. 85], and a quantitative version.
We include this second case in order to enable an exact analysis of the correlated
generator we construct in Chapter 3in terms of the pseudo random generator whose
existence we assume. We begin by dening computational indistinguishability of two
probability ensembles.
Denition 3: A probability ensemble Y is a sequence of probability distributions
Y = fYngn2IN , where each distribution Yn ranges over some domain Domn .
Denition 4: Let `ength : IN ?! IN be a monotonously non-decreasing function
satisfying `ength(n) n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let Y = fYn gn2IN
and Z = fZn gn2IN be two probability ensembles such that the domain of both Yn
and Zn is f0; 1g`ength(n) . We say that Y and Z are TD (n); "(n){computationally
indistinguishable if the following holds. For every distinguisher D, a probabilistic
algorithm whose running time is bounded by TD(), there is an n0 such that for all
n n0
jPr[D(Yn ; 1n ) = 1] ? Pr[D(Zn ; 1n ) = 1]j < "(n)
where the probability is taken over the distributions Yn , Zn , and over the coin tosses
of D.
We say that Y and Z are non-uniformly TD (n); "(n){computationally indistinguishable if for every (possibly non-uniform) family of circuits, fCircngn2IN , where
the size of the n-th circuit Circn is bounded by TD (n), there is an n0 such that for all
n n0
jPr[D(Yn ; 1n ) = 1] ? Pr[D(Zn ; 1n ) = 1]j < "(n)
where the probability is as above.
30
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Notice that by giving the distinguisher D the input 1n in addition to the \real
input" (a `ength(n) bit string) we make the requirement more stringent. This is
because `ength(n) could be substantially smaller than n, yet we let D run TD (n)
steps, not just TD (`ength(n)).
Denition 5: Let `ength : IN ?! IN be a monotonously non-decreasing function
satisfying `ength(n) n. Let T be a family of functions of the form TD : IN ?! IN ,
and let E be a family of functions of the form " : IN ?! [0; 1]. Let Y = fYn gn2IN and
Z = fZn gn2IN be two probability ensembles such that the domain of both Yn and Zn
is f0; 1g`ength(n) . We say that Y and Z are T ; E {computationally indistinguishable if
for every TD 2 T and " 2 E , Y and Z are TD (n); "(n){computationally indistinguishable. We say that Y and Z are computationally indistinguishable if they are T ; E {
computationally indistinguishable, where E = fn?c j c 2 IN g and T = fnc j c 2 IN g:
Now we turn to the special case in which one of the ensembles is the sequence of
uniform distributions over f0; 1g`ength(n) .
Denition 6: Let `ength : IN ?! IN and let Uni = fUningn2IN be a probability
ensemble. We say that Uni is the uniform ensemble corresponding to `ength if for
every n 2 IN the distribution Unin is the uniform distribution on the set f0; 1g`ength(n) .
Denition 7: Let `ength : IN ?! IN be a monotonously non-decreasing function
satisfying `ength(n) n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let Y =
fYn gn2IN be a probability ensemble such that the domain of Yn is f0; 1g`ength(n) and
let Uni = fUningn2IN be the uniform ensemble corresponding to `ength. We say
that Y is TD(n); "(n){pseudo random if Y and Uni are TD (n); "(n){computationally
indistinguishable. We say that Y is non-uniformly TD (n); "(n){pseudo random if Y
and Uni are non-uniformly TD (n); "(n){computationally indistinguishable. Let T be
a family of functions of the form TD : IN ?! IN , and let E be a family of functions
of the form " : IN ?! [0; 1]. The ensemble Y is called T ; E {pseudo random if Y and
Uni are T ; E {computationally indistinguishable. The ensemble Y is simply called
pseudo random if Y and Uni are computationally indistinguishable.
We are now ready to dene pseudo-random generators.
Denition 8: Let : IN ?! IN be a monotonously non-decreasing function, satisfying (n) < n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. A deterministic algorithm
GEN is called a TD(n); "(n){pseudo random generator if it has the following properties:
Stretching: For every n 2 IN , GEN receives as input s 2 f0; 1g(n), the seed, and
1n , the output length, and produces as output a string GEN (s; 1n ) 2 f0; 1gn .
31
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Computational indistinguishability: The ensemble fGEN (s; 1n )gn2IN (induced
by choosing s 2 f0; 1g(n) uniformly at random) is TD (n); "(n){pseudo random
(`ength(n) = n).
A TD (n); "(n){pseudo random generator GEN is called a non-uniformly TD (n); "(n){
pseudo random generator if it has the property of:
Non-uniform computational indistinguishability: The ensemble of distributions fGEN (s; 1n )gn2IN , (induced by choosing s 2 f0; 1g(n) uniformly at random) is non-uniformly TD (n); "(n){pseudo random (`ength(n) = n).
Let T be a family of functions of the form TD : IN ?! IN , and let E be a family of
functions of the form " : IN ?! [0; 1]. We say that GEN is a T ; E {pseudo random
generator if it is a TD (n); "(n){pseudo random generator for every TD 2 T and " 2 E .
Notation 1:
The running time of GEN (s; 1n ) is denoted by TGEN (n).
We say that the stretch of GEN is (n) to n.
The parameter n is called the security parameter. (Notice that this diers from
the standard denition in which (n) is called the security parameter [44, p.
76].)
Denition 9 : We say that a T ; E {pseudo random generator GEN is simply a
pseudo random generator if it has the following properties:
There exists a constant ctime such that TGEN (n) nctime .
There exists a constant c`en > 1 such that the seed length is (n) = n1=c`en .
The ensemble fGEN (s; 1n )gn2IN , induced by choosing s 2 f0; 1g(n) uniformly
at random, is pseudo random (`ength(n) = n).
Denition 9 coincides with the usual denition for pseudo random generators. Informally it states that the generator is a polynomial time algorithm such that any
polynomial time bounded probabilistic algorithm cannot gain an inverse polynomial
(in n) advantage in distinguishing between the uniform distribution and the distribution of the generator's outputs. It is known [50, 51, 59] that the existence of such a
generator is equivalent to the existence of one way functions.
32
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
2.3 The CPIR Model
After dening the notion of computational indistinguishability we are ready to proceed and dene one round, computationally private information retrieval schemes.
Intuitively, the dierence between information-theoretic privacy and computational
privacy is that the distribution of queries received by a server when the user wishes
to retrieve xi1 is indistinguishable from that of the queries when the user is retrieving
xi2 (as opposed to being identical).
Denition 10: A function i : IN ?! IN is called an index function if for every
n 2 IN we have 1 i(n) n.
We use the same notation as in denition 1. In particular, we denote by Distj (n; i)
the distribution of the j -th query (qj ) as induced by the random choices of r.
Denition 11 : Let TD : IN ?! IN , and " : IN ?! [0; 1]. P is a one round,
k-server PIR scheme maintaining TD(n); "(n){computational privacy if it is a PIR
scheme according to Denition 1, except for the privacy condition which is changed
to:
Privacy: Let n 2 IN and let qlen : IN ?! IN be the query length function. For every
i 2 [n] and for every j 2 [k] the j -th output of the query generator Q(n; i; r)
(denoted qj ) is of the same length qlen(n).
For every j (j 2 [k]) and for every two index functions i1 ; i2 : IN ?! IN
the two probability ensembles Distj;i1 = fDistj (n; i1(n))gn2IN and Distj;i2 =
fDistj (n; i2(n))gn2IN are TD (n); "(n){computationally indistinguishable (in this
case `ength(n) = qlen(n)).
We say that P is a one round, k-server PIR scheme maintaining non-uniform TD (n),
"(n) computational privacy if for every j , i1() and i2() as above, the ensembles
Distj;i1 , Distj;i2 are non-uniformly TD(n); "(n){computationally indistinguishable.
We say that P is a one round, k-server PIR scheme maintaining computational
privacy (in short a computationally private information retrieval scheme) if for every j , i1() and i2() as above, the ensembles Distj;i1 , Distj;i2 are computationally
indistinguishable.
Discussion: In a way, Denition 11 seems unsatisfactory. Given a computationally
bounded distinguisher D and two sequences of query distributions, it ensures that for
a large enough n0 the algorithm D can't distinguish between the sequences. In other
words, it can't distinguish between a query retrieving i1(n) and a query retrieving
i2(n). However, the denition does not state that there is a single n0 such that for
33
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
every pair of index functions i1; i2 the query distributions for i1(n) and i2(n), where
n > n0 are indistinguishable by D. Our next denition addresses this problem and
may thus be more attractive.
Denition 12: Let `ength : IN ?! IN be a monotonously non-decreasing function,
satisfying `ength(n) n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let Y =
fYn;i gn2IN ;i2[n] be a probability ensemble such that each Yn;i ranges over f0; 1g`ength(n) .
We say that Y is TD (n); "(n){index indistinguishable if for every distinguisher D, a
probabilistic algorithm whose running time is bounded by TD (), there is an n0 such
that for all n n0 and for all i1 ; i2 2 [n]
jPr[D(Yn;i1 ; 1n) = 1] ? Pr[D(Yn;i2 ; 1n) = 1]j < "(n)
where the probability is taken over the ensembles Yn;i1 , Yn;i2 , and over the coin tosses
of D. P is a one round, k-server PIR scheme maintaining TD(n); "(n){computational
privacy if it is a PIR scheme according to Denition 1, except for the privacy condition
which is changed to:
Privacy: For every j 2 [k], the ensemble Distj = fDistj (n; i)gn2IN ;i2[n] is index
indistinguishable.
The following lemma shows that both denitions can be used.
Lemma 1: Denition 11 and Denition 12 are equivalent.
Proof: We show that any protocol that is a one round, k-server CPIR scheme
according to one denition is also a one round, k-server CPIR scheme according to
the second denition.
Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let P be a protocol between
DB1; : : : ; DBk and U that is TD (n); "(n){computationally private according to Definition 12. Let D be a distinguisher running in time TD (). Let i1; i2 be two index
functions. By assumption there exists an n0 such that for any n n0 and for
i1(n); i2(n) 2 [n] (similarly for any two other elements in [n]):
Pr[D(Yn;i1 (n); 1n ) = 1] ? Pr[D(Yn;i2 (n); 1n ) = 1] < "(n):
Therefore P is TD(n); "(n){computationally private according to Denition 11.
Now let P be TD (n); "(n){computationally private according to Denition 11.
Assume, towards a contradiction, that P is not TD (n); "(n){computationally private
according to Denition 12. In other words, there is a distinguisher D running in time
TD (), there is an innite sequence of natural numbers n1; n2; : : : and for every nj in
the sequence there is a pair of indices i1(nj ); i2(nj ) 2 [nj ] such that
Pr[D(Yn;i1 (n ); 1n ) = 1] ? Pr[D(Yn;i2 (n ); 1n ) = 1] "(n):
j
j
34
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
However, the two innite sequences (i1(n1); i1(n2); : : :) and (i2(n1); i2(n2); : : :) can
both be easily extended into two index functions i1; i2 : IN ?! IN . These two index
functions contradict the assumption that P is TD(n); "(n){computationally private
according to Denition 11.
In this work we restrict ourselves to PIR schemes that have certain properties
which we discuss below. First and foremost we only discuss two server schemes, that
is k = 2. All other properties are shared by both the computationally private schemes
that we construct, and the information theoretically private schemes that we use.
1. Symmetry of the servers, which means that the following conditions hold:
(a) For every n 2 IN and for every 1 i n, the two distributions D1(n; i)
and D2(n; i) are identical. In other words, the queries sent to DB1 and
DB2 are drawn from the same distribution.
(b) Let A1(qu; x) and A2(qu; x) denote the responses of DB1 and DB2 to the
query qu. Then for every qu, A1(qu; x) = A2(qu; x).
It is easy to transform any one round PIR scheme P into a scheme Q that has
this property. The user in Q ips a coin and sends the result to both servers.
If a server receives 0 it maintains its identity, and if 1 it \switches its identity"
(DB1 is considered to be DB2 and vice versa). Now the parties execute the
scheme P . Q maintains symmetry of the servers at the cost of two extra bits
of communication.
2. Each server replies to any query of the right length q(n), even if the query does
not have the correct form. Again, transforming a scheme which does not have
this property to one that does is easy.
2.4 Final Notation
We denote the problem of allowing the user to privately retrieve any of the n bits
that DB1 and DB2 hold by P IR(n).
We denote by B2 the best scheme (in terms of communication complexity) that
solves P IR(n), maintains information-theoretic privacy, and is executed in one round
of communication. The most ecient scheme of this type known to date [25] has
communication complexity O(n1=3).
If x; y denote two binary strings of the same length, then x y denotes their
bitwise exclusive or (assuming that they are of identical length). For any two strings
35
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
x; y we denote by xky the concatenation of y to x. Given a set S and an element a
we denote:
(
L
S a = SS [n ffaagg ifif aa 262 SS
We denote by e`(n) (1 ` n) the `-th unit vector of length n { an n bit binary
string with 0 entries everywhere except 1 in the `-th position.
2.5 Symmetrically private information retrieval
In this variant of PIR, introduced in [38], the data privacy is also protected. The
user is allowed to retrieve a single data item but no other information (such as partial
information on subsets of data items). The original denition appears in [38] and
is analogous to PIR denitions. We slightly generalize it by dening symmetrically
private information retrieval (SPIR) of blocks instead of single bits. We dene both
information-theoretically private and computationally private SPIR schemes.
An important dierence between PIR and its symmetric counterpart SPIR is that
multi-server SPIR schemes (k > 1) require that the servers have a source of shared
randomness. We model this by adding (the same) random input rs to all the servers,
while denoting the user's random string by ru.
Denition 13 : A one round, k-server SPIR of blocks scheme P maintaining
information-theoretic privacy is a trio of algorithms:
Q(n; `; i; ru): The query algorithm outputs a k-tuple of queries (q1; : : : ; qk ).
A(j; qj ; xkrs): The answer algorithm outputs an answer aj .
R(n; `; i; ru ; a1; : : :; ak ): The reconstruction algorithm outputs a single block of ` bits.
We say that P maintains information-theoretic privacy if it follows the same steps
as the one round PIR scheme presented in Denition 1, satises the correctness and
privacy requirements of that denition and furthermore satises a third requirement:
Data privacy: Let Q be a deterministic query algorithm (modeling a possibly dishonest user U that does not run Q). Let A(x) denote the distribution of the
answers of all servers, where Q generates the queries (q1; : : :; qk ). Then, there
exists an index i 2 [n] such that for any x; x0 2 f0; 1g`n if xi = x0i (the i-th block
is equal) then A (x) and A(x0) are identical (in other words, even a dishonest
user is able to learn only that one element, xi and nothing else).
36
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Let TD : IN ?! IN , and " : IN ?! [0; 1]. We say that P maintains TD ; "{
computational security if it follows the same steps as the one round CPIR scheme
presented in Denition 11, satises the correctness and privacy requirements of that
denition and furthermore satises a third requirement:
Data privacy: Let Q be a deterministic query algorithm. Let A(x) denote the distribution of the answers of all servers, where Q generates the queries q1; : : :; qk .
Then, for every distinguisher D running in time TD () there is a c0 , such that
for every n; ` 2 IN; n` c0 there exists an index i 2 [n] such that for any
x; x0 2 f0; 1g`n if xi = x0i (the i-th block is equal) then
Pr[D(A(x); 1n` ) = 1] ? Pr[D(A(x0); 1n` ) = 1] < "(n`)
where the probability is taken over the distributions A(x); A (x0), and over the
coin tosses of D.
2.6 General multi-party computation denitions
In this section we dene private multi-party computation in a much broader sense
than we did in the PIR denitions. We used a limited framework for PIR because
all the known results t nicely within this framework and furthermore the proofs are
simpler. We do emphasize that the PIR denitions we presented are a specic case
of the multi-party computation denitions we now show. Any PIR scheme which
is private according to the PIR denitions is also private according to the general
denitions.
We adopt the framework presented by Canetti in [19], which itself drew on a
variety of sources [5, 18, 43, 61]. The essential ingredient in dening the security of
a protocol is comparing an adversary's view of the "real" protocol to his view during
the execution of an "ideal" protocol. The adversary we dene can be either passive or
active1, and either computationally unbounded (for the case of information-theoretic
privacy) or computationally bounded. Our denitions refer explicitly to an active
adversary, since a passive adversary is a particular case of an active one who decides
to follow the protocol. We proceed to formally dene the various aspects of a private
multi-party computation in our context.
2.6.1 Basic denitions
Function: The goal of the protocol is to compute a k-input and k-output probabilistic
We sometimes refer to protocols that withstand an active adversary as "secure", while keeping
the label "private" for those that deal only with a passive adversary.
1
37
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
function2 f . The notation we use is f : Domk Rf ?! E k , where D is the domain
and E the range for each party's input and output respectively. Rf denotes the
random input of the function.
The participants: are k parties which are modeled as k interactive and probabilistic Turing machines P1; : : :; Pk . Pj holds an input aj 2 Domj and some random
input rj .
Communication: All the messages are assumed to be sent over secure channels.
Real-life model: The real-life adversary is characterized by an interactive Turing
machine A which describes the behavior of corrupted parties and by the adversary
structure C 2 2[k]. The adversary can corrupt any subset T 2 C . It starts o the
protocol with the inputs of the corrupted parties and an auxiliary input z, which
models information learned during the execution of previous protocols and is especially important for composing protocols. The computation proceeds in rounds of
communication. At each such round the adversary outputs the messages sent by the
corrupted parties (possibly after receiving the messages sent by uncorrupted parties
in the same round). At the end of the computation all the parties locally generate
their output. The adversary also outputs its entire view of the protocol's execution.
That view includes the inputs it started with and the messages sent and received by
the corrupted parties.
We use the following notation for the various random variables involved in the
protocol. ADV (A; a; z; r) denotes the output of adversary A and EXEC (A; a; z; r)i
denotes the output of Pi . a = a1; : : : ; ak denotes the input vector, r = r1; : : : ; rk
denotes the random input vector and z denotes the adversary's auxiliary input. Let
EXEC (A; a; z; r) = (ADV (A; a; z; r); EXEC (A; a; z; r)1; : : :; EXEC (A; a; z; r)k ):
Finally, let EXEC (A; a; z) denote the random variable describing EXEC (A; a; z; r),
where r is chosen uniformly at random.
Ideal model: The adversary in the real model is again characterized by an interactive Turing machine S (which is also randomized) and by the adversary structure
A. The model also includes an external trusted party T .
Each of the parties Pi (1 i n) begins the computation with an input ai (they
do not need a random input here). The adversary S begins the computation with the
inputs of the corrupted parties, a random input and the auxiliary input. Each party
hands its input to T (the inputs of corrupted parties may be arbitrarily changed by
the adversary if it is active). Let b = b1; : : :; bk be the input vector received by T .
The trusted party proceeds to compute f (b) and sends f (b)i to Pi . Each Pi outputs
the value received (corrupted parties may output some arbitrary string). S outputs
2
In PIR the computed function is not probabilistic but in joint generation of RSA keys it is.
38
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
all the information gathered during the computation: its inputs, the inputs of the
corrupted parties and the function values received by the corrupted parties.
We use the following notation: ADV (S ; a; z; r) denotes the output of adversary
S and IDEAL(S ; a; z; r)i denotes the output of Pi . a = a1; : : : ; ak, denotes the input
vector, z denotes the adversary's auxiliary inputr = r denotes the random input of
the adversary, r. Let
IDEAL(S ; a; z; r) = (ADV (S ; a; z; r); IDEAL(S ; a; z; r)1; : : :; IDEAL(S ; a; z; r)k ):
Finally, let IDEAL(S ; a; z) denote the random variable describing IDEAL(S ; a; z; r),
where r is chosen uniformly at random.
Comparing computations: We require that for every adversary A in the reallife model there exists an adversary S in the ideal model that emulates A. That is,
the output distribution of the real-life protocol, EXEC (A; a; z), is similar to that
of the output distribution of the ideal protocol, IDEAL(S ; a; z). This similarity is
interpreted as identity of distributions (in the information-theoretic setting) or indistinguishability of probability ensembles (in the computational setting). Furthermore,
we require that the computational complexity of S be comparable to (usually that
means polynomial in) the computational complexity of A3. The formal denition in
the information-theoretic setting follows.
Denition 14: Let f : Domk Rf ?! E k and let be a protocol for k parties.
We say that C -securely computes f in the information-theoretic setting if for any
real-life adversary A with adversary structure C there exists an ideal model adversary
S whose running time is polynomial in the running time of A, such that for every
input vector a, and every auxiliary input z
EXEC (A; a; z) = IDEAL(S ; a; z):
In order to provide the denition for the computational setting, we have to dene
appropriate probability ensembles. Let Domk Domk denote all the input vectors
that are encoded by bits. We view EXEC (A; a; z) as an ensemble of probability
distributions. For every 2 IN we denote by EXEC (A; a; z) the restriction of
EXEC (A; a; z) to all the inputs a 2 Domk . We abuse the notation somewhat here
and use
EXEC (A; a; z) = fEXEC(A; a; z)g2IN
to denote an ensemble of distributions (EXEC (A; a; z) was originally used as a distribution and EXEC (A; a; z) as a conditional distribution.). Similarly we dene
IDEAL(S ; a; z) as an ensemble of probability distributions. The formal denition in
the computational setting follows.
3
On this and other subtle points in the denition see more in [19].
39
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Denition 15: Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let f : Domk Rf ?!
E k and let be a protocol for k parties. We say that C -securely computes f in
the computational setting if for any real-life adversary A with adversary structure
C there exists an ideal model adversary S whose running time is polynomial in the
running time of A, such that for every input vector a, and every auxiliary input z
The ensembles EXEC (A; a; z) and IDEAL(S ; a; z) are TD(); "(){computationally
indistinguishable.
2.6.2 Composition of protocols
An important requirement we want private protocols to fulll is that of modular
composition. Given a simple task for which we designed a private protocol, we can
solve a more dicult problem by using the rst protocol as a sub-routine. In order to
formalize this notion we dene a semi-ideal model in which a function g is computed
by using protocols for m functions f1; : : :; fm as sub-routines. A trusted party can
compute each of f1; : : : ; fm but nothing else. The semi-ideal model is a stepping stone
between the real-life model, in which k parties compute g by using as sub-routines
protocols that compute f1; : : :; fm, and the ideal model in which g is computed by a
trusted party.
The semi-ideal model: We start by considering the real-life model of subsection
2.6.1 for the computation of a function g. the model is augmented by a trusted party T
that computes the functions f1; : : :; fm. At special communication rounds a function
fj (1 j m) is specied and an ideal model computation of it takes place. That is,
the parties hand the inputs (for fj ) to T , who computes the value of fj at the given
point and returns the appropriate output to each party.
Let f = (f1; : : : ; fm). We refer to the semi-ideal model we specied as the f-ideal
model. Let EXEC (A; a; z; f) denote the random variable describing the output of
the f-ideal model, with adversary A, input a, auxiliary input z, and access to a trusted
party that can compute any of f1; : : : ; fm. A protocol in this model is not a real-life
protocol as it uses a trusted party. however, its security is dened in comparison with
the ideal model computation of g as in subsection 2.6.1.
Denition 16: Let g : Domk Rf ?! E k and let be a protocol for k parties in
the f-ideal model. We say that C -securely computes g in the information-theoretic
(computational) setting in the f-ideal model if for any adversary A in the f-ideal
model, with adversary structure C there exists an ideal model adversary S whose
running time is polynomial in the running time of A, such that for every input vector
a, and every auxiliary input z The distributions EXEC (A; a; z) and IDEAL(S ; a; z)
are identical (the ensembles EXEC (A; a; z) and IDEAL(S ; a; z) are computationally
40
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
indistinguishable4 ).
Real-life model: We replace the trusted party in the f-ideal model with m proto-
cols 1; : : : ; m. The protocol j securely computes the function fj in the appropriate
setting. That is either computationally or information-theoretically as required, and
given the same adversary structure C . Each call to T is replaced by an invocation
of one of the protocols 1; : : :; m. While j is running, the protocol remains inactive. The output of j is treated by each party as the value returned by T . Let
= (1; : : : ; m). We denote by the protocol in which each evaluation of fj by
T is replaced by an invocation of j .
The point of the modular composition concept is the following theorem proved
in [19]. Informally, it states that a secure protocol in the semi-ideal model is also
secure in the real-life model, if the trusted party (that computes only f1; : : :; fm) is
replaced by secure protocols. The only limit is that at most one sub-routine protocol
is invoked at each communication round.
Theorem 1: (Canetti [19]) let f1; : : : ; fm; g be k-input and k-output functions.
Let be a k party protocol that C -securely computes g in the information-theoretic
(computational) setting in the f-ideal model. Assume that no more than one ideal
evaluation call is made at each communication round. Let 1 ; : : :; m be k-party protocols that C -securely compute f1; : : : ; fm respectively in the information-theoretic (computational) setting. Then, the protocol C -securely computes g in the informationtheoretic (computational) setting.
2.7 private information retrieval by keywords
The PrivatE information Retrieval by KeYwords (PERKY) problem is dened informally as follows. Let DB1 ; : : :; DBk be k non-communicating servers, each holding a
copy of n binary strings s1 : : : sn 2 f0; 1g` . A user, U , holds a binary string w 2 f0; 1g`
and wishes to nd out whether w = si for some i 2 [n]. No server may learn any
information about w. Given such a protocol it is possible (with a slight overhead) to
retrieve any data associated with w (such as its address in the database).
Supercially PERKY seems very similar to PIR and the formal denitions should
therefore also be similar. However, there are some important dierences in the solutions to the two problems. The rst dierence is that unlike PIR schemes, the PERKY
schemes we construct employ multiple rounds of communication. The second is that
the solutions we present are reductions to PIR schemes and are dependent on the
We abuse the notation here, denoting distributions and ensembles identically, in the same way
we did in subsection 2.6.1.
4
41
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
security of these schemes. We therefore require modular composition theorems. In
order to formally dene the problem we adopt the framework of multi-party protocols
as presented in section 2.6.
Notation 2: For any ` 2 IN the function f` : f0; 1g` 2f0;1g ?! f0; 1g is dened
by:
(
f`(w; fs1; : : :; sng) = 10 ifif ww 262 ffss1;; :: :: :;:; ssn gg
1
n
Denition 17: The private information retrieval by keywords problem is parame`
terized by 3 natural numbers `; n; k and is comprised of the following elements:
Parties: A user U and k servers DB1; : : : ; DBk .
Function: The k + 1 party function computed receives as input w 2 f0; 1g` and k
copies of s1; : : : ; sn , n binary strings, each of length `. The output of the function
is (f` (w; fs1; : : : ; sng); ?; : : :; ?), that is, only the user receives the result.
Adversary structure: is the set C = ffDB1 g ; : : :; fDBk gg.
Adversary: The adversary is considered to be passive (honest but curious), and is
either computationally bounded or not.
The symmetrically private information retrieval by keywords problem has the
same relation to PERKY that SPIR has to PIR. That is, the user learns whether
w 2 fs1; : : :; sng but is not allowed to gain any other information about the database.
As in SPIR, in any multi-server scheme, the servers share a source of randomness,
which we model as a string rs that is concatenated to the database.
Denition 18: The symmetrically private information retrieval by keywords problem is parameterized by 3 natural numbers `; n; k and is comprised of the following
elements:
Parties: A user U and k servers DB1; : : : ; DBk .
Function: The same function as in Denition 17.
Adversary structure: is the set C = ffUg ; fDB1 g ; : : :; fDBk gg.
Adversary: The adversary is considered to be passive if it controls one of the servers
but can be either passive or active if it controls the user. The adversary is either
computationally bounded or not.
42
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
This denition does not fall completely within the framework described in section
2.6. Hitherto we assumed that the adversary is either active or passive regardless of
the corrupted parties. However, here we consider protocols in which the user may be
active if it corrupts a certain party (the user), but is not allowed to be active if any
other party is corrupted.
This is not a problem because of the adversarial setting we allow. Since the
adversary is non-adaptive and may control only a single party, it has to decide prior
to the execution of the protocol which party is to be corrupted. Thus we can view
our protocols as withstanding two types of adversaries and adversary structures. The
rst is a passive adversary with adversary structure C = ffDB1 g ; : : :; fDBk gg. The
second is an active adversary with adversary structure C = ffUgg.
Notation 3: In SPIR and SPERKY schemes we refer by user privacy to the property
of maintaining privacy versus an adversary who corrupts one of the servers. We
refer by data privacy to the property of maintaining privacy against an adversary
who controls the user.
Notation 4:
PIR: Let the problem of privately retrieving a block of ` bits, from a database
of n blocks held by k non-communicating servers be denoted by PIR(`; n; k).
The problem PIR(`; n; k) where ` = 1 is denoted by PIR(n; k). SPIR(`; n; k)
(SPIR(n; k)) denote the problem of symmetrically private information retrieval
for the same parameters as in PIR(`; n; k) (PIR(n; k)).
PERKY: PERKY(`; n; k) (PERKY(n; k)) denotes the problem of private information retrieval by keywords where the database is held by k servers and is comprised of n keywords, each of length `. SPERKY(`; n; k) (SPERKY(n; k)) denotes the problem of symmetrically private information retrieval by keywords
for the same parameters as in PERKY(`; n; k) (PERKY(n; k)).
Notation 5 : Let P be a communication protocol that solves one of the problems
PIR(`; n; k), SPIR(`; n; k), PERKY(`; n; k) or SPERKY(`; n; k). We denote its communication complexity by CP (`; n; k). If ` = 1 we denote the communication complexity of P by CP (n; k). On occasion we distinguish between the user's communication
complexity denoted by P (`; n; k) (P (n; k) if ` = 1), and a server's communication
complexity denoted by P (`; n; k) (P (n; k) if ` = 1).
Notation 6: Let P be a one round PIR or SPIR scheme. We denote by QP ; AP
and RP the query, answer and reconstruction algorithms of P .
43
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
2.8 Joint generation of RSA keys
Recall the denitions of the RSA cryptosystem [70]. The system includes a public
key, which is a pair N; e and a private key d. N is called the RSA modulus and is
a product of two primes, N = PQ. e is called the public exponent (or encryption
exponent) and is chosen so that gcd((N ); e) = 1, where denotes Euler's totient
function ((n) = jm1 m < n; gcd(m; n) = 1j). d is called the private exponent (or
decryption exponent) and is chosen so that de 1 mod (N ). The idea in joint
generation of RSA keys is that k parties execute a multi-party protocol and jointly
produce the public key N; e (which everyone
receives) and share the private key. That
P
k
is, the j -th party holds dj such that j=1 dj = d. No t-coalition of the parties can
get any information on d through their participation in the protocol5.
We focus on the case of 2 parties with privacy threshold t = 1. The formal
denition follows.
Denition 19 : Joint generation of RSA keys by two parties is comprised of the
following elements:
Parties: There are 2 parties, Alice and Bob.
Function: The only input is a security parameter . Alice's output is N; e; da and
Bob's output is N; e; db . The public modulus N is chosen uniformly at random6
out of all the products of two primes N = PQ such that the binary representation
of P and Q is of length =2. e is any number such that gcd((N ); e) = 1. da and
db are distributed uniformly at random among all pairs such that da + db = d7.
Adversary structure: is the set C = ffAliceg ; fBobgg.
Adversary: The adversary is assumed to be passive and computationally bounded.
Privacy is maintained in the computational sense. Given e a computationally unbounded party
can compute the unique corresponding d.
6 The distribution achieved in all the known protocols is not exactly uniform but is "close enough",
see chapter 5.
7 Once again, our protocol does not achieve exactly the desired distribution, but one that is "close
enough".
5
44
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Chapter 3
Computationally private
information retrieval
In this chapter we investigate the notion of a correlated pseudo-random generator and
present a construction of such a generator. We then proceed to use the correlated
generator to construct computationally private information retrieval schemes. We
show two such schemes: one which is a direct consequence of our correlated generator,
and another slightly more ecient scheme, which uses similar ideas. Both schemes
are very ecient in terms of communication complexity.
3.1 Correlated Pseudo Randomness
In this section we formally dene the notion of a \correlated pseudo random generator", and then describe a construction of such a generator. We are motivated by
the problem of sending two n bit strings, one to each of two parties, such that the
following requirements are met:
The two n bit strings dier at exactly one bit, whose location is part of the
input of the problem.
The communication complexity is as low as possible.
The communication received by a single party (which describes one n bit string)
reveals no information, in the computational sense, about the index in which
the two strings dier.
A correlated pseudo-random generator consists of two succinct representation
generators SREP1 and SREP2 , and an expansion algorithm G. The generators
SREP1 ; SREP2 receive as input an index ` 2 [n] and produce two correlated short
45
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
strings u and w, which are of length (n) < n. Applying G to these two \succinct
representations", one gets longer strings G(u; 1n ) and G(w; 1n ), both of length n.
These two strings dier at exactly one bit whose location is `.
The strings u and w are referred to as succinct representations, because they
fully describe the n bit strings G(u; 1n) and G(w; 1n ), but are much shorter. An
important property of SREP1 (SREP2 ) is, informally speaking, that its output u
(w) is distributed pseudo-randomly for any input1 `.
The construction of the correlated pseudo random generator we describe is based
on any \conventional" pseudo random generator GEN .
3.1.1 Denitions
Denition 20: Let : IN ?! IN so that (n) < n for every n, let TD : IN ?! IN ,
and let " : IN ?! [0; 1]. A TD(n); "(n) correlated pseudo random generator consists
of three algorithms: two succinct representation generators SREP1 ; SREP2 and an
expansion algorithm G.
SREP1 ; SREP2 get inputs 1n , ` (1 ` n), and r (a random input of appropriate length) and produce as output u = SREP1 (1n ; `; r), w = SREP2 (1n ; `; r),
both of length (n).
G gets as input 1n and a string z of length (n), and outputs a string G(z; 1n )
of length n.
The succinct representation generators and the expansion algorithm should satisfy the
following properties:
correlation: For any n 2 IN; ` 2 [n] and r, the n bit strings G(SREP1 (1n ; `; r); 1n )
and G(SREP2 (1n; `; r); 1n ) dier at exactly the `-th bit.
TD (n); "(n) index indistinguishability: For any index function `() the ensemble
of distributions fSREP1 (1n ; `(n); r)gn2IN (r is chosen at random) is TD (n),
"(n){pseudo random and the same holds for fSREP2(1n ; `(n); r)gn2IN (in this
case `ength(n) = (n) ).
Eciency: SREP1 (1n ; `; r) and SREP2(1n ; `; r) can be computed in poly(n) many
steps.
It would suce if the output distribution of SREP1 given as input `1 and its output distribution
given as input `2 would be indistinguishable (for large enough values of n). Our construction satises
the more stringent requirement that each of these distributions can not be distinguished from the
uniform distribution.
1
46
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
The correlated generator is called pseudo random if the TD(n); "(n) index indistinguishability property is replaced by
Standard index indistinguishability: For any index function `() the two ensembles of distributions fSREP1(1n ; `(n); r)gn2IN and fSREP2 (1n ; `(n); r)gn2IN
(where r is chosen at random) are pseudo random.
The generator is non-uniformly TD (n), "(n){pseudo random if the TD (n); "(n)
index indistinguishability property is replaced by
Non-uniform TD (n); "(n) index indistinguishability: For any index function `
the ensembles fSREP1 (1n; `(n); r)gn2IN and fSREP2 (1n ; `(n); r)gn2IN (where r
is chosen at random) are non-uniformly TD(n); "(n){pseudo random.
We now spell out some of the attributes of the above denition. The length of
the succinct representations (n) is required to be smaller then n. The reason is that
(n) is the communication complexity of our CPIR scheme and is the main parameter
we wish to minimize. On the other hand the length of the random input, r, is not
restricted. While in general we would like this quantity to be substantially shorter
than n, we do not require this explicitly as a part of the denition.
We remark that property Index indistinguishability, which asserts that the
ensembles fSREP1(1n ; `(n); r)gn2IN and fSREP2 (1n ; `(n); r)gn2IN (where r is chosen
at random) are pseudo random, does imply in particular that for dierent `1() and
`2(), the corresponding ensembles are indistinguishable. Finally, it seems natural to
make the stronger requirement that the correlated generator achieve the following:
Pseudo random output: for any index function ` : IN ?! IN the probability ensemble fG(SREP1 (1n ; `(n); r); 1n )gn2IN , where r is chosen uniformly at random,
is TD(n); "(n){pseudo random, and likewise for fG(SREP2 (1n ; `(n); r); 1n )gn2IN .
Let GEN (s; 1n ) be any TD(n); "(n){pseudo random generator with stretch (n)
to n. Given the correlated generator (SREP1; SREP2 ; G) it is easy to construct a
dierent correlated generator (SREP10 , SREP20 , G0) that has this third property. The
succinct representation generators SREP10 , SREP20 receive as input (1n ; `; rks) where
n; `; r are as described earlier and s 2 f0; 1g(n) is chosen uniformly and independently of all the other elements. The output of the succinct representation generators
is SREP10 (1n ; `; rks) = SREP1 (1n ; `; r)ks, SREP20 (1n ; `; rks) = SREP2(1n ; `; r)ks.
The output of the expansion algorithm on input z is dened to be G0(z; 1n ) =
G(z; 1n ) GEN (s; 1n ). Thus, (SREP10 ; SREP20 ; G0) satises all three properties at a
small overhead compared to (SREP1 ; SREP2; G).
47
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
3.1.2 Statement of Main Results
We present three versions of our main theorem. All the versions are based on the
same construction which appears in Section 3.1.3. This construction makes use of
a pseudo-random generator GEN which is used as a building block of a correlated
generator (SREP1; SREP2 ; G). The three versions dier in the assumptions we make
about GEN . In the rst version we assume that GEN is a standard pseudo-random
generator (Denition 9). In other words, it stretches a seed by a polynomial factor,
and is resistant to any polynomial time distinguisher. In the second version (which
we call the quantitative version) we assume that GEN is a general TD (n); "(n){
pseudo random generator, as in Denition 8. Finally, in the third version we make
a stronger assumption about GEN . Namely, that even a (possibly non-uniform)
family of circuits cannot distinguish between its output and the uniform ensemble
with non-negligible probability.
Theorem 2 : [Standard version] Let c`en > 1 be a constant and let GEN be a
pseudo random generator with stretch n1=(2c`en) to n. For every constant c`en > 1
there exists a correlated pseudo random generator that has the following properties:
The length of the strings produced by each succinct representation generator on
input (1n ; `; r) is (n) = O(n1=c`en ).
The length of the random input, r, required by each succinct representation
generator on input (1n ; `; r), is O(n1=(2c`en) ).
Theorem 3: [Quantitative version:] Let TD : IN ?! IN , and let " : IN ?! [0; 1].
Let GEN be a TD (n); "(n){pseudo random generator with stretch (n) to n. Let
t(n) = 8(1="(n))2 ln 16(n="(n)). Then there exists a correlated TD(n); "(n){pseudo
random generator (SREP1 ; SREP2 ; G) such that for
q
s
2
n
n
4 1 + 2 ln 2 log (n)+1 ? 1
(n) =
2
log
ln 2
(n)
TD (n) = T2t((nn)) ? (2(n) + 23 )TGEN (n) ? 2(n) and "(n) = 2n (n)"(n):
The length of the strings produced by each succinct representation generator on
input (1n ; `; r) is (n) (n) (n) 2(n) .
The length of the random input, r, required by each succinct representation
generator on input (1n ; `; r) is: (n) (n)(n) + 2(n) 2(n) .
D
48
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Theorem 4: [Non-uniform version:] Let TD : IN ?! IN , and let " : IN ?! [0; 1].
Let GEN be a non-uniform TD (n); "(n){pseudo random generator with stretch (n) to
n. Then there exists a correlated non-uniform TD (n); "(n){pseudo random generator
(SREP1 ; SREP2; G) that has the same properties as stated in Theorem 3 except for
the following:
TD (n) = TD (n) ? 2(n)TGEN (n) ? 2(n) ? log n and "(n) = (n)"(n):
An interesting observation is that the security assurance provided by Theorem 4 is
signicantly better than the one provided by Theorem 3. Or in other words, a higher
level of security can be proved if GEN is resistant to (possibly non-uniform) circuits.
If GEN is a non-uniform TD (n); "(n){pseudo random generator then the constructed
correlated pseudo-random generator can withstand distinguishers with a longer running time than would otherwise be the case (i.e. if GEN a TD(n); "(n){pseudo random
generator). The dierence in running time is roughly by a multiplicative factor of
2t(n), where t(n) is as dened in Theorem 3. Furthermore, a distinguishing algorithm has a lower probability of distinguishing between the output of a correlated
pseudo-random generator constructed with a non-uniform TD(n); "(n){pseudo random generator than one constructed with a TD (n); "(n){pseudo random generator.
The dierence is a multiplicative factor of 2n.
Notice that the succinct representation length and the quality of the indistinguishability depend primarily on the ratio of (n) to n. We begin by describing the
construction of the generator, and then prove Theorems 2, 3 and 4 by a series of
lemmata.
3.1.3 Construction
We construct (SREP1 ; SREP2; G) using a series of \intermediate generators", which
we denote by (SREP10; SREP20; G0), (SREP11 ; SREP21; G1),: : :. Recall that a correlated pseudo-random generator as dened in Subsection 3.1.1 receives three parameters as input: (1n ; `; r). 1n is the security parameter and the output length of the expansion algorithm G, ` is the index at which the two output strings dier, and r is the
random input. In contrast, the intermediate generator (SREP1q ; SREP2q ; Gq ) receives
four parameters as input (1n ; nq ; `q ; rq ). The security parameter remains 1n , but the nal output length is nq . This value indicates the length of the strings that are produced
by Gq , and is in general smaller than n. Specically, let uq` = SREP1q (1n ; nq ; `q ; rq )
and w`q = SREP2q (1n ; nq ; `q ; rq ) denote the outputs of the two intermediate generators. Then both Gq (uq` ; nq ; 1n ) and Gq (w`q ; nq ; 1n) are of length nq , and they are
identical except at the `q -th bit.
The correlated generator we construct, (SREP1 ; SREP2; G), on input (1n ; `; r),
performs a single operation: invoking an appropriate intermediate generator with
q
q
q
q
49
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
parameters (1n ; n; `; r). If the generator GEN is assumed to be a standard pseudorandom generator an intermediate generator (SREP1c ; SREP2c ; Gc) is invoked for
some constant c. If GEN is assumed to be a (possibly non-uniform) TD (n); "(n){
pseudo random generator with stretch factor (n) to n,qthe intermediate generator
(SREP1(n) ; SREP2(n); G(n)) is invoked (where (n) 2 log (nn) ). We now dene
the intermediate generators by induction on q.
Notationq 7: We denote by q (nq ) the output length of SREP1q (1n ; nq ; `q ; rq ) (and
of SREP2 on the same input).
Induction basis: On input (1n ; n0; `0; r0), the generator (SREP10; SREP20; G0) does
the following. SREP10(1n ; n0; `0; r0) = r0 while SREP20 (1n; n0; `0; r0) = r0 e`0 (n0).
G0 is simply the identity function (G0(z; n0; 1n ) = z).
Inductive
step: qLet
q 1, and assume that the constructions of the algorithms
q
?
1
?
1
SREP1 , SREP2 ; and Gq?1 are given. We construct SREP1q ; SREP2q ; Gq .
Let 1 mq nq be a parameter (whose precise value depends on n and q, and
will be determined later). View the elements of the set [nq ] as the coordinates of a
two dimensional matrix in which there are mq rows and nq?1 =4 dnq =mq e columns.
Let the `q -th element be the (i; `q?1)-th entry in the mq -by-nq?1 matrix (`q 2 [nq ]).
The succinct representation generators SREP1q ; SREP2q use a random input rq
which consists of three parts:
1. A random subset S q [mq], chosen uniformly (mq bits).
2. mq +1 random independent seeds for the generator GEN , with security parameter n, chosen uniformly
s1; : : :; si?1; s1i ; s2i ; si+1; : : : ; sm :
The domain of these seeds is f0; 1g(n) , so overall (mq +1)(n) bits are required
to represent this part.
3. An appropriate random input rq?1 for SREP1q?1 ; SREP2q?1 .
We invoke SREP1q?1 (1n ; nq?1; `q?1; rq?1) and SREP2q?1 (1n; nq?1; `q?1 ; rq?1) to produce the two correlated succinct representations u`q??11 ; w`q??11 (of length q?1(nq?1))
respectively. Recall that by assumption Gq?1 (u`q??11 ; nq?1 ; 1n) Gq?1(w`q??11 ; nq?1 ; 1n) =
e` ?1 (nq?1 ) is a unit vector of length nq?1. By truncating extra bits, we can view GEN
as expanding strings of length (n) to strings of length q?1(nq?1 ). We dene the
correction words cw1 ; cw2:
( q?1
(s1i ; 1n) if i 2 S q
cw1 = uw`q??11 GEN
2 n
= Sq
` ?1 GEN (si ; 1 ) if i 2
q
q
q
q
q
q
50
q
q
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
(
q?1 GEN (s2 ; 1n ) if i 2 S q
i
cw2 = wuq`??11 GEN
1 ; 1n ) if i 2
(
s
= Sq
` ?1
i
The output uq` ; w`q of the two succinct representation generators SREP1q ; SREP2q
is dened as follows:
q
q
q
q
uq` =4 s1k : : : ksi?1ks1i ksi+1k : : : ksm kcw1kcw2kS q
w`q =4 s1k : : : ksi?1ks2i ksi+1k : : : ksm kcw1kcw2kS q i
q
q
q
q
The length of each succinct representation is
q (nq ) =4 mq ((n) + 1) + 2q?1 (nq?1):
(3.1)
We now dene the expansion algorithm Gq . A succinct representation of the
appropriate length q (nq ) is parsed as
s = s1k : : : ksm kcw1kcw2kT :
q
The rst part consists of mq seeds s1; : : :; sm , all in f0; 1g(n), that can be expanded
by GEN to strings of length q?1(nq?1) each2. The last part in the succinct representation is a string of length mq which we interpret as a subset T [mq] of indices.
The two correction words are simply strings of length q?1(nq?1). We now dene, for
h = 1; : : :; mq
(
cw1 GEN (sh ; 1n) if h 2 T
vh = cw
=T
2 GEN (sh ; 1n ) if h 2
On input z; nq ; 1n , the output of Gq is the string
q
Gq (z; nq ; 1n ) = Gq?1 (v1; nq?1; 1n )k : : : kGq?1(vm ; nq?1; 1n ):
q
3.1.4 Proof of correlation
In this subsection we show that the output of G is indeed a pair of n bit strings that
dier only in the `-th bit. For the sake of brevity we slightly abuse the notation in the
next lemma, and use Gq (z) to denote Gq (z; nq; 1n ) (and similarly Gq?1 (z) to denote
Gq?1 (z; nq?1; 1n )).
Lemma
2: For every
q (q = 0; 1; : : :), n; nq ; `q and rq , if uq` = SREP1q (1n ; nq ; `q ; rq )
q
q
and w` = SREP2 (1n ; nq ; `q ; rq ), then Gq (uq` ) Gq (w`q ) = e` (nq ) :
q
q
q
2
q
q
They are actually expanded by GEN to length n, but we only need the rst q?1(nq?1 ) bits.
51
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Proof: We prove the claim by induction on q. For q = 0, by construction
G0(u0`0 ) G0 (w`00 ) = u0`0 w`00 = r0 (r0 e`0 (n0)) = e`0 (n0) :
We now assume that for every j 2 [nq?1] we have Gq?1(ujq?1) Gq?1 (wjq?1) =
e` ?1 (nq?1 ), and prove the hypothesis for Gq . Let `q 2 [nq ], and assume that it
is represented in the mq nq?1 matrix as (i; `q?1). We denote the output of Gq
in the following way: Gq (uq` ) = U1k : : : kUm and Gq (w`q ) = W1k : : : kWm , where
U1; : : :; Um ; W1; : : : ; Wm are all in f0; 1gn ?1 . Let h be a value in the range 1; : : : ; mq.
By the denition of Gq there are two strings yh; zh 2 f0; 1g ?1 (n ?1) such that Uh =
q
q
q
q
q
q
q
q
q
q
Gq?1 (yh) and Wh = Gq?1(zh) (we referred to both yh and zh as vh in Section 3.1.3).
We begin by dealing with the case h 6= i and show that yh = zh . The last element
of uq` is the set S q [mq] and the last element of w`q is the set S q fig [mq]. Since
h 6= i, h 2 S q if and only if h 2 S q fig. Therefore yh = cw1 GEN (sh ; 1n ) ()
zh = cw1 GEN (sh; 1n ) and yh = cw2 GEN (sh; 1n ) () zh = cw2 GEN (sh ; 1n).
Hence, yh = zh and as a consequence Uh = Wh .
In the second case h = i, and i is an element of exactly one of the sets S q or S q i.
For both possibilities, i 2 S q and i 62 S q , yi = cw1 GEN (s1i ; 1n ) = u`q??11 and zi =
cw2 GEN (s2i ; 1n ) = w`q??11 . By assumption Gq?1 (u`q??11 ) Gq?1 (w`q??11 ) = e` ?1 (nq?1).
If we denote by o the all zero vector of length nq?1 we have:
Gq (uq` ) Gq (w`q ) = (U1k : : : kUm ) (W1k : : : kWm )
= Gq?1 (y1) Gq?1(z1)k : : : kGq?1(ym ) Gq?1 (zm )
= ok : : : koke` ?1 (nq?1)kok : : : ko
= e` (nq ) :
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
3.2 Lengths of input and output
In this section we analyze the stretching properties of the succinct representation
generators, SREP1 and SREP2. For every q, the length of the outputs of the succinct
representation generators SREP1q ; SREP2q depends on the value of the parameter mq.
Our aim is to choose a value for each mq so that the overall succinct representation
length (n) is minimized. This is important in out application because (n) is the
communication cost of our protocol, and that is the parameter we wish to minimize.
After setting mq we discuss the number of random bits that the succinct representation generators require as input. This resource is not as important in our context
as the length of the output. We analyze it in order to obtain a clearer picture of the
properties of our construction.
52
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Lemma 3 :
For every n; (n) and every q (q = 0; 1; 2; : : :) there is a choice
of mq such that the succinct representations produced by SREP1q (1n ; `q ; nq ; rq ) and
SREP2q (1n ; `q ; nq ; rq ) are of length
1
q (nq ) = 2 2 (q + 1)((n) + 1) +1 nq +1 :
Proof: We begin by proving that for every q (q = 0; 1; 2; : : :) the minimal succinct
representation length is of the form
1
q (nq ) = cq ((n) + 1) +1 nq +1
(3.2)
where cq is a constant whose value depends only on q and not on (n) or nq . The
proof is by induction on q. For q = 0, by construction, 0(n0) = n0, so by setting
c0 = 1 we have the right form 0(n0) = 1 n10((n)+1)0. By Equation 3.1 the relation
between q (nq ); q?1(nq?1 ) and mq is given by:
q (nq ) = mq ((n) + 1) + 2q?1 (nq?1) :
Since q?1(nq?1) is positive valued and appears in the recursive equation with a plus
sign, to minimize q (nq ) we need to minimize q?1(nq?1) as well. Assume that the
claim is correct for q ? 1. We look for a value of mq which minimizes q (nq ). By the
induction hypothesis, we have
?1
1
q (nq ) = mq ((n) + 1) + 2cq?1 ((n) + 1) (nq?1) :
We can view q (nq ) as a function of mq since nq?1 = nq =mq . By computing its
derivative with respect to mq , we obtain that q (nq ) is minimized for
!
! +11
2
c
n
q?1 +1
q
mq = q
:
(3.3)
(n) + 1
Substituting, we get
1
?
1
q (nq ) = (2cq?1 ) +1 (q +1 + q +1 )((n) + 1) +1 nq +1 :
We obtain from this expression that indeed Equation 3.2 holds for
?
1
cq = (2cq?1 ) +1 (q +1 + q +1 ) :
We complete the proof by showing inductively that cq = 2q=2(q + 1). For q = 0, we
indeed have that cq = 1. Assume now that the hypothesis of the induction is true for
q ? 1.
?
1
cq = (2cq?1) +1 (q +1 + q +1 )
?
+1
1
= (2 2 q) +1 (q +1 + q +1 )
= 2 2 (1 + q) :
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
53
q
q
q
q
q
q
q
q
q
q
q
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Corollary 5 : For any constant q, q (nq ) = O(n1q=(q+1)(n)). Therefore, if the
correlated generator SREP1 ; SREP2 ; G invokes the q-th intermediate generator, for
some constant q, then the output length of the succinct representation generators is
(n) = O(n1=(q+1)(n)).
Suppose we set a large value for n and range
over q (q = 0; 1; 2; : : :). The sequence
1
+1
+1
of values q (nq ) = 2 2 (q + 1)((n) + 1) n is monotonously decreasing for small
values of q. However, at some point the term 2 2 takes over, causing q (nq ) to increase.
We wish to set (n) to be the optimal value in the sequence q = 0; 1; 2; : : :. In other
words the value of q for which q (nq ) is minimal. By computing the derivative of
q (nq ) as a function of q we obtain Corollary 6. We omit the arithmetic.
q
q
q
q
q
Corollary 6 : Let (SREP1(n) ; SREP2(n) ; G(n)) be the generator in the series of
intermediate generators, (SREP10; SREP20 ; G0),(SREP11 ; SREP21 ; G1 ), : : : for which
q (nq ) is minimal. Then,
q
1 + 2 ln2 2 log (nn)+1 ? 1 s
n ;
2
log
(n) =
ln 2
(n)
and the length of a string produced by the succinct representation generators on input
(1n ; `; r) equals
(n) 2(n) (n)(n):
In the next lemma we investigate the size of the random input rq required by
SREP1q and SREP2q . After setting the values of mq (q = 0; 1; 2; : : :) the stretching
factor3 of SREP1q and SREP2q is determined. By Equation 3.3 we have that
1
! ( +1)
n
q
mq = 2 2 ((n) + 1)
(3.4)
q
q
for every q.
Lemma 4: Let mq =q 2q=2(nq =((nq)+1))1=(q+1) and let the size of the random input,
rq , required by SREP1 and SREP2 (together) be denoted by q (nq ). Then,
1
?q
q (nq ) = q(n) + nq +1 ((n) + 1) +1 2 2 (2q+1 ? 1):
q
q
q
That is the ratio between the length of the output of the succinct representation generators and
the length of the random input.
3
54
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Proof:
We prove the claim by induction on q. For q = 0 the random input, r0, is an
n0 bit long binary string as the rst generator returns r0, while the second returns
r0 e`0 (n0).
Now we assume the claim for q ? 1 and prove it for q. The random input rq
includes a random subset of [mq] (mq bits), mq +1 seeds for GEN ((mq +1)(n) bits)
and a random input for SREP1q?1 and SREP2q?1 (q?1(nq?1) bits). Therefore,
q (nq ) = mq((n) + 1) + (n) + q?1 (nq?1):
Substituting for mq the value 2q=2(nq =((n) + 1))1=(q+1) we have
0 +1
1 1
1
+1
(
(
n
)
+
1)
n
q
+1
A
q (nq ) = 2 2 nq ((n) + 1) +1 + (n) + q?1 @
22
1
1
?1
= 2 2 nq +1 ((n) + 1) +1 + q(n) + 2? 12 nq +1 ((n) + 1) +1 2? 2 (2q ? 1)
1
= q(n) + nq +1 ((n) + 1) +1 2? 2 (2q+1 ? 1)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Corollary 7 : For any constant q the random input r that SREP1q ; SREP2q ; Gq
receives must be of size q (nq ) = O(n1q=(q+1)(n)). Therefore, if the correlated generator SREP1 ; SREP2 ; G invokes the q-th intermediate generator, for some constant q, then the random input required by the succinct representation generators
is (n) = O(n1=(q+1) (n)).
If we again set SREP1 ; SREP2; G to invoke that intermediate generator for which
the succinct representations are of minimal length we have that:
Corollary 8: The random input r required by the correlated pseudo(random
genern
)
ator SREP1 ; SREP2; G is of equal length to that required by SREP1 , SREP2(n) ,
G(n):
(n)
1
?(n)
(n) = (n)(n) + n (n)+1 (n) + 1) (n)+1 2 2 (2(n)+1 ? 1)
(n)(n) + 2(n) 2(n);
Corollary 9: SREP1 and SREP2 expand the random input, which is of length (n)
to the succinct representation, which is of length (n). The ratio between the output
length and the random input length is
s
(n) + 1 1 log n :
2
2 (n)
55
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
3.3 Proof of index indistinguishability
In this section we analyze the index indistinguishability property (Denition 20) of
the construction described in Subsection 3.1.3. The current section includes several
technical propositions, which we divide into a few sub-sections in order to enhance
readability. The organization of the section is as follows. In Subsection 3.3.1 we prove
several useful lemmata, while Subsection 3.3.2 is comprised of some helpful notation.
In Subsection 3.3.3 we prove index indistinguishability under the assumption that
GEN is a standard pseudo-random generator (Denition 9). In Subsection 3.3.4
we base the proof on GEN being a general TD(n); "(n){pseudo random generator
(Denition 8).
3.3.1 Three useful statements
The rst two lemmata we present have appeared previously in the literature, and are
presented here for the sake of completeness.
Lemma 5 : Let `ength : IN ?! IN be a monotonously non-decreasing function,
satisfying `ength(n) n. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let A =
fAngn2IN , B = fBngn2IN be two probability ensembles such that both An and Bn range
over f0; 1g`ength(n) . Let M be an algorithm mapping strings to strings, so that if a
and b are strings of the same length, then M (a) and M (b) are strings of the same
length. Let A0 = M (A), B 0 = M (B ) be the ensembles induced by applying M to A; B ,
respectively. Let TM (n) be the running time of M on `ength(n) bit strings.
If A and B are TD (n); "(n){indistinguishable then the ensembles A0; B 0 are
TD (n) ? TM (n), "(n) indistinguishable.
If A and B are computationally indistinguishable and TM (n) is polynomial in n
then A0; B 0 are computationally indistinguishable.
Proof: For every algorithm D0 that distinguishes between the ensembles A0 and
B 0 we dene a corresponding algorithm D that distinguishes between A and B . On
an input strings z and 1n (see Denition 4), D computes M (z), invokes D0 on input
(M (z); 1n ) and returns the output of D0.
For every D0 whose running time is TD (n) ? TM (n), the running time of the
corresponding algorithm D is TD(n). By assumption A and B are TD (n); "(n){
indistinguishable and thus there exists an n0 such that for every n n0, the probability that D distinguishes between An and Bn is smaller than "(n). Therefore, for
every n n0, the probability that D0 distinguishes between A0n and Bn0 is smaller
than "(n).
56
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
For every D0 whose running time on strings of length n is polynomial in n, if M is
a polynomial time mapping, then the corresponding algorithm D runs in polynomial
time. If A and B are computationally indistinguishable then for any constant c there
exists an n0 such that for every n n0, D distinguishes between An and Bn with
probability less than 1=nc . Therefore, for every n n0 D0 distinguishes between
A0n and Bn0 with probability less than 1=nc . This in turn implies that A0; B 0 are
computationally indistinguishable.
Lemma 6 : Let `ength : IN ?! IN be a monotonously non-decreasing function,
satisfying `ength(n) n. Let TD ; TD0 : IN ?! IN , and "; "0 : IN ?! [0; 1]. Let
A = fAngn2IN , B = fBn gn2IN , C = fCngn2IN be three probability ensembles such that
An, Bn and Cn range over f0; 1g`ength(n) .
Assume that A; B are TD(n); "(n){indistinguishable while B; C are TD0 (n); "0(n)
indistinguishable. Then A; C are minfTD (n); TD0 (n)g; "(n)+ "0(n) indistinguish-
able.
Assume A; B are computationally indistinguishable, and B; C are computationally indistinguishable. Then A; C are computationally indistinguishable.
Proof: Let D be a distinguisher between the ensembles A and C , whose running time is minfTD (n); TD0 (n)g. That same algorithm D can be used to distinguish
between any two ensembles, in particular between A and B and between B and C .
We use the notation pY (n) = Pr[D(Yn ; 1n) = 1] for each of the three ensembles
Y = A; B; C .
If A; B are TD (n); "(n){indistinguishable, while B; C are TD0 (n); "0(n) indistinguishable there exist n1 and n2 such that for any n n1 we have jpA (n) ? pB (n)j <
"(n), and for any n n2 we have jpB (n) ? pC (n)j < "0(). Let n0 = maxfn1; n2g. For
any n > n0 we have
jpA (n) ? pC (n)j jpA (n) ? pB (n)j + jpB (n) ? pC (n)j
< "(n) + "0(n)
Assume that A; B are computationally indistinguishable, B; C are computationally indistinguishable and the running time of D is polynomial in n. Therefore,
for every constant c there exist n1 and n2 such that for any n n1 we have
jpA (n) ? pB (n)j < 1=2nc and for any n n2 we have jpB (n) ? pC (n)j < 1=2nc .
Let n0 = maxfn1; n2g. For any n > n0 we have jpA (n) ? pC (n)j < 1=nc , and thus
A; C are computationally indistinguishable.
We need one more lemma in order to precisely analyze the running time bounds
on algorithms that try to distinguish the output of SREP1q (or SREP2q ) from the
uniform distribution.
57
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Lemma 7: Theq computational complexityq of producing the pair of succinct representations SREP1 (1n ; nq ; `q ; rq) and SREP2 (1n ; nq ; `q ; rq) is no more than 2q (nq )+
2qTGEN (n) steps.
Proof:
Let f (q; n; nq) denote the time needed to generate the pair of succinct
representations SREP1q (1n ; nq ; `q ; rq ) and SREP2q (1n ; nq ; `q ; rq ). Note that f does
not depend on the values of `q and rq . Generating the two succinct representations
can be divided into several stages:
Sampling random elements: s1; : : :; sm and S q (q (nq ) ? 2q?1 (nq?1) + k(n)
steps).
Generating the strings u`q??11 ; w`q??11 (f (q ? 1; n; nq?1) steps).
q
q
q
Applying GEN to the two seeds s1i ,s2i ( 2TGEN (n) step).
Finally computing cw1, cw2 and S q i (no more than q (nq ) ? k(n) steps).
Therefore, the following recursive inequality holds:
f (q; n; nq ) 2(q (nq ) ? q?1(nq?1 ) + TGEN (n)) + f (q ? 1; n; nq?1)
Solving for f as a function of q veries the statement of the lemma.
3.3.2 Notation used in the proof
Notation 8: Let ` : IN ?! IN be an index function and let `ength : IN ?! IN
be a monotonously non-decreasing function, satisfying `ength(n) n. Let GEN
be a pseudo-random generator stretching seeds of length (n) to strings of length
n, let SREP1; SREP2 ; G be the correlated generator and SREP1q ; SREP2q ; Gq (q =
0; 1; 2; : : :) be the intermediate generators as constructed in Subsection 3.1.3. We denote by UNI`ength(n);`(n) the distribution induced by binary strings of the form rk`(n)
where r 2 f0; 1g`ength(n) is chosen uniformly at random. We 3.1.3. If the input of
the correlated generator is (1n ; `(n); r) we denote the input of the q-th intermediate
generator (which is completely determined given the input of the correlated generator) by (1n ; nq ; `q (n); rq ). For every n 2 IN we denote by Aqn;`(n) the distribution
q
induced by uq`q (n) = SREP1q (1n ; nq ; `q (n); rq ). We denote by Bn;`
(n) the distribution
4
q
q
Bn;`
(n)n = An;`o(n) k`(n). The corresponding
n q o ensembles given q and ` are denoted by
q
q
q
A` = An;`(n) n2IN and by B` = Bn;`(n) n2IN . We analogously dene the ensembles
Aq` ; B`q induced by the outputs of SREP2q .
58
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Notation 9: Let ` : IN ?! IN be an index function. Denote by GEN`ength(n);`(n)
the distribution induced by binary strings of the form GEN (s; 1n )k`(n). We con-
struct GEN (s; 1n ) as follows. First, a seed s is chosen uniformly at random from
f0; 1gkappa(n). Then, GEN is applied to input the (s; 1n ) and used to compute an n
bit output. Finally, the rst `ength(n) bits of the output are taken to be GEN (s; 1n ).
The corresponding ensembles are denoted by
n
o
n
o
UNI`ength;` =4 UNI`ength(n);`(n) n2IN ; GEN`ength;` =4 GEN`ength(n);`(n) n2IN :
Notation 10: Let : IN ?! IN be any function
and let ` : IN ?! IN be an index
function. We dene the ve ensembles A` , A` , B` , B` , UNI;` by
o
n
A` =4 An;`(n()n) n2IN
n (n) o
B` =4 Bn;`
(n) n2IN
o
4n
UNI ;` = UNI ( )(n);`(n) n2IN
o
n
A` =4 An;`(n()n) n2IN
n (n) o
B` =4 Bn;`
(n) n2IN
n
where the distributions are as described in Notation 8 and Notation 9. We denote by
T(n) the time required to compute (n).
3.3.3 Polynomial indistinguishability
In this subsection we assume that GEN is a standard pseudo-random generator with
stretch n1=2c`en to n, as in Denition 9. Furthermore, we assume that GEN takes as
input seeds of length (n) = n1=2c`en(n) and a security parameter 1n and outputs a
string of length n. We use GEN to construct the correlated pseudo-random generator
SREP1 ; SREP2; G as in Subsection 3.1.3.
Our nal goal is to prove that if GEN is a pseudo-random generator (that is,
its output is indistinguishable from a random string) then the output of the succinct
representation generators we dened is also indistinguishable from a random string.
In the following lemma we take the rst step towards this goal. For a given index
function ` we examine strings that have `(n) concatenated to them. We argue that if
the output of GEN , concatenated to `(n), is indistinguishable from a random string,
concatenated to `(n), then the output of a succinct representation generator with a
concatenated `(n) is also indistinguishable from a random string with a concatenated
`(n).
Note that proving the above claim is not sucient to prove that the output of the
succinct pseudo-random generators is pseudo-random. The remaining gap is dealt
with in Lemma 9, which shows that the output of GEN concatenated to `(n) is
indistinguishable from a random string concatenated to `(n).
59
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Lemma 8: Let `ength(n) = n for every n 2 IN and let ` : IN ?! IN be an index
function. If the ensemble GEN`ength;` is computationally indistinguishable from the
ensemble UNI`ength;` then for every constant q (q = 0; 1; 2; : : :) the ensembles B`q ; B`q
are computationally indistinguishable from the ensemble UNIq ;` .
Proof: We prove the claim
explicitly for B`q (q = 0; 1; 2; : : :) by induction on q.
The proof is symmetric for B`q . The claim for the basis of the induction, q = 0, follows
immediately from the construction. Recall that for any n and any `(n) the output
of SREP10(1n ; `(n); r) is uniformly distributed over f0; 1g0(n0 ). Thus, for every n
0
and ` the two distributions Bn;`
(n) and UNI0 (n0 );`(n) are identical and therefore the
0
ensembles B` and UNI0;` are identical (without any assumptions about GEN`ength;` ).
Assume that the inductive claim is correct for q ? 1. Let `() be an index function.
We prove by a hybrid argument that B`q is computationally indistinguishable from
the ensemble UNIq ;`.
q
For any n 2 IN the distribution Bn;`
(n) is over strings of length q (nq ) + log n and
can be parsed as:
uq`q (n)k`(n) =4 s1k : : : ksi?1ks1i ksi+1k : : : ksmq kcw1kcw2kS q k`(n):
We use the notation `(n) and `q (n) in analogous fashion to ` and `q in Subsection
3.1.3 (`q (n) can be easily computed from `(n).). We assume w.l.o.g that i 2 S q and
thus cw1 = u`qq??11 (n) GEN (s1i ) and cw2 = w`qq??11(n) GEN (s2i ).
q
Intuition: We wish to compare Bn;`
(n) to another distribution over strings of length
q (nq ) + log n, which is simply the uniform distribution over q (nq ) bits with `(n)
concatenated at the end. These two distributions are very similar. The only dierence
is that cw1 and cw2 are not random strings. Replacing cw2 by a random string
cannot be detected because of the following. s2i is a seed which is independent of
uq`q (n), therefore its expansion by GEN is indistinguishable from a random string, and
hence cw2 can't be distinguished from a random string. Replacing cw1 by a random
string cannot be detected because because of a dierent argument. u`qq??11(n)k`(n) is
indistinguishable from a random string with a concatenated `(n) by the induction
hypothesis. Therefore, u`qq??11(n) is indistinguishable from a random string, and the
same holds for cw1.
A formalization of the above intuition now follows. Consider the distribution
q
Hn;`
(n) over strings of the form
hq` (n) =4 s1k : : : ksi?1ks1i ksi+1k : : : ksm kcw1kw`q??11 (n) r2kS q k`(n);
where r2 is an q?1(nq?1) bit binary string, chosen uniformly at random. The sole
4
q
q
2 ) is replaced by r2 . Let H q =
dierence
between
B
and
H
is
that
GEN
(
s
i
`
n;`
(
n
)
n;`
(
n
)
n q o
Hn;`(n) n2IN be the corresponding ensemble of distributions.
q
q
60
q
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
We construct a probabilistic polynomial time algorithm M1 that for every n transforms a string zk`(n) drawn from either GEN ?1 (n ?1 );`(n) or UNI ?1(n ?1 );`(n) to a
q
q
string drawn from either Bn;`
(n) or Hn;`(n) respectively. Given z k`(n) as input, M1
does the following:
1. Chooses uniformly at random and independently a set S q 2 [mq] and mq seeds
s1; : : :; sm 2 f0; 1gn1 2 .
2. Computes `q?1(n) and i from `(n).
3. Constructs u`q??11(n) and w`q??11 (n) which are the outputs of the succinct representation generators SREP1q?1 and SREP2q?1 .
4. Outputs
q
q
q
q
= c`en
q
q
q
s1k : : : ksm ku`q??11(n) GEN (si ; 1n )kw`q??11(n) zkS q k`(n):
q
q
q
q
If zk`(n) is drawn from GEN ?1(n ?1 );`(n) then the output is drawn from Bn;`
(n) and
q
if zk`(n) is drawn from UNI ?1(n ?1);`(n) then the output is drawn from Hn;`(n) . M1
runs in polynomial time (in n) due to the following facts. The number of random bits
it needs is less than q (nq ) n. The computation of `q?1 (n) and i is polynomial4
in log n. The construction of the pair u`q??11(n) and w`q??11 (n) takes polynomial time,
see Lemma 7. Finally, since GEN is a pseudo-random generator the computation of
GEN (si ; 1n ) takes polynomial time (in n).
Since r2 is random and independent of all other entries, hq` (n) can also be viewed
as:
hq` (n) =4 s1k : : : ksi?1 ks1i ksi+1k : : : ksm kcw1kr2kS q k`(n);
where r2 is a random binary string of length q?1(nq?1).
q
Consider the distribution Fn;`
(n) over strings of the form
q
q
q
q
q
q
q
q
q
f`q (n) =4 s1k : : : ksi?1ks1i ksi+1k : : : ksm kr1 GEN (s1i ; 1n)kr2kS q k`(n);
q
q
where r1 is an q?1(nq?1) bit binary string, chosen uniformly at random. The sole
q?1
q
q
dierence between Hn;`
(n) and Fn;`(n) is that u` ?1 (n) is replaced by r1. Since r1 GEN (s1i ; 1n ) is random the string f`q (n) can be viewed as rk`(n) where r 2 f0; 1g (n )
is chosen uniformly at random. Thus, f`q (n) is in fact drawn from the distribution
UNI (n );`(n).
q
q
q
q
q
q
q
The computation can be carried out by recursively calculating the value mq as in Subsection 3
and then by calculating the position of `q (n) in the matrix: (i; `q (n)) such that `q (n) = (i ? 1)nq?1 +
`q?1 (n).
4
61
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
We construct a probabilistic polynomial time algorithm M2 that for every n transq?1 or UNI
forms a string zk`(n) drawn from either Bn;`
?1 (n ?1 );`(n) to a string drawn
(n)
q
from either Hn;`(n) or UNI (n );`(n) respectively.
Given zk`(n) as input, M2 does the following:
q
q
q
q
1. Chooses uniformly at random and independently s1; : : :; sm 2 f0; 1gn1 2 , r2
and S q 2 [mq].
2. Computes `q?1(n) and i from `(n).
3. Outputs
s1k : : : ksm kz GEN (si ; 1n )kr2kS q k`(n):
= c`en
q
q
Similar arguments to those we used to show that M1 runs in polynomial time prove
that M2 runs in polynomial time (indeed M2 requires less running time than M1).
We complete the proof of the Lemma by the following chain of arguments. M1 is a
polynomial time algorithm that maps GEN ?1 ;` to B`q and UNI ?1;` to H`q . We have
by Lemma 5 that if GEN ?1 ;` and UNI ?1;` are computationally indistinguishable
then B`q and H`q are also computationally indistinguishable.
By the induction hypothesis if GEN`ength;` and UNI`ength;` are computationally
indistinguishable then so are B`q?1 and UNI ?1;`. We combine this with the polynomial time mapping M2 that transforms B`q?1 to H`q and UNI ?1 ;` to UNI ;` and
obtain that if GEN ?1;`(n) and UNI ?1;` are computationally indistinguishable then
so are H`q and UNI ;`.
Therefore, if GEN ?1 ;`(n) and UNI ?1;` are computationally indistinguishable
then B`q is computationally indistinguishable from H`q and H`q is computationally
indistinguishable from UNI ;`. By Lemma 6 we have that if GEN ?1;` is computationally indistinguishable from UNI ?1;` then B`q and UNI ;` are computationally
indistinguishable.
In the next lemma we show that if GEN is a pseudo-random generator then
the ensembles GEN`ength;` (the output of GEN with `(n) attached at the end) and
UNI`ength;` (the uniform ensemble with `(n) attached at the end) are computationally
indistinguishable. Note that this does not follow immediately from the fact that the
output of GEN (without `(n)) is indistinguishable from a random string.
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Lemma 9: Let ` : IN ?! IN be an index function and let `ength : IN ?! IN be a
monotonously non-decreasing function, satisfying `ength(n) n, which is computable
in uniform polynomial time. Let GEN be a pseudo-random generator. Then, the
ensemble GEN`ength;` is computationally indistinguishable from UNI`ength;` .
62
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Proof: Our rst impulse might be to try proving this lemma by the same argument
employed in Lemma 5. That is, by using a mapping M that takes as input a string
z 2 f0; 1g`ength(n) (which is either pseudo-random or truly random) attaches `(n) to
z and thus produces a string which is either drawn from GEN`ength(n);`(n) or from
UNI`ength(n);`(n) . However, `(n) may not be computable in uniform polynomial time.
Hence, we have to rely on a slightly dierent argument.
We assume towards a contradiction that there exists an innite sequence `(n1),
`(n2), : : : such that the distribution UNI`ength(n);`(n) can be eciently distinguished
from GEN`ength(n);`(n) . We show that a sequence with such properties can be discovered by sampling, and used to distinguish between the uniform and pseudo-random
ensembles. We now provide the necessary details.
Assume towards a contradiction that there exists a polynomial time algorithm D
such that for some constant c and for an innite sequence of natural numbers
Pr[D(1n ; GEN`ength(n);`(n) ) = 1] ? Pr[D(1n ; UNI`ength(n);`(n) ) = 1] n1c :
For every i 2 [n] we use the notation pG (n; i) =4 Pr[D(1n ; GEN`ength(n);i ) = 1] and
4
also pU (n; i) =
Pr[D(1n ; UNI`ength(n);i ) = 1].
We can assume w.l.o.g that for any n in an innite sequence of natural numbers
n1; n2; : : : we have pG (n; `(n)) ? pU (n; `(n)) 1=nc (note the absence of the absolute
value) and design the algorithm D0 accordingly5.
We construct an algorithm D0 that distinguishes with non-negligible probability
between the output of the pseudo-random generator, GEN , and the uniform distribution for the same innite sequence of natural numbers, n1; n2; : : :. Informally, D0
attempts to nd an index i 2 [n] such that pG (n; i) ? pU (n; i) 1=2nc . If D0 succeeds
it takes the rst `ength(n) bits of its input string z, which we denote by z`ength(n) , executes D on input z`ength(n)ki and outputs the result. If it does not succeed it outputs
1.
Assume that for some n and `(n) the algorithm D distinguishes between the distributions GEN`ength(n);`(n) and UNI`ength(n);`(n) with probability at least 1=nc . Then, we
show that D0 distinguishes between the pseudo-random distribution on n bits, GENn ,
and the uniform distribution on n bits, UNIn , with probability at least 1=(2nc+1 ).
The algorithm D0:
1. On input (1n ; z), where z 2 f0; 1gn , compute `ength(n).
2. Choose randomly i 2 [n] and estimate the probability pG (n; i) by the following
steps:
If this assumption is incorrect then pU (n; `(n)) ? pG (n; `(n)) 1=nc for an innite sequence
n1 ; n2; : : :. In that case the algorithm D00 which can be obtained from D0 by exchanging pG (n; i)
with pU (n; i) (for every n and i) achieves the same purpose as D0 .
5
63
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
4 2c
(a) Choose t(n) =
8n ln 16nc+1 seeds s1; : : :; st(n) 2 f0; 1g(n) uniformly at
random and independently.
(b) For every j = 1; : : : ; t let GEN (sj ; 1n ) be the `ength(n) bit binary string
obtained by having GEN expand sj to n bits and then taking the rst
`ength(n) bits.
(c) For every j = 1; : : : ; t execute D on input (1n ; GEN (sj ; 1n)ki). Dene a
random variable Xj to be 1 if D returns 1 on its input and 0 otherwise.
(d) Estimate pG (n; i) by
Pt(n)
4 j =1 Xj
pG (n; i) = n
.
3. For every i 2 [n] estimate the probability pU (n; i) in similar fashion:
(a) Choose t(n) strings r1; : : : ; rt(n) 2 f0; 1g`ength(n) uniformly at random and
independently.
(b) For every j = 1; : : : ; t execute D on input (1n ; rj ki). Dene Yj to be 1 if
D returns 1 on its input and 0 otherwise.
(c) Estimate pU (n; i) by
Pt(n)
4 j =1 Yj
pU (n; i) = n
.
4. If pG (n; i) ? pU (n; i) 1=2nc then execute D on input z`ength(n)ki and return
the result as the output of D0 . Otherwise return 1 as output.
Analysis: In order to show that the estimates pG (n; i) and pU (n; i) are close to
pG (n; i) and pU (n; i) respectively, we use the Cherno bound (see for instance [44, p.
109]). It states that if p 1=2 and Z1; : : :; Zt0 are independent 0 ? 1 random variables
so that 8i; Pr[Zi = 1] = p then for any , 0 < p(1 ? p), we have
3
2 Pt0
2
Z
0
Pr 4 j=10 j ? p > 5 < 2 e? 2 (1? ) t
t
p
p
Assume that for n and `(n) we have pG (n; `(n)) ? pU (n; `(n)) 1=nc .
jPr[D0(1n ; GENn ) = 1] ? Pr[D0(1n ; UNIn ) = 1]j
0
n
0
n
Pr[PD (1 ; GENn ) = 1] ? Pr[D (1 ; UNIn ) = 1]
=
1 n Pr[
1 ] (p (n; i) ? p (n; i))
p
(
n;
i
)
?
p
(
n;
i
)
G
U
G
U
n i=1
2n
c
64
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
We divide the indices 1; : : :; n into 3 separate categories which we denote by
C1; C2; C3 (C1 [ C2 [ C3 = [n]). C1 includes all the indices i for which pG (n; i) ?
pU (n; i) 1=nc . In particular `(n) 2 C1. For the sake of brevity in the following
calculations we use the notation j = `(n). For the C1 category we have:
1P
pG (n; i) ? pU (n; i) 2n1 ] (pG (n; i) ? pU (n; i))
n i2C1 Pr[
h
i
1
1
n +1 Pr hpG (n; j ) ? pU (n; j ) 2n i
1
1
1
=
n +1 Pr jpGh(n; j ) ? pG (n; j )j < 4n ^ jpU(n; j ) ? pU (n; j )j < 4n
i
1
1
1
n +1 1 ? Pr h jpG (n; j ) ? pG (n; j )j 4ni _ jhpU (n; j ) ? pU (n; j )j 4n i 1
1
1
n +1 1 ? Pr jpG (n; j ) ? pG (n; j )j 4n ? Pr jpU (n; j ) ? pU (n; j )j 4n
c
c
c
c
c
c
c
c
c
c
c
c
Here we put the Cherno bound to good use. pG (n; j ) (or pU (n; j )) is an estimate
of pG (n; j ) (or pU (n; j )) by a sum of 0 ? 1 variables exactly as in the statement of the
bound6 . When estimating pG (n; j ) we used the Xj variables which are independent
0 ? 1 variables with Pr[Xj = 1] = pG (n; j ) and we employed the Yj s in similar manner
to estimate pU (n; j ). Thus, continuing the above computation:
!!
2
(1 4 )2
1
? 2 ( (1)(14 ? ) ( )) t(n)
?
t
(
n
)
> nc+1 1 ? 2 e
? e 2 ( )(1? ( ))
nc1+1 1 ? 2 2e? ln16n +1
4n3c+1
The category C2 includes all the indices i for which 0 pG (n; i) ? pU (n; i)) < 1=nc .
Obviously,
1 X Pr[p (n; i) ? p (n; i) 1 ] (p (n; i) ? p (n; i)) 0:
U
U
n i2C2 G
2nc G
pG n;j
= nc
pG n;j
pU n;j
= nc
pU n;j
c
The third category C3 includes all the indices i for which pG (n; i) ? pU (n; i) < 0.
1P
pG (n; i) ? pU (n; i) 2n1 ] (pG (n; i) ? pU (n; i))
n i2C3 Pr[
P
pG(n; i) ? pU (n; i) 21n ]
? n1 i2C3 Pr[
h
i
P
? n1 i2C3 Pr h pG (n; i) ? pG (n; i) 4n1 i _ pUh(n; i) ? pU (n; i) 4n1 i ? 1 P Pr jp (n; i) ? p (n; i)j 1 + Pr jp (n; i) ? p (n; i)j 1 c
c
c
n
i2C3
G
G
4nc
c
U
U
4nc
If pG (n; j) > 1=2 we consider the complement 1 ? pG(n; j) which is the probability that D
returns 0 on a pseudo-random string.
6
65
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Again, by the Cherno bound,
(1 4 )2
? n1 Pi2C3 2 e? 2 ( )(1? ( )) 8n2
? 4 e? ln16n +1
= nc
pG n;i
pG n;i
c
ln16nc+1
+2e
2
=4n )
2c
c+1
? 2pU (n;i(1)(1
?pU (n;i)) 8n ln 16n
c
!
=
c
? 4n1+1
c
Combining all the previous calculations we obtain:
jPr[D0(1n ; GENn ) = 1] ? Pr[D0(1n ; UNIn) = 1]j
P
P
1 3
pG(n; i) ? pU (n; i) 2n1 ] (pG (n; i) ? pU (n; i)) n j =1 i2C Pr[
1
3
=
4n +1 + 0 ? 4n +1
c
j
c
1
2nc+1
c
Corollary 10: Let q be qsome constant
natural number, let GEN be a pseudo-random
q
generator and let SREP1 ; SREP2 ; Gq be a correlated generator based on GEN . For
any index function ` : IN ?! IN the output ensembles Aq` ; Aq` , of SREP1q ; SREP2q
respectively, are pseudo-random.
Proof: Assume that there qexists a qpseudo-random generator GEN and it is used
to construct the ensembles B` and B` . By Lemma 8 and Lemma 9 we deduce that
B`q and B`q are computationally indistinguishable from the ensemble UNIq ;`. Recall
q
q
that for any n, Bn;`
(n) = An;`(n) k`(n) and UNIq (nq );`(n) = UNIq (nq ) k`(n), where
UNIn q (nq ) denotes the uniform distribution on q (nq ) bit strings. Thus, if an
algorithm D distinguishes between Aqn;`(n) and UNIn with probability at least 1=nc
q
then that same algorithm distinguishes between Bn;`
(n) and UNIq (nq );`(n) with the
same probability by simply ignoring the attached `(n). A symmetric argument proves
the claim for Aq`.
3.3.4 A quantitative version
In this subsection we present a quantitative analysis of the correlated generator's
index indistinguishability property. We assume that the pseudo-random generator
GEN expands seeds of length (n) for some function (n) (compared to seeds of
length n1=2c`en for some constant c`en > 1 as in Subsection 3.3.3) to strings of length
n. Any probabilistic algorithm running in time TD(n) cannot distinguish between
the pseudo-random and the uniform distributions with advantage "(n) (compared to
66
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
probabilistic polynomial time algorithms not being able to distinguish between the
two distributions with advantage n1=c for any constant c as in Subsection 3.3.3) for
large enough values of n.
The benets inherent in the quantitative version are twofold. The rst advantage
is a more exact analysis of the index indistinguishability property of the correlated
pseudo-random generator. The second and more important advantage is that the full
potential of the generator's stretching factor is explored.
When we use the standard notion of polynomial indistinguishability (as in Subsection 3.3.3) (n) = n1=2c`en. By Corollary 5, for any constant q, we have q (nq )
= O(n1q=(q+1)(n)). By Denition 9 the seed length (n) can be chosen as n1=2c`en
for any constant c`en. Thus, for any desired constant cout, the output of a succinct
representation generator may be determined to be n1=cout, by choosing appropriate
values for (n) and q. On the other hand, if the desired output length of a succinct representation generator is asymptotically smaller than n1=cout for any constant
cout then security can't be proven according to Denition 9. However, by using
the quantitative version of indistinguishability we might reach a better solution. Indeed, in Subsection p3 we showed that the optimal succinct representation
length is
q
2log(
n=
(
(
n
)))
(n) (n) (n) 2
and is achieved after (n) 2 log(n=(n)) recursive invocations. It is interesting, therefore, to determine the security of the shortest
possible succinct representations in the case where n is super-polynomial in (n).
Most of the analysis in this subsection follows in the footsteps of the lemmata
we proved in Subsection 3.3.3. We do note, however, one important dierence. As
mentioned previously, the correlated generator SREP1 ; SREP2; G in Subsection 3.3.3
is identical to an intermediate generator SREP1q ; SREP2q ; Gq (for some constant q).
In other words, for every nal output length n, the number of recursive invocations
is the same (i.e q). In this subsection we take into account a state of aairs in
which the number of recursive invocations is a function of n. Therefore, proving that
the ensembles Aq`; Aq` are pseudo-random does not suce for the proof of the index
indistinguishability property.
Intuition: In Lemma 10 it is assumed that GEN`ength;` and UNI`ength;` are computationally indistinguishable. We wish to prove that the output of a succinct representation generator concatenated to `(n) (e.g. B`) is indistinguishable from a random
string concatenated to `(n). We use a hybrid argument (although a dierent one from
the line of reasoning in Lemma 8).
uq` (n), the output of SREP1 is constructed from a series of random and pseudorandom strings, which are joined together by concatenation and exclusive-or operations. If we exchange all the pseudo-random strings in the construction by random
string, then the result is also a random string. We dene a sequence of distributions,
using the same construction each time. The rst distribution is B`, the second uses
q
67
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
the same construction, but with one random string instead of a pseudo-random string,
the third uses two random strings and so on. The crux of the proof is showing that
each successive pair of distributions is indistinguishable (for a large enough n), and
since the number of distributions in the sequence is not too large B` and UNI ;` are
indistinguishable.
q
Lemma 10: Let TD : IN ?! IN and " : IN ?! [0; 1] be two functions. Let GEN
be a TD (n); "(n){pseudo random generator. Let t(n) = 8(1="(n))2 ln 16(n="(n)) and
let `ength(n) = n for every n 2 IN . Let
q
s
2
n
n :
4 1 + 2 ln 2 log (n)+1 ? 1
(n) =
2
log
ln 2
(n)
For every index function ` : IN ?! IN if the ensembles GEN`ength;` and UNI`ength;`
are (TD (n)?3tTGEN (n))=2t(n); 2n"(n){computationally indistinguishable then the ensembles B`; B` are TD (n); "(n){computationally indistinguishable from the ensemble
UNI ;`, where
q
TD (n) = 2Tt(n) ? (2(n) + 23 )TGEN ? 2(n)
"(n) = 2n (n)"(n):
D
We prove the claim explicitly only for B` (since the proof for B` is sym(n)
0
1
metric). We begin by regarding the distributions Bn;`
(n) ; Bn;`(n) ; : : : ; Bn;`(n) anew, and
change the notation slightly so that a q superscript (or subscript) is added to every
q
q
element of Bn;`
(n) (1 q (n)). Thus, Bn;`(n) is dened on strings of the form,
Proof:
uq` (n)k`(n) = sq1k : : : ksqi ?1 ksq;i 1ksqi +1k : : : ksqm kcw1q kcw2q kS qk`(n):
q
q
q
q
q
Recall that if iq 2 S q then cw1q = u`q??11 (n) GEN (sq;i 1; 1n ) and cw2q = w`q??11 (n) GEN (sq;i 2; 1n ), where sq;i 2 is a random seed independent of all the other entries in
uq` (n). If iq 62 S q then the values of cw1q and cw2q are exchanged.
For every n we inductively dene a sequence of distributions qn;` (n) (q = 0; 1; : : :)
on strings of length q (nq )+log n. The distribution 0n;`0 (n) is simply the distribution
UNI0(n0);`(n) (that is the uniform distribution on strings of length 0(n0) followed
by the binary representation of `(n)). The distribution qn;` (n) is dened over the
following strings:
q
q
q
q
q
q
q
q
q
q
q
q
q;1 q
q q
q
`q (n) k`(n) = s1 k : : : ksi?1 ksi ksi+1 k : : : ksmq kncw1 kncw2 kS k`(n):
68
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
All the elements are sampled as in the output of SREP1q (1n ; nq ; `q ; rq ), uq` (n), except
for ncw1q and ncw2q . If i 2 S q then ncw1q = `q??11 (n) GEN (s1i ; 1n ), but ncw2q is chosen
uniformly at random from f0; 1g ?1 (n ?1). If i 62 S q then ncw1q and ncw2q exchange
their values.
Observation 1: q` (n) = UNI (n );`(n) (the uniform distribution over f0; 1g (n )
concatenated to `(n)). This observation is easily proved by induction. If q = 0 it
follows from the denition. Assume that `q??11 (n) = UNI ?1(n ?1);`(n), and therefore
ncw1q is sampled uniformly at random. All the other elements of `q (n) are also sampled
uniformly at random and independently, and thus `q (n) is a random string.
Observation 2: there
is a sequence of q pseudo-random strings that are used in
q
the construction of u` (n), which were produced from seeds that are independent of
uq` (n). Thus GEN (sq;i 2; 1n ) is used in the construction of either cw1q or cw2q , while
u`q??11 (n) is used in the construction of the other correction word. In the construction
of u`q??11 (n) there is also such a pseudo-random string GEN (sqi ??11;2 ; 1n). The sequence
extends until GEN (s1i1;2; 1n ). The only dierence between `q (n) and uq` (n) is that the
sequence GEN (s1i1;2; 1n ); GEN (s2i2;2; 1n ); : : :; GEN (sq;i 2; 1n ) used in the construction of
uq` (n) is replaced by a sequence of random strings in the construction of `q (n).
We now dene a sequence of hybrid distributions, 0n;`(n); 1n;`(n); : : : ; n;`(n()n). Each
of them is on binary strings of length (n)(n) + log n (recall that by denition
(n)(n) = (n)). For every q = 0; : : : ; (n), a string hqn;`(q) drawn from qn;`(q) is
constructed in similar fashion to un;`(n()n) k`(n) except for one detail. The sequence
of pseudo-random strings GEN (s1i1;2; 1n ); GEN (s2i2;2; 1n ); : : :; GEN (si ((n));2; 1n ) used in
;2 n
the construction of un;`(n()n) is replaced by the sequence r1; : : : ; rq , GEN (sqi +1
+1 ; 1 ),
: : :,GEN (si ((n));2; 1n ). r1; : : : ; rq are chosen uniformly at random and independently.
For every j (1 j q) rj is of the same length as the string GEN (sj;i 2; 1n ). By this
denition we have that
n;`(n()n) = n;`(n()n)
= UNI ( )(n);`(n)
(n)
0n;`(n) = Bn;`
(n)
Assume towards a contradiction that there exists an algorithm M running in time
(TD (n) ? 3t(n)TGEN (n))=2t(n) ? 2(n) (n) ? 2(n)TGEN (n) that for every n in an innite sequence of natural numbers (n1; n2; : : :) distinguishes between the distributions
(n) and UNI
Bn;`
( )(n);`(n) with probability greater than 2n (n)"(n). We construct
(n)
0
an algorithm M that runs in time (TD (n) ? 3t(n)TGEN (n))=2t(n) and distinguishes
between GEN`ength(n);`(n) and UNI`ength(n);`(n) with probability greater than 2n"(n).
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
n
q
n
j
n
n
69
q
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
The algorithm M 0:
1. Given an input (1n ; z), choose uniformly at random an integer q 2 [(n)].
2. Construct a sequence of strings 1; : : :; (n) such that for every j 2 [(n)],
j 2 f0; 1g (n ). If j < q then j is chosen uniformly at random. Otherwise, if
j > q, j is constructed by choosing uniformly at random a seed sj;i 2 2 f0; 1g(n),
expanding it to n bits via an application of GEN , and taking the rst j (nj )
bits as the desired string. Finally, q is set to the rst q?1(nq?1) bits of z.
3. Construct a binary string x of length (n)(n)+log n using a similar construction
to that of u` ((n))(n)k`(n). The only dierence is that the sequence of pseudorandom strings GEN (s1i1;2; 1n ); : : :; GEN (si ((n));2; 1n ) used in the construction of
un;`(n()n) is replaced by the sequence 1; : : :; (n).
4. Execute M on input x and return its answer as the output of M 0.
Analysis: the running time of M 0 is comprised of the construction of x and the
execution of M on x (the contribution of the other elements of M is negligible).
The construction of
x takes no more time than the construction of a pair of succinct
representations, u`((nn)) = SREP1 (1n; `; r) and w`((nn)) = SREP2 (1n ; `; r). By Lemma 7
that step takes no more than 2(n)(n) + 2(n)TGEN (n). Thus the running time of
M 0 is (TD (n) ? 3tTGEN (n))=2t(n).
the rest of the analysis is a standard hybrid argument. From the description of
the algorithm we obtain that:
X
(n)
Pr[M 0(GEN`ength(n);`(n);1 ) = 1] = (1n) Pr[M (qn;`(n); 1n ) = 1]
q=1
and
(X
n)?1
Pr[M 0(UNI`ength(n);`(n) ; 1n ) = 1] = (1n)
Pr[M (qn;`(n); 1n ) = 1]:
j
j
j
n
n
n
Thus,
q=0
0 (GEN`ength(n);`(n) ; 1n ) = 1] ? Pr[M 0(UNI`ength(n);`(n) ; 1n ) = 1] =
Pr[M
1 Pr[M ((n) ; 1n ) = 1] ? Pr[M (0
n ) = 1]
;
1
=
n;`(n)
n;`(n)
(n) (n) n
1 n
(n) Pr[M (Bn;`(n) ; 1 ) = 1] ? Pr[M (UNI ( ) (n);`(n) ; 1 ) = 1]
2n"(n)
n
70
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Lemma 11: Let TD : IN ?! IN and " : IN ?! [0; 1] be two functions. Let GEN
be a TD (n); "(n){pseudo random generator. Let t(n) = 8(1="(n))2 ln 16(n="(n)), let
`ength(n) = n for every n 2 IN and let ` : IN ?! IN be an index function. Then,
the ensemble GEN`ength;` is (TD(n) ? 3t(n)TGEN (n))=2t(n); 2n"(n){computationally
indistinguishable from UNI`ength;` .
Proof: This lemma is an analog of Lemma 9 and to prove it we use the same
arguments as in the proof to Lemma 9. That is, we assume towards a contradiction
that there exists an algorithm D that runs in time (TD (n) ? 3t(n)TGEN (n))=2t(n) and
for every n in an innite sequence of natural numbers n1; n2; : : : distinguishes between
the distributions GEN`ength(n);`(n) and UNI`ength(n);`(n) with probability 2n"(n). Given
this assumption the same algorithm D0 which we construct in the proof to Lemma 9
can be used to distinguish between the output of GEN and the uniform distribution.
The analysis of the algorithm is similar to what we showed in the proof of Lemma 9
except for exchanging 1=nc by "(n).
In Lemma 9 it suced that D0 runs in polynomial time. Here we analyze its
running time more thoroughly. The most time consuming operations undertaken by
D0 are the expansion of t(n) = 8(1="(n))2 ln 16(n="(n)) seeds into strings of length
n (t(n) TGEN (n) time) and executing D on 2t(n) inputs (TD (n) ? 3t(n)TGEN (n))
time). Sampling t(n) seeds and t(n) random strings takes less than 2t(n) TGEN (n)
time. Thus7, the total running time of D0 is TD (n).
If we analyze D0 using the Cherno bound (as in Lemma 9) we see that D0 distinguishes between the uniform and pseudo-random distributions with 1=(2n) the probability that D distinguishes between GEN`ength(n);`(n) and UNI`ength(n);`(n) , namely
"(n). Therefore we have arrived at a contradiction and the lemma is correct.
Corollary 11: Let TD : IN ?! IN and " : IN ?! [0; 1] be two functions. Let GEN
be a TD (n); "(n){pseudo random generator. For any index function ` : IN ?! IN the
ensembles Aq`; Aq` are TD (n); "(n){pseudo random, where
TD(n) = TD2(tn) ? (2(n) + 23 )TGEN (n) ? 2(n)
"(n) = 2n (n)"(n):
Corollary 12: Let TD : IN ?! IN and " : IN ?! [0; 1] be two functions. Let GEN
be a non-uniform TD(n); "(n){pseudo random generator. For any index function ` :
IN ?! IN the ensembles Aq` ; Aq` are non-uniformly TD(n); "(n){pseudo random, where
TD (n) = TD (n) ? 2(n)TGEN (n) ? 2(n) ? log n
"(n) = (n)"(n):
We disregard the other components of D0 whose total contribution to its running time is
negligible.
7
71
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Proof:
In our context, the important dierence between a uniform and non-uniform approach is that in the second case `(n) can be "hard wired" as part of any family of
circuits fDn gn2IN , which can't necessarily be done in a uniform algorithm D. Thus
`(n) need not be sampled (as is done in the proofs to Lemma 9 and Lemma 11).
If GEN is a non-uniform TD (n); "(n){pseudo random generator then the probability ensembles UNI`ength;` and GEN`ength;` are TD(n) ? log n; "(n){computationally
n o
indistinguishable. The reason is that otherwise there exists a family D n n2IN , such
that for every n in an innite sequence D n is of size TD(n) ? log n and it distinguishes
between UNI`ength;` and GEN`ength;` with probability at least "(n). We construct a
family of circuits fDngn2IN by attaching `(n) to Dn and having Dn on input z 2 f0; 1gn
invoke Dn on z`ength(n) k`(n). Dn is of size TD (n) and distinguishes between GENn
and UNIn with probability at least "(n) which is a contradiction.
Using the same arguments as in the proof of Lemma 10 (only with a tighter bound
on the indistinguishability of UNI`ength;` and GEN`ength;` ) veries the statement of
the corollary.
3.4 Integrating the proofs
In this section we show how the various lemmata prove Theorems 2, 3 and 4.
Proof: [Theorem 2]: We dene the correlated generator SREP1; SREP2 ; G to be
SREP1q ; SREP2q ; Gq , where q's value is determined by the desired size of the succinct
representations. Therefore nq is equal to the nal output length n, nq?1 = nq =mq and
so on. mq is dened by Equation 3.4.
The correlation property of SREP1 ; SREP2; G is proved by Lemma 2. The standard index indistinguishability property of SREP1 ; SREP2 ; G is proved by Corollary 10. The length of the succinct representations is determined by q. Corollary
5 states that for any constant q the length of the outputs of SREP1q (1n ; nq ; `q ; rq )
and SREP2q (1n ; nq ; `q ; rq ) is O((n)n1=(q+1)). By Denition 9 GEN expands seeds of
length (n) = n1=(2c`en) to strings of length n. Setting q = 2c`en ? 1 we have that the
length of the succinct representations is O(n1=c`en ). Finally, the length of the random
input required by SREP1 and SREP2 is given by Corollary 7.
Proof: [Theorem 3:] We dene the correlated generator SREP1; SREP2 ; G to be
SREP1(n) , SREP2(n) , G(n), where (n) is as dened in Theorem 3.
The correlation property of SREP1 ; SREP2 ; G is proved by Lemma 2. The
TD (n); "(n){index indistinguishability property of SREP1; SREP2 ; G is proved by
Corollary 11. The length of the succinct representations is given by Corollary 6.
The size of the random input that the succinct representations require is given by
72
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Corollary 8.
Proof: [Theorem 4:] We dene the correlated generator SREP1; SREP2 ; G to be
SREP1(n) , SREP2(n) , G(n), where (n) is as dened in Theorem 3.
The correlation property of SREP1 ; SREP2 ; G is proved by Lemma 2. The
non-uniform TD (n); "(n){index indistinguishability property of SREP1 ; SREP2; G is
proved by Corollary 11. The length of the succinct representations is given by Corollary 12. The size of the random input that the succinct representations require is
given by Corollary 8.
3.5 PIR schemes
In this section we use the ideas we developed earlier in order to construct two server
computationally private information retrieval schemes. We begin the section with a
helpful Lemma and then show two CPIR schemes in the following subsections. One is
a direct application of the correlated pseudo-random generator we constructed, while
the other is a slightly dierent variation, which has some advantages over the rst.
Lemma 12 : Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let P be a one round,
2-server PIR scheme such that
All queries are of the same length q(n) (as in Denition 11).
For every j (j 2 f1; 2g) and for every index function i : IN ?! IN the probability ensemble Dj;i = fDj (n; i(n))gn2IN is TD (n); "(n){pseudo random.
Then P maintains TD (n); 2"(n){computational privacy.
Proof: Let i1; i2 : IN ?! IN be two index functions and let j 2 f1; 2g. Consider
the three probability
n
oensembles: Dj;i1 = fDj (n; i1(n))gn2IN , Dj;i2 = fDj (n; i2(qn(n))) gn2IN
and Uni = Uniq(n) n2IN , where Uniq(n) is distributed uniformly over f0; 1g .
By assumption Dj;i1 and Uni are TD (n); "(n){computationally indistinguishable
and so are Dj;i2 and Uni. By Lemma 6 the two distributions Dj;i1 and Dj;i2 are TD(n),
2"(n) computationally indistinguishable. The lemma follows from Denition 11.
3.5.1 A direct scheme
In this subsection we use the correlated generator (SREP1 ; SREP2; G) in order to
obtain an ecient PIR scheme. The scheme is similar to the simple linear scheme of
[25]. The only dierence is that the long fully random queries are replaced by short
succinct representations. The succinct representations are expanded at the server
73
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
end by the generator described in section 3.1. The notation (n); (n) we use in the
following theorem is dened in Section 3.1.
Theorem 13: If there exists a pseudo-random generator GEN , then for every con-
stant c > 1 there exists a two server, one round, computationally private information
retrieval scheme P in which the user sends each server O(n1=c ) bits of communication
and receives one bit in return.
Proof:
The Scheme: Suppose the user wishes to retrieve x`, the value of the `-th bit of the
database.
Algorithm Q: The user constructs two queries, u =4 SREP1(1n ; `; r) and w =4
SREP2 (1n; `; r), where the correlated generator SREP1 ; SREP2; G is as constructed in Theorem 2 (u = SREP1q (1n ; nq ; `q ; rq), w = SREP2q (1n ; nq ; `q ; rq )
and q = 2c ? 1). u is sent to DB1 and w is sent to DB2.
Algorithm A: DB1 computes the n bit string G(u; 1n ) =4 U1; : : :; Un and DB2 com4
putes the n bit string G(w; 1n ) =
W1; : : : ; Wn. DB1 computes a bit b1, and DB2
computes a bit b2:
b1 =4
n
M
j;xj =1
Uj b2 =4
Finally, b1; b2 are sent to U .
Algorithm R: U calculates x` = b1 b2.
n
M
j;xj =1
Wj
We can deduce all the elements of the proof from the statement of Theorem 2.
By that theorem we have:
By the choice of u and w: G(u; 1n )G(w; 1n ) = e`(n) and therefore, b1 b2 = x`.
The computational privacy of the scheme is implied by Theorem 2 together with
Lemma 12,
The number of bits that U sends to each server is O(n1=c).
Corollary 14 : Let TD : IN ?! IN , and let " : IN ?! [0; 1]. Let GEN be a
TD (n); "(n){pseudo random generator with stretch (n) to n. Let (n), (n) and
t(n) be dened as in Theorem 3. Then, there exists a one round, TD(n); "(n){
computationally private information retrieval scheme P such that
74
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
TD (n) = 2Tt(n) ? (2(n) + 23 )TGEN ? 2(n) and "(n) = 4n (n)"(n):
The number of bits that U sends to each server is (n) (n) (n) 2(n).
D
If GEN is a non-uniform TD (n); "(n){pseudo random generator then
TD(n) = TD ? 2(n)TGEN ? 2(n) ? log n and " = 2(n)"(n):
Proof: Replace the results of the standard version (Theorem 2) by the results of
the quantitative versions (Theorem 3 or Theorem 4).
3.5.2 A Generic Transformation
In this subsection we describe a construction of a wide class of CPIR schemes. We
begin with a scheme that has the properties required in subsection 2.38 and furthermore the query ensembles, D1;` and D2;`, are pseudo-random for any index function
`(n). We construct a dierent P IR(n) scheme that has these same properties. We
denote the original scheme by P and the new scheme we construct by . P and dier in three major aspects: communication complexity, computational complexity
and degree of privacy.
The construction of from P is very similar to the construction of the intermediate generator (SREP1q ; SREP2q ; Gq ) from the generator (SREP1q?1 ; SREP2q?1 ; Gq?1)
in section 3.1. The query a user sends to each server in resembles a succinct representation for Gq . The main dierence between the generator and the scheme is the
way this succinct representation/query is interpreted. Gq interprets the succinct representation as mq succinct representations for the generator Gq?1 (denoted in section
3.1 by v1; : : :; vm ). In a server interprets the query it receives as m queries in the
original scheme P .
q
Notation 11: We denote by P (n); (n) the communication complexity of the user
in P and respectively. We denote by P (n); (n) the communication complexity
of a server in P and respectively. We use QP ; AP and RP to denote the Q; A and
R algorithms of the one round scheme P (see Denition 1). Similarly we use Q; A
and R to denote the Q; A and R algorithms of the one round scheme . We denote
by TP (n) the computational complexity of QP (1n ; i; r).
The following theorem describes the properties of in terms of the properties of
a pseudo-random generator GEN and of P itself. For expository purposes we only
8 That is, the scheme has to be a 2-server, one round T (n); "(n){computational PIR scheme in
D
which symmetry of the servers is maintained and the servers answer any query of the right length.
75
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
concentrate on the case in which GEN is a non-uniform pseudo random generator
and P is a non-uniform CPIR scheme. The construction we present works well enough
in the other cases (for instance GEN being a standard pseudo-random generator and
P being a standard CPIR scheme). However, in such a case the privacy property of
has to be changed from what is stated in Theorem 15. The revised statement can
be easily deduced by utilizing the tools we developed in Section 3.3. For instance, in
the example we quoted, will be a standard CPIR scheme.
Theorem 15: Let TD1 ; TD2 : IN ?! IN , and let ; " : IN ?! [0; 1]. Let GEN be a
non-uniform TD1 (n); (n){pseudo random generator with stretch (n) to n. Let P be
a one round, 2-server PIR scheme such that
All queries are of the same length P (n), symmetry of the databases is maintained and any query of the right length is answered.
For every j (j 2 f1; 2g) and for every index function i : IN ?! IN the probability ensemble Dj;i = fDj (n; i(n))gn2IN is non-uniformly TD2 (n); "(n){pseudo
random.
Dene TD : IN ?! IN by TD (n) = min fTD1 (n); TD2 (n)g for every n 2 IN . Then, for
every m 2 [n] there exists a one round, two server PIR scheme m such that
All queries are of the same length (n), symmetry of the databases is maintained and any query of the right length is answered.
For every j (j 2 f1; 2g) and for every index function i : IN ?! IN the probability ensemble Dj;i = fDj (n; i(n))gn2IN is non-uniformly TD (n) ? (TP (n=m) +
TGEN (n) + (n) + log n); + "{pseudo random.
and the communication complexity is as follows:
n)
(n) = m((n) + 1) + 2P ( m
n)
(n) = 2P ( m
We assume from here on that the value of m is set and denote m simply as .
Application: Construction of computationally private schemes with more exibility
than the direct scheme of Subsection 3.5.1 in terms of the relation between privacy
and communication complexity. For instance, suppose that the user requires a high
level of security. He is willing to use a scheme that is T (n); 2"(n){computationally private (for some reasonable T ), but not a scheme that is T 0(n); 4"(n){computationally
private. Given a TD (n); "(n){pseudo random generator GEN and using a similar
76
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
construction to that of Subsection 3.5.1 we can obtain a scheme with communication
complexity O(((n)n)1=2) that is TD ? 2TGEN ? 2(n) ? log n; 2"(n){computationally
private. The way we achieve this is by changing the correlated generator invoked
in the scheme from SREP1 ; SREP2; G to SREP11 ; SREP21; G1. However, using the
generic transformation we introduce in Theorem 15 with the scheme B2 (see subsection 2.4) as P we can obtain a TD (n) ? O(n1=4); 2"(n){computationally private scheme
with improved communication complexity: O((n)1=2n1=4), by choosing m = n1=2.
Proof: We begin with a high level description of . The data x 2 f0; 1gn is
viewed as an m dn=me binary matrix. Suppose U wishes to retrieve the `-th item,
which is in the (i; j )-th position of the two dimensional matrix. Each of the m rows
of the matrix is regarded as a separate database of dn=me bits. We denote the rows
by x(1); : : :; x(m).
The user and servers execute the scheme P m times in parallel. In each execution
the database is viewed as one of the rows of the matrix. U sends a query u to DB1
and a query w to DB2. DB1 interprets u as m queries for P , v1,: : :,vi?1,vi1, vi+1,: : :,vm,
and DB2 interprets w as m queries v1,: : :,vi?1,vi2,vi+1,: : :,vm.
For every h = 1; : : : ; m each server executes its part of the scheme P where
the database is x(h) and the query is the h-th of the interpreted queries. DB1
computes m answers o1,: : :, oi?1,o1i ,oi+1,: : :,om, and DB2 computes m answers o1 ,: : :,
oi?1 ,o2i ,oi+1 ,: : :,om .
The m answers computed by DB1 are identical to the answers computed by DB2
for every row of the matrix except the i-th row, because the queries they received were
identical and P maintains symmetry of the servers. The answers o1i and o2i together
allow U to retrieve the desired bit. In the last part of the scheme the servers send U
two short strings that allow the user to learn both o1i and o2i .
The Scheme :
Algorithm Q: U chooses uniformly a subset S [m]. The user also chooses uniformly at random and independently m + 1 seeds for the generator GEN ,
s1; : : :; si?1; s1i ; s2i ; si+1; : : :; sm 2 f0; 1g(n)
U executes the algorithm QP (1n=m ; j; r). That is, invokes the query algorithm
of P where the length of the database is n=m and the desired item is the j -th
bit. The output of QP is two legal queries of the scheme P , uj and wj . The
user computes two correction words as follows:
(
(s1i ; 1n) if i 2 S
cw1 = uwj GEN
=S
j GEN (s2i ; 1n ) if i 2
(
GEN (s2i ; 1n ) if i 2 S
cw2 = wu j GEN
(s1i ; 1n) if i 2= S
j
77
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
U constructs the two queries of as follows:
u = s1k : : : ksi?1ks1i ksi+1k : : : ksmkcw1kcw2kS
w = s1k : : : ksi?1ks2i ksi+1k : : : ksmkcw1kcw2kS i
u is sent to DB1 and w to DB2.
Algorithm A: Each server receives a string of the form s1k : : : jksmkcw1kcw2kT .
Each server computes vh, h = 1; : : : ; m as follows:
(
cw1 GEN (sh ; 1n) if h 2 T
vh = cw
=T
2 GEN (sh ; 1n ) if h 2
Each server executes AP m times in parallel. In execution number h, the
database is the row x(h), and the user's query is vh. Every vh is a legitimate query because of the assumption on P . Thus, each server produces an
answer for each execution of AP . DB1 produces o1; : : : ; oi?1,o1i ,oi+1,: : :,om and
DB2 produces o1; : : :; oi?1 ,o2i ,oi+1 ,: : :,om . The answers are identical for any invocation but the i-th because P satises the symmetry of servers requirement.
DB1 computes
X
X
A1 =4 oh; B1 =4 oh
h2S
and DB2 computes
A2 =4
h2S
X
4 X
o
;
B
=
oh :
h
2
L
L
h2S fig
h2S fig
All 4 sums A1; B1; A2; B2 are sent to U .
Algorithm R: U can now compute the desired replies from the scheme P , namely
o1i and o2i . If i 2 S , then o1i = A1 A2 and o2i = B1 B2. If i 62 S , then
o1i = B1 B2 while o2i = A1 A2. Given these two replies, U can utilize RP to
retrieve the j -th bit of the i-th row, i.e x` . U executes RP (1n=m ; j; r; o1i ; o2i ) and
the result is x`.
In order to prove the theorem we show that the retrieved bit is indeed x`, that the
privacy requirement is satised and that the communication complexity is as stated.
Correct retrieval: The proof that the retrieved bit indeed is x` is based on the
fact that P is a one round P IR(n) scheme. Suppose DB1 receives uj , DB2 receives
wj and the database is x(i). Then, by correctness property of P we have
RP (1n=m ; j; r; AP (1; uj ; x(i)); AP (2; wj ; x(i))) = x(i)j = x`:
78
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
The queries each server uses in the m executions of P are v1; : : :; vm, which are
computed in algorithm A. Consider some row h that is dierent from i. Since h 6= i,
either h 2 T for both servers, or h 2= T for both servers. Thus, both servers compute
the same \interpreted query" vh, which is:
8 u GEN (s1 ; 1n) GEN (s ; 1n ) h 2 S; i 2 S
>
>
< wjj GEN (si2i ; 1n) GEN (shh ; 1n ) h 2= S; i 2 S
vh = > w GEN (s2 ; 1n) GEN (s ; 1n ) h 2 S; i 2= S
>
: ujj GEN (s1i; 1n) GEN (shh; 1n ) h 2= S; i 2= S
i
For the i-th row (h = i) the query that DB1 produces is:
(
1 GEN (s1i ; 1n ) = uj GEN (s1i ; 1n ) GEN (s1i ; 1n ) = uj ; i 2 S
vi = cw
cw2 GEN (s1i ; 1n ) = uj GEN (s1i ; 1n ) GEN (s1i ; 1n ) = uj ; i 2= S
A similar argument shows that the i-th interpreted query of DB2 is wj regardless of
whether i 2 S or not.
In the next phase of the scheme each server uses the interpreted query and executes
AP . For each row h, h 6= i, both servers have the same query vh. By assumption that
query is valid in the scheme P and the answers of both servers on an identical query
are the same. Therefore, for any row h 6= i both servers produce the same answer oh.
The answers of DB1; DB2 when executing P on the x(i) are o1i ; o2i respectively, which
together allow the user to retrieve x` .
The messages that the servers send the user are:
( 1 P
A1 = oPi o h2Snfig oh ii 22= SS
h2S h
(P
i2S
B1 = o1 h2SPoh
o
i
2= S
i
h2S nfig h
(P
i2S
nfig oh
A2 = o2 h2SP
=S
h2S oh i 2
i
( 2 P
oi h2S oh i 2 S
B2 = P
i 2= S
h2S nfig oh
it follows from the above that U obtains the desired answers o1i ; o2i by the two exclusive
or operations A1 A2 and B1 B2. Thus, that the user can retrieve x` by invoking
RP .
Privacy: we prove the privacy of the scheme using similar arguments to those we
employed in Section 3.3. Therefore, we give only an outline of the proof. We prove
79
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
explicitly the privacy with respect to DB1 , since the privacy of DB2 can be argued in
a symmetric manner.
The only message DB1 receives is the string un;`(n) = s1k : : : ksmkcw1kcw2kS . All
the elements except cw1 and cw2 are chosen uniformly at random and independently.
Let ` : IN ?! IN be an index function. We check how dicult it is to distinguish un;`(n) from hn;`(n) = s1k : : : ksmkcw1kr2kS , and to distinguish hn;` (n) from
fn;`(n) = s1k : : : ksmkr1kr2kS , where r1 and r2 are chosen uniformly at random and
independently. We dene 3 distributions: D1(n; `(n)) is induced by the strings
un;`(n), Hn;`(n) is induced by the strings gn;`(n) and Uni(n) is the uniform distribution on f0; 1g(n), which
n isoinduced by fn;`(n) . We dene 3 ensembles: D1;` =
fD1(n; `(n)gn2IN , H` = Hn;`(n) n2IN and Uni the uniform ensemble.
We assume that i 2 S (the case i 62 S is symmetric). Starting with a string that is
drawn from either the pseudo random distribution (i.e. GEN (s2i ; 1n)) or the uniform
distribution (i.e. r2) we can construct a circuit which transforms it to a string that is
drawn from either D1(n; `(n)) or Hn;`(n) respectively. The size of the circuit needed
for this transformation is bounded by TP (n=m) + TGEN (n) + (n) + log n (as in
Section 3.3 the log n summand is derived from the value `(n) which is wired into
the circuit). Therefore, by the reasoning of Lemma 5 D1;` and H` are non-uniformly
TD (n) ? (TP (n=m) + TGEN (n) + (n) + log n); (n){indistinguishable.
Starting with a string that is drawn from either the query distribution of P (i.e.
uj ) or the uniform distribution (i.e. r1) we can construct a circuit which transforms it to a string that is drawn from either Hn;`(n) or Uni(n) respectively. The
time needed for this transformation is bounded by TGEN (n) + (n). Therefore, by
Lemma 5 H`(n) ; Uni(n) are non-uniformly TD (n) ? (TGEN (n)+ (n)+log n); "(n){
indistinguishable.
Hence, by the same reasoning as Lemma 6 we have that D1;` is TD (n)?(TP (n=m)+
TGEN (n) + (n)); (n) + "(n){pseudo random. By Lemma 12 we have that is
TD (n) ? (TP (n=m) + TGEN (n) + (n)); 2((n) + "(n)){computationally private.
Communication complexity: Each server receives m seeds of length (n), one
set of length m, and two strings that have the same length as a query in the scheme
P where the database is of length n=m. The sum is m((n) + 1) + 2P (n=m). Each
server answers with two strings which are each the same length as an answer in the
scheme P where the database is of length n=m, or 2P (n=m)..
3.6 Concluding Remarks and open problems
Following the introduction of computational PIR in the rst version of our work [22],
Kushilevitz and Ostrovsky [58] realized that the (n) lower bound on single server
80
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
PIR schemes, proved in [25] for the information theoretic setting, does not hold for
the computational setting. They constructed an elegant single server computational
PIR scheme, based on the intractability assumption of quadratic residuosity. More
ecient single server CPIR schemes, based on dierent intractability assumption,
were later presented by Mann [60], Stern [72] and by Cachin, Micali and Stadler [17].
All these schemes are computationally private and rely on specic number theoretic
intractability assumptions that in particular imply the existence of trap door one way
function. Indeed it has been recently shown in [10, 33] that single server CPIR schemes
with sub-linear communication imply the existence of trap door one way functions.
On the other hand, our two server construction relies on the weaker assumption that
one way functions exist.
Given a generator that expands (n) bits to n bits, the computational complexity of our correlated pseudo random
(and the resulting two server CPIR
plog(n=kgenerator
)
schemes) is (approximately) 2
. It is an interesting open question to nd a
construction which brings this complexity down to a polynomial in k + log n.
81
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Chapter 4
Private information retrieval by
keywords
In this chapter we introduce the problem of private information retrieval by keywords
and present several solutions. Our solutions are in the form of reductions from PrivatE
information Retrieval by KeYwords (PERKY) schemes to PIR schemes. We also show
reductions between the symmetric problems (SPIR to SPERKY).
We begin with a number of denitions. Among them are denitions of PIR and
CPIR, which are a generalized version of the previous denitions (in Section 2). In
this chapter we view the database x = x1; : : : ; xn as n blocks of ` bits each. That
is, for every i 2 [n] we have xi 2 f0; 1g` . Previously the block size was 1 in all our
discussions.
4.1 Denitions and Notation
4.1.1 The PIR model
Let DB1 ; : : :; DBk be k servers, each holding a copy of a database x = x1 : : : xn where
for every i 2 [n], xi is a block of ` bits. A user, denoted by U , wishes to retrieve one
of the bits xi (1 i n). The servers can communicate with the user, but not with
each other.
The following is a denition of private information retrieval in a somewhat limited
setting, which suces for our purposes. We dene a scheme that achieves informationtheoretic privacy, is executed in one round of communication and assures the user of
correct retrieval with probability 1.
Denition 21: P is a one round, k-server PIR scheme maintaining informationtheoretic privacy if it is a trio of algorithms:
82
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Q(n; `; i; r): The query algorithm, receives as input n, the number of data blocks, `,
the length of each block, i, the retrieval index and r a random input. Its output
is a k-tuple of queries (q1; : : : ; qk ).
A(j; qj ; x): The answer algorithm, receives as input a server number j (j 2 [k]), a
query qj and the database x. Its output is an answer aj .
R(n; `; i; r; a1; : : :; ak ): The reconstruction algorithm, receives as input the database
length, retrieval index, random input and the k answers. Its output is a single
block of ` bits.
The user is dened by the two algorithms Q and R. The j -th server is dened by the
algorithm Aj which is A(j; ; ) (the algorithm A where the rst argument is restricted
to j ).
P involves three steps: U uses Q to generate k queries and send one query to each
server (qj to DBj ). Each server replies with an answer (DBj uses Aj to generate aj ).
Finally U uses R to reconstruct xi from the answers.
P must have the following two properties:
Correctness: For all n; ` 2 IN; x 2 f0; 1gn ; i 2 [n] and r, if Q(n; `; i; r) outputs
(q1; : : : ; qk) then:
R(n; `; i; r; A1(q1; x); : : :; Ak(qk ; x)) = xi:
Privacy: Let Dj (n; `; i) denote the distribution of the query algorithm's output re-
stricted to its j -th entry as induced by the random choices of r (in other words,
the distribution of the qj s). Then, for every i1; i2 and j , where 1 i1 i2 n,
and j 2 [k] we require that
Dj (n; `; i1) = D(n; `; i2 ):
4.1.2 The CPIR model
We begin by dening computational indistinguishability of two probability ensembles.
Denition 22: A probability ensemble Y is an enumerated sequence of probability
distributions Y = fYn;` gn;`2IN , where each distribution Yn;` ranges over some domain
Dn;` .
Denition 23 : Let `ength : IN IN ?! IN be a monotonously non-decreasing
function satisfying `ength(n; `) n`. Let TD : IN ?! IN , and " : IN ?! [0; 1]. Let
83
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Y = fYn;` gn;`2IN and Z = fZn;` gn;`2IN be two probability ensembles such that the domain of both Yn;` and Zn;` is f0; 1g`ength(n;`) . We say that Y and Z are TD(n`); "(n`){
computationally indistinguishable if the following holds: For every distinguisher D,
a probabilistic algorithm whose running time is bounded by TD (), there is a c0 such
that for all n; ` where n` c0
Pr[D(Yn;` ; 1n` ) = 1] ? Pr[D(Zn;` ; 1n` ) = 1] < "(n`)
where the probability is taken over the distributions Yn;` , Zn;` , and over the coin tosses
of D.
Denition 24 : Let `ength : IN IN ?! IN be a monotonously non-decreasing
function satisfying `ength(n; `) n`. Let T be a family of functions of the form
TD : IN ?! IN , and let E be a family of functions of the form " : IN ?! [0; 1].
Let Y = fYn;` gn;`2IN and Z = fZn;` gn;`2IN be two probability ensembles such that
the domain of both Yn;` and Zn;` is f0; 1g`ength(n;`) . We say that Y and Z are T ; E {
computationally indistinguishable if for every TD 2 T and " 2 E , Y and Z are
TD (n`); "(n`){computationally indistinguishable. We say that Y and Z are computationally indistinguishable if they are T ; E {computationally indistinguishable, where
E = f(n`)?c j c 2 IN g and T = fn`)c j c 2 IN g:
We use the same notation as in denition 21. In particular, we denote by Dj (n; `; i)
the distribution of the j -th query (qj ) as induced by the random choices of r.
Denition 25 : Let TD : IN ?! IN , and " : IN ?! [0; 1]. P is a one round,
k-server PIR scheme maintaining TD (n`); "(n`){computational privacy if it is a PIR
scheme according to Denition 21, except for the privacy condition which is changed
to:
Privacy: Let n; ` 2 IN and let qlen : IN IN ?! IN be the query length function.
For every i 2 [n] and for every j 2 [k] the j -th output of the query generator
Q(n; `; i; r) (denoted qj ) is of the same length qlen(n; `).
For every j (j 2 [k]) and for every two index functions i1; i2 : IN IN ?!
IN the two probability ensembles Dj;i1 = fDj (n; `; i1(n; `))gn;`2IN and Dj;i2 =
fDj (n; `; i2(n; `))gn;`2IN are TD(n`); "(n`){computationally indistinguishable (in
this case `ength(n; `) = qlen(n; `)).
We say that P is a one round, k-server PIR scheme maintaining computational privacy (in short a computationally private information retrieval scheme) if for every j ,
i1() and i2() as above, the ensembles Dj;i1 , Dj;i2 are computationally indistinguishable.
84
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
4.1.3 Search data structure
In this subsection we dene in a formal way the intuitive notion of a data structure
that supports search operations on strings. Such a data structure holds n strings
fs1; : : :; sng and (eciently) answers queries of the type: is w 2 fs1; : : :; sng.
Informally we model the data structure as t words of length m bits each, which are
stored in the memory. The search algorithm A begins at a known address, the root,
and at each step performs a computation which determines another memory address
containing one of the t words. After at most d such steps the algorithm returns an
answer, which is 1 i w 2 fs1; : : :; sn g. Our denition is general enough that just
about any search data structure we know of can be represented in those terms.
Denition 26: Let t; m; d : IN IN ?! IN be three functions. We say that DS is a
(t; m; d){search data structure if it is a trio (M (S ); root; B ) which for every n; ` 2 IN
4
and every S =
fs1; : : :; sng ; s1; : : : ; sn 2 f0; 1g` satises the following requirements:
Structure: M (S ) is a set of t(n; `) binary words, u1; : : :; ut(n;`) 2 f0; 1gm(n;`) .
Search algorithm: B (w; u; aux) is an algorithm that receives nas input w 2 fo0; 1g` ,
the query string, u the current word which is in the set u1; : : :; ut(n;`) , and
some auxiliary information aux modeling the history of the search. Its output is
either a pair (add0 ; aux0): an address of a new current word (i.e add0 2 [t(n; `)])
and auxiliary information (which are used in the next invocation of B ) or a bit
ans(w; S ) 2 f0; 1g.
Search length: Let B q (w; u; aux) denote q consecutive executions of B beginning
with input (w; u; aux). Two consecutive executions mean that if (add0 ; aux0) is
the output of the rst execution, then (w; uadd0 ; aux0) is the input of the second.
q consecutive executions are the natural generalization of this notion. Then,
B q (w; uroot; init) (where init is some appropriate initial information) returns
ans(w; S ) for some q; q d(n; `).
Correctness: ans(w; S ) = 1 if and only if w 2 S .
Notation 12: We denote by Tq (n; `) the set of all addresses in the range 1; : : : ; t(n; `)
that can appear as the rst element in the output of the q-th invocation of B , for some
structure M and query string w. Formally,
Tq (n; `) =4 fadd 2 [t(n; `)] : 9w; 9S; 9aux; s:t: Aq (w; uroot; init) = (add; aux)g :
4
We denote the cardinality of Tq(n; `) by tq (n; `) =
jTq (n; `)j. We denote by MAP
an algorithm that maps an address in t(n; `) to an address in tq (n; `). MAP takes
85
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
as input (n; `; q; add), where add 2 [t(n; `)] and outputs addq . If add 2 Tq (n; `) then
addq 2 [tq(n; `)]. Otherwise addq = 0.
In some search data structures, two dierent words ui and uj may have two different lengths. While it is always possible to choose m(n; `) as the maximum of these
lengths, we may nd the following notation to be convenient.
Notation 13: Let mq (n; `) denote the maximum length of all the words in Tq (n; `).
Let m0(n; `) denote the length of the root word.
4.2 Private Retrieval of Blocks
In [25] both PIR(n; k) and PIR(`; n; k) were introduced. PIR(`; n; k) schemes were
presented as generic transformations from PIR(n; k) schemes. In [38] SPIR(n; k) was
formulated, but was not generalized to SPIR(`; n; k). Most of the research to date
focused on PIR(n; k) schemes, while PIR(`; n; k) protocols received little attention.
Since in the PERKY context, the more general PIR(`; n; k) and SPIR(`; n; k) problems are important we devote a section to the two problems.
This section contains two parts. In the rst we show how a construction of a
PIR(`; n; k) scheme from a PIR(n; k) scheme that was presented in [25] can be generalized to construct a SPIR(`; n; k) scheme from a SPIR(n; k) scheme. In the second
part we give a brief overview of the known PIR(`; n; k) schemes.
4.2.1 SPIR(
`; n; k
)
We are interested in a scheme in which the user retrieves a block of ` bits (` 1) out
of n such blocks that the servers hold. We require that in this type of scheme the user
obtain a single block of ` bits and nothing else. In particular, the user cannot obtain
bits from dierent blocks. The \naive" solution of executing a SPIR(n; k) scheme `
times does not work, because the user can retrieve bits from dierent blocks.
Lemma 13 : Let P be a one round, k server SPIR(n; k) scheme (for dishonest
users), maintaining information-theoretic privacy, with user communication complexity P (n; k) and server communication complexity P (n; k). Then, there exists a one
round, SPIR(`; n; k) scheme (for dishonest users), P 0, which maintains informationtheoretic privacy and has user communication complexity P 0 (`; n; k) = P (n; k) and
server communication complexity P 0 (`; n; k) = `P (n; k).
Proof: The protocol is essentially the same as the general solution to PIR(`; n; k)
presented in [25].
The scheme P 0:
86
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Algorithm QP 0 : Assume that U wishes to retrieve the i-th block of bits. U executes
the query algorithm QP (n; `; i; ru) and sends the j -th entry of the output to
DBj .
Algorithm AP 0 : The servers regard the database as an ` n matrix, where each
column is one of the blocks of bits. Denote the q-th row of the matrix by
x(q). The scheme P is invoked by the servers ` times so that each invocation is
independent of the others. In order to achieve that their source of randomness
must generate ` times as many random bits as are required in P . We assume
that the servers have a shared random string rs = rs1 ; : : : ; rs such that for every
q = 1; : : : ; ` the substring rs can be used as the shared random string in P .
Furthermore, the ` substrings are chosen independently. DBj executes
`
q
AP (j; qj ; x(1)krs1 ); : : : ; A(j; qj ; x(`)krs ):
`
Let the ` outputs be denoted by a1j ; : : :; a`j . DBj sends a1j ; : : : ; a`j to U .
Algorithm RP 0 : U executes RP (n; `; i; ru; aq1; : : :; aqk ) for q = 1; : : :; ` thereby retrieving the i-th block of bits.
Correctness and communication
complexity: It follows from the correctness
property of P that RP (n; `; i; ru; aq1; : : : ; aqk) returns the i-th bit of the row x(q), for
any q = 1; : : : ; `. Therefore, U gets the whole i-th column (block). It is also evident
from the construction that the communication complexity of the user is P (n; k) as
its queries are identical to those of P , and the communication complexity of each
server is `P (n; k).
Privacy: The privacy of the user in P 0 is identical to its privacy in P since it depends
only on the output distribution of the algorithm QP .
Data privacy follows from the data privacy property of P . Let U be some deterministic user (which can be modeled by q1; : : : ; qk, the k-tuple of queries it generates).
By the data privacy of P , for some i 2 [n] and for every y 2 f0; 1gn the answer distribution AP (y) is independent of y, given yi. Let x be the database in P 0, i.e. an
` n binary matrix. Let x0 be an ` n binary matrix such that the i-th column of
x is identical to that of x0. the answer distribution of P 0 on database x is AP 0 (x) =
AP (x(1)); : : :; AP (x(`)), and on x0 it is AP 0 (x0) = AP (x0(1)); : : : ; AP (x0(`)). For every q = 1; : : : ; ` we have AP (x(q)) = AP (x0(q)). Furthermore the ` sub-distributions
AP (x(1)); : : :; AP (x(`)) are independent, and so are the ` sub-distributions AP (x0(1)),
: : :, AP (x0(`)). Therefore, AP 0 (x) = AP 0 (x0).
87
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
4.2.2 PIR(
`; n; k
)
There are two known constructions of PIR(`; n; k) from PIR(n; k).
Lemma 14: Let P be a one round, k server PIR(n; k) scheme (where k 1), maintaining information-theoretic privacy (TD (n); "(n){computational privacy), with user
communication complexity P (n; k) and server communication complexity P (n; k).
Then, there exists a one round, PIR(`; n; k) scheme, P 0, which maintains informationtheoretic privacy (TD (n); "(n){computational privacy), and has user communication
complexity P 0 (`; n; k) = P (n; k) and server communication complexity P 0 (`; n; k) =
`P (n; k).
Proof:
The scheme is identical to the one presented in Lemma 13. In this case
the computational privacy carries over to P 0 from P without change.
The second construction is of a more limited nature. It is applicable only in a
multi-server setting (k 2). We state the result for a constant number of servers in
the information-theoretic setting (its generalization to the computational setting, or
to a non-constant number of servers is possible , but does not yield ecient schemes).
Lemma 15: For any constant k, k 2, and for any `, there exists a one round
PIR(`; n; k) scheme, maintaining information theoretic privacy, with communication
k
1
complexity O(n 2k?1 ` 2k?1 ).
The lemma is proved by combining two techniques: the ecient PIR(n; k) schemes
of [3], and a generalization of the balancing technique of [22]. The full proof appears
in [39].
4.3 General Solutions to PERKY(
`; n; k
)
A trivial solution to the PERKY(`; n; k) problem is to have the server send all the
strings it holds to U . The communication complexity in this case is O(n`). Our focus
is on providing more ecient solutions to the problem. As a rst step we show simple
reductions from PIR to PERKY and in the opposite direction too.
Theorem 16 : Let P be a one round PIR(n; k) (SPIR(n; k)) scheme with com-
munication complexity CP (n; k). Then, there exists a scheme that solves the problem PERKY(`; n; k) (SPERKY(`; n; k)) with communication complexity C(`; n; k) =
CP (2` ; k). If P maintains information-theoretic privacy (information-theoretic data
privacy), then so does . If P maintains TD(n); "(n){computational privacy then maintains TD (2` ); "(2` ){computational privacy.
88
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Proof: The scheme : The n strings s1; : : :; sn are replaced with their incidence
vector: a 2` bit string in which the j -th bit is 1 i the j -th ` bit string, in the
lexicographic order, is one of s1; : : : ; sn. Suppose that the word w that U holds is the
i-th word in the lexicographic order. U and DB1; : : : ; DBk execute P on a database
of length 2`, where the user retrieves the i-th bit.
Correctness and communication complexity: The correctness property of and the claim that its communication complexity is CP (2` ; k) follow immediately
from the analogous properties of P .
User privacy: Next, we show that if a passive adversary controls one of the servers
DB1; : : : ; DBk privacy is maintained. If P is information-theoretically (TD (n); "(n){
computationally private) then for any j 2 [k] and any i1; i2 2 [2` ] the two query
distributions Dj (2` ; i1) and Dj (2` ; i1) are equal (TD (2`); "(2` ){computationally indistinguishable).
We prove the privacy of by constructing a simulator S for every real life adversary A. To simulate the view of an adversary A who corrupts a server DBj the
simulator S has to do the following:
1. Arbitrarily pick some index i0 2 [2`] and generate the j -th query, qj of a user
retrieving i0.
2. Output that query together with the rest of its view, namely the database x.
By the privacy property of P we have that the output distribution of S is identical
(TD (2`), "(2` ){computationally indistinguishable) to the view of A, thus proving the
user privacy of .
Data Privacy: Let P be an information theoretically private SPIR scheme. We
show that is an information theoretically private SPERKY scheme. Let U be the
(possibly dishonest) user in . We construct two simulators S0; S1. On input w, the
user's input, and b, the output (which is 1 if and only if w 2 fs1; : : :; sng) S0; S1 do
the following:
1. S0; S1 run the query algorithm of U and obtain a k-tuple of queries (q1; : : : ; qk).
2. S0 sets the database to x0 = 02 , S1 sets the database to x0 = 12 . S0; S1 ips
coins to get the shared random string rs.
3. S0; S1 run the answer algorithms, Aj (qj ; x0krs) for j = 1; : : : ; k and obtain a
k-tuple of answers (a1; : : : ; ak ).
4. Finally, S0; S1 output w; b; (q1; : : :; qk ); (a1; : : : ; ak ).
Let x be the 2` bit incidence vector of fs1; : : :; sn g. By the data privacy property
of P we have that for some i 2 [2`] and for every x0 such that x0i = xi, the answer
`
`
89
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
distribution (the distribution on (a1; : : : ; ak )) A(x) = A(x0). For one of the simulators S1 or S2 their database x0 satises x0i = xi. For that simulator, the distribution
it outputs is identical to the view of U , thereby proving the security of according
to the denition of SPERKY schemes.
Theorem 17: Let be a PERKY(`; n; k) (SPERKY(`; n; k)) scheme with commu-
nication complexity C(`; n; k). Then, there exists a scheme P that solves the problem
PIR(n; k) (SPIR(n; k)) with communication complexity CP (n; k) = C (log n+1; n; k).
Proof:
The scheme P : The servers replace the n bit string they hold, x =
x1; : : : ; xn, with n strings of length log n + 1 bits each, s1; : : :; sn . Let j 2 [n] and let
its binary expansion be j1; : : :; jlog n. Dene sj to be sj =4 j1; : : : ; jlog n ; xj . Suppose
that the user is interested in the i-th bit. That bit is 1 i si = i1; : : :; ilog n ; 1. U and
DB1; : : : ; DBk the scheme on input (log n + 1; n; k) where the word that U holds is
i1; : : : ; ilog n; 1.
The correctness and communication complexity of P are evident. However, we
cannot prove its privacy according to Denitions 21, 25 and 13 because P is not
necessarily a one round scheme. However, if we dene PIR (SPIR) according to the
more general denitions of Subsection 2.6.1 where the computed function is the PIR
function, we get the same security assurance for P that has.
Discussion
The above reduction indicates that we cannot hope to nd PERKY schemes which are
signicantly more ecient than PIR schemes. Our aim is now to use PIR schemes in
order to construct PERKY schemes that are more ecient than what can be obtained
by Theorem 16. The main idea in all of our subsequent PERKY constructions is the
following: the servers insert s1; : : : ; sn into a search data structure. The user conducts
an oblivious walk on the data structure until either the word w is found, or U is assured
of the fact that w is not one of s1; : : : ; sn.
A typical search in the data structure involves a sequence of operations, where
each operation consists of fetching the contents of a word from memory, performing
a \local" computation, which depends on the keyword and the fetched contents, and
either determining a new memory address based on the computation, or terminating
the search (successfully or unsuccessfully). This sequence of operations can be viewed
as a walk on the data structure. We now describe a general outline of transforming
this walk into an oblivious walk on the data structure, namely a walk where each
server gets no information on the walk (and, therefore, on the desired keyword itself).
90
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
In the oblivious walk, the operations of the original data structure are divided
between the user and the server(s). Each server maintains the structure M , without any modication. The server supports the user in fetching words with known
addresses from the memory, by performing an agreed upon PIR scheme. The local
computations, as prescribed by the algorithm B , are performed exclusively by the
user. The result of each local computation determines the address of the next word
to be fetched. If the data structure requires (in the worst case) d memory accesses,
the user will invoke the PIR scheme d times successively. If for some given keyword
fewer than d invocations are required, the user will still execute the PIR scheme d
times, with arbitrary (dummy) addresses in the last operations. Otherwise the server
learns the search length for this specic keyword.
This discussion leads to the following theorem:
Theorem 18: Let DS be a (t; m; d){search data structure (Denition 26) and let P
be a one round PIR(`; n; k) scheme with communication complexity CP (`; n; k). Then,
there exists a scheme that solves the problem PERKY(`; n; k) and has the following
properties:
Privacy: If P maintains information-theoretic (or computational) privacy then maintains information theoretic (or computational1 ) privacy.
Communication complexity:
C(`; n; k) = m0(n; `) +
dX
(n;`)
q=1
CP (mq(n; `); tq (n; `); k):
Round complexity: The number of communication rounds in is d(n; `) + 1.
Proof: We slightly abuse the notation and once n and ` are set, do not explicitly
write them down. Thus, mq denotes mq (n; `), tq denotes tq(n; `) etc.
The scheme :
1. The servers DB1; : : : ; DBk insert the n strings s1; : : :; sn into the agreed upon
data structure, DS . DB1 sends uroot, the rst word in any search sequence, to
the user.
2. U sets ans =?2, u = uroot and aux = init (where init is the initial information).
Computational privacy is assured if and only if constructing DS, conducting a search operation
in DS and executing P can all be performed in polynomial time. However, since all search data
structures, and PIR schemes in the literature are of this type, we do not state these requirements
explicitly in the theorem.
2 ans denotes the answer of the protocol, whether w 2 fs ; : : :; s g. At the beginning of the
1
n
scheme its value is unknown.
1
91
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
3. For every q = 1; : : : ; d the following is performed by DB1; : : : ; DBk and U :
(a) The user, U , executes B (w; u; aux) and gets as output either (add0; aux0)
or ans(w; S ). If the output is (add0; aux0) then the user computes a new
address add = Map(n; `; q; add0). Otherwise the user does the following:
it sets add = 13, if ans is still ? it sets it to ans(w; S ) and nally aux is
set to aux0.
(b) The parties execute the PIR scheme P with parameters (tq ; mq; k). The
database is the tq words in Tq (each of length m). The user's retrieval
index is add, and the user retrieves the full word of m bits at the desired
position.
(c) The user sets u to be the retrieved word.
4. the user's answer is ans.
Correctness: We claim that at the end of the scheme ans is the correct bit (1 i
w 2 fs1; : : :; sn g) because of the correctness property of DS . The protocol carries
out a distributed evaluation of B q (w; uroot; init) (Denition 26) for every q = 1; : : : ; d.
Since for some q; q d, the output of B q(w; uroot; ) is ans(w; S ), which is the correct
bit and since ans = ans(w; S ) the claim follows.
Privacy: We utilize Theorem 1 to prove the information-theoretic (or computational) privacy of the scheme. We show that the protocol maintains informationtheoretic (computational) privacy in the semi-ideal model, in which a trusted party
computes the output of each PIR protocol. Theorem 1 ensures that if the PIR protocol P maintains information-theoretic (computational) privacy then maintains
information-theoretic (computational) privacy.
Let A be an adversary in the semi-ideal model. It can control a server DBj
(1 j k) of its choice. A simulator S in the semi-ideal model mimics A trivially.
On input s1; : : : ; sn its output is d copies of s1; : : :; sn. The only messages a reallife adversary receives are part of d invocations of P . In the semi-ideal model, for
each invocation of P the adversary sends s1; : : :; sn to the trusted party and gets no
answers. Thus, the simulator's output is identical to the view of A (In the case of an
adversary that corrupts DB1 the simulator has to add uroot to the output.).
Complexity: In the rst round of communication uroot, which is an m bit word, is
sent to the user. The rest of the communication involves d executions of the PIR
scheme P with parameters (t1; m; k); : : :; (td; m; k). Thus the total communication
complexity is m + Pd C (m; t ; k) and the number of rounds is d + 1.
q=1 P
3
q
From this point onwards the user carries out bogus queries.
92
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
We remark that schemes of the type we described above support the simultaneous implementation of private as well as \regular" (non-private) retrieval, which is
certainly a desirable property from a practical point of view.
4.4 Specic implementations
In this section we give three examples of PERKY schemes based on the general construction of Theorem 18, using a dierent data structure each time. The most ecient
scheme in terms of communication and round complexity is the one described in Subsection 4.4.3, which is based on perfect hashing [73, 36]. The expected computational
complexity of this scheme is linear, although in the worst case it might become exponential (in `). The scheme in Subsection 4.4.1, based on binary search trees, has
linear deterministic computational complexity but is inferior in communication and
number of rounds. The last scheme, in Subsection 4.4.2 is based on the trie data
structure, [56]. It is inferior to the other two options in all complexity measures but
can be converted into a symmetrically private scheme (Subsection 4.5).
4.4.1 Binary Search Tree
Corollary 19 : Let P be a one round PIR(`; n; k) scheme with communication
complexity CP (`; n; k). Then, there exists a scheme that solves PERKY(`; n; k)
and has the following properties:
Privacy: If P maintains information theoretic (or computational) privacy then maintains information theoretic (or computational) privacy.
Communication complexity:
C(`; n; k) = ` +
log
Xn
q=1
CP (`; 2q?1 ; k):
Round complexity: The number of communication rounds in is log n + 1.
Proof: We show a specic search data structure DS such that the corollary follows
from Theorem 18.
Consider the complete binary search tree consisting of 2dlog(n+1)e ? 1 elements
(that is the smallest complete binary tree with at least n nodes). Every node holds
an ` bit string such that all the strings in its left sub-tree are smaller than it in
the lexicographic order, and all the strings in its right sub-tree are greater than it.
Searching for a word w proceeds by beginning at the root of the tree and for each
93
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
node moving to its left son if the string at the node is greater than w and to its right
son if the string at the node is smaller than w. The search terminates successfully
upon reaching a node that holds w, or unsuccessfully upon reaching a leaf node that
does not hold w.
We now describe the various parameters of the search data structure in the terms of
Denition 26. The number of words in the structure is t(n; `) = 2dlog(n+1)e?1. The rst
n words are s1; : : :; sn and the rest are all the same dummy word 1` . Thus, m(n; `) = `.
The maximumsearch length is equal to the depth of the tree d(n; `) = dlog(n + 1)e?1.
The root of the structure is the tree's root, and its index is 2dlog(n+1)e?1. For every
q = 1; : : : ; d(n; `), the set of indices Tq includes the following:
)
(
q+1 ? 1
1
3
2
Tq = 2q+1 2dlog(n+1)e; 2q+1 2dlog(n+1)e; : : : ; 2q+1 2dlog(n+1)e :
The number of elements in each such set is tq = jTqj = 2q .
Finally, the search algorithm B receives as input three parameters (w; u; aux). w
is the given query, u is the value of the string at the current node and aux contains
n; q and the address of u, which is (1=2q+1 ) p2dlog(n+1)e for some p. If u = w the output
of B is ans(w; S ) = 1. Otherwise, if q = dlog(n + 1)e its output is ans(w; S ) = 0. If
q < dlog(n + 1)e then B 's output is (add0; aux0), which is determined as follows. If
u > w in the lexicographic order then add0 = (1=2q+2 ) (2p + 1)2dlog(n+1)e. Otherwise,
add0 = (1=2q+2 ) (2p ? 1)2dlog(n+1)e. In both cases, aux0 is (n; q + 1; add0).
4.4.2 Trie
In this subsection we present a PERKY(`; n; k) scheme, which is based on the trie
data structure [56, page 481].
Denition 27: Let s be a binary string of q bits, 0 q `. We say that s is a
legal prex of s1; : : :; sn , if it is a prex of some sj (1 j n).
Corollary 20 : Let P be a one round PIR(`; n; k) scheme with communication
complexity CP (`; n; k). Then, there exists a scheme that solves PERKY(`; n; k)
and has the following properties:
Privacy: If P maintains information theoretic (or computational) privacy then maintains information theoretic (or computational) privacy.
Communication complexity:
C (`; n; k) = (` ? 1)CP (log n; 2n; k):
94
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Round complexity: The number of communication rounds in is `.
Proof:
In its basic form a trie holding n strings of length ` is a binary tree of depth `
that has n leaf nodes. Each edge represents a bit and each node represents a legal
prex. Since a node v in a tree corresponds to the unique path from the root to it,
we take the prex represented by v to be the concatenation of all bits on the path
edges from the root to v. Therefore, each of the leaf nodes represents one of the
strings held by the trie. Let w = w1 : : : w` be the query string (w1; : : : ; w` 2 f0; 1g).
A search operation proceeds down a path from the root to a leaf node. At depth q ? 1
(q = 1; : : : ; `) the current node corresponds to the prex w1; : : :; wq?1. The search
continues down the edge marked by wq if it exists and terminates (unsuccessfully)
otherwise. The search terminates successfully if a leaf node has been reached.
We slightly modify the basic trie structure in the scheme . The basic structure
has the disadvantage that the number of nodes in each level of the tree (except the rst
and last) depends not only on on n and `, but also on the actual strings s1; : : : ; sn. In
the structure we use each level is assumed to have n nodes4 (which is an upper bound
on the number of necessary nodes). The servers pad each level with "dummy" nodes,
in order to ensure that the total number of nodes in that level is n. Each dummy
node is simply a string of zeroes. Another modication is that the nodes at depth
` ? 1 are the leaves instead of those at depth `. Each such node represents an ` ? 1 bit
legal prex, y1 : : : y`?1, and holds 2 bits. The rst is 1 i y1 : : :y`?1 0 2 fs1; : : :; sn g,
and the second is 1 i y1 : : : y`?11 2 fs1; : : :; sn g. In any other level each node is
composed of two log n bit strings. The rst is the pointer to the left son (this edge
represents the 0 bit), and the second is the pointer to the right son (the 1 bit).
The internal structure of each level has to be fully determined by s1; : : : ; sn. Thus
in a multi-server scheme all the servers construct the same data structure given the
same set of n keywords. One possible solution is to have the actual nodes of the level
appear rst, ordered by the lexicographic ordering of the prexes they represent, and
the dummy nodes appear after them.
We now describe the various parameters of the search data structure in terms of
Denition 26. Each node in a trie contains two pointers to its left and right son. Each
word in M corresponds to one such pointer. If the pointer is null (i.e the appropriate
edge does not exist), the corresponding word is the all 0 word. The number of words
in the structure is t(n; `) = 2n(` ? 1) + 2. The size of each word is m(n; `) = log n.
The maximum search length is equal to the depth of the tree d(n; `) = ` ? 1. The root
of the structure is empty5. For every q = 1; : : : ; d(n; `), the set of indices Tq includes
Another, more ecient approach is to have the root contain the number of nodes in every level.
In an actual implementation in computer memory, the root would contain pointers to the single
bit prexes 0 and 1. Here, however, we are only interested in the address of each such prex in T1.
4
5
95
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
the q-th block of 2n words (corresponding to 2n pointers at the q-th level of the trie).
Thus tq = 2n.
Finally, the search algorithm B receives as input three parameters (w; u; aux). w
is the given query, u is the current pointer and aux contains q. If q = 0, that is
aux = init, then B outputs (add0; aux0), where add0 is the address in T1 of the single
bit prex w1. If 0 < q < d(n; `) and u = 0log n then B outputs ans(w; S ) = 0. If
0 < q < d(n; `) and u 6= 0log n, then u is the address of a node v in level number q + 1.
B outputs (add0; aux0), such that if wq = 0 (wq = 1) add0 is the address in Tq+1 of the
left (right) pointer of v, and aux0 = q + 1. If q = d(n; `) then u is the answer of the
protocol and B outputs ans(w; S ) = u.
4.4.3 Perfect Hashing
Suppose the servers can nd a hash function h : f0; 1g` ?! f1; : : : ; tg, where t n,
such that h is perfect for fs1; : : :; sng. A hash function is perfect for fs1; : : : ; sng if
its restriction to fs1; : : :; sng f0; 1g` is one-to-one. In order to nd such a hash
function we use techniques introduced in [73] and [36]. To make this presentation self
contained, we give a brief description of these techniques.
We dene a family of functions, H, such that 8h 2 H, h : f0; 1g` ?! f1; : : : ; tg
and jHj = 22`. For any (a; b) 2 f0; 1g` f0; 1g` we dene a function ha;b by the rule
ha;b(x) = ax + b. We view a and b as elements of the nite eld GF (2`), and addition
and multiplication are with respect to this eld. H is a universal family of hash
functions. In other words, for any x; y 2 f0; 1g` ; x 6= y, if ha;b is chosen uniformly at
random from H, then Pr[ha;b(x) = ha;b(y)] = 21 .
We are interested in functions that map f0; 1g` to f1; : : : ; tg, and therefore, for
every ha;b 2 H, we take ha;b(x) to be only the rst log t bits of ax + b (for our purposes
it is sucient to assume that t is a power of 2). The family H remains universal with
Pr[ha;b(x) = ha;b(y)] = 1t .
We choose uniformly at random a function h 2 H, and use it to map n eld
elements s1; : : : ; sn. Let F be a random variable whose value is dened as the number
of pairs si 6= sj such that h(si) = h(sj ), and is determined by the random choices of
h. The expected value of F over all choices of h is:
!
X
n
(4.1)
E [F ] =
Pr[h(si ) = h(sj )] = 2 1t :
s 6=s
`
i
j
If we choose t = n2 then E [F ] 12 , and since F is an integer, at least half the
functions in H are 1-to-1 over s1; : : : ; sn (a function is 1-to-1 i F = 0).
That address can be agreed on beforehand, e.g. address number 1 (2) corresponds to the prex 0
(1).
96
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
We'd like to map the strings s1; : : :; sn to an array of size O(n) instead of O(n2).
The mapping is carried out in two stages. In the rst stage the functions in H
are regarded as mapping strings of length ` to f1; : : : ; ng. We choose a function
h 2 H that has the following property: F , the number of pairs si 6= sj such that
h(si) = h(sj ), is at most n. By Equation 4.1, since t = n we have E [F ] n2 , and
hence jF j n for at least half the functions in H.
In the second stage we choose n functions: h1; : : :; hn 2 H. We denote the number
of strings that h maps to i by ni, and assume that these strings are si1; : : :; sin . In
order to choose hi (1 i n) we regard the functions in H as mapping strings of
length ` to f1; : : :; n2i g. The function hi 2 H is required to be a 1-to-1 mapping of
fsi1; : : :; sin g to f1; : : : ; n2i g. At least half the functions in H satisfy this requirement.
Finding any one of the n + 1 functions h; h1; h2; : : : ; hn can be achieved eciently
by choosing a function uniformly at random from H. The expected number of trials
needed until all of these functions are found, is 2(n + 1).
Two arrays are now constructed. In the rst there are n entries, and in each
entry
two data elements are stored: P
A description of the function hi and the sum Piq?=11 n2q .
In the second array there are nq=1 n2q entries. The strings fs1; : : :P; sng are stored in
this array as follows: if h(sj ) = i, then sj is stored in cell number iq?=11 n2q + hi(sj ) in
the array. By construction no two strings are stored in the same cell.
We now show that the size of the second
array
n is at most 3n. The number of pairs
P
n
si 6= sj such that h(si) = h(sj ) is jF j = q=1 2 . On the other hand, by the choice
of h we have jF j n. Therefore:
n
n
n
X
X
X
nq (nq ? 1) + nq
n2q =
i
i
q
q=1
q=1
= 2 jF j + n
3n
Discussion
q=1
Our aim is to transform the above construction into a PERKY scheme. The servers
construct the two arrays, and the user retrieves the string at the correct cell. In order
to achieve this goal the servers have to agree on n + 1 hash functions, h; h1; : : :; hn.
Having one server choose these functions, and then distribute them through U , is
impractical. Since the number of these functions is n, and each one is encoded by 2`
bits, the communication complexity is even larger than just having one of the servers
send s1; : : :; sn to U . Below we present several solutions to this problem.
1. The simplest solution is to use a scheme that solves P IR(`; n; 1). If there is
just a single server, [58, 60, 72, 17], the problem of server coordination does not
97
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
even arise.
2. The second type of solution is to slightly extend the scope of the PIR, or PERKY
model. For instance, the servers can be allowed to have a source of shared
randomness. This is already assumed if the model of communication and privacy
being used is that of SPIR [38] (see also Sections 2.5, 2.7 and 4.5).
3. Another solution is to have all the servers go over the functions in H in deterministic fashion, for example in lexicographic order. The rst function that
satises the conditions required of hi is chosen as hi . This search will always
end successfully, because H contains functions of the desired type. However, in
the worst case the computational complexity may become exponential in ` , as
opposed to an expected time of O(`n). This may pose no problem when the
only measure of eciency we use is communication complexity. It does cause
diculties if the servers are assumed to be computationally bounded.
PERKY scheme
Given one of these possible solutions for the problem of distributing the hash functions
to the servers we have:
Corollary 21 : Let P be a one round PIR(`; n; k) scheme with communication
complexity CP (`; n; k). Then, there exists a scheme that solves PERKY(`; n; k)
and has the following properties:
Privacy: If P maintains information theoretic privacy (or computational) then maintains information theoretic privacy (or computational).
Communication complexity:
C (`; n; k) = 2` + CP (2` + log 3n; n; k) + CP (` + 1; 3n; k):
Round complexity: The number of communication rounds in is 3.
Proof: We now describe the various parameters of the search data structure in
the terms of Denition 26.
The number of words in the structure is t(n; `) = 4n + 1. The rst word in the
structure is the root which holds the index of the rst hash function h. The length
of the rst word is 2`. The next n words correspond to the entries in the rst array.
Each such entry is a description of a function hi 2 H (2` bits) and the sum Piq?=11 n2q
(log 3n bits). The next 3n words correspond to entries in the second array. Each word
98
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
is either sj k1 if sj is mapped to that entry, or 0`+1 if none of the strings s1; : : : ; sn is
mapped to the corresponding entry.
The maximum search length is d(n; `) = 2. The length of the words changes
between the rst word (2` bits), the elements in the next n words (2` + log 3n bits)
and the nal 3n words (` + 1 bits).
Finally, we describe the search algorithm B . The algorithm has three stages.
In the rst stage it begins with initial information init = 0. On input (w; h; 0)
the algorithm
B outputs (add0; aux0) wherePadd0 = h(w) and aux0 = 1. On input
P
(w; ( iq?=11 n2q ; hi); 1) the algorithm outputs ( iq?=11 nq + hi(w); 2). On input (w; s; 2) it
outputs 1 i wk1 = s.
4.5 Symmetric PERKY
In this section we deal with the problem of protecting both the privacy of the user and
the privacy of the database in a retrieval by keywords scheme. After the execution of
such a scheme the servers get no information on the user's query, while the user learns
whether w 2 fs1; : : :; sng but does not gain any other information on the database.
We already showed one solution to this problem in Theorem 16. The disadvantage
of the construction in that theorem is its high computational and communication
complexity. In Theorem 18 and Section 4.4 we showed PERKY schemes that are
more ecient than the reduction of Theorem 16. However, the generalized scheme of
Theorem 18 cannot transform a SPIR scheme into a SPERKY scheme in the same way
that it transforms PIR into PERKY. The user may get additional information, beyond
a single bit that determines if w 2 fs1; : : :; sng, even if the protocol is executed in the
prescribed manner. In the binary tree instantiation of Theorem 18 (see Corollary 19)
the user may learn no less than log n database keywords in a single invocation of the
PERKY protocol. In the perfect hashing instantiation of Theorem 18 (see Corollary
21) the user can gain information from the hash functions h; hi which he retrieves.
Since they have to be 1-to-1, only certain combinations of the n keywords s1; : : : ; sn
are possible, and that is more information than the user should obtain.
Another way to solve the SPERKY problem with communication complexity polynomial in n and ` is by using general private multi-party computation protocols, e.g
[11, 21, 75, 47].
We present a dierent solution which is more ecient and is in the spirit of an
oblivious walk over a data structure as in Theorem 18. The basis of our scheme is the
trie instantiation of Theorem 18, as presented in Corollary 20. It would be convenient
if we could replace the PIR scheme P in Corollary 20 with a SPIR scheme P , and
argue that the constructed scheme is a SPERKY scheme. That is not possible,
99
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
however, as leaks some illegal information to the user. For instance, a user that
holds w 62 fs1; : : : ; sng learns which prexes of w are legal and which are not. In the
scheme we construct we eliminate all such points of information leakage.
Theorem 22 : Let P be a one round SPIR(`; n; k) scheme with communication
complexity CP (`; n; k). Then, there exists a scheme that solves SPERKY(`; n; k)
and has the following properties:
Privacy: If P maintains information theoretic (or computational) privacy then maintains information theoretic (or computational) privacy.
Communication complexity:
C(`; n; k) = log(n + 1) + (` ? 1)CP (log(n + 1); 2(n + 1); k):
Round complexity: The number of communication rounds in is `.
Proof: The scheme uses a similar machinery to the schemes in Section 4.4. We
describe a specic search data structure that is akin to the trie structure of Subsection
4.4.2, and use it in conjunction with the oblivious walk method of Theorem 18. The
structure is random in the sense that the strings s1; : : : ; sn it holds do not determine
the structure, but do induce a probability distribution on all possible structures. All
the random choices that are required by the structure are carried out in a multi-server
scheme by utilizing the shared randomness source.
The Data Structure:
Levels: Like the trie structure, this data structure has ` levels. In the rst (number
0) there is only one node, the root, and in all others there are n + 1 nodes6 . A
search path begins at the root and proceeds down a path to one of the leaves.
Therefore, d(n; `) = ` ? 1. Unlike the trie structure the position (address) of a
node in a level is not deterministic. Rather, given n +1 nodes in a certain level a
random permutation is chosen to determine the internal order of nodes. A new
random permutation has to be chosen for each level prior to each invocation of
the PERKY scheme (otherwise information may leak out).
Node types: In the q-th level, Tq, there may be nodes of three types:
1. Legal nodes represent legal prexes. There is one node in the q-th level for
every distinct legal prex of length q.
6 A slightly more ecient approach is to have 2q + 1 nodes for the q-th level, q = 1; : : :; logn, as
2q is an upper bound on the number of possible nodes at the q-th level.
100
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
2. The illegal node is a unique node at each level7 that represents all the
prexes of length q which are not legal.
3. Dummy nodes are used to ll a level up. If the q-th level has no more
than j legal and illegal nodes, then the other n + 1 ? j nodes in the level
are dummy nodes. Unlike the trie structure, here each dummy node is an
element chosen uniformly at random from the domain [n + 1]. A dummy
string is random because if it were a string of zeroes as in Corollary 20, a
dishonest user who retrieved it might learn illegitimate information about
the number of legal prexes at that level.
Node composition: Every node in levels q = 0; 1; : : : ; ` ? 2 is comprised of two
pointers p0; p1 , each of length log(n + 1). Each pointer is a word in the data
structure (using the terms of Denition 26). Thus, m(n; `) = log(n + 1) and
t(n; `) =
`?1
X
q=0
tq = 2 + (n + 1)(` ? 1):
If a node at the q-th level represents the prex y1 : : : yq then p0 is the address in
level number q +1 of the node representing y1 : : :yq 0 and p1 is the address in level
number q + 1 of the node representing y1 : : :yq 1. The ordering of the pointers
within the nodes is deterministic (p0 comes rst). If y1 : : : yq0 (y1 : : :yq 1) is an
illegal prex then p0 (p1 ) points to the address of the illegal node at level q + 1.
Hence, both pointers of the illegal node at level q point to the illegal node at
level q + 1. As remarked previously the dummy nodes contain two randomly
chosen pointers. Each node in level ` ? 1 contains two bits denoting whether
the two possible ` bit extensions are in fs1; : : :; sn g.
Search algorithm: B receives as input (w; u; aux). If aux = init, then u is the root
which is a node that contains two pointers. Otherwise u is a single pointer. aux
contains the level number q. If 0 < q < d(n; `), then u is the address of a node
v in level number q + 1. B outputs (add0; aux0), such that if wq = 0 (wq = 1)
add0 is the address in Tq+1 of the p0 (p1) pointer of v, and aux0 = q + 1. If
q = d(n; `) then u is the answer of the protocol and B outputs ans(w; S ) = u.
Correctness: The correctness of the PERKY scheme induced by this structure is
based on its correctness as a search data structure. For every w 2 fs1; : : :; sn g, a
search path beginning at the root will terminate at level ` ? 1 with the bit 1. For
every w 62 fs1; : : : ; sng, for some q; 1 q `, the prex w1 : : : wq is illegal. If q < `
the search path will reach the illegal node of level q and thence the illegal nodes of
7
That is the reason n + 1 nodes are required per level.
101
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
levels q + 1; q + 2; : : : ; ` ? 1 where both answer bits are 0. Otherwise, the search path
reaches a legal node at level ` ? 1. The retrieved answer bit is 0 (even though the
other bit at this node is 1).
Complexity: In the rst round of communication DB1 sends the root (2(n +1) bits)
to the user. In each of the next ` ? 1 rounds, the scheme P is executed, where one
block of at most log(n + 1) bits is retrieved from 2(n+1) such blocks.
User privacy: User privacy is argued as in Theorem 18. In order to complete the
proof and show that this is a SPERKY scheme, we have to prove data privacy.
Data privacy: We prove data privacy by considering the semi-ideal model in which
the protocol is augmented with a trusted party that computes only the SPIR(`; n; k)
protocol P and returns the answer to the user. We show that in this semi-ideal
model maintains information-theoretic security against an adversary (passive or
active) that corrupts the user. By Theorem 1 the real-world protocol maintains information-theoretic (computational) data privacy if P maintains informationtheoretic (computational) data privacy.
Let U be a (possibly dishonest) user in the semi-ideal model. U is characterized
by a set of algorithms C1; C2; : : :; C`?1 . These (possibly probabilistic algorithms)
determine the retrieval index at levels 1; 2; : : : ; ` ? 1. for every q, Algorithm Cq
receives as input all of U s view up to that point. The view includes the input n; `; w,
an auxiliary input, zq, which models all the messages received by U and a random
input rq . Cq outputs a retrieval index iq which is sent to the trusted party. If U is
passive, i.e. follows the protocol to the letter, the algorithms C1; : : :; C`?1 are identical
to the ` ? 1 invocations of B at levels 1; : : : ; ` ? 1.
A simulator S receives as input n; `; w and ans(w; S ) (the input and output in
the ideal model). Its operation is quite simple. It mimics the algorithms run by U at
each stage, and simulates the answers of the servers by choosing completely random
elements in the range 1; : : : ; n +1 (in other words, random pointers). We rst present
the simulator in detail and then show that its output distribution is identical to the
view of U .
1. S outputs the input of U : n; `; w.
4
2. S chooses at random two dierent elements in [n + 1], R10 ; R20 and sets z1 =
R10kR20 which simulates the root.
3. For every q = 1; 2; : : : ; ` ? 2, S samples rq , executes Cq (n; `; w; zq ; rq ) and
4
produces iq. S sets zq+1 =
zq kiqkRq where Rq is a uniformly random element
in [n + 1] that simulates the output of the trusted server on index iq and the
given database.
102
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
4. S outputs the input of U , n; `; w, the messages of the protocol in the semi
ideal model, z`?2 and ans(w; S ).
We prove by induction on q that for every q = 1; : : : ; ` ? 2 the auxiliary input
of S , zq, is distributed identically to the messages received by U before the q-th
invocation of the trusted party.
Prior to the rst invocation of P the user, U , receives one message, which is the
root. It consists of two pointers, which are the addresses in T1 of the single bit prex
0 and the single bit prex 1. By construction, the n + 1 addresses in T1 are permuted
randomly. Therefore, each pointer is a random element in [n + 1]. Furthermore, the
two pointers have two dierent values8. Thus, z1 = R10kR20 is distributed identically
to z1.
Assume that the induction hypothesis is true for zq , q 1. Since S can sample
rq in the same way that U does, the input of Cq , n; `; w; zq ; rq as simulated by S is
distributed identically to the input of Cq invoked by U . Therefore, iq as simulated
by S is distributed identically to the iq that U uses. At this point U sends iq to
the trusted party and receives in return an address in Tq . The simulator chooses a
random Rq to simulate the trusted party's answer. Rq had the same distribution as
the "real" answer because of the random permutation of all addresses in Tq . Thus,
zq+1 is distributed identically to the view of U prior to invocation number q + 1 of
the trusted party.
As a conclusion we have that S receives the same input as U , produces the same
distribution on the messages received during the protocol and receives the same output
of the protocol. S , therefore, perfectly simulates in the ideal model the operation
of an adversary U in the semi-ideal model. Hence, is information-theoretically
(computationally) secure if P is information-theoretically (computationally) secure.
4.6 Other PERKY Topics
4.6.1 Reducing Communication Complexity
In all of the PERKY protocols we presented, the communication complexity is a
function of the form ` f (n), where > 0 is some constant and f is some sub-linear
function. In certain contexts the length of the keywords, `, may be very large, which
could cause the communication complexity to become infeasible.
Following the execution of each of the previous PERKY protocols, the user knows
with certainty whether the word it holds, w, is one of the n words held by the
8
This is true unless both single bit prexes are illegal, in which case the database is empty.
103
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
servers. If we allow errors, a certain tradeo is possible between the communication
complexity of a protocol, and its probability of error. That is the probability, that at
the end of the protocol U obtains a wrong answer as to whether w is one of s1; : : :; sn.
This tradeo is possible through the application of the string equivalence technique,
described in [57, pp. 30{31].
Lemma 16 : Let be a PERKY(`; n; k) scheme with communication complexity
C(`; n; k). Let p be a prime number. Then, there exists a scheme 0 that solves
the problem PERKY(`; n; k) with error probability at most n`=p and has the following
properties:
Privacy: If maintains information theoretic (or computational) privacy then 0
maintains information theoretic (or computational) privacy.
Communication complexity:
C0 (`; n; k) = C(log p; n; k):
Round complexity: Identical to the round complexity of .
Proof: The scheme 0:
1. U chooses p and r. p is a prime number, and r is chosen uniformly at random
in the range 0; : : :; p ? 1. U sends p and r to the servers.
2. All the participants in the scheme consider the binary strings, s1,: : :,sn, as
polynomials over GF (p). That is,Pa string s = b0 : : :b`?1 (b0; : : :; b`?1 2 f0; 1g)
represents the polynomial s(y) = `j?=01 bj yj . The servers construct a new set of
n strings. The i-th string is the binary representation of the eld element si(r).
3. The user and servers execute the scheme , in which the n strings each server
holds are s1(r); : : :; sn (r), and the string that U holds is w(r).
The size of all the strings is reduced to log p bits each, and therefore the communication complexity is C (log p; n; k). The probability that for some i; 1 i n,
si 6= w, but si (r) = w(r) is at most (` ? 1)=p. Therefore, the probability that
w 62 fs1; : : :; sng but w(r) 2 fs1(r); : : : ; sn(r)g is less than n`=p. Obviously, if
w 2 fs1; : : :; sng, then w(r) 2 fs1(r); : : : ; sn(r)g.
As to privacy: in order for to be private, for every adversary A who can corrupt one of the servers there is some simulator S whose output is similar (in the
information-theoretic or computational sense) to the view of A. A corresponding
simulator S 0 for the scheme 0 chooses p according to the required error limit (which
104
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
can be part of the denition of the scheme in the same way that n and ` are). S 0 proceeds to choose r (0 r p ? 1) uniformly at random, computes the new database
s1(r); : : :; sn (r) and then runs S . Distinguishing between the output of S 0 and the
view of A in the scheme 0 is exactly as hard as distinguishing between the output
of S and the view of A in the scheme .
4.6.2 The address of
w
In this work a solution to the PERKY problem is dened as a protocol by which
the user learns whether w 2 fs1; : : : ; sng or not. In many situations a user would
actually like to know the address i for which w = si. After nding out the address,
further queries about the keyword w or the data it represents can be accomplished
by using PIR schemes, which are more ecient than their PERKY counterparts. If
w 2 fs1; : : :; sng nding out i is simple given any PERKY scheme, .
The servers construct log n sets of strings, which contain at most n items each.
Suppose the binary representation of i is i1 : : : ilog n , and the i-th keyword is si . The
keyword si appears in the q-th set (1 q log n) if and only if iq = 1. The scheme
is executed by the user and servers log n times. In the q-th execution the user nds
out if w appears in the q-th set constructed by the servers.
The communication complexity of the above protocol is log n times the communication complexity of . However, in all the schemes shown in this work the user
can nd what the address i is in a more ecient manner. In the binary tree, trie and
perfect hashing schemes the user and servers execute several PIR schemes. In the
last execution the servers construct an array such that by retrieving the data in the
j -th cell (for some 1 j n ) the user nds out if w 2 fs1; : : :; sn g. Adding to the
j -th cell the address i, which is the position of w in the original database, allows the
user to discover what i is immediately. In each of the cases (tree, trie and hashing)
the communication complexity increases by a constant factor at most.
4.7 Open problems
In this work we introduce the notion of PERKY as a step towards closing the gap between theoretically motivated PIR works and applicable information retrieval systems
that maintain privacy. A lot of further work is still required. Among the problems
that remain unsolved are more complex queries and approximate queries. By complex queries we mean that each data item in the database has several keywords that
correspond to it. The user holds several keywords and may wish to retrieve all the
items that correspond to some functions of its keywords. By approximate queries we
mean that the database returns an armative answer to the query not only if for
105
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
some i, si = w, but also if for some i the strings si and w are close according to some
metric (for instance, edit distance).
106
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Chapter 5
Joint Generation of RSA Keys by
Two Parties
In this chapter we show how two parties can jointly generate RSA public and private
keys. Following the execution of our protocol each party learns the public key: N =
PQ and e, but does not know the factorization of N or the decryption exponent d.
The exponent d is shared among the two players in such a way that joint decryption
of cipher-texts is possible.
5.1 preliminaries
Notation 14: The size of the RSA modulus N is bits (e.g = 1024).
5.1.1 Smallest primes
At several points in our work we are interested in the j smallest distinct primes
p1; : : : ; pj such that ij=1pi > 2 . The following table provides several useful parameters for a few typical values
of .
P
j
j pj
i=1 dlog pi e
512 76 383
557
1024 133 751
1108
1536 185 1103
1634
2048 235 1483
2189
5.1.2 Useful techniques
In this subsection we review several problems and techniques that were researched
extensively in previous work, and which we use here.
107
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Symmetrically private information retrieval: In the problem of private information retrieval presented by Chor et al. [25] k databases (k 1) hold copies of
the same n bit binary string x and a user wishes to retrieve the i-th bit xi. A PIR
scheme is a protocol which allows the user to learn xi without revealing any information about i to any individual database. Symmetrically private information retrieval,
introduced in [38], is identical to PIR except for the additional requirement that the
user learn no information about x apart from xi. This problem is also called 1 out of
m oblivious transfer and all or nothing disclosure of secrets (ANDOS). The techniques
presented in [38] are especially suited for the multi-database (k 2) setting. In a
recent work Naor and Pinkas [63] solve this problem by constructing a SPIR scheme
out of any PIR scheme, in particular single database PIR schemes. The trivial single
database PIR scheme is to simply have the database send the whole data string to the
user. Clever PIR schemes involving a single database have been proposed in several
works: [58, 17, 72]. They rely on a variety of cryptographic assumptions and share
the property that for "small" values of n their communication complexity is worse
than that of the trivial PIR scheme.
We now give a brief description of the SPIR scheme we use, which is the NaorPinkas method in conjunction with the trivial PIR scheme. In our scenario the data
string x is made up of n substrings of length `. The user retrieves the i-th substring
without learning any other information and without leaking information about i. The
database randomly chooses log n pairs of seeds for a pseudo-random generator G:
(s01; s11); : : : ; (s0log n; s1log n). Every seed sbj (1 j log n, b 2 f0; 1g) is expanded into
n` bits G(sbj ), which can be viewed as n substrings of length `. It then prepares a new
data string y of n substrings. Suppose the binary representation of i is ilog n : : :i1.
The i-th substring of y is the exclusive-ori of the i-th substring of x and the i-th
i1
i2
log
substring
2 of each of G(s1 ); G(s2 ); : : :; G(slog n ). The user and database combine in
log n 1 -OT of strings to provide the user with a single seed from every pair. Finally,
the database sends y to the user, who is now able to learn a single substring of x. The
parameters of the data strings we use are such that the running time is dominated
by the log n 21 -OTs and the communication complexity is dominated by the n` bits
of the data string, which are sent to the user.
Dense probabilstic and homomorphic encryption: we are interested in an
encryption method that provides two basic properties: (1) Semantic security: as
dened in [49]. (2) Additive homomorphism: we can eciently compute a function f
such that f (ENC(a); ENC(b)) = ENC(a + b). Furthermore, the sum is modulo some
number t, where t can be dened exibly as part of the system.
As a concrete example we use Benaloh's encryption [12, 13]. The system works as
4
follows. Select two primes p; q such that: m =
pq 2 , tjp ? 1, gcd(t; (p ? 1)=t) = 1
n
108
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
and gcd(t; q ? 1) = 1 1. The density of such primes along appropriate arithmetic
sequences is large enough to ensure ecient generation of p; q (see [12] for details).
Select y 2 Zm such that y(m)=t 6 1 mod m. The public key is m; y, and encryption
of M 2 Zt is performed by choosing a random u 2 Zm and sending yM ut mod m.
In order to decrypt, the holder of the secret key computes at a preprocessing
4 M(m)=t
stage TM =
y
mod m for every M 2 Zt . Hence, t is small enough that t
exponentiations can be performed. Decryption of z is by computing z(m)=t mod m
and nding the unique TM to which it is equal. The scheme is semantically secure
based on the assumption that deciding higher residuosity is intractable [13]. Most of
our requirements are met by the weaker assumption that deciding prime residuosity
is intractable [12].
Oblivious polynomial evaluation: In this problem, presented by Naor and
Pinkas in [63] Alice holds a eld element 2 F and Bob holds a polynomial B (x)
over F . At the end of the protocol Alice learns only B () and Bob learns nothing
at all. The intractability assumption used in [63] is new and states the following.
Let S () is a degree k polynomial over F and let m; dQ;x be two security parameters
(dQ;x > k). Given 2dQ;x + 1 sets of m eld elements such that in each set there is one
value of S at a unique point dierent than 0 and m ? 1 random eld elements, the
value S (0) is pseudo-random.
We now give a brief description of the protocol presented in [63] as used in our application where the polynomial B is of degree 1. Bob chooses a random bivariate polynomial Q(x; y) such that the degree of y is 1, the degree of x is dQ;x and Q(0; ) = B ().
Alice chooses a random polynomial S of degree dQ;x such that S (0) = . Dene R(x)
as the degree 2dQ;x polynomial R(x) = Q(x; S (x)). Alice chooses 2dQ;x + 1 dierent
non-zero points xj for j = 1; : : : ; 2dQ;x + 1. For each such j Alice randomly selects
m ? 1 eld elements yj;1; : : : ; yj;m?1 and sends to Bob xj and a random permutation
of the m elements S (xj ); yj;1; : : :; yj;m?1 (denoted by zj;1; : : :; zj;m). Bob computes
Q(xj ; zj;i) for i = 1; : : : ; m. Alice and Bob execute a SPIR scheme in which Alice
retrieves Q(xj ; S (xj )). Given 2dQ;x + 1 such pairs of xj ; R(xj ) Alice can interpolate
and compute R(0) = B ().
The complexity of the protocol is 2dQ;x + 1 executions of the SPIR scheme for
data strings of m elements.
5.2 Overview
In this section we give an overview of our protocol. The stages in which we use the
Boneh-Franklin protocol exactly are the selection of candidates and the full primality
1
Therefore t is odd.
109
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
test (the other stages require a third party in [15]). The protocol is executed in the
following steps.
1. Choosing candidates Alice chooses independently at random two =2 ? 1 bit
integers Pa; Qa 3 mod 4, and Bob chooses similarly Pb ; Qb 0 mod 4. The
two parties keep their choices secret and set as candidates P = Pa + Pb and
Q = Q a + Qb .
2. Computing N Alice and Bob compute N = (Pa + Pb )(Qa + Qb). We show how
to perform the computation using three dierent protocols and three dierent
intractability assumptions.
3. Initial primality test For each of the smallest k primes p1; : : : ; pk the participants check if pi j N (i = 1; : : : ; k). This stage is executed in conjunction with
the computation of N . If N fails the initial primality test, computing a new
candidate N is easier than computing it from scratch (as is the case following
a failure of the full primality test)
4. Full primality test The test of [15] is essentially
as follows: Alice and Bob
g
agree on g 2 ZN . If the Jacobi symbol N is not equal to 1 choose a new
g. Otherwise Alice computes va = g(N ?P ?Q +1)=4 mod N , and Bob computes
vb = g(P +Q )=4 mod N . If va = vb or va = ?vb mod N the test passes.
5. Computing and sharing d In this step we compute the decryption exponent
d assuming that e is known to both parties and that gcd(e; (N )) = 1. Alice
receives da and Bob receives db so that d = da +db mod (N ) and de 1 mod m.
Boneh and Franklin describe two protocols for the computation of d. The rst
is very ecient and can be performed by two parties, but leaks (n) mod e.
Therefore, this method is suitable for small public exponents and not for the
general case. The second protocol computes d for any e but requires the help
of a third party.
a
b
a
b
5.3 Computing
N
Alice holds Pa; Qa and Bob holds Pb ; Qb. They wish to compute
N = (Pa + Pb )(Qa + Qb) = PaQa + PaQb + PbQa + PbQb:
We show how to carry out the computation privately using three dierent protocols.
110
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
5.3.1 Oblivious transfers
Let R be a publicly known ring and let a; b 2 R. Denote = log jRj (each element
in R can be encoded using bits). Assume Alice holds a and Bob holds b. They
wish to perform a computation by which Alice obtains x and Bob obtains y such
that x + y = ab where all operations are in R. Furthermore, the protocol ensures the
privacy of each player given the existence of oblivious transfers. In other words the
protocol does not help Alice and Bob to obtain information about b and a respectively.
The protocol:
1. Bob selects uniformly at random and independently ring elements denoted
by s0; : : : ; s?1 2 R. Bob proceeds by preparing pairs of elements in R:
(t00; t10); : : : ; (t0?1; t1?1). For every i (0 i ? 1) Bob denes t0i =4 si and
t1i = 2ib + si.
2. Let the binary representation of a be a?1 : : :a0. Alice and Bob execute 21 OTs. In the i-th invocation Alice chooses tia from the pair (t0i ; t1i ).
4 P?1 a
3. Alice sets x =
t and Bob sets y =4 ? P?1 s .
i
i=0 i
i
i=0 i
Lemma 17: x + y = ab over the ring R.
Proof:
P?1 a 2i. Since a?1; : : : ; a0 is the binary representation of a we can write a =
i=0 i
x+y =
X
?1
a
i
ti ? si
i=0
i=0
X
?1
X
?1
i
(ai 2 b + si) ? si
i=0
i=0
X
?1
b ai 2 i
i=0
X
?1
ab
2 .
In the following protocol for computing N the ring R is Z2 , the integers modulo
1. Alice and Bob use the previous protocol twice to additively share PaQb = x1 +
y1 mod 2 and PbQa = x2 + y2 mod 2 . Alice holds x1; x2 and Bob holds y1; y2.
111
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
2. Bob sends y =4 y1 + y2 + Pb Qb mod 2 to Alice.
3. Alice computes PaQa + y mod 2 . Alice now holds N mod 2 , which is simply
N due to the choice of .
Lemma 18 : The transcript of the view of the execution of the protocol can be
simulated for both Alice and Bob and therefore the protocol is secure.
Proof: We denote the messages that Alice receives during the sharing of PaQb by
ta00 ; : : :; ta??11 and the messages received while sharing PbQa by ta ; : : :; ta22 ??11 . In the
same manner we denote Bob's random choices for the sharing of PaQb and Pb Qa by
s0; : : :; s?1 and s ; : : :; s2?1 respectively.
Bob's view can be simulated because the only messages Alice sent him were her
part of 2 independent oblivious transfers.
Alice receives 2 + 1 elements in Z2 :
ta00 ; : : :; ta22 ??11 ; y:
The uniformly random and independent choices by which s0; : : : ; s2?1 are selected
ensure that the messages Alice receives are distributed uniformly subject to the condition that
2X
?1
tai + y N ? Pa Qa mod 2 :
i
i=0
Since Alice can compute N ? PaQa a simulator Sa can produce the same distribution
as that of the messages Alice receives, given N; Pa ; Qa.
Lemma 19: The computation time and the communication complexity of the protocol are dominated by 2 oblivious transfers. The transfered strings are of length
.
5.3.2 Oblivious polynomial evaluation
Alice and Bob agree on a prime p > 2 and set F to be GF (p). They employ the
following protocol to compute N :
1. Bob chooses a random element r 2 F . He prepares two polynomials over F :
B1(x) = Pb x + r and B2 = Qbx ? r + Pb Qb.
2. Alice uses the oblivious polynomial evaluation protocol of [63] to attain B1(Qa)
and B2(Pa). Alice computes N = PaQa + B1(Qa) + B2(Pa).
112
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
The security of the protocol depends on the security of the cryptographic assumption outlined in subsection 5.1.2 and of a similar argument to the proof of lemma
18.
Lemma 20 : The computational complexity of the protocol is dominated by the
execution of 2 log m(2dQ;x + 1) oblivious transfers, where m and dQ;x are the security
parameters. The communication complexity is less than 3m(2dQ;x + 1).
5.3.3 Benaloh's encryption
We now compute N by using the homomorphic encryption described in subsection
5.1.2. Let p1 ; : : :; pj be the smallest primes such that ij=1pi > 2 . The following
protocol is used to compute N mod pi :
4
1. Let t =
pi . Alice constructs the encryption system: an appropriate p; q; y, and
sends the public key y; m = pq to Bob. Alice also sends the encryption of her
4 P t
shares, i.e z1 =
y u1 mod m and z2 =4 yQ ut2 mod m, where u1; u2 2 Zm are
selected uniformly at random and independently .
2. Bob computes the encryption of Pb Qb mod t, which is denoted by z3, calculates
z =4 z1Q z2P z3 mod m
and sends z to Alice.
3. Alice decrypts z, adds to the result PaQa modulo t and obtains N mod t.
The two players repeat this protocol for each pi, i = 1; : : : ; j . Alice is able to reconstruct N from N mod pi , i = 1; : : : ; j by using the Chinese remainder theorem.
Lemma 21: Assuming the intractability of the prime residuosity problem, the transcript of the views of both parties in the protocol can be simulated.
Proof: The distribution of Bob's view can be simulated by encrypting two arbitrary messages assuming the intractability of prime residuosity. Therefore, Alice's
privacy is assured.
The distribution of Alice's view can be simulated as follows. Given N , N mod pi
can be computed for every i. The only message that Alice receives is z =4 z1Q z2P z3 mod m. By the denition of z3 and the encryption system z3 = yP Q ut mod m
where u is a random in Zm . Thus z is a random element in the appropriate coset (all
the elements whose decryption is N ? PaQa mod t).
Lemma 22: The running time of the protocol is dominated by the single decryption
Alice executes, the communication complexity is 3 and the protocol requires one
round of communication.
a
a
b
b
b
b
113
b
b
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
5.4 Amortization and initial primality test
The initial primality test consists of checking whether a candidate N is divisible by
one of the rst k primes p1; : : : ; pk . If it is then either P = Pa + Pb or Q = Qa + Qb is
not a prime. This test can be carried out by Alice following the computation of N .
If a candidate N passes the initial primality test, Alice publishes its value and it
becomes a candidate for the full primality test of [15]. However, if it fails the test
a new N has to be computed. In this section we show how to eciently nd a new
candidate following a failure of the initial test by the previous candidate. The total
cost of computing a series of candidates is lower than using the protocols of section
5.3 each time anew. We show two dierent approaches. One for the oblivious transfer
and oblivious polynomial evaluation protocols, and the other for the homomorphic
encryption protocol.
5.4.1 OT and oblivious polynomial evaluation
Suppose that after Alice and Bob discard a certain candidate N they compute a new
one by having Alice retain the previous Pa; Qa and having Bob choose new Pb ; Qb. In
that case, as we show below, computing the new N can be much more ecient than
if both parties choose new shares. The drawback is that given both values of N Bob
can gain information about Pa; Qa. Therefore, in this stage (unlike the full primality
test) Alice does not send the value of N to Bob.
Assume Bob holds two sequences of strings: (a01; : : :; a0n), (a11; : : :; a1n) and Alice
wishes to retrieve one sequence without revealing which one to Bob
2 and without
gaining information about the second sequence. Instead of invoking a 1 -OT protocol
n times the players agree on a pseudo-random generator G and do the following:
1. Bob chooses two random seeds s1; s2.
2. Alice uses a single invocation of 21 -OT to gain sb, where b 2 f0; 1g denotes the
desired string sequence.
3. Bob sends to Alice the rst sequence masked (i.e bit by bit exclusive-or) by
G(s1) and the second sequence masked by G(s2).
Alice can unmask the required sequence while the second sequence remains pseudorandom. In the protocol of subsection 5.3.1 N is computed using only oblivious
transfers in which Alice retrieves a set of 2 strings from Bob. Alice's choices of
which strings to retrieve depend only on her input Pa; Qa. Therefore if Alice retains
Pa and Qa while Bob selects a sequence of inputs (Pb1; Q1b ); : : : ; (Pbn; Qnb), the two
114
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
players can compute a sequence of candidates N 1; : : :; N n with as many oblivious
transfers as are needed to compute a single N .
The same idea can be used in the oblivious polynomial evaluation protocol, as
noted in [63]. The evaluation of many polynomials at the same point requires as
many oblivious transfers as the evaluation of a single polynomial at that point. Thus,
computing
a sequence of candidates N requires only 2 log m(2dQ;x + 1) computations
2
of 1 -OT.
5.4.2 Homomorphic encryption
Alice and Bob combine the two stages of computing N and trial divisions by using
the protocol of subsection
5.3.3 exibly. Let p1; : : : ; pj0 be the j 0 smallest distinct
0
primes such that 0ij=1pi > 2(?1)=2. Alice and Bob pick their elements at random in
the range 0; : : : ; ij=1pi ? 1 by choosing random elements in each Zp for i = 1; : : : ; j 0.
Alice and Bob compute N mod pi as described in subsection 5.3.3. If N 0 mod
pi then at least one of the elements P = Pa + Pb or Q = Qa + Qb is divided by pi.
In that case, Alice and Bob choose new, random elements: Pa; Pb ; Qa; Qb mod pi, and
recompute N mod pi . The probability of this happening
is less than 2=pi . Thus the
0
P
j
expected number of re-computations is less than i=1 2=pi . This quantity is about
3:1 for = 1024 (2 does not cause a problem because Pa Qa 3 mod 4 and
Pb Qb 0 mod 4).
Setting Pa mod pi for i = 1; : : : ; j 0 determines Pa , and by the same reasoning
the other 3 shares that Alice and Bob hold are also set. The two players complete
the computation of N by determining the value of N mod pi (using the protocol of
subsection 5.3.3) for i = j 0 + 1; : : : ; j , where ij=1pi > 2 . If for one of these primes
N 0 mod pi Alice and Bob discard their shares and pick new candidates 2.
i
5.5 Computing
d
4
Alice and Bob share (N ) in an additive manner. Alice holds a =
N ?Pa ?Qa+1, Bob
holds b = ?Qb ? Pb and a + b = (N ). The two parties agree on a public exponent
e. Denote =4 dlog ee. We follow in the footsteps of the Boneh-Franklin protocol and
employ their algorithm to invert e modulo (N ) without making reductions modulo
(N ):
1. Compute = ?(N )?1 mod e.
An interesting optimization is not to discard the whole share (Pa ; Pb; Qa; Qb), but for each
specic share, say Pa, only to select a new Pa mod pi for i = j 0 ? c; : : :; j 0 , where c is a small
constant. The probability is very high that the new N thus dened is not a multiple of pi .
2
115
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
2. Compute d = ((N ) + 1)=e.
Now de 1 mod (N ) and therefore d is the inverse of e modulo (N ).
As a rst step Alice and Bob change the additive sharing of (N ) into a multiplicative sharing modulo e, without leaking information about (N ) to either party.
At the end of the sub-protocol Alice holds r(N ) mod e and Bob holds r?1 mod e,
where r is a random element in Ze.
1. Bob chooses uniformly at random r 2 Ze. Alice and Bob invoke the protocol of
4
subsection 5.3.1, setting R =4 Ze, a =
a and b =4 r. At the end of the protocol
Alice holds x and Bob holds y such that x + y ar mod e.
2. Bob sends y + b r mod e to Alice.
3. Alice computes x + y + br r(N ) mod e, and Bob computes r?1 mod e.
Lemma 23: The computation time and the communication complexity of the protocol are dominated by oblivious transfers.
After completing the sub-protocol we described above, Alice and Bob nish the
inversion algorithm by performing the following steps:
1. The two parties hold multiplicative shares of (N ) mod e. They compute the
inverse of their shares modulo e and thus have a; b respectively such that
a b ?(N )?1 mod e.
2. Alice and Bob re-convert their current shares into additive shares modulo e, i.e
a; b such that a + b mod e. Bob chooses randomly b 2 Ze and the two
4
parties combine to enable Alice to gain a =
? b + ab mod e. This is done
by employing essentially the same protocol we used for transforming a; b into
a multiplicative sharing. If we replace a by a, b by r and ? b by rb we get
the same protocol.
3. The two parties would like to compute (N ). What they actually compute
4
is =
( a + b)(a + b). The result is either exactly (N ) or ( + e)(N ).
The computation is carried out similarly to the computation of N in subsection
5.3.1. The ring used is Zk where k > 4e 2 . We modify the protocol in two
ways. The rst modication is that is not revealed to Alice but remains split
additively over Zk among the two players. In other words they perform step 1
of the protocol in subsection 5.3.1 and additively share ab + ba. Alice adds
aa to her share and Bob adds b b to his.
116
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
The sum of the two shares over the integers is either or + k. The second
option is unacceptable (unlike the two possibilities for the value of ). In order
to forestall this problem we introduce a second modication. The sharing of results in Alice holding + y0 mod k and Bob holding ?y0 mod k. Furthermore
Bob selects y0 by his random choices. We require Bob to make those choices
subject to the condition that y0 < k=2. Since y0 is not completely random, Alice
might gain a slight advantage. However, that advantage is at most a single bit
of knowledge about (N ) (which can be guessed in any attempt to discover
(N )).
4
4. Alice sets da =4 d( + y0)=ee and Bob sets db =
b(?y0 + 1)=ec (these calculations
are over the integers). Hence, either da + db = ((N ) + 1)=e or da + db =
((N ) + 1)=e + (N ).
5.6 Improvements
In this section we suggest some eciency improvements.
O-line preprocessing: The performance of the protocol based on Benaloh's
encryption can be signicantly improved by some o-line preparation. Obviously,
for any t used as a modulus in the protocol a suitable encryption system has to
be constructed (i.e a suitable m = pq has to be found, a table of all values TM =
yM(m)=t mod m has to be computed etc.).
Further improvement of the online communication and computational complexity
can be attained at the cost of some space and o-line computation. Instead of constructing a separate encryption system for t1 and t2, Alice constructs a single system
for t = t1t2. The lookup table needed for decryption is formed as follows. Alice computes TM = yM(m)=t mod m for M = 0; : : : ; t1 ? 1 and TM = yMt1(m)=t mod m for
M = 0; : : : ; t2 ? 1. The entries of the table are obtained by calculating TM TM mod m
for every pair M; M . Constructing this table takes more time than constructing
the two separate tables for t1; t2. The additional time is bounded by the time required to compute t2(t1 + log t1 log t2) modular multiplications over Zm (computing
TM involves log t1 log t2 multiplications in comparison with log t2 multiplications in
the original table).
The size of the table is t log m (slightly more than t). This gure which might
be prohibitive for large t can be signicantly reduced. After computing every entry
in the table it is possible by using perfect hashing [36] to eciently generate a 1-to-1
function h from the entries of the table to 0; : : : ; 3t?1. A new table is now constructed
in which instead of original entry TM an entry (h(TM ); M ) is stored. Decryption
of z is performed by nding the entry holding h(z(m)=t) mod m and reading the
117
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
corresponding M . The size of the stored table is 2t log t. As an example of the
reduction in space complexity consider the case t = 3 751 = 2253. The original table
requires more than 221 bits while the hashing table requires less than 216 bits.
It is straightforward to use t = t1t2 instead of t1 and t2 separately in subsections
5.3.3 and 5.4.2. The protocols in both subsections remain almost without change
apart from omitting the sub-protocols for t1 and t2 and adding a sub-protocol for t.
In subsection 5.4.2 it is not enough to check whether N 0 mod t. It is necessary to
retain the two tests of N mod t1 and N mod t2.
Note that here we need the stronger higher residuosity intractability assumption
replaces the prime residuosity assumption.
Alternative computation of d: The last part of generating RSA keys is constructing the private key. Using Benaloh's encryption we can sometimes improve the
computation and communication complexity of the construction in comparison with
the results of section 5.5. The improvement is possible if the parameter t of a Benaloh encryption can be set to e (that is, the homomorphism is modulo e) so that
ecient decryption is possible. Therefore, e has to be a product of \small" primes,
see [13]. The protocol for generating and sharing d is a combination of the protocols
of subsection 5.3.3 and section 5.5. We leave the details to the full version of the
paper.
5.7 Performance
The most resource consuming part of our protocol, in terms of computation and
communication, is the computation of N together with trial divisions. We use trial
divisions because of the following result by DeBruijn [28]. If a random =2 bit integer
passes the trial divisions for all primes less than B then asymptotically:
Pr[p prime j p 6 0 mod pi ; 8pi B ] = 5:14 lnB (1 + o( 2 )):
We focus on the performance of the more ecient version of our protocol, using
homomorphic encryption. We also assume that the o-line preprocessing suggested
in section 5.6 is used. Let j 0; j be dened as in section 5.3.3. We pair o the rst
j 0 primes and prepare encryption systems for products of such pairs (as in section
5.6.) The number of exponentiations (decryptions) needed to obtain one N is on
average about j + 3 ? j 0=2. The probability that this N is a product of two primes is
approximately (5:14 ln pj0 =)2.
Another obvious optimization is to divide the decryptions between the two parties
evenly. In other words for half the primes the rst party plays Alice and for the other
half they switch.
118
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
If = 1024 and we pair o the rst j 0 = 76 primes the running time of our
protocol is (by a rough estimate) less than 10 times the running time of the BonehFranklin protocol. The communication complexity is (again very roughly) 42MB. If
the paricipants are willing to improve the online communication complexity in return
for space and pair o the other j ? j 0 needed to compute N the communication
complexity is reduced to about 29MB.
Open problem: Boneh and Franklin show in [15] how to test whether N is a
product of two primes, where both parties hold N . It would be interesting to devise a
distributed test to check whether N is a product of two primes if Alice holds N; Pa ; Qa
and Bob only has his private shares Pb; Qb. The motivation is that in the oblivious
transfer and oblivious polynomial evaluation protocols we presented Pa; Qa will have
to be selected only once. Thus the number of oblivious transfers in the whole protocol
is reduced to the number required for computing a single candidate N .
119
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
Bibliography
[1] M. Abadi, J. Feigenbaum, and J. Kilian. On hiding information from an oracle
(extended abstract). In Proc. of the 19th Annu. ACM Symp. on the Theory of
Computing, pages 195{203, 1987. Journal version in JCSS vol 39 pp. 21-50, 1989.
[2] M. Ajtai and C. Dwork. A public-key cryptosystem with worst-case/average-case
equivalence. In Proc. of the 29th Annu. ACM Symp. on the Theory of Computing,
pages 284{293, 1997.
[3] A. Ambainis. An upper bound for private information retrieval. In Proc. of 24th
ICALP, pages 401{407, 1997.
[4] L. Babai, P. Kimmel, and S. Lokam. Simultaneous messages vs. communication.
In Proc. of 12th STACS, LNCS 900, Springer Verlag, volume 900, pages 361{372,
1995.
[5] D. Beaver. Foundations of secure interactive computing. In Advances in Cryptology - CRYPTO '91, 1991.
[6] D. Beaver. Commodity-based cryptography. In Proc. of the 29th Annu. ACM
Symp. on the Theory of Computing, pages 446{455, 1997.
[7] D. Beaver and J. Feigenbaum. Hiding instances in multioracle queries. In STACS,
pages 37{48, 1990.
[8] D. Beaver, J. Feigenbaum, J. Kilian, and P. Rogaway. Security with low communication overhead. In Advances in Cryptology - CRYPTO '90, pages 62{76,
1990.
[9] D. Beaver, J. Feigenbaum, J. Kilian, and P. Rogaway. Locally random reductions: Improvements and applications. J. of Cryptology, 10(1):17{36, 1997. Early
version: Security with small communication overhead, CRYPTO '90, LNCS 537,
pages 62-76.
120
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
[10] A. Beimel, Y. Ishai, E. Kushilevitz, and T. Malkin. One-way functions are
essential for single database private information retrieval. In Proc. of the 31th
Annu. ACM Symp. on the Theory of Computing, pages 89{98, 1999.
[11] M. Ben-OR, S. Goldwasser, and A. Wigderson. Completeness theorems for noncryptographic fault-tolerant distributed computation. In Proc. of the 20th Annu.
ACM Symp. on the Theory of Computing, pages 1{10, 1988.
[12] J. Benaloh. Veriable Secret-Ballot Elections. PhD thesis, Yale University, 1987.
[13] J. Benaloh. Dense probabilstic encryption. In Proc. of the Workshop on Selected
Areas of Cryptography, pages 120{128, May 1994.
[14] M. Blum and S. Micali. How to generate cryptographically strong sequences of
pseudo-random bits. SIAM Journal on Computing, 13:850{864, 1984.
[15] D. Boneh and M. Franklin. Ecient generation of shared rsa keys. In Advances
in Cryptology - CRYPTO '97, pages 425{439. Springer-Verlag, 1997. THe full
version appears on the web at theory.stanford.edu/ dabo/pubs.html.
[16] G. Brassard, C. Crepeau, and J. M. Robert. All-or-nothing disclosure of secrets.
In Advances in Cryptology - CRYPTO '86, pages 234{238, 1987.
[17] C. Cachin, S. Micali, and M. Stadler. Computationally private information retrieval with polylogarithmic communication. In Advances in Cryptology - EUROCRYPT '99, pages 402{414, 1999.
[18] R. Canetti. Studies in Secure Multiparty Computation and Applications. PhD
thesis, Weizmann Institute, 1995.
[19] R. Canetti. Security and composition of multi-party cryptographic protocols. Theory of Cryptography Library, Record 98-18. Available online from
philby.ucsd.edu/cryptolib.html, 1998.
[20] R. Canetti, U. Feige, O. Goldreich, and M. Naor. Adaptively secure multiparty computation. In Proc. of the 28th Annu. ACM Symp. on the Theory of
Computing, pages 639{648, 1996.
[21] D. Chaum, C. Crepeau, and I. Damgard. Multiparty unconditionally secure
protocols(extended abstract). In Proc. of the 20th Annu. ACM Symp. on the
Theory of Computing, pages 11{19, 1988.
121
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
[22] B. Chor and N. Gilboa. Computationally private information retrieval. In Proc.
of the 29th Annu. ACM Symp. on the Theory of Computing, pages 304{313,
1997.
[23] B. Chor, N. Gilboa, and M. Naor. Private information retrieval by keywords,
December 1996. private communication.
[24] B. Chor, N. Gilboa, and M. Naor. Private information retrieval by keywords.
Technical Report TR CS0917, Technion, 1997. Full version submitted for publication in Designs, Codes and Cryptography.
[25] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan. Private information retrieval. In Proc. of the 36th Annu. IEEE Symp. on Foundations of Computer
Science, pages 41{50, 1995.
[26] C. Cocks. Split knowledge generation of rsa parameters. In M. Darnell, editor, Cryptography and Coding, 6th IMA international conference, pages 89{95.
Springer-verlag, December 1997.
[27] C. Cocks. Split generation of rsa parameters with multiple participants, 1998.
On-line version at www.cesg.gov.uk/downlds/math/rsa2.pdf.
[28] N. DeBruijn. On the number of uncancelled elements in the sieve of eratosthenes.
Proc. Neder. Akad., 53:803{812, 1950. Reviewed in Leveque Reviews in Number
Theory, Vol. 4, Section N-28, p. 221.
[29] A. DeSantis, Y. Desmedt, Y. Frankel, and M. Yung. How to share a function
securly. In Proc. of the 26th Annu. ACM Symp. on the Theory of Computing,
pages 522{533, 1994.
[30] Y. Desmedt. Threshold cryptography. European Transactions on Telecommunications and Related Technologies, 5(4):35{43, July-August 1994.
[31] Y. Desmedt and Y. Frankel. Threshold cryptosystems. In Advances in Cryptology
- CRYPTO '89, volume 435 of Lecture Notes in Computer Science, pages 307{
315, 1990.
[32] G. Di-Crescenzo, Y. Ishai, and R. Ostrovsky. Universal service-providers for
database private information retrieval. In Proc. of the 17th Annu. ACM Symp.
on Principles of Distributed Computing, pages 91{100, 1998.
[33] G. Di Crescenzo, T. Malkin, and R. Ostrovsky. Single database private information retrieval implies oblivious transfer. manuscript, 1999.
122
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
[34] S. Even, O. Goldreich, and A. Lempel. A randomized protocol for signing contracts. Comm. of ACM, 28:637{647, 1985.
[35] Y. Frankel, P. D. MacKenzie, and M. Yung. Robust ecient distributed rsa-key
generation. In Proc. of the 30th Annu. ACM Symp. on the Theory of Computing,
pages 663{672, 1998.
[36] M. Fredman, J. Komlos, and E. Szemeredi. Storing a sparse table in o(1) worst
case access time. Journal of the ACM, 31:538{544, 1984.
[37] Y. Gertner, S. Goldwasser, and T. Malkin. A random server model for private
information retrieval (or how to achieve information theoretic PIR avoiding data
replication). In Proc. of 2nd RANDOM, 1998.
[38] Y. Gertner, Y. Ishai, E. Kushilevitz, and T. Malkin. Protecting data privacy in
private information retrieval schemes. In Proc. of the 30th Annu. ACM Symp.
on the Theory of Computing, pages 151{160, 1998.
[39] N. Gilboa. Private retrieval of blocks. manuscript.
[40] N. Gilboa. Joint generation of rsa keys. In Advances in Cryptology - CRYPTO
'99, pages 116{129, 1999.
[41] N. Gilboa and Y. Ishai. Private information storage in constant communication
rounds. manuscript, 1999.
[42] O. Goldreich. Towards a theory of software protection and simulation by oblivious rams. In Proc. of the 22nd Annu. ACM Symp. on the Theory of Computing,
pages 182{194, 1990.
[43] O. Goldreich. Secure multi-party computation (working draft). Manuscript,
1998.
[44] O. Goldreich. Modern Cryptography, Probabilstic Methods and Pseudo Randomness. Springer-Verlag, 1999.
[45] O. Goldreich, S. Goldwasser, and S. Halevi. Eliminating decryption errors in
the ajtai-dwork cryptosystem. In Advances in Cryptology - CRYPTO '97, pages
105{111, 1997.
[46] O. Goldreich, S. Goldwasser, and S. Halevi. Public-key cryptosystems from
lattice reduction problems. In Advances in Cryptology - CRYPTO '97, pages
112{130, 1997.
123
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
[47] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game
(extended abstract). In Proc. of the 19th Annu. ACM Symp. on the Theory of
Computing, pages 218{229, 1987.
[48] O. Goldreich and R. Ostrovsky. Software protection and simulation on oblivious
rams. Journal of the ACM, 43:431{476, 1996.
[49] S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and
systems science, 28:270{299, 1984.
[50] J. Hastad. Pseudo-random generators with uniform assumptions. In Proc. of the
22nd Annu. ACM Symp. on the Theory of Computing, pages 395{404, 1990.
[51] R. Impagliazzo, L. Levin, and M. Luby. Pseudo-random number generation
from one way functions. In Proc. of the 21st Annu. ACM Symp. on the Theory
of Computing, pages 25{32, 1989.
[52] R. Impagliazzo and S. Rudich. Limits on the provable consequences of oneway permutations. In Proc. of the 21st Annu. ACM Symp. on the Theory of
Computing, pages 44{61, 1989.
[53] Y. Ishai. Single-server, sub-linear communication pir implies secret key exchange,
1999. private communication.
[54] Y. Ishai and E. Kushilevitz. Improved upper bound on information theoretic
private information retrieval. In Proc. of the 31th Annu. ACM Symp. on the
Theory of Computing, 1999.
[55] M. Ito, A. Saito, and T. Nishizeki. Secret sharing schemes realizing general access
structures. In Proc. IEEE Global Telecommunication Conf., Globecom 87, pages
99{102, 1987.
[56] D. Knuth. The art of computer programming, volume 3. Addison Wesley, 1973.
[57] E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University
Press, 1997.
[58] E. Kushilevitz and R. Ostrovsky. Single-database computationally private information retrieval. In Proc. of the 38th Annu. IEEE Symp. on Foundations of
Computer Science, pages 364{373, 1997.
[59] M. Luby. Pseudorandomness and Cryptographic Applications. Princeton University Press, 1996.
124
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
[60] E. Mann. Private access to distributed information. Master's thesis, Technion Israel Institute of Technology, Haifa, 1998.
[61] S. Micali and P. Rogaway. Secure computation. In Advances in Cryptology CRYPTO '91, pages 392{404, 1991.
[62] D. Naccache and J. Stern. A new public-key cryptosystem. In Advances in
Cryptology - EUROCRYPT '97, 1997.
[63] M. Naor and B. Pinkas. Oblivious transfer and polynomial evaluation. In Proc.
of the 31th Annu. ACM Symp. on the Theory of Computing, pages 245{254,
1999.
[64] M. Naor and O. Reingold. Number-theoretic constructions of ecient pseudorandom functions. In Proc. of the 38th Annu. IEEE Symp. on Foundations of Computer Science, pages 458{467, 1997. Full on-line version at
www.wisdom.weizmann.ac.il/ reingold/PAPERS/qdh.ps.gz.
[65] R. Ostrovsky. Software Protection and Simulation on Oblivious RAMs. PhD
thesis, M.I.T., 1992.
[66] R. Ostrovsky and V. Shoup. Private information storage. In Proc. of the 29th
Annu. ACM Symp. on the Theory of Computing, pages 294{303, 1997.
[67] T. Pederson. A threshold cryptosystem without a trusted party. In Advances in
Cryptology - EUROCRYPT '91, pages 522{526, 1991.
[68] G. Poupard and J. Stern. Generation of shared rsa keys by two parties. In
ASIACRYPT'98, pages 11{24. Springer-Verlag LNCS 1514, 1998.
[69] P. Pudlak and V. Rodl. Modied ranks of tensors and the size of circuits. In
Proc. of the 25th Annu. ACM Symp. on the Theory of Computing, pages 523{531,
1993.
[70] R. Rivest, A. Shamir, and L. Adelman. A method for obtaining digital signature
and public key cryptosystems. Comm. of the ACM, 21, 1978.
[71] A. Shamir. How to share a secret. Communications of the ACM, 22:612{613,
1979.
[72] J. P. Stern. A new and ecient all{or-nothing disclosure of secrets protocol. In
ASIACRYPT'98, pages 357{371. Springer-Verlag, 1998.
125
Technion - Computer Science Department - Ph.D. Thesis PHD-2001-02 - 2001
[73] M. Wegman and J. Carter. New hash functions and their use in authentication
and set equality. Journal of Computer and System Sciences, 22(3):265{27, 1981.
[74] A. Yao. Theory and applications of trapdoor functions. In Proc. of the 23th
Annu. IEEE Symp. on Foundations of Computer Science, pages 80{91, 1982.
[75] A. Yao. How to generate and exchange secrets. In Proc. of the 27th Annu. IEEE
Symp. on Foundations of Computer Science, pages 162{167. IEEE Press, 1986.
126
Download