Sharing with Limited Trust: Erica Y. Yang Jie Xu

advertisement
Sharing with Limited Trust:
An Attack Tolerance Service in Durham e-Demand Project
Erica Y. Yang
Jie Xu
Keith H. Bennett
Department of Computer Science
University of Durham
South Road, Durham DH1 3LE, U.K
{Erica.Yang, Jie.Xu, Keith.Bennett}@durham.ac.uk
Abstract
The unique characteristics of the Grid pose significant new security challenges that demand for new
solutions. This paper argues that only limited trust should be placed in the grid environment. The trust
relationship among grid nodes may be valid only within the lifetime of a submitted job. We focus on two
key security challenges centred on the trust issue: protecting the intention (privacy) of users against
untrusted nodes and detecting job tampering against malicious attacks.
We propose to use an attack-Tolerant private Information Retrieval (TIR) scheme to address the above
problems. A generic implementation framework is presented for implementing a TIR service in a
distributed database environment. Experimental results show that incorrect results reconstructed from
corrupted data can be detected with a probability arbitrarily close to one, thus masked from the user.
Our current implementation exhibits good performance: for example the service takes less than 32%
extra processing time to reconstruct a correct result in the presence of two malicious servers from a
total of five servers, in comparison with normal situations. The total processing time takes much less
than one second even in the presence of malicious attacks.
Keywords: Security, Privacy, Trust, Intention Protection, Attack Tolerance, Fault Tolerance, Error
Detection and Result Verification.
1. Introduction
Grid computing is concerned with large scale and
dynamic information sharing and coordinated resource
sharing over distributed “Virtual Organisations” (VOs).
In the web service based grid infrastructure (e.g.,
OGSA), the collaboration relationship among
participating parties can be formed on the fly. The
emergence of such new infrastructure support introduces
a lot security challenges. Examples include the
authentication and authorisation mechanisms in the
highly dynamic Grid environment. These problems are
the focuses of current OGSA security efforts.
The GT3 security model for OGSA [WSFB03] is
concerned with standardising ways to do 3E: Express
(security policies), Exchange (security tokens), and
Establish (trust relationships) among mutually untrusted
participants who may associate with multiple VOs. The
rationale behind the standardisation process is to enable
dynamic discovery and automated interpretability
among heterogeneous security services with minimal
human intervention.
Like conventional security model, the OGSA security
model is mainly focussed on prevention based strategy,
i.e., to keep the bad guys out and make their job hard.
This strategy works fine in traditional distributed
systems with tight security controls in a restricted scale.
However, for the Grid, this is not an easy job. In the
following, we will now look at the problems of this
strategy from both a security and reliability perspective.
The Grid is hardly a trustworthy environment
[WSFB03]. First of all, the scale of the Grid determines
its heterogeneous nature, which makes it very difficult
to effectively keep out outsiders and manage insiders.
Second, the dynamic characteristic of the Grid means
that it is sometimes infeasible to make a clear distinction
between collaborators and counterparts because the trust
relationship among participating parties can change all
the time. Indeed, this characterisation is a true reflection
of the nature of “Virtual Organisations” (VOs). The
dynamic nature of the Grid also means that a service
requestor may have limited knowledge about remote
nodes before their interactions begin. This has very
challenge security implications. Because once the job is
out, it is entirely up to the service provider to determine
how to process the job and little protection can be
imposed on protecting the job (hence, the requestor).
For security-critical applications, this is a real concern.
Given the untrustworthiness of the Grid, we argue that:
only limited trust can be placed on the Grid
environment.
We now discuss the trust problem from a reliability
point of view. Grid applications usually involve long
time processing on remote nodes. It is often not
convenient to resubmit a job due to practical reasons, for
example, charges or timeliness. Preventing the
occurrence of errors in the highly dynamic Grid
environment is a challenge task. Therefore, it is
imperative to have error detection and fault tolerance
techniques in place to enable a reliable grid.
In the context of this paper1, protecting the privacy of
a service requestor sr is concerned with a method of
hiding the identity of the information sr is interested in
(i.e., the intention) against any attacks occurring during
communications and on the service provider side. This
problem in the domain of information retrieval is called
the Private Information Retrieval (PIR) problem, which
was introduced by Chor et. al. in 1995 [CGKS95]. Due
to the practical significance of the problem and the
novelty of the problem itself, the PIR problem has been
attracting vast interests from security community, e.g.,
[A97, CGN97, KO97, CMS99, BI01, BS02, and KO00].
Here is an example illustrating how serious this
privacy problem can be in a Grid environment. Suppose
a group of people conduct a security-sensitive
collaboration over the Grid. Querying credential
services (e.g., certificate authority) is a standard
approach to enable private and authenticated
communications and establish trust relationships in the
Grid [WSFB03]. However, the downside of this
approach is the credential services can learn lots of
information (e.g., the people involved, the time, and
how frequent they collaborate) about the collaboration
simply by observing the transactions made. While
reliable communications can only defense against
outsiders, this approach can do little to prevent privacy
attacks in the presence of insiders (e.g., the credential
services administrators).
Privacy hiding techniques can help to reduce the risk
of using the Grid for collaboration activities and
improve people’s trust on the Grid. For example, after
collaboration ends, collaborating parties will not be able
to “sell” each other’s information. In the example given
above, the information obtained by the credential
services can be largely restricted.
We, therefore, focus on two key problems centred on
the trust issue: protecting the intention (privacy) of users
against untrusted nodes and detecting corrupted results
against malicious attacks. The application domain
considered is information retrieval, which is an essential
part of information sharing on the Grid. Both problems
are highly related. Therefore, a balanced solution to both
problems can help to improve people’s confidence on
the Grid without scarifying either one. However, these
problems are badly addressed in the current OGSA
security specification [WSFB03]. The privacy problem
is of crucial importance to security-sensitive grid
service. While a satisfactory solution to the latter
problem can largely improve users’ confidence on using
the grid for serious real world applications. We propose
a tolerance strategy to deal with these problems by
looking at technical approaches to “disable” or “limit”
the capability of untrusted nodes in the Grid
environment.
The contributions of this paper are three folds. 1. We
identify a new security problem for the Grid. 2. We
propose to use an attack-Tolerant private Information
1
If the term privacy protection has different meanings, it will be
mentioned explicitly.
Retrieval (TIR) scheme to address both problems for
information retrieval applications on the Grid. To the
best of our knowledge, it is the first of its kind in the
Grid community. Unlike any other existing Grid
security solutions, this scheme does not rely on trusted
third parties. Subject to the assumptions set out in
Section 3, the scheme is also interesting because a client
can detect corrupted results without rely on
conventional cryptographic techniques (e.g., digital
signatures) and reconstruct correct results with a
probability of arbitrarily close to one. 2. Implementing a
practical TIR service can be difficult. The design and
implementation of a TIR service in a distributed
database environment is presented with experiments
conducted in a malicious fault-injected environment.
2. Related Work
To the best of our knowledge, there is little published
work on the exact problems we are considering in the
Grid community. Therefore, this section will describe
related work in a broad sense. Our work relates to two
closely related fields: privacy protection and fault
tolerance.
According to the assumptions made, existing privacy
protection techniques can be classified into three
categories: the existence of a trusted proxy, the
availability of (privacy) negotiation mechanisms, and
the availability of large population of redundant
resources.
The representative approaches from the first category
are www.anonymizer.com and www.rewebber.com.
These web sites act as a “proxy server” to request
senders. Using these techniques, the identity of a
requestor can be hidden from web site owners. One
distinct feature about Rewebber.com is that it masks the
location of documents requested from request sender.
This approach can protect the privacy of the document
publisher. Both techniques rely on the trustworthiness of
proxy servers.
Examples from the second category are P3P [P3P01]
and Appel [Appel02]. P3P is actually the de-facto
standard on the Internet and ubiquitous computing
environment at the time of writing this paper. The focus
of existing work in the area aims to automate the policy
checking and negotiation processes. The success of
these techniques depends on the “good faith” of service
providers.
Crows [RR98] is a typical example from the last
category. By blending a request and its originator with a
large number of requests and locations, it is hard to tell
the initial origin a request. The larger the population is
the harder it becomes to ping pong the exact location of
the originator.
The above privacy protection techniques aim to hide
the identity of a sender and/or location of information
(e.g., sender’s names, the origin of a message or a
document). We now proceed to describe the work in the
PIR community. The majority of PIR results are
theoretical. Before 2001, little progress has been made
along the line of Practical implementation of PIR
(PPIR). To the best of our knowledge, only the
researchers in IBM Watson and Dartmouth College
have
attempted
PPIR [SS01,
IS03].
Both
implementations are preliminary and very limited
experimental results are provided. Furthermore, each
PIR server is assumed to have a tamper-resistant secure
co-processor which can provide the environment for
secure computations. However, this approach is hard to
scale because of inherent limitations (e.g., limited
memory and computation power) of co-processors.
In the fault tolerance computing area, there is a large
body of work on cryptographic signing schemes (e.g.,
digital signatures and Message Authentication Code) to
detect message corruption. Examples include Castro’s
PBFT [C01], Zhou’s COCA [ZBR00], and the EU
MAFTIA project [CP99]. However, the use of
cryptographic techniques makes assumptions on the
reliability of key management and key distribution
protocols, which is actually not easy to achieve in a real
Grid environment.
3. System Model
Consider a distributed system with a set of processing
nodes connected by a communication network. It is
assumed that any message passing among nodes is
bounded within a given period of time. The system is
comprised of three types of participants: users, clients,
and replicas. A set of k replicated database servers
(called replicas) {S1, S2, …, Sk} and a set of clients run
on separated processing nodes. The replicas provide
information services to the clients and the clients take
inputs from the users of the system. Each user has his
trusted client (e.g., a computer owned by the user).
A replica can simultaneously deal with multiple
clients’ queries, while a client may need to send queries
to multiple replicas to utilise a service. There are two
types of message exchanges among the participants. The
messages exchanged between the users and the clients
are called inputs and results. The messages exchanged
between clients and replicas are called queries and
answers. The communication channels between clients
and replicas are one-to-one.
The information stored at the replicas is modelled as
a character string x = x1 x2 … xn, where n is the length of
the information. For example, if the information is a
database, each character represents a record in the
database and n is the total number of records in the
database. In the normal situation, the information is
identical on each of the replicas. Each character xj,
where j ∈ {1, 2, …, n}, is viewed as an integer taken
from a given integer set [X-1] = {0, 1, …, X-1}. For
example, if we encode characters using the ASCII code,
an eight-bit byte character can be viewed as an integer
taken from the set {0, 1, …, 255}. The range of this set
may be adjusted dynamically according to the semantics
of the actual information. In order to perform certain
desirable operations on x ∈ [X-1]n, we associate the set
[X-1] with a finite field GF (q), where q is a prime
number and q ≥ X. Let [q-1] = {0, 1, 2, …, q-1} be the
set of q elements of GF (q). We have
x ∈ [X-1]n ⊆ [q-1]n.
The replicas can be malicious and behave arbitrarily
(e.g., the information stored can be deliberately
modified in an arbitrary way). The communication
channels (e.g., queries and answers) between clients and
replicas are not secure. But the communication between
the users and the clients is secure. Messages may be
dropped, modified and duplicated. Our system does not
rely on authenticated channels to guarantee privacy,
although conventional authentication techniques can be
used to defense against malicious users and replicas to
certain extend. Therefore, corruptions can occur either
on the replicas or during the communications. The exact
location of corruptions makes no difference to the
clients of the system. Therefore, we take replica faults
into account only for the clarity of presentation2.
Each replica can be in one of two states: correct and
corrupted. A corrupted replica can exhibit either one of
the following faults: fail-stop or malicious. However,
we limit the number of corrupted replicas by a reliability
parameter f, which is the maximum number of replicas
corrupted. A group of malicious replicas can collude
together to violate users’ privacy. However, the number
of replicas colluding with each other is limited by a
privacy parameter t, which is the maximum number of
replicas in collusion.
k, f and t satisfy the following condition:
k
t+1+f.
Protection of Privacy: suppose that a user is
interested in the character xi stored in a system. Within
the system, a client takes i as an input from the user,
where i ∈ {1, 2, …, n}. The input represents the
intention of the user. In order to keep this intention
private from any replica, the client constructs k query
functions Q1, …, Qk, based on i and some random
inputs, and generates a set of random and thereby
independent queries. These queries will then be sent to
the replicas respectively.
There are k answer functions A1, …, Ak that are
defined for the replicas respectively and perform readonly operations on x. Based on the query submitted from
the client, each replica executes its answer function to
generate an answer and sends the answer back to the
client. The client will then reconstruct a result xi locally
by executing a reconstruction function over the answers
returned. xi is subject to a verification function, which
has two outcomes: failed or successful. The system will
continue to perform reconstruction and verification
functions until it is successful. Then xi will be passed to
the user as the final result.
2
For example, the case that a replica is malicious covers three
possible scenarios: the replica is malicious, the communication
channel that connects the replica and the client is malicious and both
the replica and the communication channel are malicious.
4. Overview of a TIR scheme
In order to present the paper in a self-contained way, we
include an overview of a TIR scheme (previously
known as FT-PIR scheme) in this section. The
presentation will be entirely informal to avoid repetition.
For its formal presentation (including its construction
and relevant proofs of its properties), interested readers
are referred to [YXB02a] and [YXB02b]. (Unless
otherwise specified, the symbols appeared in this section
follow their definitions presented in the previous
section.)
Informally speaking, a TIR scheme describes how to
generate the messages passed between a client and
replicas. The scheme consists of four types of functions:
query function, answer function, reconstruction function
and verification function. The answer function is
performed on each of the replicas while the remaining
ones run on the client side.
We now describe each of the functions in turn. Each
query function is a mapping between the input from a
user and a query. Each answer function is a mapping
between a query and an answer. Each reconstruction
function is a mapping between a set of answers to a
result. The result reconstructed may be correct or
corrupted and is subject to the verification function.
The scheme has four properties: correctness, privacy,
safety, and liveness. Subject to the condition specified in
the previous section, the scheme guarantees the
correctness and safety property in a probabilistic way
(the probability of correctness can be set to be arbitrarily
close to one. Correspondingly, the probability of safety
can be set to be close to zero) but ensures the remaining
two properties with probability one.
The correctness property states that there exists at
least one set of replicas (i.e. availability) whose answers
can be used to reconstruct the intended result with a
high probability. The privacy property guarantees that
provided the number of replicas in collusion does not
exceed t, none of the replicas can obtain any information
about the intention. The safety property ensures the
client will only get an incorrect result with a small
probability. Finally, provided the condition specified in
the previous section is satisfied, the liveness property is
guaranteed.
5. Implementation Model and Discussions
This section presents an implementation model for a
TIR service in a distributed environment. We first
discuss why practical PIR implementations are difficult,
and then proceed to describe our solutions.
Overview of Our Previous Implementations
Our prototype implementations of PIR and TIR
(known as FT-PIR previously) were reported in
[YXB02a, YXB02b], 2002. To the best of our
knowledge, they are the first ones implementing such
schemes on a replication based database environment.
These implementations demonstrate the feasibility of
implementing PIR and FT-PIR on a real distributed
database environment. Previous experimental results
were also obtained through experimenting with these
implementations.
Why Practical PIR Implementations are Difficult?
Index knowledge assumption: There are several
inherent limitations of these implementations. As noted
by an early paper [CGN97], they require the user knows
exactly the physical index of the intended record in a
database. This assumption has the following limitations:
i) the physical index of records can change all the time,
it is hard to get hold of this information easily; ii) in a
modern information system, the underlying data
manipulation mechanisms handle such information in
the background, therefore, it is transparent to user level
applications. We call this index knowledge assumption.
However, this assumption is widely used as a part of the
PIR model in all PIR construction (implemented or not)
which makes them hard to implement in a straight way.
Although the assumption is standard, it is a strong
assumption. In a real system, a user usually knows what
kind of information they want, e.g., Bob’s certificate.
For example, the user Alice wants to retrieve Bob’s
certificate from a certificate authority server (CA) but
she doesn’t want to reveal her intention to the CA. Our
model assumes every piece of information has a unique
identifier, for example, a unique certificate id for any
certificates stored in the CA. The information of
identifier is public knowledge to each user. In reality,
we can relax this requirement by including an identifier
resolution protocol in the service. If the user supplies
the name of the information, the TIR service will query
a directory service with the name and returns two pieces
of information to the user: the identifier of the name and
the total number of identifiers in the directory. This
directory service acts just like the DNS service. This
adaptation removes the index knowledge assumption
and paves the way for integrating TIR service with real
applications.
Full processing vs. view processing: The server side
computation of PIR schemes can be classified as: preprocessing and online processing. Both computation
processes the entire database on each individual server
respectively. We call it full (pre-, online) processing.
Only one of them will be chosen for any specific PIR
scheme. The pre-processing aims to reduce the cost of
online processing by changing the data to specific
format. This approach is used by [IS03], which reduces
the cost of online processing to a constant. In terms of
online processing, this is optimal, however, at the cost
of full pre-processing. Without pre-processing, online
processing needs to compute over an entire database.
Our previous TIR implementation does this, i.e., it does
not have pre-processing but requires full online
processing.
The overall goal of full processing is to guarantee the
perfect privacy property in PIR schemes, i.e., each
record has the equal probability to be the one a user
wants. For example, if the full processing is over a 100-
record database, each record has 1/100 chance to be the
one intended. Obviously, full processing is not flexible.
Generally speaking, the level of privacy protection is
inversely proportional to that of computation, including
both client side and server side computation. This is the
trade-off between computation and privacy. A better
strategy is to let the user choose the amount of time he is
willing to wait. Then, a PIR service determines the
appropriate level of computation, thus the level of
privacy protection. For example, if it takes 100
milliseconds to do a full online processing, the privacy
level is 1/100. If the user only wishes to wait 10
milliseconds, the privacy level becomes 1/10, i.e., we
have 1-in-ten chance to know which one the user wants.
The current implementation lets the user to specify
the number of records he wishes to compute over, called
a view. Thereby, only a view of a database will be
processed rather than an entire database. We call it view
processing. (We shall describe how to determine this
view in the paragraphs followed). We need to point out
that the size of a view is independent of the size of the
database. We are in the progress of extending it to the
above fully adjustable PIR service.
Client
Server1
2. If needed, start the
identifier resolution
Protocol to obtain
the identifier i.
6. Generate
view V (RIS, a)
1. <INPUT, , s, b>
5. <QUERY, RIS, s, Q1>
3. Generate a random
index set
8.<ANSWER, A1>
RIS = MF (b, i, r) ,
10. <RESULT, Result > where |RIS| = b
5.
<Q
4. Generate queries
UE
RY
User
Q1 = QF (i, R),
,R
8
…,
.<
IS
AN
,s
Qk = QF (i, R)
SW
,Q
ER
k>
9. Reconstruct Result
,A
= RF (Aj1, …, Ajt, Ajt+1)
k>
b
DB1
: keyword of intended record.
s: schema, where |s| = a.
b: number of records in a view.
r: a set of random numbers
R: a set of random numbers.
MF: meta function.
QF: query function.
AF: answer function.
RF: reconstruction function.
a
7. Compute answer
A1 = AF (V, Q1)
Serverk
6. Generate
view V (RIS, a)
b
DBk
a
7. Compute answer
Ak = AF (V, Qk)
Figure 1: Implementation model for TIR
The Implementation Model
We will explain the implementation model and how it
works as follows3. The model is illustrated in Figure 1.
This model has two parts: client side and server side.
A user interacts with a client, who can be the user’s own
machine or a trusted computing based chosen by the
user. There are two types of messages exchanged
between the user and the client: INPUT and RESULT
messages. The server side comprises of k replicas. Each
of them connects to a backend database. There are two
types of messages exchanged between the client and
each individual replica: QUERY and ANSWER
messages. A message is a tuple which contains a
message (aka. msg) identifier and a number of msg
items as the following:
3
Note that this implementation model does not limit the specific
domain of PKI applications. It can be used in any information retrieval
applications.
<msg identifier, msg item, …, msg item>.
The following steps specify how the implemented
system works.
Step 1: take inputs. The client takes a message
is the intention
of the user, e.g., the name Bob, the schema s the user
wants to obtain about Bob, e.g., Bob’s certificate, his
salary, and his address, called a schema where the
length of s is a, and b is a security parameter chosen by
the user. The higher b is the better privacy protection
becomes. In our current implementation, b is the number
of records used in the server side computation.
Step 2: identifier resolution. The service will set out the
identifier resolution protocol to resolve the identifier i of
if is not an identifier and get the total number of
identifiers in the directory which is denoted by n.
Step 3: generate a random index set. Based on b, i and a
set of random numbers r, a meta function MF is executed
to generate a ordered index set RIS of size b, which
contains i and b - 1 randomly selected identifiers. Note
that b is in the range of [1, n].
Step 4: generate queries. Upon i and another set of
random numbers R, query function QF is used to
generate queries Q1, …, Qk, one for each replica.
Step 5: send queries. For j = 1, …, k, a query message is
sent to the replicas, respectively. The query message is
of the form <QUERY, RIS, s, Qj>.
Step 6: generate an on-line view. Based on RIS and s,
the program of serverj selects records from DBj and
forms an on-line view V. An on-line view in this model
is a two-dimensional table generated on the fly where b
is the number of records selected and a is the length of
the schema s.
Step 7: compute an answer. Based on the view V and Qj,
the replicaj uses answer function AF to compute an
answerj, and thus, obtains the answer message
<INPUT, , s, b> from the user, where
<ANSWER, Aj>.
Step 8: send the answer back. The answer message is
sent back to the client.
Step 9: reconstruct a result. Based on t + 1 correct
answers (Aj1, …, Ajt+1), the reconstruction function R is
able to reconstruct the intended Result . Each result
reconstructed is subject to the validation of the
verification function, which is described in Section xxx.
Step 10: deliver a result. The result message
<RESULT, Result > is then delivered to the user.
6. Experiments, Results and Discussions
As discussed in the last section, our previous
experimental studies were conducted based on a simple
implementation model where computations on the
server-side were performed over entire databases. Both
normal and
fail-stop cases were examined in those
studies.
However, this does not fully exhibit the potential of
the TIR scheme as one of the distinct features of TIR is
its capability to tolerate malicious attacks. Therefore,
this section is mainly devoted to examine the fault
tolerance characteristics of the newly implemented TIR
service. We start by describing the experimental
environment and certain categories of malicious attacks
which can be exploited by malicious attackers. We end
the section by describing how we simulate malicious
attacks and analysis the experimental results obtained.
Experimental Environment
Our system is implemented in Java J2SDK 1.4.0.
Both replicas and clients are multithreaded. The pointto-point communication channels between replicas and
clients are implemented using non-encrypted TCP/IP
sockets.
QUERY messages are generated randomly and
independently while ANSWER messages are generated
from the QUERY messages. The randomness of Answer
messages depends on those of the QUERY messages.
Therefore, encryption is not needed to ensure the
confidentiality of the channels.
The client machine is a time-sharing Sun Sparc E450
with four 250Mhz processors running SunOS 5.8. Up to
five server machines are used with the same
specification: 400 MHz Pentium IIs (celeron) running
RedHat Linux (6.0 or 7.2), 3Com EtherLink XL 10Mb
Ethernet NIC, 64 Mbytes RAM, and a 4 Gigabyte hard
disk. The client machine resides within the Durham
campus LAN while the server machines are connected
by a separated 10MB/sec Ethernet LAN which directly
connects to the campus LAN. The network delay is less
than 10ms.
The software used is: Sun J2SDK 1.4.0, MySQL
3.23, and MySQL JDBC Driver mm.mysql-2.0.4.
Software is installed on every machine to ensure the
independency of severs. Each server hosts two MySQL
databases: one contains 3,000 records and the other
contains 46,000 records. The size of 3000-record
database is about 1.04M. The size of the 46,000-record
database is about 16M. The smaller database is used
mainly to save the testing time and the larger database is
used to demonstrate the feasibility of FT-PIR on
databases of practical size. Unless otherwise stated, the
standard parameter settings for all experiments are: X =
255, e = 0.03, |s| = 110, b = 10 and n = 3,000, where X is
the valid range of data stored in the database, e is the
probability of undetected errors, s is the schema
specified by the user, b is the view size and n is the
number of the records in the database.
Experimental Methods
The TIR scheme is designed to defense against
malicious attacks. The standard PIR model limits the
communications among servers to prevent collusion
attacks, where a number of servers exchange the
information obtained to violate the privacy of a user. We
call this assumption as limited communication
assumption. So long as the pieces of information
obtained by the servers do not exceed a threshold, the
privacy protection is guaranteed. For the same reason,
our TIR scheme also uses this assumption.
In conventional fault tolerant systems, cryptographic
and/or (local) voting techniques are used to verify the
integrity of data. Typical examples include Castro’s
PBFT [C01] and Zhou’s COCA [ZBR00]. However,
these techniques usually require a broadcast or multicast
channel in the systems. Therefore, this approach is not
viable for PIR and TIR schemes due to the limited
communication assumption of PIR schemes.
As the client may only have limited knowledge of the
data on the server side, it is difficult to use conventional
techniques on the client side to verify the integrity of
answers. That is why we introduce a result verification
to validate reconstructed results.
In order to ensure that each server has the same
probability to become faulty, we corrupt the servers'
data in turn in separate runs. For example, in a fiveserver case, any three answers out of the five servers are
sufficient for reconstructing a result. There are ten
server combinations in total. In a 1000 times tests, every
100-tests targets one combination out of a ten possible
probability. For each run the data of the two servers in
the combination is intentionally modified.
Attack Categories: Our experiments focus on
simulating malicious attacks to the views of the servers
and the answers returned from the servers. As shown in
Table 1, there are five categories of attacks we
considered. The first three attack categories are
concerned with the views while the last two categories
target the answers.
Table 1. Attack Categories.
Category
Attack Description
1. OneRecord
Modify only one record in a view
2. OneRecord-oneChar
Modify one character of a record in a view
3. RecordSet
Modify a set of records in a view
4. Answer
Modify several characters in an answer
before sending back to the client
5. Answer-oneChar
Modify one character in an answer before
sending back to the client
Measurements: For each experiment we measure the
following operations: 1) the time taken to prepare
queries (TPreQ), 2) server side query processing time
(TProQ), 3) the time taken to perform reconstruction
(TPR) and 4) the total processing time (TPT). As the
time taken to inject faults is generally less than 2% of
TPT and is excluded in the timing measurements shown in
this paper, the details of the fault injection process is
omitted in our paper. We have the following
relationship among the timing measurements:
TPT = TPreQ + TProQ + TPR.
In addition, we also measure the impact of changing
b, the security parameter.
Results and Discussions
Table 2 shows the breakdown of costs of various
operations in normal cases using three replicas. All
replicas behave properly and there is no attack
occurrence in the system. We report three key
operations in the system: Query Preparation, Server Side
Computation, and Result Verification (including result
verification). These figures are means obtained from
500 independent runs.
Table 2. Breakdown of costs for various operations in
normal cases.
Phrase (normal)
Three replicas
(msec)
Percent of TPT
Query Preparation
327
48%
Server side Computation
275
40%
Result Reconstruction
84
12%
Total Processing Time (TPT)
687
100%
Table 3 compares the TPT of the system in the
presence of various attacks with that of normal cases. In
the three-replica case, one malicious replica can be
tolerated and less than 10% extra TPT time is needed for
obtaining a correct result in all categories of attack
experiments, in comparison with normal cases. In the
five-replica case, two malicious replicas can be tolerated
and less than 32% extra TPT time is needed, in
comparison with normal cases. The TPT time for all
categories of experiments is less than one second. As
also shown in the table, the variation of different attacks
has little impact on the TPT time. To the client, the
attacks attempted in our experiments have little
difference.
Figure 2 and 3 show a similar pattern of query
processing time in normal and (maliciously) faulty cases
when the view size b changes. The faulty status of the
system has insignificant impact on the TPT time as b
increases.
In these experiments, the view of size 400 appears to
be optimal as the TPT time increases dramatically when
b is greater than 400. We are investigating whether there
is any sensible explanation for this interesting
phenomenon. It is also observed that during our
experiments the time taken to prepare query increases as
b increases, but the time spent on performing
reconstruction is fairly stable. That is because
reconstruction is performed on the client side, which is
independent of the view size b.
7. Conclusions
This paper presents our thoughts of the new trust issues
introduced by the unique characteristics of the Grid. We
propose to use the TIR scheme to address the following
trust problems: privacy protection, error detection and
result verification. A generic implementation model for
TIR services is presented and the design and
implementation of a TIR service in a distributed
database environment is described. The effectiveness of
the service is evaluated through injecting certain
categories of malicious attacks into the system.
Compared with normal cases, the implemented service
performs well even in the presence of these attacks. In
the meanwhile, experimental results also demonstrate
that the occurrence of these attacks has insignificant
impact on the total processing time.
q ue ry pr oc es s ing (f au lt- f r ee)
k3
4 000
k5
k7
quer y proc es s ing (f aulty )
4000
k9
k3
k5
k7
k9
3000
time (ms)
time (ms)
3 000
2 000
2000
1000
1 000
0
0
50
2 00
350
v iew s iz e (r ec or ds )
500
50
200
350
v iew s iz e (rec or ds )
500
Figure 3. Time taken to process queries in faulty cases
(n=3,000).
Figure 2. Time taken to process queries in normal cases
(n = 3,000).
Table 3. Performance Results of Different Attack Categories in msec (n = 46,000).
attack
category
num. of
replicas
Normal
OneRecord
OneRecordoneChar
increased
%
TPT
increased
%
RecordSet
TPT
increased
%
Answer
TPT
increased
%
Answer-oneChar
TPT
increased
%
TPT
TPT
Three replicas
687
709
3%
741
8%
716
4%
699
2%
676
-1%
Five replicas
753
991
32%
952
26%
947
26%
980
30%
963
28%
References
[A97] A. Ambainis, “Upper bound on the communication
complexity of private information retrieval”, Proc. 24th Int’l
Colloquium on Automata, Languages and Programming
(ICALP’97), LNCS, vol. 1256, Springer-Verlag, Bologna,
Italy, July 1997, pp. 401-407.
[Appel02] A P3P Preference Exchange Language 1.0 (Appel
1.0), working draft, W3C, Apr. 2002.
[BI01] A. Beimel and Y. Ishai, “Information-Theoretic Private
Information Retrieval: A Unified Construction”, Proc. 28th
Int’l Colloquium on Automata, Languages and
Programming (ICALP 2001), Crete, Greece, LNCS, vol.
2076, Springer-Verlag, July 2001, pp. 912-926.
[BS02] A. Beimel and Y. Stahl, “Robust InformationTheoretic Private Information Retrieval”, Proc. of the 3rd
Conference on Security in Communication Networks, 2002.
[CMS99]
C. Cachin, S. Micali, and M. Stadler,
“Computationally Private Information Retrieval with
Polylogarithmic Communication”, Proc. Advances in
Cryptology (EUROCRYPT '99), Prague, Czech Republic,
LNCS, vol. 1592, Springer-Verlag, May 1999, pp. 402-414.
[CP02] C. Cachin and J. A. Poritz, “Secure intrusion-tolerant
replication on the Internet”, Proc. Intl. Conf. on dependable
systems and networks (DSN-2002), Washington DC, USA,
June 2002.
[C01] M. Castro, Practical Byzantine Fault Tolerance, tech.
report MIT/LCS/TR-817, Laboratory for Computer Science,
MIT, Cambridge, MA, USA, Jan. 2001.
[CGKS95] B. Chor, O. Goldreich, E. Kushilevitz, and M.
Sudan, “Private Information Retrieval”, Proc. 36th Annual
Symposium on Foundations of Computer Science
(FOCS’95), Milwaukee, Wisconsin, USA, 23-25 Oct. 1995,
pp. 41-51. Journal version: J. of the ACM, vol. 45, no. 6,
1998, pp. 965-981.
[CGN97] B. Chor, N. Gilboa, and M. Naor, Private
information retrieval by keywords, tech. report TR CS0917,
Dept. Computer Science, Technion, Israel, 1997.
[FKNT02] I. Foster, C. Kesselman, J. M. Nick and S. Tuecke,
“Grid Services for Distributed System Integration”, IEEE
Computer Magazine, 35(6): 37-46, June 2002.
[IS03] A. Iliev, S. Smith, “Privacy-Enhanced Credential
Services”, Proc. 2nd PKI Research Workshop, USA, 2003,
pp. 109-121.
[KO97] E. Kushilevitz and R. Ostrovsky, “Replication is Not
Needed:
Single
Database,
Computationally-Private
Information Retrieval”, Proc. 38th Ann. IEEE Symposium on
Foundations of Computer Science (FOCS’97), 1997, pp.
364-373.
[KO00] E. Kushilevitz and R. Ostrovsky, “One-way Trapdoor
Permutations are Sufficient for Non-Trivial Single-Server
Private Information Retrieval”, Proc. Advances in
Cryptology (EUROCRYPT 2000), LNCS, vol. 1807,
Springer-Verlag, 2000, pp. 104-121.
[P3P01] The Platform for Privacy Preferences 1.0 (P3P 1.0)
Specification, W3C, Sept. 2001.
[RR98] M. K. Reiter and A. D. Rubin, “Crowds: Anonymity
for Web Transactions”, ACM Transactions on Information
System Security, 1(1), Apr. 1998.
[SS01] S. W. Smith and D. Safford, “Practical Server Privacy
with Secure Coprocessors”, IBM System Journal, 40(3),
Sept. 2001.
[SN02] J. M. Schopf, B. Nitzberg, “Grids: The Top Ten
Questions”, Scientific Programming, Special Issue on Grid
Computing, Aug. 2002.
[WSFB03] V. Welch, F. Siebenlist, I. Foster, J. Bresnahan, K.
Czajkowski, J. Gawor, C. Kesselman, S. Meder, L.
Pearlman, S. Tuecke, “Security for Grid Services”, Proc.
12th International Symposium on High Performance
Distributed Computing (HPDC-12), IEEE Press, June 2003.
[YXB02a] E. Y. Yang, J. Xu and K. H. Bennett, “Private
Information Retrieval in the Presence of Malicious
Failures”, to be published in Proc. 26th Ann. Int’l Conf.
Computer Software and Applications Conference
(COMPSAC2002), Oxford, England, Aug. 2002.
[YXB02b] E. Y. Yang, J. Xu and K. H. Bennett, “A FaultTolerant Approach to Secure Information Retrieval ”, in
Proc. 21st IEEE International Symposium on Reliable
Distributed Systems (SRDS2002), Suita, Osaka, Japan, Oct.
2002.
[ZBR00] L. Zhou, F. B. Schneider, and R. van Renesse,
COCA: A Secure Distributed On-line Certification
Authority, tech. report 2000-1828, Dept. Computer Science,
Cornell University, Ithaca, N.Y., USA, Dec. 2000.
Download