Raluca Ada Popa: The mic working

advertisement
>> Raluca Ada Popa: The mic working? Is the mic working? Maybe I'll
put it -- is it better now? Yes. Okay. So I'll tell you about
CryptDB, which is a system for protecting the data in the database by
computing on encrypted data. This is joint work with Catherine
Redfield, Nickolai Zeldovich and Hari Balakrishnan from MIT.
So we hear many times in the news that confidential data leaks from
databases. To give you a few examples, the Homeland Security newswire
listed that between 2009 and 2011, eight million medical records were
leaked. Also another example is last year a group of hackers
infiltrated into the Sony Playstation network and were able to access
77 million user profiles, many of which contained credit card
information.
So there are many reasons why confidential data leaks, and here in this
talk we're going to consider attacks to the database server. So, for
example, hackers notoriously break into database systems and steal
sensitive information.
Or, in fact, system administrators oftentimes have root access to the
database servers and may be able to read company data such as financial
or medical, when really their only task is to maintain database
servers.
So we group all these attacks together into one thread that we call
passive database server attacks. So basically the database server can
be attacked, an adversary can have full access to a database server,
it's passive meaning that the attacker doesn't actually change the data
or issues queries. It just tries to read confidential information.
>>: The data administrator they have legitimate reasons to change a
database.
>> Raluca Ada Popa: Right. So here -- okay. So, for example, if you
look at system administrators that only have to balance the load, they
don't necessarily need to look at the data, but oftentimes they
actually can. So that's one thread model.
And just to -- for cryptographers here, by passive we mean honest but
curious adversary.
So the approach of CryptDB is to process queries on encrypted data.
The reason why we take this approach is that in this way the database
server never gets the decryption key. It only gets encrypted data.
Even if an attacker gets full access to the database server, the
attacker still cannot learn anything other than encrypted data.
So before I explain how CryptDB works, let me summarize at the high
level CryptDB's contributions. So CryptDB is the first practical DBMS
to process most SQL queries on encrypted data.
lot on technicality.
In CryptDB we focused a
One can use CryptDB, for example, to hide the database from system
administrators while still allowing them to maintain the database
servers. Or it can be used to put the database on the cloud.
One of the main contributions of CryptDB is that it has a modest
overhead. So CryptDB has a 26 percent throughput loss for DBC compared
to MySQL with no encryption. So by throughput, what it means number of
queries per second the server can execute. And TPCC is standard
industry database benchmark. And perhaps surprisingly CryptDB makes no
changes to existing debug MSs, such as processing MySQL.
And I will explain it why. And also it makes no changes to existing
applications. So basically applications can run on top of CryptDB
without knowing that they're actually running on top of CryptDB. And
that's because CryptDB exports SQL interface.
Okay. So to explain the approach of CryptDB. Let me put it in
relation to existing approaches. There are really two points in the
design, two extremes in the design space in terms of practicality and
security.
So at one extreme there are unencrypted databases. So these are very
efficient. They've been optimized for over 40 years of experience and
they're efficient because they process simple operations such as
equality, how many items are equal to 100, for example, in a column.
And also because they have specialized data structures such as indexes.
So you can think of an index as not a binary, as a search tree, that
has a database server look at items fast.
The other extreme, there's fully homomorphic encryption, which is first
constructed by Craig Gentry in 2009. So FHE shortly, it's called FHE.
FHE allows any kind of general computations to be performed on the
data. And it has great security, semantic security, which virtually
leaks nothing about the data. However, it is prohibitively
inefficient.
Yes, there have been a lot of improvements in practicality of FHE since
then. For example, Gentry Halogen and Smart [phonetic] in 2012, they
implemented a yes on top of FHE and they did some clever optimizations.
However, even then the scheme was nine orders of magnitude slower than
unencrypted computation.
And in fact even besides the cryptographic overhead, there's one main
inherent reason why FHE is impractical for databases. And here's the
reason. So in order to compute the query on a database, the client has
to express the query as a circuit over the entire database, which means
that for every single query, the entire database has to be processed.
Whereas in real database, the database server just uses an index, locks
item fast. So our ID with CryptDB is to try to come up with an
intermediate point. Ideally we'd like it to be almost as fast as
unencrypted databases while at the same time having a high degree of
security.
So the insight into the practicality of CryptDB is to try to have the
computation on encrypted data be the same as unencrypted data. So, for
example, if we do an equality check on unencrypted data, that should
turn into an equality check on encrypted data.
And also indexes will still be useful. And the insight here is that in
fact most SQL operations use a limited set of operators. So if you
can't support those with efficient encryption schemes, then we are
roughly mostly done.
And in terms of the security that achieves is, well, we will reveal the
server relations necessary to compute the query but really no relations
among the data beyond those needed to process the query.
And I will go back to explain the security in more detail. I'd like to
mention other related work. For example, search and encrypted data
initially pioneered by Song et al. and then followed by a lot of work
by [inaudible] and others, actually has a different purpose, has the
purpose of locating keywords on encrypted text, but it could be used to
process certain SQL queries such as equality. However, those are only
specific kinds of queries and sometimes database indexes cannot be used
so it's not as efficient as CryptDB.
Also system proposals that result in weaker security and functionality
and also efficiency, and many times those require significant client
side processing of the data.
So beyond the database server, the client also has to do a lot of data
filtering. So now that I explained CryptDB's approach, let's go into
the design. So as the system setup let me remind that you we have a
database server that is under attack. And an attacker can get full
access of the data, database server, but it's passive. And the
application is trusted.
There's a second part actually to CryptDB that deals with attacks to
the application as well but I'll just refer you to our paper, because
it's just an extension to the solution I show here.
So first we're going to introduce a lightweight machine called a proxy
on the client side. So it's also trusted. So the proxy only stores
the schema and the master key. So the schema is basically the name of
the tables and of the fields and data types but not actually the
content. So it's schema is short.
Whenever the application issues a query to the database server, the
proxy intercepts it, transforms it by doing certain encryptions that we
will talk about soon, and then sends it to the database server.
The server processes the queries on the encrypted data completely, then
sends the encrypted results to the proxy. The proxy decrypts them and
sends them to the application.
Note that the proxy doesn't do any query execution at all. It all
happens in the database server entirely. So let's see how do we
process queries on encrypted data, database server and I'll start with
a simple example. So consider we have the table employees database
server that has three columns, rank name and salary. In fact, we
anonymas the names of the table and column, we have table one, column
one, column two and three this is what the database server sees. Now
each grid block indicates a data item that is encrypted and initially
we encrypted with randomized encryption. So for cryptographers this is
semantic cryptographer probabilistic encryption. The reason this is
very strong security, very strong encryption scheme.
For your information I'm showing the values in the salary column
unencrypted but the server only sees the encrypted ciphertext.
Okay. So let's consider the application sends a query to the proxy
setting give me all the rows where salary equals 100. Remember that
our goal in CryptDB was to keep the computational encrypted data the
same as unencrypted data. So what we would like to do here is to have
database server just do an equality check. But we can see that that's
not possible because of the randomized encryption. The encryptions of
100 are different.
Okay. So the first simple idea is to in fact use deterministic
encryption. So we can see in this way the encryption, the two
encryptions of hundred are mapped to the same value. Now all the proxy
has to do is encrypt a value 100 with the same encryption scheme and
key and then just send that query to the server.
Now, the server can perform the equality check on encrypted data as if
the data was not encrypted to begin with.
Then it sends the encrypted results back to the proxy.
decrypts them and sends them to the application.
The proxy
So what happens if instead the application requested all values where
salary was at least 100. We can see that the deterministic encryption
scheme doesn't preserve order. So 60 is smaller than 100. But the
encryption of 60 is larger than encryption of 100. So performing
greater operation won't work. Instead the idea is to use order
preserving encryption, which are some recent encryption schemes that
preserve order.
So basically 60 is smaller than 100, then the encryption of 60 smaller
than encryption of 100. Okay. So now we can task the proxies easy.
It encrypts the value 100 with the same, with this order preserving
encryption scheme, send it to the server and now the server can perform
the greater than operation on the encrypted data as if it was encrypted
to begin with, sends back the encrypted results, and proxy decrypts
them and sends them to the application.
Okay. So this gives you an insight into our two main techniques. The
first is to use a SQL aware set of encryption schemes, basically have
encryption schemes that can cover most common SQL operations and the
second is to adjust the encryption of the database based on the
queries. So we saw that different queries require the data to be
encrypted with different encryption schemes. So we have to have a way
to adapt to adjust between those encryption schemes.
I'm going to present each of these techniques in detail. So let me
start with the first one. So on this slide I'm going to show you all
the encryption schemes we use. We use six of them. And I'm going to
show you in roughly decreasing security and but increasing
functionality.
So the first one is we call it RAND, stands for randomized encryption,
implemented with yes. Provides security, semantic security, which
basically leaks nothing about the data but it supports no computation.
The second one we call it homes, stands for homomorphic. Note here
we're using a specific kind of homomorphism [inaudible]. It's
efficient, fully homomorphic encryption that supports general
computation is not so efficient.
So this allows us to support SQL operations such as sum. And the
homomorphic encryption is roughly as secure as RAND, semantic security
as well, has strong security properties.
The third one is search, which is, which allows us to do word search,
so to locate words and encrypted text. And this is just the scheme of
Song of 2000. It enables us therefore to support the I like in SQL but
restricted type. Basically we just support whether there's full word
matches.
And search is roughly secure home as well. Then we have that seen as
deterministic encryption, and it allows us to perform equality type of
operations. So therefore in turn we can support a lot of SQL
operations such as equal different [inaudible] group by distinct and so
forth.
And remaining two are actually some new encryption schemes we provide.
The first one you join, and the join is useful for finding equality
matches between two columns.
And OP, we've seen order preserving encryption is useful for order type
of operations, which supports a lot of SQL operations such as greater,
smaller order by sort max/min greatest and so forth.
So I'm going to show how to use all this encryption schemes. First,
let me discuss briefly about the encryption schemes we provide.
>>: Can I ask a question.
CMC?
>> Raluca Ada Popa: CMC is it's just a mode of using a yes, basically
one of the encryption streams into, becomes the IV for the next
encryption and you also go backwards, go one direction, go back. It
basically has, basically allows us to provide the security property
that one, which is the random permutation.
Okay. So for join, we basically just, for join what we want is we want
to have equality checks between two columns. Okay. So why can't we
use that for deterministic encryption. Why do we need the new
encryption scheme? Well, here's the problem. We don't know ahead of
time what columns will be joined.
So there are two possibilities. One is we encrypt all the columns with
the same data that key, in which case sort of we'll be able to do the
joins. But that leaks more than we intend to because there may be
times when the user doesn't request joins between two columns. And
CryptDB's goal is only to release those relations among data items that
are for the types of queries issued by the user. That would leak more.
The other choice is to have every column encrypted with a different
key, but then when a join is requested the server can undo it.
Instead our scheme allows us to initially encrypt the columns for
different keys for security, but then when the application requests a
join between two columns, the proxy can give a token to the database
server and using that token the server can adjust the encryption of the
two columns to an encryption with a common key. And then the server
can just see equality matches. Okay. So let's see in more detail what
kind of encryption scheme do we need. And this encryption scheme has
four algorithms. The first is a key gen.
Oh using visual proxy can obtain the secret key, then the encryption
algorithm allows the proxy to encrypt value message M for certain
column I using the secret key.
Then when a join is requested, the proxy can compute a token for two
columns, column I and J. And then it gives this token to the server.
The server now uses the fourth algorithm, the just algorithm, with the
token from the client to transform the encryption of a column to an
encryption with a shared key as the figure shows.
So now since the two columns are containing the same key the encryption
key is deterministic the database server can figure out equality
matches and can process the join.
So our joint scheme, we have a report online. We secured definitions
and proof. But intuitively the security says that the database server
cannot learn joint relations without knowing the token. And we
implemented the schema actually in elliptic curves, which means that
the ciphertext are rather short, considering it's a public key scheme.
So the 192 bits long, and the time to encrypt and adjust is half a
millisecond which is reasonable.
So the second encryption scheme we open up is order encryption which I
think is the more interesting one, and remember that order preserving
encryption aims to reveal the order, but ideally should not think
beyond order. So this is formalized [inaudible] Lee and O'Neal in 2009
and the security notion was called in the OCPA, indistinguishability
under chosen plain text attack.
Basically this security notion says is that no adversary can
distinguish between encryptions of two sequences of values that have
the same order relation.
Well, it turns out there's been more than ten order preserving
encryptions proposed and they actually have more than ideal security.
They link more than order. And a part of the reason why this has been
so difficult is that [inaudible] showed in 2009 such secure definitions
invisible and more concretely what they showed is that the size of the
ciphertext has to be exponential in the size of the plain text. So if
you want to encrypt 32 bit values in the plain text size will have to
be 2 to 32 bits long, which is huge.
In fact, we show that even stronger possibility there's simply no
NDCOPA even if you want to encrypt three values. In that case two to
three wouldn't be so large as ciphertext but it's actually not
possible.
>>: [inaudible] model.
>> Raluca Ada Popa:
Impossibility.
Yes.
>>: [inaudible].
>> Raluca Ada Popa: I doubt it, because it's the property of the
resulting function. So basically okay so basically what we show is
that their exists an adversary such that the advantage of the adversary
is 1 over the ciphertext size.
So no matter what scheme you come up with, advantage of adversary will
be 1 over ciphertext size if the scheme is order preserving.
So as a result, the BC allow paper, they settled on a weaker secure
definition that later turned out to actually leak half of the plain
text bits. So not only linked order but actually given a ciphertext
you can tell the higher most half of the plain text bits.
And the order leaks quite a bit we didn't really like this. We really
wanted to have ideal security. And the observation we made to achieve
that is that in fact in real system such as database, the model is less
restrictive than the one of the encryption scheme. In particular, you
can update ciphertext. So you can go, if you encrypt a certain value
and place it in a database, the later time when you encrypt another
value, you can go back to the database and update that value in a real
system you can do that. You call those [inaudible] ciphertext.
So it turns out that with a small number of ciphertext, if a small
number of ciphertext are allowed to change then we can achieve ideal
security.
And in fact we also show that this mutability is in fact necessary
because any -- we show that any NDCPA scheme is invisible with mutable
ciphertext, even in stateful meaning that the algorithm can look at all
the values even before the encryption algorithm can run for a long
time, not even polynomial. It's still infeasible without mutable
ciphertext. If you allow a little bit of mutability, you can achieve
ideal security. And in fact you can achieve an even stronger secure
definition that we call same time security. What we mean by this is
that the order should only -- should only leak among items that are
currently in the database. For example, if an item is resolved and was
discarded new item was inserted the order item should not leak
proprietary whereas with NDCPA the order leaks among any items ever
encrypted.
So and also in the real database really there's no, the database server
only needs to know the order relations among items currently in the
database. Doesn't need to learn more than that. So there's no reason
to tell him more order information than that.
Okay.
So let me tell you briefly the gist of our scheme.
>>: [inaudible] for an adversary watches as you're changing, changing
[inaudible].
>> Raluca Ada Popa:
Yes.
>>: So the entire process.
>> Raluca Ada Popa: Yes, yes. And it's actually -- it's simulation
security as well. It's not indistinguishability. Yes. It's adaptive
in that sense.
>>: Okay. And it will be clear. We want to show you the gist of the
scheme. So the server stores binary search tree of values. So each
node contains the deterministic encryption of a value. The
deterministic encryption basically for cryptographers is as strong to
the random permutation basically. So it strictly less than order. So
what is the -- so in this binary search tree the values are sorted
based on their underlying plain text. So the left child of a node has
a smaller plain text than the parent and the right child of a node has
a larger plain text than the parent.
Now the order preserving encoding in our case is actually given by the
path in the tree and the reason is that the path naturally indicates
the relative order between the items. So, for example, the path of 12
is 00. The path of 48 is 01. Now we actually have to pad this path
because the root has an NT path so how does the NT path compare to 00
or 1. So we actually pad it out by one and as many 0s as needed to pad
the value.
So let's see. Therefore the OP encoding of the value 12 is 0010 of the
value 48 is 010. And the root is just basically the padding, because
the path is empty. And we can see that these values actually preserve
the order. So how does the proxy and crypto value say 32. It first
provides a dynamic encryption of 32 to the server. And the server
says, well, server doesn't know what the values are so instead it
replies with the root of the tree. It says 32 is to the left of 50, it
says the server go left.
Now, the server gives the client deterministic encryption of 23. The
client decrypts it. Again sees that 32 is to the right of 23 so it
says the server okay give me the value on the right. And so fourth
until they find a place in the tree that's empty and in that place the
server encrypts, inserts the new encryption.
All right. So we can see that all the information in the client
provides the server in this case is just left to right. So it's just
order relation. Nothing in addition to order relation. So intuitively
you can see why we achieved the definition. But I won't go formally
into it.
Okay. But what happens if the client keeps asking for values to
encrypt along a certain path. Path grows really large which means that
the ciphertext size becomes large, and this starts to remind us of the
invisibility results we talked about. But that's precisely where
mutability comes in. We rebalance the tree. When we rebalance the
tree, certain ciphertext may move in the tree. Now, if they move, they
have a different OP path, which means they have a different OP
encoding, and we have to go in and update the database. That's how
mutability makes us avoid infeasibility of large ciphertext.
But it turns out that the number of ciphertext we update per encryption
is actually small. It's logarithmic. And we implemented the scheme,
and surprisingly it was one to two orders of magnitude faster than the
BCLS scheme, which was the most secure encryption scheme previous to
ours and it linked half of the plain bits more than us.
All right.
>>: [inaudible] BCL scheme has these round trips right?
>> Raluca Ada Popa:
Absolutely.
They're included.
>>: So you can include that ->> Raluca Ada Popa: Yeah. Of course. The question is how much
network do we include and we have a graph in our paper showing that
basically the dependence on the network and the point at which BCL
items becomes faster, but one thing to say about BCL if you encrypt
values beyond 32 bits, the scheme becomes extremely slow. The
performance of the scheme degrades a lot by the number of bits you
encrypt. So for massive size of 128 bits it's not even, you have to
have really large network.
Another thing I want to add about network is our scheme actually is
paralyzable. So you can encrypt things, you can encrypt things in
parallel.
And basically network cost is not really factored in at that point.
>>: What happens if you relate this to the real world, let's say I
track you down. I know there's a lot of -- and you're in front of
them. There's some [inaudible]. And then maybe I called an update on
Pomerance and [inaudible] and [inaudible] right?
>> Raluca Ada Popa: Right. So basically order preserving encryption
links order. If you can use order to learn what an item is, for
example, if as you say you preface things then yes you can learn.
Order encryption links order, links something. I'll show you one thing
in practice actually very interestingly very sensitive field with
remain encrypted with RAND links virtually nothing. I'll get to that
show you real applications and what happens in that case and the OP
really is used for lessons in the fields.
In fact, if you're concerned as the owner of data you can always put
thresholds saying don't go to OP in -- don't use OP for this data. But
we're going to get there.
>>: The server ->> Raluca Ada Popa:
It should be on the database server as well.
>>: Sort of [inaudible].
>> Raluca Ada Popa:
backed by disk.
It is stored as a bit tree, yes.
Memory and
>>: [inaudible] what's stopping them from creating an island proxy.
>> Raluca Ada Popa: So the proxy is if you remember the model, the
proxy is considered to be untrusted on the application side. Here.
>>: [inaudible].
>> Raluca Ada Popa:
Nothing is trusted here.
>>: But the server is actually trying to ->> Raluca Ada Popa: Yes, it's a passive adversary. Passive meaning
that I'm trying to learn as much information as I can but I want to do
things incorrectly. I want to change queries, I won't change database
content.
>>: So have access to the server.
the server.
I can create a proxy?
Proxy upon
>> Raluca Ada Popa: You cannot create -- okay. So this proxy is on
the trusted side and it has the master key. Master key. Now, if you
put create a new proxy in the server side you have to give it some key.
It's incompatible with the other key. You get junk back.
But the proxy's on the application side is trusted. As I mentioned we
have a second part which we deal with the attacks to proxy. And can
talk to you more off line about that.
>>: [inaudible].
>> Raluca Ada Popa:
Yeah.
>>: So encryption, so here encryption of 32 here is 01/01.
>> Raluca Ada Popa:
Yeah.
>>: But here all these also actually you're already encrypted 32 with
deterministic encryption and then you ->> Raluca Ada Popa:
Yes, yes.
>>: My question [inaudible] the database administered to change the
schema?
>> Raluca Ada Popa:
Uh-huh.
>>: Is the proxy involved in that or is it just cash for schema.
>> Raluca Ada Popa: If you want to change the schema, I guess it
depends how you want to change the schema. If you want to add another
column, that's sensitive. You've got to encrypt it. So you've got to
get the key or proxy to do that. So it really depends on ->>: Shouldn't the proxy be [inaudible] in that case.
>> Raluca Ada Popa: Okay. So there are really two types of
administrator, one system administrator maintains system server,
manages load, server crashes, boots up another one. That's separate
from the database administrator.
Another kind of database administrators and depends on how much trust
you are willing to give them. Depends on how much work you're willing
to give them. If it's really crucial to allow them to perform all
kinds of queries and see even to see the database in the clear then
sure. But if they're only for certain tenants reasons there then you
can even protect against those kind of administrators.
>>: Where is the key stolen [inaudible].
>> Raluca Ada Popa:
In the proxy only.
>>: And proxy is running along ->> Raluca Ada Popa: Yes. Yes. On the trusted side. Okay. So let's
go to the fun part. Now we have all these six encryption schemes, and
the question is how do we use them.
Well, one possibility is we can encrypt all the data with all of them.
Because it's important that the queries that come in. So the
encryption scheme we use depend on the queries that come in. Certain
queries mean certain types of encryption schemes. The problem is we
may not know the queries ahead of time. Therefore one naive solution
would be to encrypt the data with each of the encryption schemes so we
can support all the query s that would come in.
But that would be a space but not so much that as much as the fact that
each column is encrypted with OP which leaks order.
And in fact an application may never perform an order operation in a
certain column so in that case according to CryptDB's goal we should
not leak the order relations on that column. So instead our idea is to
start an encryption scheme to onions of encryption. So each value
becomes encrypted with three onion. The first onion encrypts the value
with join and resulting encryption with RAND. So we can see that this
onion is used for equality type of operations. Now we can also see
that you go down in the onion, the functionality strictly increases.
So join can toss it the same way but can also join with a different
column. Now, the second onion is onion order, I didn't tell you about,
OP and RAND. Useful for order type of operations and the third onion
depends on the type of the field.
So if it's attacks for search, keyword search, or integer for
homomorphic addition. Each of these all the values in the column are
encrypted with the same key but the key is different across different
layers of the onion, across different onion, across different fields.
Also notice that initially when the onions are in this state the outer
layer of the onion is RAND, search, semantic security. So basically it
leaks nothing about the data in this state when we started the
database.
But then as queries come in we need to adjust the encryption scheme to
support those queries and this happens naturally with our onions. We
just peel off layers of the onions. So the proxy gives a key to the
server using, a SQL user defined function. These are functions that
the SQL interface allows the user to define and they can be invoked
from within a query.
Now, the proxy remembers the resulting onion layer for every column,
and we do not put back that onion layer. So, for example, the first
time we do an equality and the layer's taken to deterministic
encryption, then all future equalities in that column don't need any
further decryption they can process directly.
>>: So you hear function that's access to the key, administrator can
run the profiler, keeps a lot of queries get access to the keys.
>> Raluca Ada Popa: So there's a key -- so getting access to the key
for a layer says you nothing about all the other keys. So there's a
key per layer that is different for every layer. So once we give that
key to the server, server removes that layer that key is used for any
other purpose. In fact the server can look at it, use it but it's
useless.
Okay. So let's see a concrete example. Again, on the employee's
table. So each column becomes three columns. One for each onion.
Within each column of values involved with onion of encryption. CEO
down in equality is encrypted with join, debt and RAND.
Now consider that the query comes with a proxy requesting all values
where rank equals CO. The proxy says okay, it's rank. Let's look at
the first column. It's equality. Let's look at onion equality. And
the proxy remembers that outer layer for that column is RAND. For RAND
we cannot do equality. We need to adjust it to deterministic
encryption. For that the proxy issues an update query to the database
server invoking decryption UDF and giving the key only for the RAND
layer, so for that layer alone.
Now, executing this update query, the database server removes the RAND
layer and outer most layer now becomes that. Now the query is
processed as before. Proxy encrypts the value CO with join and on top
of that with that. And now database server can perform the equality on
the encrypted data as before, an unencrypted data. And return the
results to the application, decrypted.
All right. So we saw how CryptDB works. Let's talk about the security
guarantees. So we saw that depending on the queries we take with
encryption schemes and they have in each encryption scheme may have a
different kind of leakage. CryptDB makes two kinds of guarantees. The
first is that the system design of CryptDB with the onions guarantees
that the encryption scheme exposed for every column is the strongest
encryption scheme from our encryption set that enables the query.
Now, overall intuitively CryptDB only reveals the data items needed for
a type of query. Okay. So the way we formalize this cryptographically
is similar to secure multi-party computation you have a real ideal
setting and ideal world there is Oracle that helps the server process
the queries. For example, the server asks the Oracle questions such as
is the item in the second item in the first column equal to the third
item in the second column. And the Oracle examines whether the answer
to that question is needed for the query, the server processing and if
so it says the server the answer.
So clearly in this ideal world all that leaks, all that the server
learns is what is the relationship is allowed to learn. Now we've
proved that the real world with CryptDB and ideal world are
computationally indistinguishable.
>>: What's the database can be optimized various ways. You ask
different questions for each of them. You have any proof that ->> Raluca Ada Popa: No. The proof is if you look at our technical
report, we prove that for each specific operator the way that the query
is written out, each require certain known operation. No, not the way
you write it. For the specific query we get as input.
Let me show you some natural examples. If we perform an equality
predicate on a column, then that is exposed, which means that
repetitions within that column leak. The server cannot see whether the
third item is equal to the fifth item but not that they're actual
values. Now if no aggregation is performed on a column -- now if an
aggregation is performed, the column no equality or inequality,
homomorphic remains as outer later semantic security virtually leaks
nothing about the data.
Also if we perform no filter in the column don't do equality don't do
inequality just fetch each of the data the outer layer remains RAND
virtually leaks nothing about the data. And actually turns out that
this is very common in practice. We send a field as we will show in
some application examples.
And the bottom line is that we never decrypt the lowest layer of the
onion OP we don't reveal plain text to the server.
>>: What's the one key layer.
>> Raluca Ada Popa:
Yes.
>>: Inserts.
>> Raluca Ada Popa:
Yes.
>>: Going to be able to build the same layer or we set ->> Raluca Ada Popa:
On the same layer.
>>: So you stay there.
>> Raluca Ada Popa: We stay there. You can envision obviously
security optimizations by refreshing [inaudible] but it's cleaner to
describe it this way now. Anyone can prove it to anyone.
>>: [inaudible] if you let's say process [inaudible].
>> Raluca Ada Popa:
And --
>>: [inaudible].
>> Raluca Ada Popa: Basically you're saying if someone -- someone
leaks something, another onion leaks something else can someone
correlate and learn something, depends on the setting. Maybe you
could. The point is that we leak equality for deterministic and
recorder for -- but as I said, I'm going to show what happens in
practice. In fact for most of the fields remains encrypted RAND which
you cannot correlate with anything. Basically these two cannot be
correlated with anything.
>>: Can influence the type of query.
>> Raluca Ada Popa: It's passive adversary, yes. But let me say
that -- in our second part CryptDB we actually consider active
adversaries, adversaries that attack the proxy. And there we're able
to provide guarantees of the following. If a user is not online at the
time of an attack then his data is not compromised but if he's on line
at the time of the attack his data can be compromised. So we limit
compromise in the situation when everything is compromised and actively
as well.
>>: I was think of an adversary force, clearly -- the database and then
all the layers [inaudible].
>> Raluca Ada Popa: Right. So that's why we say this scheme is for a
passive adversary. Now, if you consider an active adversary, basically
our second part of [inaudible] requires more explanation. But it
guarantees, too, that if you are off line doing an attack then your
data doesn't get affected, meaning that an adversary could not, could
not get your onions down, because when you're off line your data
basically your key will not be available. Will not be even on the
proxy at all. When you're online you proxy, adversary can lower now to
the onion level. And we don't guarantee.
>>: Naively I would expect lots of queries to have some sort of
average. And I would expect homomorphic encryption to be a public key
to be pretty expensive. How do you keep things down to 26 percent
penalty?
>> Raluca Ada Popa: We'll see the exact breakdown in costs. I will
show you the exact breakdown for every query and cost you'll write down
the most expensive operation. But we do have some optimizations and
we'll see the exact numbers.
>>: [inaudible] the data and the queries are chosen, in your
definition, right, data are chosen by adversary.
>> Raluca Ada Popa:
Passive.
>>: It's passive in the sense that -- so number of phrasing is
guarantees are hold for any dataset, queries, right, when you define ->> Raluca Ada Popa:
Right.
>>: So you're saying for all databases, all queries, right?
some distribution on the data?
Assuming
>> Raluca Ada Popa: No. So basically -- no. We're not saying -- yes.
Okay. So our security guarantee doesn't say CryptDB doesn't leak
anything. It doesn't say that. In fact our security guarantee says
CryptDB leaks only what is needed to process the query. Equality if
you need it, order if you need it. If you don't need a certain column
then nothing.
>>: This is for any -- there's no such -- no distribution.
>> Raluca Ada Popa: No distribution over data.
distribution at all.
Exactly.
No
You had a question?
>>: [inaudible] you chose this layering model.
encrypt it and set for using rather sort of use
the data and maintain different copies. So use
encryption of the data, random one and join one
Why didn't you actually
each layer encrypted
deterministic
and keep them separate.
>> Raluca Ada Popa: Right. So the reason we didn't do that is because
then each data would be encrypted with OP, which leaks order. And we
don't want to leak order for a column unless order is needed for that
column. And we don't know ahead of time what are the queries we don't
know ahead of time if the order will be needed. Therefore we want to
start initially with RAND being outer most layer leaking nothing. Then
if the user actually needs order, then we peel off the level.
>>: What I'm saying is keep an OP data, keep a RAND data as well.
>> Raluca Ada Popa:
Not on top.
>>: Not on top, separate. So when the proxy issues a query offers this
query don't need the encryption, only need the RAND one, just go to the
RAND one.
>> Raluca Ada Popa: What does the server know if the server has the OP
encryptions of the data then he knows the order.
>>: I see.
>> Raluca Ada Popa: So if the server has the OP encryption of a column
then he knows the order. And we don't want him to know that until we
need the order for sure.
>>: But he also -- I understand -- you're right.
line.
Let's take this off
>> Raluca Ada Popa: So implemented CryptDB on top of MySQL, and one of
the cool parts of CryptDB is that we didn't make any change to the
DBMS. And the reason is that, the reason we could go, we could not
make a change is because we user defined functions.
So basically whenever we wanted to change the behavior of the database
server, for example, decrypt something and we invoke one of the user
defined functions.
This makes it be forcible. In fact, initial implementation
presentation was in [inaudible] and we made it to MySQL with six lines
of code change to CryptDB and mostly to the interface of DBS and talk
to the server not to the core of CryptDB.
>>: [inaudible].
>>: Seemed to require [inaudible].
>> Raluca Ada Popa: Actually, no, because -- so everything all tree
looking up can happen UDF. User defined function. Yes. DBMS doesn't
change at all. In fact, it doesn't even have to be restarted because
you can load UDF libraries dynamically.
>>: [inaudible].
>> Raluca Ada Popa:
No, it's --
>>: [inaudible].
>> Raluca Ada Popa:
Yeah.
[inaudible].
>>: [inaudible].
>> Raluca Ada Popa:
Excuse me?
>>: [inaudible].
>> Raluca Ada Popa: Yes. Okay. So also there's no change needed to
applications because the CryptDB proxy exports SQL interface for
existing applications on top of CryptDB unchanged. So we valid CryptDB
and in doing so we try to answer three questions.
That's CryptDB support real questions and applications. Real queries
and applications and what's the resulting confidentiality in terms of
the onion levels, for example, the questions these guys have been
asking me and what is the performance overhead. So in terms of real
queries and operators we don't support those queries. For example, we
don't support complex operators such as trigonometry, and sometimes we
don't support combinations of encryption schemes that we support. For
example, A plus B greater than C because we support to do A plus B we
need to do homomorphic encryption and compared to C you need encryption
but homomorphic encryption does not preserve order. There are things
you can do to support it split it in two query have the client encrypt
it or compute columns or use FHE for specific types of computation. In
fact, there's a project at MIT that's follow up to CryptDB that does
all these things and able to support virtually all the queries, at the
cost of some computational client side.
In terms of real applications in query, so we look at seven real
applications out of which I'm showing you five. So PHPB is an open
source form software in which we would like to secure private messages,
for example, or private posts. Hot CRP is a conference management
website for OSB for talk in which we would like to hide reviews, paper
authors if it's anonymous and so forth. What applies MIT graduation
database where we'd like to hide student grades, letters of
recommendations. TCPC is a industry benchmark and SQL MIT view is a
large trace of queries we got from a popular SQL server at MIT that
hosts thousands of applications and traces over 126 million queries and
more than 120,000 columns because we wanted to see really what could be
supported or not.
>>: [inaudible] columns.
>> Raluca Ada Popa: Right. So in these applications we only encrypted
the columns we didn't send. So we evaluate what were sensitive such as
posts such as secret posts, secret messages, things that [inaudible].
For traces we said let's encrypt everything even if they didn't seem
sensitive at all, just to see how CryptDB would perform in that
situation. But the realistic situation is the one in which you only
encrypt the things sensitive. Basically for the large traces almost
nothing was sensitive for TCPC but let's see what happens if everything
is encrypted.
And actually the good news is that for the applications we supported
every single encrypted data on the sensitive fields and four TCPC
supported all the queries on the fields because we encrypted them all.
Now for the large query we didn't support one percent, once or less
than one percent less queries and those were queries like this doing
mathematics in the SQL query. Select one over log of -- so those we
can support.
>>: [inaudible] number of [inaudible].
>>: Yes.
Number of columns, yes.
>>: What about all these 500 unencrypted columns of the database?
You're happy in plain text?
>> Raluca Ada Popa: So, yes. We examined which ones were sensitive.
In fact, we were even exaggerating, for example, if the posts are
private we were keeping it private but we keep private data posted
everything and this is the number of values we got. For example, there
are a lot of fields such as auto increments that the database obviously
knows what they are. There's no point to hide. So how about resulting
confidentiality level. So examine the min level for every column.
What the min level is the weakest encryption scheme exposed for that
column. So we can see that in fact most of the fields remain in RAND,
which means that nothing leaks about this field. The reason they
remain in RAND there's no equality or inequality performed on them.
Basically just inserted retrieved, maybe based on other fields or maybe
summation again that one doesn't leak anything as well. So that's the
good news about CryptDB. Now, some fields were DB the one that worries
the most as well because it leaks order but very few fields were DB.
And in fact we examine manually to see what those fields were.
And some of -- they look to be less sensitive. For example, the time,
for example, the contents of a post were all that RAND, the time when
the post was made was top P.
>>: Based on you examine a trace of queries over time to see ->>: We examine all the possible queries the application can issue,
which is easy because it's a fixed set because these are Web
applications. And also for TPCP. For SQL we examined 126 million
queries in a stream. And we couldn't examine whether these were
sensitive or not, because there were too many.
>>: [inaudible] the SQL queries, suggest [inaudible] TPCP has not just
SQL queries, it has a bunch of logic around it.
>> Raluca Ada Popa:
>>: TPCP.
TPCP or TCPH?
TCPH has --
>> Raluca Ada Popa: We take benchmark SQL is benchmark for TPCP on hub
and we run it on top of CryptDB.
>>: [inaudible].
>> Raluca Ada Popa:
Just look at the queries.
>>: [inaudible] like additions, I would expect to see [inaudible] but I
don't see any columns [inaudible].
>> Raluca Ada Popa:
Using the --
>>: Using the [inaudible] addition failure ->> Raluca Ada Popa: Oh, yeah, yeah, because the min level in that case
is home, which is secure as RAND. I'm including them all under the
column. I think I even said that all these items here either are we
don't require inequality or you just do addition. Include here the
same sensitivity. Maybe I should have said more clearly. CryptDB,
TPCP certainly has those.
>>: [inaudible] on sections during [inaudible].
>> Raluca Ada Popa: Yes. Queries per second. Okay. In terms of
performance, let's compare the performance of CryptDB of applications
running on top of CryptDB, to applications running on uncrypt DB MySQL.
We looked at a lot of metrics in our paper. Here I'll present you the
two we consider most interesting. One is latency. That is the time
from when an application sends a query timing gets response and the
second is server throughput. The number of queries per second the
server can process.
So in terms of latency CryptDB adds on everything 0.62 milliseconds per
query for TPCP.
>>: [inaudible].
>> Raluca Ada Popa:
Because the workload is --
>>: No, it's because of encryption.
the size of the data.
The delta -- [inaudible] what's
>> Raluca Ada Popa: The size of the database it's in memory.
memory. It's encryption. It's encryption cost.
It's in
>>: The space is not ->> Raluca Ada Popa:
Right.
And I think for --
>>: You can see larger difference between your text and encryption.
>> Raluca Ada Popa:
Yes, probably.
>>: [inaudible] it's fitting.
>> Raluca Ada Popa: It's all fitting memory. And the encryption
overhead in terms of space expansion I think is three times. Three
times. We have more precise valuation on paper.
Okay.
So in terms of throughput, yes.
>>: In terms of operations you have to go through multiple iterations
between the proxy and the encrypted database. What is the assumption
if you go to the previous slide? What is the assumption you made on
the latency between the CryptDB proxy and CryptDB database?
>> Raluca Ada Popa: So the application.
to do a bunch more --
So you're saying that it has
>>: So, for example, encrypted database is held externally and the
CryptDB proxy is the same machine as the application, then there may be
a large latency between [inaudible].
>> Raluca Ada Popa: So our setup is the following. Application.
CryptDB proxy and database all on the same machine. One core.
Restrict the machine to one core so we don't see other behavior. Okay.
So in terms of throughput, this graph shows the queries per second as
depending on the number of server cores. For TPCP more MySQL and
CryptDB on TPCP. And maximum throughput loss is of 26 percent. So
let's understand why. Yes.
>>: [inaudible].
>> Raluca Ada Popa: It's all of the TPCP which includes updates,
deletes, everything.
Okay. So this graph shows you the throughput for different kind of
operations of TPCP. We said delete, insert for MySQL and CryptDB. So
we considered it actually for the first part of the query is actually
the throughput loss is less than 26 percent. And the reason is that in
this case the server doesn't do any cryptography in this state. Here's
why once you do equality and you lower the level from RAND to that
future equalities process directly on the data they don't need any
additional encryption. In that sense the server here does the same
work as the unencrypted database except TUPLs are a bit larger because
of encryption. So the cost you're seeing is the cost of expansion
because the values are a bit larger because of encryption.
>>: But there is an overhead inasmuch as the levels of the onion have
to be undone. So there's the proxy DB still has to do multiple
encryption. Not just a single layer. It has to be every layer
underneath.
>>: [inaudible].
That difference --
>> Raluca Ada Popa: This is server throughput, which means that it's
number of cores per second at the server. It has nothing to do with
the proxy.
>>: [inaudible].
This doesn't represent.
>> Raluca Ada Popa:
This does not represent what --
>>: [inaudible].
>> Raluca Ada Popa: What represents the proxy is actually the latency.
The latency difference, because this latency includes the cost of the
proxy and everything. Good question.
>>: You say that was [inaudible].
>> Raluca Ada Popa: That was average latency. If you want to know the
exact breakdown, our paper has the breakdown. So the second part of
the question, the second part actually we can see the update with
increments and summation actually have a larger throughput loss than
26 percent, roughly 50 percent. And that's because the server is doing
homomorphic edition [inaudible] instead of adding to values it
multiplies some larger cryptographic numbers. But overall TPCP the
throughput loss is of 26 percent, which we think overall is practical.
Yes.
>>: Very small.
I guess [inaudible] transaction.
>> Raluca Ada Popa:
I don't think so.
>>: [inaudible] database transaction larger so updates, even if your
data -- update the cost -- the cost would be a lot more for ->> Raluca Ada Popa: We didn't disable, we did not disable the log. We
did not disable the log and MySQL setup was the same for CryptDB as for
playing MySQL.
>>: So what were you using DB?
>> Raluca Ada Popa:
I think we were using [inaudible] for DB actually.
>>: [inaudible].
>> Raluca Ada Popa: The point is that we used the exact same setup for
both of them. So we did not make any changes.
>>: [inaudible] data size. That affected it the most [inaudible] item
and logs have to be persistent, independent of the data.
>>: Database of the system.
>> Raluca Ada Popa:
And we didn't disable the log.
>>: [inaudible] encryption. [inaudible] we are doing rebalancing.
Rebalancing that means [inaudible].
>> Raluca Ada Popa:
Yes.
>>: And [inaudible].
>> Raluca Ada Popa: Okay. So basically this CryptDB. These are two
papers mushed into one talk. So CryptDB paper contains scheme of
[inaudible] that's the one. That's the one it includes. So basically
there's no [inaudible] balancing but at the same time the encryption
cost is larger. These results are for CryptDB. The scheme I told you
about is actually a paper follow-up.
It would be interesting to put them all together and see.
>>: What happens if there's a way [inaudible] proxy?
>> Raluca Ada Popa:
There's a wide --
>>: A wide proxy -- probably the most secure setup, if it's to the
same -- same datacenter. [inaudible] have access [inaudible].
>> Raluca Ada Popa: Right. So the proxy and application are supposed
to be not accessible to the database administrators. So ->>: The length of the proxy and server, wide link, and latency would
increase, and if you have multiple round trips to latency.
>> Raluca Ada Popa: But the latency should not affect server
throughput. We're talking about how many queries per second can the
server, while the server is support -- so I have a demo for you guys.
Short DB demo. I hope it displays properly. Because we had some
problems with the projector.
Seems like it does. So on the -- get my cursor. Okay. So on the left
side I have a shell CryptDB. This is, for example, an application use.
Export SQL interface. So things should work exactly the same. On the
right side I have basically, I have shell MySQL server so you get to
see exactly what gets stored in the database. So let's create a table.
And I'm also printing out messages to see what's happening in CryptDB.
So I create the table that has two fields, name text and age integer.
Now this gets transformed into create table 0 and you can see that P
onion, the search, there's some salt, because we use a yes salted and
that's a field in itself. For example, we can check the database
server and indeed that's the table that gets created. So three onions
for each field and a salt.
So now let me insert into the table Alice agent Bob age 21, Chris age
20. So we can see that in fact what the CryptDB, what the CryptDB
proxy produces is really a query with encrypted values. And let's make
sure that that's what's getting stored to the database server. So
indeed we can see database server contains encrypted data.
So now if the user wants to see what's inside the table, we can see
that he still gets access to the CryptDB data. And we see what
actually the proxy does behind the scene it says the server give me all
the equality onions, because those are easy to decrypt and to salt.
And then gets back the encrypted results from the server and the crypto
gives them back to the user. So let's do a line adjustment. We can
say select star from T where age equals 19. So we can see that the
first, as I said the layer is RAND initially, so the process equality
you have to go down to that. So we can see that an update of the onion
equality that's issued and the level therefore becomes that. Then the
actual query is issued. We can see where that field is equal to the
encryption of 19. Then the crypto results are received, the encrypted
results are received from the database server, and the proxy decrypts
them and sends them to the user. Okay. So now I'm going to show you a
more interesting query. So I'm going to select sum of greatest of age
and 20, 20. So basically what the greatest operator does is takes the
maximum from age and 20. So the ages we have is 19, 20, 21, so the
greatest operator will return 20, 21. And passes them to the sum
operator who is supposed to add them up.
But the first side you say wait didn't you tell us CryptDB cannot
combine encryption schemes. It turns out that it's really smart and in
certain cases not actually combining encryption schemes. You can use
the greatest operator to figure out whether age is greater than 20 but
then if that's the case you return the homomorphic encryption as
opposed to returning the order preserved encryption.
So we can see as far as it works and second we can see how the query
was rewritten. So basically the greatest transform into if the OP
encryption of age is greater than OP encryption of 20, then give me
back homomorphic field, give me homomorphic encryption of 20. These
are passed as inputs to the aggregate user defined function which
basically that's homomorphic addition. And the encrypt result is sent
back to the user and the decrypt 261.
All right. So in conclusion, if I can conclude, yes, I can conclude,
CryptDB provides the first practical DBMS more running most standard
queries on encrypted data. Has modest overhead and makes no changes to
DMMS. This is the website of CryptDB. It has the papers and the
source code to play with, if you're interested in. And thank you.
[applause]
>>: I have a question about usability. So encryption is kind of hard
to understand for even cryptographers, especially when you throw in
deterministic encryption and OP encryption with some cryptographers
probably wouldn't even say is encryption. So what kind of administers,
do you think that add administrators can understand the implications of
using deterministic encryption or God help them OP encryption.
>> Raluca Ada Popa: I personally think it's not that hard because
these things are just three very simple things to know. One, nothing
leaks. One, you can tell him histogram equality and one order. So
there's really just three things they have to get their head around.
And one way to do them, we've been thinking about it, is to have this
nice user interface basically showing each onion basically three
gradient colors, which one is OP, which one is one equality and one
RAND, and they can understand based on that based on security. I think
the safer thing to say is that whenever they have some column, they
know it's sensitive. Then in CryptDB they can set a threshold in the
proxy don't go below RAND period. If they're really worried about the
certain encryption scheme, certain column they can only set this
threshold say I don't want to, this is secure, I don't want you to -this is really worrisome, I don't want you to go beyond a certain onion
level.
>>: Might be more -- I guess I'm -- you have hubcap as an example. I'm
concerned a steering committee maybe who isn't in cryptography might
think we have two alternatives. We could use CryptDB which I've read
about, encrypts things, it's secure, or we could require the PC chair
to have his papers in a separate database. And it seems like ->> Raluca Ada Popa:
Separate database on the same server.
>>: You talk in the paper how some hot crap installations go to extreme
lengths to keep the PC chair from seeing his conflicts, extreme lengths
of having a different database for those. How does the steering
committee make those choices?
>> Raluca Ada Popa: I don't think you have -- so I think besides
CryptDB you have no choice. You either keep the data, you process
unencrypted data or you use CryptDB. So I don't think there's really a
choice. There's no alternative.
>>: All your PC conflicts are managed by the co-chair in a separate
database.
>> Raluca Ada Popa: Right. So that database itself compute on
unencrypted data. That one can be attacked as well.
>>: You don't give PC chair administrative rights to that.
>> Raluca Ada Popa: Right. But you can still potentially have attacks
to that database. Now, we -- right now if you're thinking of -- you
can always have some sort of attackers in anything.
>>: Particular attackers. The PC chair might accidentally or might
arrange to figure out information about ->> Raluca Ada Popa: If the whole world is vulnerable to specific
attacks maybe you can have solutions for those specific attacks. But I
think the whole world is more complicated. And then I just seek
solutions one encrypted data in which case DB2 is the only practical
DBMS for that or you don't, in which case you don't have the security
of computing the data.
>>: One more follow-up. Usability might it better not to have OP at
all, because doesn't it give the illusion of security where no
security -- it seems like it would be better just to say, hey,
administrator, you know put a plain text, put an exclamation point
saying administrator, anybody could decrypt this column. Saying it's
encrypted with order preserving encryption kind of provides a
misleading sense ->> Raluca Ada Popa: Basically you're saying you could have a CryptDB
that only contains random and [inaudible] leak nothing and you can have
deterministic and join the class and that's it.
>>: It seems like the benefit of OPE is -- it seems like it's like
giving a handgun to a child. It may be more dangerous than not ->> Raluca Ada Popa: Because you're saying the administrators may not
understand. Okay. Maybe then that's a good idea for the
administrators to just tell them, just to tell them to think about
nothing or that or in fact you can make it even simpler for them
depends on what you're willing to assume they can do, for example, they
can say if it's something that's very secure, very worrisome mark it as
such. For those, behind the scenes DB proxy is going to make sure that
RAND nothing else for the others it's going to do OP that's better than
nothing, better than not encrypting the OP. But maybe for indeed for
administrators they could just point out what's secret and what's
worrisome and what's not. That will make it very simple.
>>: Have you tried other benchmarks other than TCPH?
>> Raluca Ada Popa:
Uh-huh.
>>: [inaudible].
>> Raluca Ada Popa:
Good question.
>>: Any or very few.
>> Raluca Ada Popa: Good question.
has lots of complex --
TCPH analytics query, TCPH
>>: [inaudible].
>> Raluca Ada Popa: Exactly. And whereas CryptDB is more for royalty
type benchmark CryptDB is like gold. It wouldn't be fit for TCPH. But
actually there was work at MIT following up work CryptDB that
specifically looked at TCPH. And had clever database techniques such
as speaking queries and maybe materializing certain queries they wrote
a smart query planner to figure out how to split dynamically. They
were split all TCPH and the overhead was twice at most less than twice
basically overhead was less than twice in terms of throughput. And
again and they had huge database they went to disk. So it was a purely
database work.
>>: For get being the data administered come to me some data about me
what promise can you give me about them not leaking out to the wrong
people? And was OB void the [inaudible] crazy statistics and that
follows the whole database to be exposed.
>> Raluca Ada Popa: I come and I tell you the following. I say you
know you have two choices. One, you compute unencrypted data you don't
have anything or two you use CryptDB, and basically for field
[inaudible] nothing. And for that you can leak something and for OP
you can leak something else. Basically there's the choice is nothing
versus the security DB provides. And I think it's worth it.
>>: Did you look at the amount of data that actually gets transferred
between the encrypt DB and the proxy DB as a comparison in the
encrypted case versus the unencrypted case?
>> Raluca Ada Popa: So we have a measure that's specifically. But
that really is just, what gets transferred between the two query and
query results. So the query gets, contains encryptions of value so
it's slightly larger and the results, we don't return any additional
results besides the ones that -- we just return the actual results,
CryptDB doesn't return the results. The other results larger because
they're encrypted but no we didn't look at the actual expansion factor
and I think that's because that's covered by other measurements, for
example, it's covered by large how the story becomes that gives you an
expansion, sense of expansion for the results.
>>: OP sessions you're doing [inaudible].
>> Raluca Ada Popa: As I said OP is a different paper. For that paper
did we look at message sizes? I don't think we didn't but we looked at
expansion storage and we looked at ciphertext sizes. So you could
reconstruct those maybe from the micro benchmarks. But it is an
interesting -- yeah.
>>: [inaudible] because now if your proxy is doing [inaudible] their
database throughput with use I don't mean database server but the
application looking at the database would say proxy and database ->> Raluca Ada Popa: I agree with throughput, but we measure
throughput. I showed the throughput.
>>: That was the database server.
>> Raluca Ada Popa:
Yes.
>>: Application would be living at the database as database server plus
proxy.
>> Raluca Ada Popa: Yes. We have experiments for that as well. We
actually took PHPB and we looked at the throughput of the application
itself, which includes everything. Proxy database, everything. And
actually there the throughput was actually loss was even less because
of all the overhead of PHP, I think. So throughput of something like
four percent, five percent. That's because of Web applications are so
slow the throughput was. But there we counted everything, including ->>: That was the point that was being made which is if you had
unencrypted data you don't have proxy [inaudible] but you have the
proxy might be doing multiple rounds overall throughput might reduce.
Ignoring current Web applications PHP.
>> Raluca Ada Popa: Right. I agree, but you can always have proxies
in parallel and I guess the database server at least the way we look at
the database server is the bottleneck, because multiple applications
share the database, the same database server where each database can
have its own proxy so it's not as important as the throughput of the
database server. But I agree with you.
>> Maybe we can stop here and take questions off line.
>> Raluca Ada Popa:
[applause]
I'll hang out.
I'll be around.
Download