>> Melissa Chase: Today we are very happy to... student at the University of Virginia, finishing up very soon. ...

advertisement
>> Melissa Chase: Today we are very happy to have Samee Zahur visiting us. Samee is a grad
student at the University of Virginia, finishing up very soon. He is working with David Evans and
he has been working on building one of the state-of-the-art multiparty computation libraries
among other things. And I will let him tell us more about it.
>> Samee Zahur: Thanks. I'm Samee. I previously intern here with Ryan Parnell. I was working
on verifiable computation. As Melissa said, we'll be talking about the rest of my work back at
the University of Virginia. Most of it is on multiparty computation. To give you a brief overview
in case you are not familiar, the idea is that let's say, today, if I meet somebody and I want to
see if you know other people, like if we have common acquaintances, what do we do? On
Facebook we have all of our friend lists and they will say these are the friends in common. We
have a trusted third-party who takes all of our data and then does the comparison. The idea is
that we shouldn't have to do that. So if you and I have our own private information, let's say
our friend list were genetic information and we want to see how closely related we are, we
should be able to do that computation without having to reveal our data either to a trusted
third-party or to each other. We should be able to perform computation directly on private
data. That's the premise of secure multiparty computation, is that we don't have to reveal
anything other than just the output. The other applications could include secure auctions and
any other data analysis algorithms and neural network algorithms and whatnot. There are
ways of computing on private data directly and computing arbitrary functions on private data.
The way they generally work is by using Boolean circuits. There are protocols that will take any
Boolean logic circuit and gate, or gate and whatnot and glue them together and finally gave
you a protocol that will execute that circuit. So some of the input wires will be from one-party.
Some of the input wires will have data from the other party and then you execute the entire
circuit and finally the output can be shared with several parties, which ever parties you want.
This is great, but it makes it kind of hard to use for everyday programmers. My research goal
has been so far to make it easy for normal programmers to use. If somebody wants to use
these protocols now, they will either have to be experts in cryptography or experts in circuit
design, which turns out to be nontrivial even though there are a lot of designs of undergrad
courses in pretty much every computer science curriculum. What we want to do is sort of give
programmers the tools necessary so that they can use these sorts of technologies without
being experts in cryptography. Some of that has been in the form of a new language with its
own compiler. It's a language that I developed. It was mostly C like language with a few extra
annotations and keywords. For instance, you could have variables declared as Obliv which is
secret variables. Any combination done on them will be done cryptographically. Say that
certain variables come from party one and certain variables come from party two and then
finally reveal the values to both parties. You do that then finally you get the whole program.
You compile it and you get the protocol and they execute it. Writing this kind of code will not
need to know anything about what is going on underneath. Now if you look at what's going on
around it in order to make it all work, you have the Obliv-C framework in the middle. Some of
my research has been both in the front and the back inside. For instance it turns out that
certain algorithms which are fast in normal computation are not necessarily fast in PC and vice
versa. So we had to come up with data structures and algorithms that is fast in Obliv-C and
make that available as normal functions to programmers. On the other hand, in the very back
end we had to come up with ways of reducing bandwidth usage, reducing CPU usage and
whatnot for the actual cryptographic portions. That is sort of the outline of my research so far.
In this talk we will be mostly talking about this side of the equation. We'll be talking about how
we can do data access fast inside NPC. The central question would be if you have a situation
such as this where you have some memory accessing a program and you want to run this
program in NPC and whenever you have an area access you are accessing location j and let's say
somehow this j variable is dependent data that you want to keep secret. You can't just reveal
that I am accessing location j because would expose some results that you want to keep secret.
The idea was that we will only have inputs that are private. The only thing that we will reveal is
the final output. The interim results should not be revealed. Do you have a question?
>>: I do. I am kind of lost. Where is this computation happening? Is it in one of the two
parties?
>> Samee Zahur: It's distributed computation. If you and I are the two parties our machines
will be communicating with each other and run some cryptography operations such that the
computation happens in distributed fashion. We will both do some computation such that the
intimated values are not seen.
>>: What is being showed here is just a specification of the computations?
>> Samee Zahur: Yes. Feel free to ask questions by the way. It's perfectly fine if we go off into
a tangent that you are more interested in and I don't get to cover. That's perfectly fine. I
would much rather cover something that you guys are more interested in. Something as simple
as area access is actually difficult if you want to hide which location you are accessing, right?
The simplest way to do it would be even if you are just accessing one element, access all of
them. If you scan the entire area just to hide which element you are actually interested in.
That would be the nonlinear scan approach. But you just take an operation that was constant
time and expanded it out to a linear time operation. Most programs would become
overbearing so regardless of the fact that NPC is already slow, so it's not going to work. So if
you want this technology to be used, programs that are easy to write will remain easy to write.
If we have to completely rewrite programs that's a problem. The way to solve it, there are
essentially two different approaches. One would be to transform the program or come up with
algorithms such that your program is accessing memory locations in a very deterministic
fashion, independent of your input data. If you can express your program in that fashion, then
great. Your area location is no longer reviewing private information. The other approach is to
randomize it. You shuffle the data around constantly so that even if you reveal which location
you are accessing, that's perfectly fine because that does not instantly correspond to the logical
identity of the element. That might still protect your information. That gives you an outline of
the two halves of the talk here. The first-half would be to come up with algorithms which are
basically circuit structures that will give you a schedule by which you can access data in a
particular deterministic fashion independent of input data. Those are pure circuits but they
only work for special cases. In the general case it is not possible. If you have complete random
accessing I can't help you there. But they are faster. On the other hand, you have the general
random access where you keep shuffling the data. It will hide any kind of random access and it
will work all the time, but the problem is it is definitely slower. You have to do some extra
conversions there. Before I get started on the first-half, questions? Yes?
>>: When you say models I think of this [indiscernible] one of the parties?
>> Samee Zahur: No. It could be any of them. It's fine. Any other questions? Great. Circuit
structures. In this part of the talk we will be covering the very basic data structures, so stacks
and queues and associated maps. They are extremely easy normally. They are easy to
implement in normal programs. The problem is if you have a stack and let's say you are
pushing elements into it there is some condition stack that push x. If your condition is secret
than the length of your stack becomes a secret value. You cannot renew it. The moment that
happens you have to now figure out how to implement the stack without revealing where you
are. The way we represent it inside a circuit would look kind of like this. You have some
conditional push circuit made out of some logic gates. Your inputs will be the condition that is
secret, some intermediate value x which results in a secret. It's getting pushed in. You have the
old stack elements coming in and the new stack elements going out. Now what I'll show you is
how to efficiently implement this push operation. What is a naïve approach? Here is a really
naïve approach. We have the old elements, a0, a1, a2 and a3. We have the new elements, the
primes. You have the conditions and these boxes are basically multiplexers. They will choose
one or the other depending on the condition. If condition is zero they will all choose the right
hand side of the their inputs and pass it on to the output. If the condition is one they will take
the left hand side. If the condition is one x gets passed into a0. Everything gets shifted. If the
condition is zero the x gets ignored in the old values just pass right through. This works. This is
a valid circuit for doing conditional push operation. The problem is we are using a linear
number of gates in order to implement a single push. That's the problem we are trying to
avoid. The way we solve it is quite simple. The idea is we break up this buffer into small pieces
and we put empty spaces into those buffers, so that when we are doing a shift operation we
don't shift everything. It's basically as simple as that. We take this one row in the next
diagram, consider just the top row. That's the just the one row from the previous round. The
moment we are doing a push operation we start with by making sure we have at least two
empty spaces. We have five elements here, 10 elements, 20 elements in the next level and 40
elements, so two powers of 2x5. We do to push operations. We know those succeed. X is only
the first level, level 0 and nothing else.
>>: What do you mean by these levels?
>> Samee Zahur: The top line, top row is one of these buffers. What we have is that we have
taken this buffer and divided it up into pieces. That is level 0 and that is level one buffer.
>>: [indiscernible]
>> Samee Zahur: Yes. We making invariance of how many spaces we keep at each level. After
two operations we know. We started with at least two empty spaces so we know after two
operations it might be full depending on your conditions. So after every two operations we
shift from level 0 to little one. After every for operations we will shift from level one to level
two and so on. If you do that and count up the costs you will notice that -- let's do a counting
right here. For each operation we are accessing level 0. Let's say five units of cost. After every
two operations we accessing level one. But level one is twice as big, so each time you pay 10
units. But you are accessing it half the time, so 10 times a half, again, five. Similarly, level two
will be four times as big but access is 1/4 the time. So at each level you are sort of paying on
average of five units of cost per access. You have logarithm number of levels because the level
and size are increasing. [indiscernible] of log n levels. What happens towards the end is that
each access on average you are paying five times log n cost essentially, some constant 10 log n
cost. That's how you do stack push and the reason we have five is that we start with at least
two empty spaces. If you want to do pop operations then you need at least two full spaces so
that you can serve your elements and then you need one extra element just in case some of the
conditions were false and you need odd number of elements.
>>: So the idea is that you are not actually pushing something at that first step you would put a
space there?
>> Samee Zahur: Yes.
>>: So nine might be [indiscernible]
>> Samee Zahur: Nine might be interesting.
>>: And then when you smoosh everything over to the right [indiscernible]
>> Samee Zahur: Actually, no. Whatever we do writes nine is empty. Next time we push seven
will just go here.
>>: And that's because you read the whole thing?
>> Samee Zahur: Yes. It is the linear scanning.
>>: How do you deal with errors? Do you do a conditional push and then you do a pop? And
then at that point the pop my return an error or it might not?
>> Samee Zahur: Yes. Good question. You can do anything you want. You can have it such
that the error would be a secret condition. First of all, in most cases we recommend that you
write the program in such a way that that doesn't happen. You maintain your invariance so
that it doesn't happen. You can definitely have some extra circuitry. Doing a pop-up will check
whether or not it is empty. If so it will set a Boolean flag. But then again, whether or not you
reveal that Boolean flag is up to you. It might be a Boolean flag that you only reveal at the very
end. If something went wrong I'm not going to reveal the results. That can be done. That can
be depending on your preference.
>>: Where does that [indiscernible] end? Do you have the upper bound [indiscernible]
>> Samee Zahur: Yes. If you know that you have a maximum of n elements in the set at any
time, then you can just have log n levels.
>>: But you have to know that statically?
>> Samee Zahur: You have to know that statically. You have to know something statically
always. For instance, if you're doing n push operations you know it will never exceed n.
Statically you at least have to know how many operations you are doing because -- I mean, you
don't have to. So let's put it this way. You always have to reveal how long your program is
running. That is something you are not going to be able to hide. If you can either reveal how
many operations you need or you can do extra operations, that's up to you. Anything else? So
that's stack. Similarly, I'm not going to go into details but you can do queues in a very similar
way, just have extra errors come between them. But the result is the same. When you do the
evaluation you just compare with linear scan here because that's pretty much the only thing
that it can be. That is what you would expect of log. There is no surprises here, except for the
fact that it's small. You don't have a hidden giant cost. It's doable. It's really efficient. And the
best part is it's completely circuit based. It doesn't matter what protocol you are using. There
are many different protocols which you can instantiate. There is [indiscernible]. There is
[indiscernible]. You can have [indiscernible]. You can have malicious, whatever. The same
algorithm would work unchanged. It's very critical agnostic. And so yeah. It's good that way.
The thing is that once you have stats and queues you can do memory access for any locality. If
you have an error here and you are accessing this i and this j and you know that they will only
be incremented or decremented in small increments or decrements, then you can just break it
up into many different steps and queues and use steps and queues circuits to access them in
log n time instead of using a general [indiscernible]. That would be much more efficient to do it
this way. Make sense? I see some frowning faces. Okay. Yes?
>>: So there's like [indiscernible] stage junctures based on [indiscernible]? How is this more
efficient?
>> Samee Zahur: Yes. At least in my experience, yes. The reason is whenever you use or ands
you, those are not general circuits in the sense that you have to review which path you are
reading from and not. You introduce the external around latencies and so you need to have
extra steps if you want to go to malicious security and whatnot. So they come and play, where
this is pure circuit. Anything that is circuit it will just run and there is no round triple agency,
nothing like that. So yes.
>>: Okay what is the precise condition of locality that lets you model an array of stacks and
queues?
>> Samee Zahur: The condition is that whenever you access a particular index, the next thing
that you access needs to be within some constant number of steps. If your constant is large
you pay more. Okay? Great. The other thing we have that is completely unrelated is batched
operation. If you do not have any locality, but let's say you do many writes in one go, or many
reads in one go and you can use [indiscernible] sorting base approaches to get log square n
performance. But yeah, these are sort of the pure circuit based structures. So the conclusion
for the first half is that when your application is such that you do not need perfect random
access, completely random general access rate, there are all of these specialized circuit
structures that you can use for stacks and queues we have like 10 x speedup. For the batch
operation we have like 8x or something. And they are completely protocol agnostic. They are
more versatile. You can use it within existing protocols. That was the first half. Before I go into
the second half, questions? Good. That was for specialized access patterns. Sometimes we
can do that and then sometimes you have completely general random-access. In that case you
kind of have to fall back on oblivious RAM. There has been a lot of work in this. Most of the
implementations today that use of oblivious RAM use freebase oblivious RAM. They were first
introduced by Elaine Shi and others. If you look at the literature, there are tons and tons of
papers. People have been working on it for a long time. These are what other people have
done. They have been implementing hybrid protocols between Yao and ORAM just to see how
they integrate together and what the performance are, which is great. All of them have been
tree-based implementations. But if you look at, let's look at some performance numbers.
Without ORAM what's the performance number? Writing a single 32-bit integer you'll need 32
logic gates. Great. If you know the location, you know exactly where they are located. Raw
Yao performance is, one a million gates per second is actually a low number that you can get at
least three or four on a gigabit for second but let's go with order of magnitude. One a million
gates per second, great. The write speed, you do the division you'll get 31,000 writes per
second if you know the location. The location is not dependent on private data. If you have to
hide access patterns let's say you have 2 to the 16 elements or 65,000 elements, wait. Did I do
the math wrong? I did the math wrong. You will be doing around half accesses per second.
You will be doing around two seconds per access. That's sort of the order of magnitude. If you
do not, if you are using complete linear scan, like no RAM whatsoever. Let's compare this to
keeping in mind the error in this slide, let's compare this to previous work of ORAM
performance. This is from CCS of last year. This was a circuit ORAM minimizing the circuit size
for each ORAM access. It's kind of the best you can do for MPC's. As you can see, if you have 2
to the bar 16 elements per access time is around when second. At this point it is almost in the
same order of magnitude as a linear scan. That's where the breakeven point is. So if you have
less than that there is no point even using oblivious RAM because a plain linear scan of
accessing every single element will be faster access. This is not even taking into account the
fact that you have to initialize oblivious RAM first. If you have an oblivious RAM structure you
at least have to touch each element once just to initialize the structure, so not even taking that
into account. The response to that is great. ORAMs are asymptotically better, so if you go big
enough ORAM should still win out. Yes. ORAM should win out. If you go let's say 2 to the 18 or
2 to the 20. Two to the 20 is like a million elements which is okay. It will definitely win out, but
think of what that means. It means that 2 to the 20 a million elements per access you have to
spend around two seconds. If you have a million elements just to initialize, that's just to write
each element once, you'll need two times a million seconds. That's 2 million seconds. That's
about two weeks, a little over two weeks just to initialize. What happens is if we want to
provide this as a tool to people and say that you can use this to do arbitrary computation, but if
you need random access by the way you need to wait two weeks just to initialize the data, it's
hard to kind of sell MPC to people. We have this strange situation where yes, ORAM will
provide advantages only for applications so slow where even MPC wouldn't be used. For
smaller cases ORAMs are still not usable. What happens is that for many cases people just
wouldn't use oblivious RAMs at all. They would just use plain linear scan, which created this
weird stigma against oblivious RAM. Hey, this is too slow and nobody wants to use it, which we
don't want. The goals here for our cases that we want to design ORAM which provides benefits
that much smaller size and can initialize quickly. We don't need to go through that long
initialization. Those are the two goals here. I will start now describing how this ORAM works
for just four elements. Any questions so far? Let's say we have just for elements. This is the
Waksman shuffling. If you have four elements, so these lines are the data wires and we just
want to shuffle them, this is just the Waksman network which means if they either swap or
leave elements unchanged, if I have a secret control bits controlling them, zero and one control
bits, I can use them to permute these inputs into any given combinations. So if you need to
permute four elements, you need five switches. So the cost of shuffling for elements would be
about five units, could be CPU costs, could be whatever, five units for shuffling four elements.
Here's how we can construct an ORAM, so just four blocks, right? You coming with four pieces
of data. You shuffle them. Once it's shuffled and let's say there is some map that says element
one went to location three and elements we went to location two and whatnot. So there is a
small like two bitmap. Once it's shuffled if you're accessing a certain element and you reveal
the fact that you are accessing this position, that's fine because it has been completely shuffled.
You don't know which original element actually went there. So we can actually revealed that I
am accessing that element and you will be paying a cost of B and that's it. The next time you
access an element you have to reveal, you kind of have to access two elements, the same
elements as before and some other element because you don't want to reveal whether or not
it's repeated access. Now you are paying the cost of 2B. The next time you pay a cost of 3B
because you access the same elements as the previous two and a new one. You can see where
this is going. What we are going to do is not go to 4B again and just shuffle it again and then
keep going. So we do three accesses and then shuffle again. We do three accesses and then
shuffle again. Every three accesses we are paying cost of 5B plus B, 2B, 3B up to 11B, which is
kind of interesting. If you compare it with the linear scan linear scan would pay a cost of 4B per
access, 12 by 3, whereas, we are paying 11 B by 3. So just four blocks we are already doing
better than linear scan, which is much better than the previous schemes that we saw. And
there is no extra initialization other than this shuffling.
>>: If you do the fourth one proportionally you would also win, right?
>> Samee Zahur: You would also win, but then your costs would be 15 by 4, so this is better.
>>: Also, how do you account for the cost of finding which one is accessed?
>> Samee Zahur: You just have a small bit vector, which is a two bit value for each position and
its constant depending on block size.
>>: So it's under the rug [indiscernible]
>> Samee Zahur: It's kind of soft in the rug, yes. The idea is that if you have a large enough
block it wouldn't matter. In practice if you are accounting for, we will have the graph. But
[indiscernible] bandwidth. Block size of 36 bytes is good. If you have larger than the you can
always just divide it up into larger blocks in the set, yes.
>>: If you have a [indiscernible] application where the data is just like bits or something like
that and you just add the access bit here and over here, does that [indiscernible]
>> Samee Zahur: That may be too much and too expensive simply because of the metadata like
you said. What I recommend in that case is to divide it up into several bits and then do a linear
scan on each one of them.
>>: So that would be better than even if you are accessing one bit at a time?
>> Samee Zahur: Yes. The two alternatives you are proposing is that one would be scan all bits
and the other would be divide into four blocks and linear scan each of them, right? So yes, that
would still win out. So that's basically the scheme. What we did is we generalized this into not
just four blocks, n blocks. We generalized it and, unfortunately the asymptotic complexities
were worse than other ORAM schemes. So other existing ORAM schemes would give you B
times log n or log square n. I think log square and log cube n complexities. Ours is much worse.
Ours is square root n times something, but in terms of concrete costs, is still wins out. I should
have the graph. There we go. If you do the comparison, so we did the measurement of 2 to
the 11 and 2 to the 16. We did our own implementation of circuit ORAM. This is actually done
by the same author as circuit ORAM, but anyway our [indiscernible] implementation because
the previous one was Java. This one is faster by a factor of two. We see that this is a linear
scan, circuit ORAM and our skin, so yes. Eventually, circuit ORAM does win out at 2 to the 16,
but it's still better for the smaller cases where you don't have to spend and a large initialization
costs. And talking about initialization, this is just access. In our case this is all of the cost that
there is. In circuit ORAM case you still have to do multiple write up operations to actually
initialize the data. You don't necessarily want that. In fact, if you look at initialization we sort
of computers our own initialization cost is the cost of shuffling because that's all there is. But
there is a fixed hundred x gap between circuit ORAM initialization and ours. Yes?
>>: It looks like the circuit ORAM is always better than the [indiscernible]?
>> Samee Zahur: Yes. It kind of wins out in our implementation, the break even is somewhere
here. Yes.
>>: This is better than previous?
>> Samee Zahur: Yes.
>>: [indiscernible]
>> Samee Zahur: Yes. The language difference plus, the language difference also implies we
can do various low-level things such. ORAM always introduces round trips, right? Since they
introduce roundtrips we can do things like at the TCP level we can disable Nagle's algorithm.
And these are things that would aggregate, let's say you are sending two packets of data. TCP
at the kernel level would aggregate these two packets and send them off to reduce bandwidth.
The problem with that is that once you send one packet it will wait for the next packet and it
will weight in the order of milliseconds. We can execute many gates in one millisecond, so that
doesn't pay out. It actually helps to disable that part.
>>: [indiscernible].
>> Samee Zahur: You did to? Great. We spent more than a week. [laughter]. I feel like there
should be a wiki of these tricks so we don't have to reinvent each time. That's the bandwidth
cost. If you want we can talk a bit more about some of the things that we have to do to actually
make this happen. We already showed this. I am going a little bit out of order. If you look at
previous work all implementations were tree-based ORAMs, but in the other car-based or not
in NPCs there are other kinds of ORAMs. Hierarchical ORAMs were already there and the main
difference between them is as follows. In hierarchical ORAMs the initialization is pretty much
as cheap as ours. It's just a shuffle. That's all it is. However, each access requires a hash
function computed by the client. In NPC setting that would mean a hash function being
computed inside a circuit. So that's a problem. This is why all of the implementations of
ORAMs use tree-based ORAMs. They avoided that. What they paid for was highest
initialization cost. What we did in our case if you think about our approach is we sort of
merged them together. We don't use the tree-based structure. Ours is kind of a hierarchical
structure except that it is limited to two levels, but at the same time we don't use a hash
function. We use a tree-based kind of nested relocation table kind of approach. There is a
nested ORAM or a recursive ORAM that will give us a map of which element goes where. That's
why we get the performance improvement. Yes?
>>: The when you showed with the shuffle, do you see that with a hierarchical?
>> Samee Zahur: Yes. It kind of is, actually. What happens is -- it's hard to see here, but what
happens here is that this is sort of the first level of the hierarchy here and once you keep using
elements these are the elements that end up in the stash. They get moved from the first level
to a stash element which is being scanned each time. In some sense this is a two level
hierarchy here, but I sort of drew it in a different way.
>>: I like the original.
>> Samee Zahur: It's a square root route, yes. The title of the paper was revisiting square root
go rounds, so yes. Absolutely. If you, there were a few other challenges we had to fix such as
creating the position map itself. If you think about it you have a bunch of elements. You do the
shuffle and then we also have to create the position map. The position map is essentially an
inverse permutation, if you think about it. We have to know the -- if you are looking for
elements we need to map to say element zero is in position 2, so we will have element zero in
position 2. This shuffle operation will also need to produce this. And for the first case it's
actually fairly easy. You have some shuffled circuit that does lots of swap operations and so we
tag these with metadata zero, one, two, three four in sequence. We run them in reverse site
with the same swaps. So this is zero, one, two, so there we know that zero maps to position
two and that's how we compute this column here. So that's fairly easy. The problem is that the
next time around you are doing some operation that gets composed with the previous
permutation. Now if that element goes here, that does not mean element two is match here.
Reverse permutation doesn't get you there. One way to solve this would be to use oblivious
sorting, so we could tag them with zero, one, two, three again and then start them using these
values. That will give us reverse permutation. But starting again is n log squared n. That will
add another log n factor to the complexity. We didn't want that. So what we did instead was
come up with a new protocol that just inverses the permutation. This is a secret permutation
that we don't want to reveal and we want to compute its inverse permutation. How do we do
that? We actually use some secret sharing and whatnot. It's not too novel, but the idea is that
this is the permutation that we are going to keep secret. The inverse of this is the output we
want, pi. How do we do it? The way that we do it is we have two parties, Alice and Bob. I was
locally generates a random permutation pi A, feeds it in, uses that to permute this. This result
is the composition now pi A and pi inverse. This gets revealed to Bob. Bob sees the original
permutation shuffled in some manner that he doesn't know. That is the safety review. Once
we have that, Bob can't locally compute the inverse permutation of that to get this. And finally,
we have another permutation circuit, so pi A here and pi inverse here and since it's locally done
we can do the permutation and get just pi. So with two permutation networks, which is n log n
instead of log squared n we can compute the inverse permutation. That's how we did the
ORAM. And the conclusion here is we revisited a well-known scheme. It's an old scheme that
I'm telling you. We showed that this actually can be implemented was really lower initialization
cost and the breakeven point as well as four elements. The hope is this can now be widely
adopted and people can use ORAMs without worrying too much. Yes?
>>: With the motion setting, does the circuit have to check that it's the actual [indiscernible]
>> Samee Zahur: It has to check that they are actually inverse of each other and whatnot, yeah.
>>: [indiscernible]
>> Samee Zahur: It shouldn't require sorting, no. I mean the asymptotic complexity will remain
unchanged, but you will have overheads.
>>: So by the breakeven point you mean with the linear?
>> Samee Zahur: Yes, with the linear. At small sizes that's pretty much the only thing you need
to worry about. There is no other ORAMs. That's it. I guess, download, use and tell me how it
is. If there are complaints, yell. [applause].
>> Melissa Chase: Are there any other questions?
>>: Is your language similar to Oblivion stuff?
>> Samee Zahur: Yes. They were developed around the same time. It's more like
[indiscernible] in the sense that he really attains preprocessing in the gets translated into C, so
you can pretty much see what code is getting generated. It's not too different from what you
write. The pros and cons is that Oblivion is a completely fresh start. It's a clean slate. They
have their own language and they can design it however they want. In our case we have a lot
of C baggage to deal with. The good side is all of the C libraries are pretty much available for
you. You have dynamic [indiscernible] size array. It's a malloc. You have the simple things like
networking. You have threads. Those are things I don't need to invent. You don't have to wait
for me to implement them in the language. They are already there. With Oblivion that's not
the case.
>>: Are there things that you can use in [indiscernible] application? [indiscernible]
>> Samee Zahur: I'm sorry. I didn't get the question.
>>: So you were talking about having all the power of C available. That's in the sort of…
>> Samee Zahur: Both. No, both. Even if you want to do private computation, but you want to
split it up into two threads. That's okay. We will need like five laying wrappers around pthread,
but anybody can write it. You don't have to change the compiler or anything. You can just have
it, yeah. There is almost nothing you need to do. Like the synchronizing primitives like mutex
[phonetic] [indiscernible] like a huge library of all these concurrency primitives that you don't
need to invent. You should be able to just use whatever is already there with some wrappers,
but that's something you can write. They don't require a modification. And there are tools like
existing tools, like profiling tools. See where you can program it slow. These profiling tools will
work here. You have debugging tools. Valgrind [phonetic] will work on it. Valgrind is extremely
useful here, so things like that and not have to [indiscernible].
>> Melissa Chase: Are there any more questions? Let's thank the speaker. [applause].
Download