>> Melissa Chase: Today we are very happy to have Samee Zahur visiting us. Samee is a grad student at the University of Virginia, finishing up very soon. He is working with David Evans and he has been working on building one of the state-of-the-art multiparty computation libraries among other things. And I will let him tell us more about it. >> Samee Zahur: Thanks. I'm Samee. I previously intern here with Ryan Parnell. I was working on verifiable computation. As Melissa said, we'll be talking about the rest of my work back at the University of Virginia. Most of it is on multiparty computation. To give you a brief overview in case you are not familiar, the idea is that let's say, today, if I meet somebody and I want to see if you know other people, like if we have common acquaintances, what do we do? On Facebook we have all of our friend lists and they will say these are the friends in common. We have a trusted third-party who takes all of our data and then does the comparison. The idea is that we shouldn't have to do that. So if you and I have our own private information, let's say our friend list were genetic information and we want to see how closely related we are, we should be able to do that computation without having to reveal our data either to a trusted third-party or to each other. We should be able to perform computation directly on private data. That's the premise of secure multiparty computation, is that we don't have to reveal anything other than just the output. The other applications could include secure auctions and any other data analysis algorithms and neural network algorithms and whatnot. There are ways of computing on private data directly and computing arbitrary functions on private data. The way they generally work is by using Boolean circuits. There are protocols that will take any Boolean logic circuit and gate, or gate and whatnot and glue them together and finally gave you a protocol that will execute that circuit. So some of the input wires will be from one-party. Some of the input wires will have data from the other party and then you execute the entire circuit and finally the output can be shared with several parties, which ever parties you want. This is great, but it makes it kind of hard to use for everyday programmers. My research goal has been so far to make it easy for normal programmers to use. If somebody wants to use these protocols now, they will either have to be experts in cryptography or experts in circuit design, which turns out to be nontrivial even though there are a lot of designs of undergrad courses in pretty much every computer science curriculum. What we want to do is sort of give programmers the tools necessary so that they can use these sorts of technologies without being experts in cryptography. Some of that has been in the form of a new language with its own compiler. It's a language that I developed. It was mostly C like language with a few extra annotations and keywords. For instance, you could have variables declared as Obliv which is secret variables. Any combination done on them will be done cryptographically. Say that certain variables come from party one and certain variables come from party two and then finally reveal the values to both parties. You do that then finally you get the whole program. You compile it and you get the protocol and they execute it. Writing this kind of code will not need to know anything about what is going on underneath. Now if you look at what's going on around it in order to make it all work, you have the Obliv-C framework in the middle. Some of my research has been both in the front and the back inside. For instance it turns out that certain algorithms which are fast in normal computation are not necessarily fast in PC and vice versa. So we had to come up with data structures and algorithms that is fast in Obliv-C and make that available as normal functions to programmers. On the other hand, in the very back end we had to come up with ways of reducing bandwidth usage, reducing CPU usage and whatnot for the actual cryptographic portions. That is sort of the outline of my research so far. In this talk we will be mostly talking about this side of the equation. We'll be talking about how we can do data access fast inside NPC. The central question would be if you have a situation such as this where you have some memory accessing a program and you want to run this program in NPC and whenever you have an area access you are accessing location j and let's say somehow this j variable is dependent data that you want to keep secret. You can't just reveal that I am accessing location j because would expose some results that you want to keep secret. The idea was that we will only have inputs that are private. The only thing that we will reveal is the final output. The interim results should not be revealed. Do you have a question? >>: I do. I am kind of lost. Where is this computation happening? Is it in one of the two parties? >> Samee Zahur: It's distributed computation. If you and I are the two parties our machines will be communicating with each other and run some cryptography operations such that the computation happens in distributed fashion. We will both do some computation such that the intimated values are not seen. >>: What is being showed here is just a specification of the computations? >> Samee Zahur: Yes. Feel free to ask questions by the way. It's perfectly fine if we go off into a tangent that you are more interested in and I don't get to cover. That's perfectly fine. I would much rather cover something that you guys are more interested in. Something as simple as area access is actually difficult if you want to hide which location you are accessing, right? The simplest way to do it would be even if you are just accessing one element, access all of them. If you scan the entire area just to hide which element you are actually interested in. That would be the nonlinear scan approach. But you just take an operation that was constant time and expanded it out to a linear time operation. Most programs would become overbearing so regardless of the fact that NPC is already slow, so it's not going to work. So if you want this technology to be used, programs that are easy to write will remain easy to write. If we have to completely rewrite programs that's a problem. The way to solve it, there are essentially two different approaches. One would be to transform the program or come up with algorithms such that your program is accessing memory locations in a very deterministic fashion, independent of your input data. If you can express your program in that fashion, then great. Your area location is no longer reviewing private information. The other approach is to randomize it. You shuffle the data around constantly so that even if you reveal which location you are accessing, that's perfectly fine because that does not instantly correspond to the logical identity of the element. That might still protect your information. That gives you an outline of the two halves of the talk here. The first-half would be to come up with algorithms which are basically circuit structures that will give you a schedule by which you can access data in a particular deterministic fashion independent of input data. Those are pure circuits but they only work for special cases. In the general case it is not possible. If you have complete random accessing I can't help you there. But they are faster. On the other hand, you have the general random access where you keep shuffling the data. It will hide any kind of random access and it will work all the time, but the problem is it is definitely slower. You have to do some extra conversions there. Before I get started on the first-half, questions? Yes? >>: When you say models I think of this [indiscernible] one of the parties? >> Samee Zahur: No. It could be any of them. It's fine. Any other questions? Great. Circuit structures. In this part of the talk we will be covering the very basic data structures, so stacks and queues and associated maps. They are extremely easy normally. They are easy to implement in normal programs. The problem is if you have a stack and let's say you are pushing elements into it there is some condition stack that push x. If your condition is secret than the length of your stack becomes a secret value. You cannot renew it. The moment that happens you have to now figure out how to implement the stack without revealing where you are. The way we represent it inside a circuit would look kind of like this. You have some conditional push circuit made out of some logic gates. Your inputs will be the condition that is secret, some intermediate value x which results in a secret. It's getting pushed in. You have the old stack elements coming in and the new stack elements going out. Now what I'll show you is how to efficiently implement this push operation. What is a naïve approach? Here is a really naïve approach. We have the old elements, a0, a1, a2 and a3. We have the new elements, the primes. You have the conditions and these boxes are basically multiplexers. They will choose one or the other depending on the condition. If condition is zero they will all choose the right hand side of the their inputs and pass it on to the output. If the condition is one they will take the left hand side. If the condition is one x gets passed into a0. Everything gets shifted. If the condition is zero the x gets ignored in the old values just pass right through. This works. This is a valid circuit for doing conditional push operation. The problem is we are using a linear number of gates in order to implement a single push. That's the problem we are trying to avoid. The way we solve it is quite simple. The idea is we break up this buffer into small pieces and we put empty spaces into those buffers, so that when we are doing a shift operation we don't shift everything. It's basically as simple as that. We take this one row in the next diagram, consider just the top row. That's the just the one row from the previous round. The moment we are doing a push operation we start with by making sure we have at least two empty spaces. We have five elements here, 10 elements, 20 elements in the next level and 40 elements, so two powers of 2x5. We do to push operations. We know those succeed. X is only the first level, level 0 and nothing else. >>: What do you mean by these levels? >> Samee Zahur: The top line, top row is one of these buffers. What we have is that we have taken this buffer and divided it up into pieces. That is level 0 and that is level one buffer. >>: [indiscernible] >> Samee Zahur: Yes. We making invariance of how many spaces we keep at each level. After two operations we know. We started with at least two empty spaces so we know after two operations it might be full depending on your conditions. So after every two operations we shift from level 0 to little one. After every for operations we will shift from level one to level two and so on. If you do that and count up the costs you will notice that -- let's do a counting right here. For each operation we are accessing level 0. Let's say five units of cost. After every two operations we accessing level one. But level one is twice as big, so each time you pay 10 units. But you are accessing it half the time, so 10 times a half, again, five. Similarly, level two will be four times as big but access is 1/4 the time. So at each level you are sort of paying on average of five units of cost per access. You have logarithm number of levels because the level and size are increasing. [indiscernible] of log n levels. What happens towards the end is that each access on average you are paying five times log n cost essentially, some constant 10 log n cost. That's how you do stack push and the reason we have five is that we start with at least two empty spaces. If you want to do pop operations then you need at least two full spaces so that you can serve your elements and then you need one extra element just in case some of the conditions were false and you need odd number of elements. >>: So the idea is that you are not actually pushing something at that first step you would put a space there? >> Samee Zahur: Yes. >>: So nine might be [indiscernible] >> Samee Zahur: Nine might be interesting. >>: And then when you smoosh everything over to the right [indiscernible] >> Samee Zahur: Actually, no. Whatever we do writes nine is empty. Next time we push seven will just go here. >>: And that's because you read the whole thing? >> Samee Zahur: Yes. It is the linear scanning. >>: How do you deal with errors? Do you do a conditional push and then you do a pop? And then at that point the pop my return an error or it might not? >> Samee Zahur: Yes. Good question. You can do anything you want. You can have it such that the error would be a secret condition. First of all, in most cases we recommend that you write the program in such a way that that doesn't happen. You maintain your invariance so that it doesn't happen. You can definitely have some extra circuitry. Doing a pop-up will check whether or not it is empty. If so it will set a Boolean flag. But then again, whether or not you reveal that Boolean flag is up to you. It might be a Boolean flag that you only reveal at the very end. If something went wrong I'm not going to reveal the results. That can be done. That can be depending on your preference. >>: Where does that [indiscernible] end? Do you have the upper bound [indiscernible] >> Samee Zahur: Yes. If you know that you have a maximum of n elements in the set at any time, then you can just have log n levels. >>: But you have to know that statically? >> Samee Zahur: You have to know that statically. You have to know something statically always. For instance, if you're doing n push operations you know it will never exceed n. Statically you at least have to know how many operations you are doing because -- I mean, you don't have to. So let's put it this way. You always have to reveal how long your program is running. That is something you are not going to be able to hide. If you can either reveal how many operations you need or you can do extra operations, that's up to you. Anything else? So that's stack. Similarly, I'm not going to go into details but you can do queues in a very similar way, just have extra errors come between them. But the result is the same. When you do the evaluation you just compare with linear scan here because that's pretty much the only thing that it can be. That is what you would expect of log. There is no surprises here, except for the fact that it's small. You don't have a hidden giant cost. It's doable. It's really efficient. And the best part is it's completely circuit based. It doesn't matter what protocol you are using. There are many different protocols which you can instantiate. There is [indiscernible]. There is [indiscernible]. You can have [indiscernible]. You can have malicious, whatever. The same algorithm would work unchanged. It's very critical agnostic. And so yeah. It's good that way. The thing is that once you have stats and queues you can do memory access for any locality. If you have an error here and you are accessing this i and this j and you know that they will only be incremented or decremented in small increments or decrements, then you can just break it up into many different steps and queues and use steps and queues circuits to access them in log n time instead of using a general [indiscernible]. That would be much more efficient to do it this way. Make sense? I see some frowning faces. Okay. Yes? >>: So there's like [indiscernible] stage junctures based on [indiscernible]? How is this more efficient? >> Samee Zahur: Yes. At least in my experience, yes. The reason is whenever you use or ands you, those are not general circuits in the sense that you have to review which path you are reading from and not. You introduce the external around latencies and so you need to have extra steps if you want to go to malicious security and whatnot. So they come and play, where this is pure circuit. Anything that is circuit it will just run and there is no round triple agency, nothing like that. So yes. >>: Okay what is the precise condition of locality that lets you model an array of stacks and queues? >> Samee Zahur: The condition is that whenever you access a particular index, the next thing that you access needs to be within some constant number of steps. If your constant is large you pay more. Okay? Great. The other thing we have that is completely unrelated is batched operation. If you do not have any locality, but let's say you do many writes in one go, or many reads in one go and you can use [indiscernible] sorting base approaches to get log square n performance. But yeah, these are sort of the pure circuit based structures. So the conclusion for the first half is that when your application is such that you do not need perfect random access, completely random general access rate, there are all of these specialized circuit structures that you can use for stacks and queues we have like 10 x speedup. For the batch operation we have like 8x or something. And they are completely protocol agnostic. They are more versatile. You can use it within existing protocols. That was the first half. Before I go into the second half, questions? Good. That was for specialized access patterns. Sometimes we can do that and then sometimes you have completely general random-access. In that case you kind of have to fall back on oblivious RAM. There has been a lot of work in this. Most of the implementations today that use of oblivious RAM use freebase oblivious RAM. They were first introduced by Elaine Shi and others. If you look at the literature, there are tons and tons of papers. People have been working on it for a long time. These are what other people have done. They have been implementing hybrid protocols between Yao and ORAM just to see how they integrate together and what the performance are, which is great. All of them have been tree-based implementations. But if you look at, let's look at some performance numbers. Without ORAM what's the performance number? Writing a single 32-bit integer you'll need 32 logic gates. Great. If you know the location, you know exactly where they are located. Raw Yao performance is, one a million gates per second is actually a low number that you can get at least three or four on a gigabit for second but let's go with order of magnitude. One a million gates per second, great. The write speed, you do the division you'll get 31,000 writes per second if you know the location. The location is not dependent on private data. If you have to hide access patterns let's say you have 2 to the 16 elements or 65,000 elements, wait. Did I do the math wrong? I did the math wrong. You will be doing around half accesses per second. You will be doing around two seconds per access. That's sort of the order of magnitude. If you do not, if you are using complete linear scan, like no RAM whatsoever. Let's compare this to keeping in mind the error in this slide, let's compare this to previous work of ORAM performance. This is from CCS of last year. This was a circuit ORAM minimizing the circuit size for each ORAM access. It's kind of the best you can do for MPC's. As you can see, if you have 2 to the bar 16 elements per access time is around when second. At this point it is almost in the same order of magnitude as a linear scan. That's where the breakeven point is. So if you have less than that there is no point even using oblivious RAM because a plain linear scan of accessing every single element will be faster access. This is not even taking into account the fact that you have to initialize oblivious RAM first. If you have an oblivious RAM structure you at least have to touch each element once just to initialize the structure, so not even taking that into account. The response to that is great. ORAMs are asymptotically better, so if you go big enough ORAM should still win out. Yes. ORAM should win out. If you go let's say 2 to the 18 or 2 to the 20. Two to the 20 is like a million elements which is okay. It will definitely win out, but think of what that means. It means that 2 to the 20 a million elements per access you have to spend around two seconds. If you have a million elements just to initialize, that's just to write each element once, you'll need two times a million seconds. That's 2 million seconds. That's about two weeks, a little over two weeks just to initialize. What happens is if we want to provide this as a tool to people and say that you can use this to do arbitrary computation, but if you need random access by the way you need to wait two weeks just to initialize the data, it's hard to kind of sell MPC to people. We have this strange situation where yes, ORAM will provide advantages only for applications so slow where even MPC wouldn't be used. For smaller cases ORAMs are still not usable. What happens is that for many cases people just wouldn't use oblivious RAMs at all. They would just use plain linear scan, which created this weird stigma against oblivious RAM. Hey, this is too slow and nobody wants to use it, which we don't want. The goals here for our cases that we want to design ORAM which provides benefits that much smaller size and can initialize quickly. We don't need to go through that long initialization. Those are the two goals here. I will start now describing how this ORAM works for just four elements. Any questions so far? Let's say we have just for elements. This is the Waksman shuffling. If you have four elements, so these lines are the data wires and we just want to shuffle them, this is just the Waksman network which means if they either swap or leave elements unchanged, if I have a secret control bits controlling them, zero and one control bits, I can use them to permute these inputs into any given combinations. So if you need to permute four elements, you need five switches. So the cost of shuffling for elements would be about five units, could be CPU costs, could be whatever, five units for shuffling four elements. Here's how we can construct an ORAM, so just four blocks, right? You coming with four pieces of data. You shuffle them. Once it's shuffled and let's say there is some map that says element one went to location three and elements we went to location two and whatnot. So there is a small like two bitmap. Once it's shuffled if you're accessing a certain element and you reveal the fact that you are accessing this position, that's fine because it has been completely shuffled. You don't know which original element actually went there. So we can actually revealed that I am accessing that element and you will be paying a cost of B and that's it. The next time you access an element you have to reveal, you kind of have to access two elements, the same elements as before and some other element because you don't want to reveal whether or not it's repeated access. Now you are paying the cost of 2B. The next time you pay a cost of 3B because you access the same elements as the previous two and a new one. You can see where this is going. What we are going to do is not go to 4B again and just shuffle it again and then keep going. So we do three accesses and then shuffle again. We do three accesses and then shuffle again. Every three accesses we are paying cost of 5B plus B, 2B, 3B up to 11B, which is kind of interesting. If you compare it with the linear scan linear scan would pay a cost of 4B per access, 12 by 3, whereas, we are paying 11 B by 3. So just four blocks we are already doing better than linear scan, which is much better than the previous schemes that we saw. And there is no extra initialization other than this shuffling. >>: If you do the fourth one proportionally you would also win, right? >> Samee Zahur: You would also win, but then your costs would be 15 by 4, so this is better. >>: Also, how do you account for the cost of finding which one is accessed? >> Samee Zahur: You just have a small bit vector, which is a two bit value for each position and its constant depending on block size. >>: So it's under the rug [indiscernible] >> Samee Zahur: It's kind of soft in the rug, yes. The idea is that if you have a large enough block it wouldn't matter. In practice if you are accounting for, we will have the graph. But [indiscernible] bandwidth. Block size of 36 bytes is good. If you have larger than the you can always just divide it up into larger blocks in the set, yes. >>: If you have a [indiscernible] application where the data is just like bits or something like that and you just add the access bit here and over here, does that [indiscernible] >> Samee Zahur: That may be too much and too expensive simply because of the metadata like you said. What I recommend in that case is to divide it up into several bits and then do a linear scan on each one of them. >>: So that would be better than even if you are accessing one bit at a time? >> Samee Zahur: Yes. The two alternatives you are proposing is that one would be scan all bits and the other would be divide into four blocks and linear scan each of them, right? So yes, that would still win out. So that's basically the scheme. What we did is we generalized this into not just four blocks, n blocks. We generalized it and, unfortunately the asymptotic complexities were worse than other ORAM schemes. So other existing ORAM schemes would give you B times log n or log square n. I think log square and log cube n complexities. Ours is much worse. Ours is square root n times something, but in terms of concrete costs, is still wins out. I should have the graph. There we go. If you do the comparison, so we did the measurement of 2 to the 11 and 2 to the 16. We did our own implementation of circuit ORAM. This is actually done by the same author as circuit ORAM, but anyway our [indiscernible] implementation because the previous one was Java. This one is faster by a factor of two. We see that this is a linear scan, circuit ORAM and our skin, so yes. Eventually, circuit ORAM does win out at 2 to the 16, but it's still better for the smaller cases where you don't have to spend and a large initialization costs. And talking about initialization, this is just access. In our case this is all of the cost that there is. In circuit ORAM case you still have to do multiple write up operations to actually initialize the data. You don't necessarily want that. In fact, if you look at initialization we sort of computers our own initialization cost is the cost of shuffling because that's all there is. But there is a fixed hundred x gap between circuit ORAM initialization and ours. Yes? >>: It looks like the circuit ORAM is always better than the [indiscernible]? >> Samee Zahur: Yes. It kind of wins out in our implementation, the break even is somewhere here. Yes. >>: This is better than previous? >> Samee Zahur: Yes. >>: [indiscernible] >> Samee Zahur: Yes. The language difference plus, the language difference also implies we can do various low-level things such. ORAM always introduces round trips, right? Since they introduce roundtrips we can do things like at the TCP level we can disable Nagle's algorithm. And these are things that would aggregate, let's say you are sending two packets of data. TCP at the kernel level would aggregate these two packets and send them off to reduce bandwidth. The problem with that is that once you send one packet it will wait for the next packet and it will weight in the order of milliseconds. We can execute many gates in one millisecond, so that doesn't pay out. It actually helps to disable that part. >>: [indiscernible]. >> Samee Zahur: You did to? Great. We spent more than a week. [laughter]. I feel like there should be a wiki of these tricks so we don't have to reinvent each time. That's the bandwidth cost. If you want we can talk a bit more about some of the things that we have to do to actually make this happen. We already showed this. I am going a little bit out of order. If you look at previous work all implementations were tree-based ORAMs, but in the other car-based or not in NPCs there are other kinds of ORAMs. Hierarchical ORAMs were already there and the main difference between them is as follows. In hierarchical ORAMs the initialization is pretty much as cheap as ours. It's just a shuffle. That's all it is. However, each access requires a hash function computed by the client. In NPC setting that would mean a hash function being computed inside a circuit. So that's a problem. This is why all of the implementations of ORAMs use tree-based ORAMs. They avoided that. What they paid for was highest initialization cost. What we did in our case if you think about our approach is we sort of merged them together. We don't use the tree-based structure. Ours is kind of a hierarchical structure except that it is limited to two levels, but at the same time we don't use a hash function. We use a tree-based kind of nested relocation table kind of approach. There is a nested ORAM or a recursive ORAM that will give us a map of which element goes where. That's why we get the performance improvement. Yes? >>: The when you showed with the shuffle, do you see that with a hierarchical? >> Samee Zahur: Yes. It kind of is, actually. What happens is -- it's hard to see here, but what happens here is that this is sort of the first level of the hierarchy here and once you keep using elements these are the elements that end up in the stash. They get moved from the first level to a stash element which is being scanned each time. In some sense this is a two level hierarchy here, but I sort of drew it in a different way. >>: I like the original. >> Samee Zahur: It's a square root route, yes. The title of the paper was revisiting square root go rounds, so yes. Absolutely. If you, there were a few other challenges we had to fix such as creating the position map itself. If you think about it you have a bunch of elements. You do the shuffle and then we also have to create the position map. The position map is essentially an inverse permutation, if you think about it. We have to know the -- if you are looking for elements we need to map to say element zero is in position 2, so we will have element zero in position 2. This shuffle operation will also need to produce this. And for the first case it's actually fairly easy. You have some shuffled circuit that does lots of swap operations and so we tag these with metadata zero, one, two, three four in sequence. We run them in reverse site with the same swaps. So this is zero, one, two, so there we know that zero maps to position two and that's how we compute this column here. So that's fairly easy. The problem is that the next time around you are doing some operation that gets composed with the previous permutation. Now if that element goes here, that does not mean element two is match here. Reverse permutation doesn't get you there. One way to solve this would be to use oblivious sorting, so we could tag them with zero, one, two, three again and then start them using these values. That will give us reverse permutation. But starting again is n log squared n. That will add another log n factor to the complexity. We didn't want that. So what we did instead was come up with a new protocol that just inverses the permutation. This is a secret permutation that we don't want to reveal and we want to compute its inverse permutation. How do we do that? We actually use some secret sharing and whatnot. It's not too novel, but the idea is that this is the permutation that we are going to keep secret. The inverse of this is the output we want, pi. How do we do it? The way that we do it is we have two parties, Alice and Bob. I was locally generates a random permutation pi A, feeds it in, uses that to permute this. This result is the composition now pi A and pi inverse. This gets revealed to Bob. Bob sees the original permutation shuffled in some manner that he doesn't know. That is the safety review. Once we have that, Bob can't locally compute the inverse permutation of that to get this. And finally, we have another permutation circuit, so pi A here and pi inverse here and since it's locally done we can do the permutation and get just pi. So with two permutation networks, which is n log n instead of log squared n we can compute the inverse permutation. That's how we did the ORAM. And the conclusion here is we revisited a well-known scheme. It's an old scheme that I'm telling you. We showed that this actually can be implemented was really lower initialization cost and the breakeven point as well as four elements. The hope is this can now be widely adopted and people can use ORAMs without worrying too much. Yes? >>: With the motion setting, does the circuit have to check that it's the actual [indiscernible] >> Samee Zahur: It has to check that they are actually inverse of each other and whatnot, yeah. >>: [indiscernible] >> Samee Zahur: It shouldn't require sorting, no. I mean the asymptotic complexity will remain unchanged, but you will have overheads. >>: So by the breakeven point you mean with the linear? >> Samee Zahur: Yes, with the linear. At small sizes that's pretty much the only thing you need to worry about. There is no other ORAMs. That's it. I guess, download, use and tell me how it is. If there are complaints, yell. [applause]. >> Melissa Chase: Are there any other questions? >>: Is your language similar to Oblivion stuff? >> Samee Zahur: Yes. They were developed around the same time. It's more like [indiscernible] in the sense that he really attains preprocessing in the gets translated into C, so you can pretty much see what code is getting generated. It's not too different from what you write. The pros and cons is that Oblivion is a completely fresh start. It's a clean slate. They have their own language and they can design it however they want. In our case we have a lot of C baggage to deal with. The good side is all of the C libraries are pretty much available for you. You have dynamic [indiscernible] size array. It's a malloc. You have the simple things like networking. You have threads. Those are things I don't need to invent. You don't have to wait for me to implement them in the language. They are already there. With Oblivion that's not the case. >>: Are there things that you can use in [indiscernible] application? [indiscernible] >> Samee Zahur: I'm sorry. I didn't get the question. >>: So you were talking about having all the power of C available. That's in the sort of… >> Samee Zahur: Both. No, both. Even if you want to do private computation, but you want to split it up into two threads. That's okay. We will need like five laying wrappers around pthread, but anybody can write it. You don't have to change the compiler or anything. You can just have it, yeah. There is almost nothing you need to do. Like the synchronizing primitives like mutex [phonetic] [indiscernible] like a huge library of all these concurrency primitives that you don't need to invent. You should be able to just use whatever is already there with some wrappers, but that's something you can write. They don't require a modification. And there are tools like existing tools, like profiling tools. See where you can program it slow. These profiling tools will work here. You have debugging tools. Valgrind [phonetic] will work on it. Valgrind is extremely useful here, so things like that and not have to [indiscernible]. >> Melissa Chase: Are there any more questions? Let's thank the speaker. [applause].