>> Christian Bird: All right. Welcome everybody. Thank you for coming out to listen to Yuriy's talk. I'm Christian Bird. I have the opportunity of hosting Yuriy Brun today. Yuriy Brun is currently an assistant professor at University of Massachusetts Amherst. He got his Ph.D. at USC and then had a post-doc close to University of Washington, worked with David Melkin and Michael Ernst. And we had some collaboration with him in the past. And he's here today to talk to us about privacy and reliability in an untrusted cloud. So I hand it over to you. >> Yuriy Brun: Thank you, Chris. Thanks everyone for coming. So I'm going to talk to you about privacy and reliability in the cloud. And those of you who know my work will find that this is quite different from the work. Most of you know me for doing stuff with developers, helping developers do particular kinds of actions. What I'm going to be talking about today is how do we illuminate whole classes of actions developers have to do from the set of things that they have to do. So I'm going to be coming up with a technique that tries to inject privacy into systems and try to inject reliability into systems without the developer worrying too much about how to do that. So let's jump right in. I'll talk about privacy first. Let me go ahead and talk about what I mean by privacy in the cloud. So the cloud is a well known term today. There's lots and lots of things out there that calls itself the cloud. And the problem that I foresee with the cloud is the following: If I used to do my taxes, I used to get my computer. I would download a program on to my computer. I would enter stuff into this program. If I wanted to make sure that nobody stole my private data, what I had to do is make sure that nobody broke into my computer and stole the data or nobody physically stole my computer. Now, today that's not the case. If I want to do the taxes, I rarely download a program to my computer what I really do is go something to turbotax.com and I enter my data into this program and it's not stored on my computer it's stored somewhere else out on the cloud. Now I have to worry about two things. First, I have to worry that nobody breaks into Intuit's cloud, they're the ones that make Turbo Tax, and steals the data. But second I don't actually know where these computers are. I don't know if Intuit is outsourcing this to some other cloud and these computers actually live on Azure or somewhere else. So what I'd really like to do is I'd like to make sure that the data that I'm entering isn't known even by those computers themselves. So I would like to distribute a computation on to a cloud or on to some large network without having those individual computers know the private data that I'm entering into the computation. I want someone to do the work for me without them knowing what they're doing. That's the problem I'm going to try to tackle today. There's lots of different ways I can try to talk to you about what privacy means. I'm going to focus on a particular thing. Today I'm going to tell you about a technique called Style. It's a technique for privately solving computationally problems. I'm not going to look at taxes. I'm not going to look at Gmail. I'm going to start out with NP complete problems. Particularly I'll talk about 3-Sat and talk about how do we solve an instance of a 3-Sat problem on the cloud without the computers on the cloud knowing that input, knowing the particular formula that I want to solve. And in particular, this is an even harder problem than taxes. Because there's a very small input. Not just -- I see some nods. It's not just harder because it's NP hard but it's harder from the point of view of privacy as well. So there's a very small input and a lot of computation has to happen on that input. I'll try to keep that input private, despite taxes you have add a number, won't have that number again. It might be easier to keep it private. The hope is if you can do it for NP complete problems, you can expand it to a much more wider range of problems. How do we do this? Turns out this is a NP hard problem. In fact, there's lots of people working on the problem. Those of those in the audience familiar with homomorphic encryption, tries to solve the exact same problem I'm trying to solve. The idea is you have lots of computers, you want to distribute a computation on the computers without them knowing what they're doing and give you back an answer you can use, but they don't know what the answer is. So I'm going to make a -- circumvent the problems that people have identified. So, in particular, so Childe, in 2005, has proven that for certain kind of problem it's actually not possible to get help from somebody else without telling them what the problem is what the input is, he showed that for NP complete problem. So how am I going to get around it? This is what I'm going to do. I'm going to take a problem and distribute it to lots and lots of computers. Now it's going to be the case if you look at all the computers, if you were to compromise all the computers, all together they're going to know my data. But if I look at any one computer, or if I look at any several computers and I'll show you later that as long as I don't control up to half of the network, it's very hard to reconstruct the whole input. So that's how I'm circumventing these hardness proofs from before that it is the case that the whole cloud is going to know my data. But if you look at reasonably large chunks of the cloud, they won't. provides these guarantees. So it All right. So in particular I'll come back to homomorphic encryption later in the talk but in particular the distinction is that the privacy guarantees are weaker than with homomorphic encryption because the entire entity would know the computation. But it's a systems approach. It's a much more efficient approach than homomorphic is today. I think highly of homomorphic encryption and I think a decade from now we'll all be using it, hopefully, but today this is a much more efficient approach. How do we get there? What we're going to do is I'm going to take a computation and I'm going to divide it into its very basic very elemental pieces, we'll talk about how you do that. We're going to take these pieces the sub computations and distribute them in a smart way on the network so they all sort of self-assemble self-compute and everything comes together and out pops the answer for you. That's the high level approach. So in order to do that, I'm going to need to describe to you a theoretical model of how do you take a problem and separate it into little pieces. Let's talk about a particular theoretical model here is an example of a model you need an input, computation, and I'll talk about each one of these pieces the next few slides. You need an input that you can encode in some way. And I'm going to encode things using these tiles, little squares labels on them, you can think of them as cellular automata. And you need a program that some way acts on the input. Here's a program that happens with addition, I'll talk about that in a second. Then in this view of computation I'm talking about, you take an input that encodes your input, you take a program. You mix them together. There's lots and lots of copies of this program. And these little pieces are going to self-assemble and somehow build up a large what I call a crystal, assembly of these tiles, that encodes the answer. That's what it means to compute. Let's do it in detail, this example, so you understand what I'm talking about. First of all, I want to talk about the program. The program for addition -- here's an example of a program. Here's the hard part. This is the part you have to write using tiles to perform a particular calculation you're interested in. This program is for addition. And we don't have to understand too much of how this works, but you can sort of see that each one of these tiles encodes information, encodes bits. There's 0s, there's 1s. So I'm going to give you a little bit of a hint as to how this works. Each one of these tiles is a one bit full ladder. So this tile right here, it's adding 0 and 0 and 0. And it's coming up with the answer is 0. And the carry bit is 0. Not very interesting. If we take a look at one of these guys it's adding 1 and 0 and 1. And in binary 1 plus 1 plus 0 is 0 with a carry bit of 1. So we'll see how this comes together in the future. Let's take a look at what we're going to do with these tiles. Let's take a looking at an input. Suppose I want to add 10 and 11. Let's go 10 and binary, 101011 binary 11, you build something like this. This is an input to the computation. So here is 1010, 10 encoded on top, and here is 1011 encoded on the bottom, and what's going to happen is we'll look through this program and these tiles are going to attach under certain rules. Here the rule is whenever a tile matches on three sides with this crystal, with this growing assembly, it will attach. So there's one place here where there's three labels that are available for attachment. The 001. So the 001 tile is going to attach there. Right. We've thrown it in here. Now what you see that it's done is it's added this bit and this bit and this incoming 0 carry bit. So it said 0 plus 1 is 1. So it added the two least significant bits, and the carry bit from 1 is 0. The next thing you'll do is add this 101, 101 will pop in and you can fill it in all the way in there. If you read in the middle you'll get 10101, which is 21 and binary. This is a very simple example. I'm just doing this full adder but I'm trying to illustrate how you can take a computation break and out comes little chunks. Now we can do this, everything I described so far it's in a crazy model. in a model not in a software system. It's So what do I envision doing this with a software system? Well, I envision that each one of these tiles is going to deploy on some computer. There's going to be computers out there and we'll talk about how they're out there and they're going to deploy these tiles. And so you can envision that every single one of these things is deployed on a particular computing now if we come to a computer and we compromise the computers this guy and this guy and this guy. We get pieces of the input. We get pieces of the output. We get pieces of the intermediate computation, we're sort of getting these chunks an entity that controls the three computers can get these pieces but all they see is a few bits. And I'll talk a little more about this later, but they don't actually even see that this one and this one are apart. So all they know is they are not connected they know these guys are connected. They can connect pieces directly connected but as soon as there's a disconnect you can't piece it back together. In order to gain the whole input you have to piece together through maybe you have multiple copies of this addition going on at once you have to piece together all the pieces that's where the hardness is going to come from. So this is a very high level intuition. Now, so far I've been talking about addition. The same thing you do with addition you can do with more complex problems. Addition was just for explanation purposes. So with addition, you take a tie, you take a way to encode the input and you can build something that will automatically find the answer for you. You can do the same thing with satisfiability. So here's a system that solves 3-Sat. Now, this is way too small to read so you don't need to read that. And I'm not actually going to focus today I want to focus today on how the software But I encourage you to take a look at the to see how this works but the idea is you tiles. And then there's a program. It's that I've written. on how the 3-Sat tile system works. system that distributes it works. natural computing paper if you want encode the 3-Sat formula using these very similar to the addition program Instead of eight different tile types that addition had, it has 64 different tile types. And so then these tile types can come in and they attach and they grow and they find the right answer to the problem. Now, this is an NP complete problem. Something that's important to understand I'm not trying to solve NP complete problems in polynomial time. All I'm doing is I'm trying to put NP complete problems out on the cloud to solve them faster not looking for polynomial algorithms. This algorithm is nondeterministic. It's going to guess an assignment. It's not the dumbest assignment, it's not going to guess two to the N assignment. It does 1.7 something, something, something to the N assignments. Because it makes smart choices along the way and prune things along the way. Tiling is universal, improve any algorithm you want using tiles. But that's really the hard part of the approach. What I'm trying to talk to you about today is let's say we go through the trouble of implementing our code, and there's actually -- you can compile it. Don't have to write the tiles. Go through the trouble of tile assembly mode, if we do that, what can we get out of it? And the answer is privacy. But it's at the cost of efficiency cost of a couple of other things, that's what we're trying to compare. So the same thing that happens in addition you can do with NP complete computation, encode the input. You have to make lots of copies of the input now because it's nondeterministic and each one of these assemblies kind of grows I'll show you the process of that happening and eventually there's this black tile in the corner. If the black tile attaches, that indicates that it's found the answer. It's found a particular assignment that will lead to the satisfiability of this formula. If the assembly can also get stuck some where in the way if it gets stuck the black tile will never attach to that assembly. That's going to happen to most assemblies there's only a few special ones that find the answer. Okay. So how does this process actually work? When we're dealing with computers, the idea is the first thing you do you go up to a network and you want to make this network deploy your style computation. You have to tell the computers the program they're going to run. So you have to go out to a computer and you have to say you are going to be this type of a tile. You're going to deploy a particular type of a tile, which is defined by those side labels that it has. You go on to the next one and next one. And as I said before, there's 64 different tile types for 3-Sat. So you can take this system and now using some gossip protocols, they can spread this information out and they can assign every computer on the network that's going to participate in the computation to particular tile type. And then you build a single seed. So you build one seed out of computers and network through deploying these tiles. That are deploying each one of these individual tiles and they know they're connected. And that's all you have to do as a client and now the system is going to take over on its own. So the first thing system is going to do it's going to go out and create copies of these seeds. So it's going to go out. And it's going to find other computers that deploy the same tiles as itself. So each one of these tiles is going to go out and find another computer that deploys the same tile and ask it to replicate itself to create a copy of itself. So the seed is going to create copies out there and then they communicate with one another using these neighbors in order to create an exact connected copy to this seed. So this isn't too tricky, probably believe me if I said that, I have an assembly of computers that know about their neighbors they can go out and create a copy of another set of computers that are going to look just like them. Okay. Once you have each one of these assemblies built, they're going to start growing. So each one, without waiting for others to finish, is going to start growing. And under particular conditions in this corner here these two tiles are going to say we need a neighbor. We don't have a neighbor. We have a couple of exposed sites here. We need a neighbor. So they're going to start querying nodes on the network to see whether or not they can attach. When they do this querying, they're going to use secure multi-party computation protocols. Yao's protocol. The idea is that when this node, when this tile tries to attach, it's going to use learn only two things. Learn one thing, it's going to either attach there or not attach there. Either it matches on the sides or doesn't. It's not going to learn any of those bits of data that's stored within each one of the tiles. So you only learn the interfaces. And that's how you prevent information from spreading. Okay. So maybe this tile doesn't fit. I'll ask somebody else. Somebody else does fit. There's intricate algorithm in there to make sure you don't end up answering too many nodes, you can. Quickly figure out if anyone fits or not once the node starts fitting the assembly is going to start growing. So there's multiple levels of parallelism in the system. First, there's lots and lots of these copies of these assemblies going in parallel. These can be completely independent and second of all each one of these little growth places each time you have a step, you can have a tile trying to attach there as well. Lots of these things can happen in parallel. And then once this crystal grows, once you get to the end so some lucky crystal will get one of these black checkmark tiles and that tile can contact back to the client and say, hey, I found the answer. And the idea of this if you think about it there's lots of these inputs going in parallel. And most of them are not finding the right answer and the hard part about this problem is fishing out the one that is finding the right answer. This assembly here is encoding precisely the answer the to your problem. What the 3-Sat assignment is to satisfy the formula. Once you get the right one with the right authentication the client can go and find out what the assignment was. With some NP complete problems all you really care about is whether something is satisfiable or not it's sort of a binary decision, so you don't even have to disassemble this crystal you just get that bit. Yes? >>: Does the algorithm backtrack if you say this tile goes here and later you realize there's no answer, can you factor ->> Yuriy Brun: That's a good question. The question is does the algorithm backtrack. In this particular implementation it doesn't, because what's happening here is that you have lots and lots of copies of the seeds and they're exploring all the possible paths. Now, you could implement something smarter where it actually does backtrack, goes up and gets stuck, things fall off. There's some implementations of systems like this, not in computers, but actually in D and A where it actually does precisely that. For this purpose I didn't bother. I sort of created lots and lots of copies all exploring different paths and the goal here isn't efficiency. We'll see it in later slides the algorithm isn't efficient it's efficient enough today for some purposes but the key is you can get really good privacy and that's a proof of concept in that sense. This is how you report the answer to the client. So what I'd like to do now is I try to give you some intuition about how the system works. I really want to talk about a copy of the system I actually built and talk about some empirical evidence, empirical evaluation of that system. Before we get to the empirical part first I want to talk about where does the privacy come from, why is it hard to break the data to crack the data in the system? All right. So let's talk about formal proofs of privacy. Before we get to the formal proofs, pretty graph. So for three different sizes of problems, for 20 bit, 38 bit and 56 bit input, what I'm showing here is as the fraction of the network you can compromise increases, here you've compromised half the network, what's the probability that you can crack the data. And so there's a few things to notice on here. One is that the bigger your problem is, the harder it is to reconstruct all the data. And that's a good thing. You want that, because for small toy problems maybe it's easier to reconstruct old data but when the problem gets into a space mini problem it becomes harder to reconstruct the data. Another thing to know is that as you come over here towards one-half, if you compromise half of the network, there is asymptotically you get 1 over E, 1 minus over E or 63 percent chance of being able to get the data out of the system. So that's not very good. I wouldn't be happy if my taxes had a 63 percent chance of being hacked. But as soon as you push over here to .2 or .1, so 10 percent of the network is compromised the probability of hacking the data is very low. So if you have a 56 bit input, 10 percent, you're getting 10 to the negative 40th roughly probability. So very low. That's the exponential drop-off you want. So let's take a look at where these graphs come from. So, first of all, what is my threat model. The threat model I assume byzantine nodes that are trying to actively collude with one another in order to figure out the private data. You figure out the probability they can collect enough of the pieces of different inputs floating around to put back together your entire input. So there's three arguments here. There's the fraction of the network you've compromised. There's the size of your input, the larger your input, the harder it is to put together, but because this is an NP complete problem, the larger your input, the more seeds you'll need, the more copies of the input you'll need. In fact the number is exponential in the size of your input. So you get this kind of -- it's very formula, but it's a very ugly interaction. You get this fraction here that's being exponentially pushed down by the size of the input. But then it's being doubly exponentially pushed up in the number of inputs that you can collect different pieces from. And so in the end result, what you get is that graph you saw, it's best I think shown with an example. If you have taragrid, 100,000 machine network, and somebody's compromised 12 and a half machines on that network, and you deploy, what am I using here, 17 variable formula, a small 3-Sat formula, what you have is one in 10 billion chance that somebody can crack your input. Remember as the formula becomes bigger, it becomes harder and harder to reconstruct that input. Okay. That's what I wanted to say about privacy. Let's talk about can the system actually run. I built a version of the system, it's called Mahjong. Available for download. Open source. I encourage you to use it if you like. Actually reasonably small. Built on top of prism MW, middleware platform that takes care of all the network communication. And really all that Mahjong, is it imposes the rules for when things can and cannot attach. It's only about 3,000 lines of Java code. The input to the system is an NP complete problem instance, and it compiles NP complete problem instance down to distributed system that you can now take and put on computers, and it will run, it will provide an answer for you to solve that problem instance. So in particular it's limited right now to NP complete problems, because it translates them using polynomial time encoding into either 3-Sat or sub sat sum which are two programs for which I've written tile systems. And the key idea here is that when somebody wants to use the system, they never have to worry about tiles. Tiles are an underlying thing just like assembly is, that the developer doesn't need to worry about. I'd like to provide for them a system that they can use if they want to prove the properties, the privacy property system sure they need to understand tiles. But if all they want to do is deploy a system privately they don't need to worry about tiles, it's an automated compilation procedure that does it for you. So I use the system to run it on three different networks. I have an 11-node private cluster. Imagine a graduate student in a room with 11 computers he set up. That was me a few years ago. I also have 186 node USC high performance computing cluster. 186 computers all in Los Angeles, but there are two different locations in Los Angeles. And they're pretty homogenous. And then there's also a hundred node planet lab subset. 100 of those machines. So Planet Lab is a globally distributed system. There's computers all over the place. Different organizations could contribute two or three computers to it for the right to be able to use some of these computers to join as a testbed. So Planet Lab is not an ideal resource for this computation because Planet Lab is not computation intensive. There's lots and lots of experiments running on it, it's very overloaded. Planet Lab is really used for measuring the reality of the Internet's communication overhead and things like that, but I think for us, for me, it's really served as sort of this is the limiting factor, this is pushing style to the limit where you're using computers, some of them are overloaded and doing other things and some of them are faulty and might be running viruses. So I think it was a good evaluation from that point of view. All right. So what did I do with this? I wanted to show two things, to demonstrate two things by using these systems. The first I wanted to show is that the system can actually be used. I'll show you something in a second that fundamentally you should all be thinking this is going to be way too slow. And the second thing I want to show you is about scalability. Why is it going to be way too slow? It's going to be too slow because I'm taking things, instead of just adding two ends, I'm splitting up things and take things that normally gates within your CPU and I'm moving them to the network. Grossly slow. 102,000 times smaller and probably this is an underestimate network communication is much slower than things happening inside this view. So this is the right intuition, but it actually turns out it's the right intuition for the wrong problem. The reason why this kind of slow down doesn't affect style is the difference between throughput and latency. So if my system were doing the following: If it were saying I need to add this bit and this bit, great, let's take care of this tile. Sending the message out to the network for some tile to come and attach which is essentially that addition and waiting for the message to come back saying you're attached let's go on to the next bit. If it were doing that, then in fact this would be the right intuition. But it's not doing that. We're dealing with this very large computations with lots and lots of tiles. So my system never sits there and waits for communication. It deals with the tile. It sends out a message saying I need an attachment and deals on with the next tile the next tile the next tile. So every one of the nodes is spending in fact spending no time waiting for if communication to come back. In fact the communication waits for the computation to, yeah, for the computation to finish on nodes. In particular we're dealing with NP complete problems helps me because there's so much computation you never wait for the communication. So this is a nice story I told you but I can verify it empirically. What I did I took an 11 node subset of the three networks, the private cluster with very low latency, HPCC cluster and Planet Lab cluster, and solved two different sized problems on them a larger and smaller one, and you can see there's never more than six percent deviation from the mean. Well I can tell you it's six percent. But the times despite the fact that Planet Lab has way bigger lag, the time doesn't change very much in the amount of time it takes to compute to solve the problem. I also created a simulator to run my system on top of and assimilator I can control how much it's a discrete event simulator. I can control how much the communication takes, so I used no delay in the communication to 10, 100, 500 millisecond to a Gaussian random distribution of the communication. And then also for every node I assigned a random location on earth. So somewhere in the middle of the ocean, but that's the world we live in today. And the communication is going to be proportional to the distance. And again you can see no more than six percent deviation. >>: Randomly rather than be totally local? >> Yuriy Brun: Right. So -- so it's not faster. The same way that the converse is not faster. There's lots of factors that affect your communication, the -- sorry the computation speed. And the most important one we're solving nondeterministic algorithms, so sometimes you'll get lucky, other times you won't get lucky. It's not fast, out here it's working faster sometimes but if I ran enough of these numbers all the error bars would decrease. Basically the latency is not effectively affecting the running time is the key here. That's the first thing I want to demonstrate empirically. The second thing I want to demonstrate is one of the big reasons I built this system is that I wanted to get good speedup, good scalability. I wanted it to be the case if I throw twice as many computers at my system, it would compute twice as fast, and there's so much parallelism that we should be able to get pretty close to that. I designed an experiment to do that as well. I solved some problems again on private cluster HPCC, Planet Lab and simulation, and for each one of them I picked a half of the network and then -- well twice half the network. To compare how much the execution times were. And what I found was that I get 1.9 times speedup so almost two, not quite. A little bit of overhead. On the real physical network, and I believe these numbers are pretty accurate for the physical networks, in simulation there's a little more variability here you can see I'm getting superlinear speedup. I think this comes from the fact that I solved a very large problem. So this is a problem for where you can't explore all the seeds, just exploring a small portion of the seeds so basically the error bars are too big here, so this is really an artifact. I think the numbers that I got from the physical network of 1.9 are much more accurate and the simulation shows that the algorithms do what you would think they would do. So this is not quite perfect. We're not getting twice the speedup but I'm pretty happy with the 1.9 number considering it's not really optimized for that overhead. Okay. So great. I want to talk a little bit about some related work and other ways to try to solve this same problem. So first thing, first and foremost, if quantum computing were a real thing today, you could do private computing the way I've described it by using entanglement. This is great I look forward to the day when we can do quantum computing simply homomorphic encryption tries to solve the same problem. Today Microsoft Research has done some great advancements in homomorphic encryption and there's certain types of problems that we can do already. When homomorphic encryption first came out we needed more memory than the particles in the universe in order to solve just a simple problem. Today we can actually do multiplications using homomorphic encryption, but I'm trying to solve other larger problems today that are not homomorphic encryption. I believe a decade, maybe sooner, if we're lucky, we can use homomorphic encryption. But for now I'm taking a systems approach to solve the same problem that is usable today. A lot of work out there trying to distribute it in a nonprivate way. The work is complementary to mine because you can take these approaches and you can compile them down into style and you will be trading off efficiency for privacy. You'll be getting a private system out. Some of them are trickier than others but it's complementary in that sense. There's lots of work on how to make systems fault tolerant and I'll get to that next. We'll get to the reliable part of the cloud. And then there's also lots and lots of work out there on how to do private storage and private access to data on the cloud. And of course the big difference here is that you could, for example, encrypt your data, but you can't then compete on it. You have to decrypt it to compute on it. I'm trying to get the cloud to compute on the data produce something useful for me without telling them what you're doing. Okay. So what I told you about in the first half of the talk is about style. It's this kind of crazy idea, that we're going to get privacy through distribution, by distributing the problem that's where we're going to get privacy from, I've shown this is actually possible. It's not very efficient but it's actually possible to do. And I can give you a specific number. So in my nonoptimized prototype, it costs about 4,000 times more to solve the problem using style than on a sing machine. So what that means is if I have it's roughly the cost of owning one machine, I could do it just as fast by owning 4,000 cloud machines. So it's a big difference. It's probably not, financially doesn't make sense to do it right now. But I believe that about half of that or I should say the square root of that, about a factor of 20, is because of inefficiencies in the prototype that I've built and the other square root of that has to do with the fundamental costs of doing privacy by distribution. So basically the prototype can be increased quite a bit. Even with the 4,000 number there's still places where people would want to use it. Microsoft has tons of computers that are internal to the company that are not being used at night. They may want to run something on them but they may not trust those individual computers to not have spam ware or something installed. They might have to do it in a private way. There's places. In particular what I think about this is as, this is a bound we can do privacy through distribution by doing it this way, by doing it through style. We can try to improve on it. We can try to build more efficient systems. This is orders and orders of magnitude more efficient than homomorphic encryption is but it's a very different approach. So I'll refer you to a couple of papers, one from ICDS from last year and transactions of secure compensateable computing if you want more information. So I want to shift gears right now and talk a little bit about how do we make systems not just private but reliable. How do we take a system and try to make it more reliable without the developer having to worry about it? So I'm going to start out question is: Let's say I compute what 3 plus 5 is, some of you are byzantine give me the right answer. by asking you a couple of questions. So the first want to compute some very simple function. I want to and I can ask you guys in the audience, but let's say some of you are mean and some faulty, and you may not >>: [inaudible]. >> Yuriy Brun: Thank you. That is a good example. So let's say that there's a 70 percent chance if I ask any one of you you'll give me the right answer. But that 70 percent is not enough for me. So what's a good way for me to bring up that reliability? >>: [inaudible]. >> Yuriy Brun: Right, ask three of you and have you vote. Or ask five of you and have you vote. That's one possibility. Okay. Let me ask you a separate question. Let's say that I would like to send a message to Chris. And in sending this message to Chris the channel among I'm going to send it is noisy let's say 30 percent of the bids get flipped. How can I send a message to Chris? >>: [inaudible]. >> Yuriy Brun: Send it a bunch of times. Anyone have another idea? We know a lot from information theory about sending messages. We can do things like he can send me an acknowledgment. We could encode our message in a smart way such that what we get back, such that we can correct errors or [inaudible] and it's not sending messages multiply. We could send messages multiply but that's not very efficient. The question I want to ask if we know in fact there's lots of theory out there on how to optimally send a message so you're squeezing as much information as you can out of every nonnoisy bit. So if that's what we do for sending information, why don't we do the same thing for computation. Can we improve our computational channel, if you will, of the cloud so that you're getting as much reliability as you can out of it? So it turns out you can. Here's the model that I'm going to be talking about. You have a pool of computers. The computers can be byzantine so the computers can be actively trying to clued and break your computation but not all of them. But the fraction and the identity of the nodes of byzantine is unknown to me. In fact it can change, computers can join, they can leave they can become byzantine and reliable again all of that is unknown to me. All I know is some fraction of them is faulty. And my goal is to build a technique called smart redundancy. I'm going to try to use redundancy. And that redundancy is going to try to maximize the task reliability given a particular cost. So basically let me say that I'm willing to spend three times the resources for my computation of just doing it once, but I'd like better than 70 percent reliability from adding three plus five. What's the best way to do that? What's the best way to be willing to spend three times of resources to get as much reliability as I possibly can? Let's spend a minute looking at what kinds of systems this would be applicable to. So there's lots and lots of systems that need this kind of reliability. MapReduce systems that have lots of individual sub tasks. There's anything -not anything -- but knows things built using the glow bus toolkit can benefit from this kind of work and there's a series of boing systems folding at home study at home there's lots of individual tasks that can happen in almost arbitrary order not quite arbitrary but almost arbitrary order and so anything that has tasks like this that can be repeated multiple times to improve reliability and can be reordered will benefit from my technique. And there's a whole other class of techniques that's very different from the ones we normally think about but they're actually even more important. They're crowdsourcing techniques, things like recapture or fold it, which is a protein folding game which has led to some advances in cancer treatments. There's software verification techniques out there. There's techniques on involving humans in everyday programs. Things like that. So this is an interesting area, because the resources here are even more expensive. The humans are the resources and they're unreliable. They might be byzantine and trying to break my computation. But because their resources are so expensive we definitely want to squeeze out as much reliability as we can out of that unreliable channel. All right. These are the applications, let's get back to how I'm actually going to do the reliability. Let's start with something very simple. This is the voting redundancy. This is what happens when I ask three or five of to vote on the answering, and now we're going to assume we know the average node reliability. Let's say the node reliability is 70 percent. So if I ask a single node I get 70 percent of reliability. And let's say I'm trying to hit a system reliability of 97 percent. That's my target. Okay. If I ask three nodes, we can compute the probability that we'll get a reliable answer. And that probability is 1 minus the probability that all three nodes failed or were byzantine. Minus the probability that two of the nodes failed when there's three different ways in which that can happen so we get an 84 percent chance probability, 84 percent confidence in the answer if we ask three nodes and then have them vote. In order to get to 97 percent reliability, we're going to have to ask 19 nodes. So we have to pay a cost of 19 factor of 19 in order to get to the desired reliability. All right. So that's our baseline. That's what we're trying to beat. Turns out we can beat that by quite a bit. Okay. So let's talk about smart redundancy. Here's a flow chart of how smart redundancy works. It takes a computation and it says let's assume the best case. Let's assume the best possible thing is going to happen. And if that best thing happens, how many jobs do we need to distribute? I'll explain this with an example on the next slide. How many jobs do I need to distribute in the best possible case? So we compute that number. We go and we actually deploy that many computations, and then we find out how close to the best case are we? How close is realty to the best case. If the reality is the best case, then great, we achieved our achieved reliability and it's done if not we go back to our staff and we know more about reality. Let's readjust our expectations, let's say how many more jobs do we need to deploy in order to achieve in the now known best case in order to achieve the desired reliability. The main idea here is you only deploy jobs if you definitely know you're going to need them. You never deploy a job if it's possible that it will contribute or maybe it won't contribute. So let's take a look at this with an example. I'm going to need to throw up some numbers here on the right for us to use the example. We already know what happens if we ask one node and it returns an answer to us. We are 70 percent certain in that answer, what if we ask two nodes? If we ask two nodes and they both return the same answer to us, so we've got two of sort of one answer and 0 of the other agreeing answers, then we have an 84 percent chance of getting the right answer. We ask three nodes we're even higher. If we ask three nodes they all give us the same answer we're at 93 percent chance. All right. This is great. But now let's look at some disagreements. What happens if I ask four nodes and three of them give me one answer and another one gives me another answer. Well, that's going to undermine our probability. So you can look, you can calculate that probability and you get 84 percent chance. So let me dive into this formula just a tiny bit. How do you calculate this probability? What you look at is you look at what's the probability that it's that 70 percent thing happened three times and the 30 percent thing happened once divided by that same probability that the same thing happened 30 times happened once plus the probability that the 30 percent thing happened three times in the 70 percent thing happened once. So how likely is it that you got three bad nodes and one good node to get this answer. So that host you can compute this. Let me throw out some numbers here, if you ask four nodes and they all give you the agreeing answer then you're in the magical 97 percent reality you wanted. This is the best case we'll come back to this. If you ask 5 41 split and back down to 93 percent and 51 gives us 90 percent reliability. So we're going to use these numbers to try to figure out how this technique is going to work. So let me throw back up this flow chart that we saw before. So the first thing we do we start up here and we say in the best case, what do we need to do to get the reliability we're looking for. Best cases would deploy four jobs they all come back with the same answer and that gives us 97 percent reliability. That's what we'll do. We'll deploy four jobs, that's our best case. Now, we deploy the four jobs and they go out there, and let's say we get a 3-1 split back, so we weren't in the best possible case. But now we know where we stand. We haven't achieved the desired reliability. We loop back around. We say given the fact that we're seeing three of one answer and one of another answer, how many more jobs do we need to deploy in the best case. Best case here is we'll add two more nodes, both will agree with the answer as three. And 51 split that gives us 97 percent. Let's do that. We'll deploy two more nodes. There it is. Deploy two more jobs. If they come back to us say in this case we get lucky we get a 5-1 split we're done. We reached the reliability we wanted, confidence interval we wanted and we spend six times the resources rather than 19. Now, of course this is just an example execution. I'll show you the expected value in just a minute. Okay. So that's basically how the technique works. What I'd like to do is take a quick aside to talk about something interesting. So here I've been talking about knowing these probabilities. Knowing that the network overall is 70 percent reliable. But actually turns out that you don't need that. You can run this kind of computation without having any idea how reliable the nodes are. So the basic idea as I said earlier smart redundancy, you assume the best case, and you ask the minimum number of nodes and you only ask more nodes after you learn how much reality differs from the best case. So no one here does it say you need to know actually how much confidence is all you're trying to do is improve the confidence as much as possible given some sort of set of resources this is how you do that we're going to play a fake game, virtual game. There's going to be two rooms, what I'm going to do at the end of this, after I describe the rules, is I'm going to ask you guys to vote which room would you rather be in. So in each room you will get to make a bet. A virtual bet. And so basically the difference is do you get more information from what you know in room one or more information from what you know in room two to make this bet. Okay. So let me describe the rooms for you. So in room one we're going to take a 30/70 bias coin either comes up tails 70 percent or heads 70 percent of the time we don't know which 70/30. Flip the coin four times I'll get four heads and 0 tails. And then the bet that you're going to make is whether it's a 70/fully heads tails or heads tails coin what's the more likely thing? Presumably you would guess that heads is more likely since we got 4 and 0 but you have some amount of certainty that that's the case after seeing four flips. Now, in room number two, we're going to do something very similar. We're going to take another 7030 coin again we don't know whether it's heads or tails 70 percent. We'll flip it a thousand and four times and we're going to get a 504 heads and 500 tail split. And again you're going to be asked to make a bet, whether it's 70 percent heads or 70 percent tails coin. So let's vote. Who would rather be in room number one to make this bet? I'm seeing forehand. Okay. Good. You guys split them down the middle. Who wants to be in room number two? Nobody wants to be in room number two anyone have any other answers? >>: So incredibly unlikely room two would only exist in probability one in ten to the hundred to whatever. >> Yuriy Brun: That's right. Ten to the hundred. So room two may not exist therefore you don't want to be there. That's exactly right. You expect to get roughly 700, 300 split but you're getting this really crazy split. Let's take a look at the probability that this is a head coin versus a tail coin, a head favoring versus a favoring write down the formula a number of different ways you can pick out 504 out of a thousand four times the fact it's .7 percent thing. It's the heads that happened 504 times and tails that happened 500 times divided by that same probability times flipping it. So it's the heads that was .3 percent of the time and yet that came up 504 times. And the probability that it was the tails that comes up 70 percent of the time and yet happened 500 times. So this is a very small number as mentioned. So here's this probability. But we can notice there's a bunch of things here we can cross out. So all these guys are the same. We can cancel them out. And there's a lot of these point threes to the 500 ->>: Wait. A thousand four over 500. >> Yuriy Brun: This is a binomial coefficient. A thousand four choose 500. Symmetric, a thousand four choose 500 is the same as a thousand four choose 500. There's also this .3 to the 500 that we can get rid of. So there's just four of them left down here, and there's this .7 to the 500 that we can get rid of and we'll have a couple of fours over here. I made a total mess here. Let me simplify it out. What we get is this form ma which happens to be exactly the probability in room one. Right? So what happens here? You had a very good intuition, but this intuition is driving us down to the wrong decision. The reason is this room is incredibly unlikely to happen. Right? But given that this room is what happened, when you send out a bunch of answers and you get some responses back, given that the responses you got back the question is how likely am I to have gotten more of the right answer than the wrong answer? So it's very unlikely you'll be in this room. But given that you are in this room you get exactly as much information out of it as you do from this room. So the key point to take away here -- this comes from bay's theorem, pretty direct compilation from bay's theorem but it's counterintuitive. But the technique here is that as long as you get a same split. Remember how the 40 split gave us the same confidence as 51 split. That's because the difference was four. When you're specifying the reliability technique you can specify single number you can say get me difference of four, get me difference of 20. It's parallel to the voting technique when you say deploy to 19 nodes and have them vote. That doesn't actually tell you how much confidence you're going to get, how much improvement in confidence you're going to get. All it tells you is how many resources you're going to use. So it's a single parameter that tells you how much improvement in relative terms do you want to get out of it. All right. Let's take a look at how the system works. I'm not going to describe how I built the system. I built the system it's built on top of boing. We took sat. Sat at home. And we ripped out the redundancy technique that it uses which is voting redundancy and we put in our own smart redundancy. What I did here is I took the system and varied reliability of the underlying nodes. This is again deployed on top of Planet Lab. I varied reliability of the underlying nodes from .95, so 95 percent down to 75 percent. And then I deployed voting redundancy, boing with voting redundancy on top of it and you see it asks seven nodes each time. And I also used smart redundancy with a difference of two. So I'm trying to just trying to get a difference of two between one answer and the other answer. What you see is when the nodes are very reliable you're getting down to almost two. Sometimes you're still getting unlikely you get disagreement have to trust answer you're getting down to a cost of almost two when the nodes become less reliable down here, you are shooting up in the cost that you're spending. What's interesting here is the technique automatically adjusts. You never are specifying the nodes are unreliable you should do something different it's that you're getting more disagreement. The answers are coming back disagreeing more often so your cost shoots up. But if you look at the reliability of the system with voting redundancy the reliability drops just like reliability of underlying nodes drops quite a bit but with smart redundancy you're staying, the scales are different, by the way. So watch out for that one. But the smart redundancy the reliability is staying pretty constant. In fact the only reason there's this jiggle here is because of the discreteness of the system. You can't ask 3.2 nodes that you really want to ask. You have to ask either three or four nodes so you're getting these little jumps here from time to time. So that's the goal to try to keep the reliability of the system at a particular level regardless of what the underlying hardware is doing, and then use the resources optimally so they use the least possible resources in order to achieve that reliability. So you can actually show that was one example, that was a particular change. You can show that for any desired system reliability smart redundancy will always outperform voting redundancy. And this graph I'm showing you here is theoretical results, not mathematically what should happen. I did the same thing using discrete event simulation, see it's the same graphs and you can see. I did the same thing empirically using an actual 3-Sat, sat solver on top of Boing. And you can see it's a little jumpier, and in fact it's only jumpy for redundancy. But you're getting the same results. >>: I think I may have missed something. Smart redundancy makes an assumption about what the actual node reliability is, right? >> Yuriy Brun: It doesn't. It doesn't need to do that. parameter for how much reliable in some relative term. So you just specify >>: [inaudible] how does that compute that if it doesn't know .7 as of .8. >> Yuriy Brun: That's a good question. So I didn't quite go over that. So you specify -- what do you do with voting with redundancy, you say ask 19 nodes with smart redundancy you say give me a difference of answer of two or give me difference answer of four and it goes out asks four nodes 3/1 split it says I need more. Doesn't actually know the reliability of underlying nodes are just trying to get a difference of four. It's a weird thing to specify, but I would argue that it's just as weird to specify this 19, because you're specifying the cost you're willing to spend rather than reliability you want out of it. >>: [inaudible] first formula a difference of four is effectively equivalent to implied reliability of some number of ->> Yuriy Brun: It's not quite that. The four means relative improvement. If I gave you nodes that are 60 percent reliable you'd go from 60 to 82. If I gave you nodes that are 80 percent reliable you go from 82 to 97 or something like that. So it's an amount of improvement. Just like the K for voting redundancy is an amount of improvement. All right. So these graphs, I don't think they're that interesting. They're just really here to emphasize that the theoretical analysis is consistent with empirical results. The system we built actually provides you with the same reliability of theoretically predicted. But there is -- I've talked about lots of good things about smart redundancy, they're not all good. There is a cost that you pay with smart redundancy, that is with voting redundancy I get to go deploy 19 jobs at once and they take some time to come back but roughly speaking I get one time unit until I get my results. With smarter redundancy you can't do that. You have to employ some number of tasks and then you have to wait for them to come back before you decide if you want to deploy more and more. So it grows logarithmically in the number of steps, in the number of stages, but if your job is one that the, you can move on to the next task until you finish a task then smart redundancy may cost you more in resources, not in resources but in time than voting redundancy would. For some a large number of tasks, lots of recapture tasks out, crowdsourcing tasks lots of MapReduce tasks where you can do things out of order it can be helpful and you can get all the benefits but it's not always better. So there's some situations where it's not always better. All right. So there's two different kinds of related work. There's one kind is other types of redundancy techniques, and for the most part what I found is that these types of techniques work really well in models where there's random faults or models where there's particular kinds of faults. What I'm going after is byzantine fault where somebody's compromising my cloud. Things like credibility and fault tolerance that watches a system for a while says this node has never given me the wrong answer they don't work there because a malicious agent might give you the wrong answer for a long time just to screw you at the wrong time. So those kinds of techniques fail on the byzantine models where smart redundancy does not. There's a number of techniques that are complementary to our work, things like primary back-up and active replication, these use redundancy, and so you can plug in smart redundancy to tell them how many backups you need in order to achieve a certain level of reliability, things like that. So it can actually improve those kinds of systems. All right. So when we talked about smart redundancy, I basically showed you how you can use this idea of a computational channel to try to boost optimally in some sense the reliability of your system. And there's lots of future work. What I've talked about today deals with essentially one bit channels. I get to ask you computation I get an answer back, and it's either right or it's wrong. And I grouped all the wrong ones together. But you can actually get a lot more if you allow these computational channels to be larger width. So in particular if I'm allowed to deploy four jobs to Tom, and I have some reason to believe that he's either going to give me the right answer or wrong answer for all of them. That now I can squeeze even more reliability out of that channel. So my proof that I didn't show you the proof today it's in the paper. But the proof of optimality assumes that there's a one bit channel. If you're allowed to send multiple jobs between node and assume correlation between the answers you can even be more optimal, or reliability out. There's also lots of things that can be done with using the history to improve. So now we're assuming non-business San teen models, but you can be even better at using the resources and also there's lots of applications to crowdsourcing that might have a lot of serious challenges that we may not have thought about before with the computers, because people may work together in different ways and people, the errors may be correlated and things like that. Are there any questions? I made you guys answer questions, it's only fair. >>: Is there a way to combine these two into like the ultimate private and reliable use of resources? >> Yuriy Brun: Right. So the question is can you combine style and smart redundancy. In fact smart redundancy fell out of style, when I was working on style and I was dealing with Planet Lab some of the nodes were faulty. I was trying to figure out how would I use the Planet Lab nodes in a smarter way when somebody returns a faulty note to me the whole thing doesn't fall apart. The first answer was voting redundancy. It's very inefficient. This came out of thinking of how would you do reliability through redundancy in a smarter way. I haven't combined them yet but it definitely plugs right in and it's possible to do that. But I think that this says much more broader applications than just for style. >>: [inaudible] but in the privacy version, you're looking at attacks that just rigid in the entire -- [indiscernible] have you looked at all if the attacker wants to achieve ->> Yuriy Brun: Yes, that's a good question. The question is about if somebody's trying to get half of your data or a fraction of your data, how hard is it? So the numbers I've shown you, the big drop-off and concrete numbers had to do with trying to reconstruct the whole input. You're exactly right. If somebody's trying to come up with just half the input what would happen is you would shift down in that scale. You still get an exponential drop-off but it's still very hard to get any fraction of your input. What's easy is to reconstruct, say, four bits of your input or one bit is very easy. But it's actually getting one bit of the input is a funny thing. You know there's a 0 somewhere in the input. That's actually less than one bit of information. One bit of information is to know there's a 0 in a particular spot. To know there's a 0, it's almost no information. You just know it's not the old one input. But if you're trying to get a constant piece like five bits, relatively easier. You don't get the exponential drop-off, you get some privacy but it's possible. If you're looking at fractions of the input, you get the exponential drop-off and it's very hard to reconstruct them. >>: [inaudible]. >> Yuriy Brun: Yeah, so I've thought about different ways to encode the input. I haven't done a lot of work in trying to figure out if there's a more efficient way really when you reconstruct these chunks you can't get much information out. One very simple thing you could do if you ensure that the 01 input aren't legal, when you find out when there's a 1 or 0 in there you get nothing. You get literally no information. You already knew the input couldn't be '01s. That's a simple encoding, a lot of room there for thinking how you encode the input in such a way that it's computable upon, but you get less information out from reconstructing chunks of it. Okay. Thank you very much for your attention. >> Christian Bird: [applause] Thank you.