16760

16760 >>: It is a pleasure to have Payman Mohassel with us this morning. Payman is a student at UC Davis, as you can see from the slide, finishing up his Ph.D., working with Matt Franklin. Payman has done a lot of work in the area of secure multi-party computation, an area that I'm especially interested in because I did work in this area years ago when we thought this was an interesting area, but not a lot of people seemed to care about it very much. It wasn't used in practice, and part of the reason it wasn't used in practice seems to have been that we were focusing more on sort of look at this great thing that could be done, but, yeah, it wasn't very practical. Payman and some of his colleagues have brought this to practice. The area's now getting a lot of attention and interest because of largely his work and the work a few of others who have made this area more efficient, introduced real practical efficient protocols and broadened the set of functions that could be computed efficiently. And today Payman is going to tell us in particular about his work in this area on protocols for secure linear algebra. Thank you, Payman. >> Payman Mohassel: Thank you. So can everybody hear me? Yeah? Well, thanks for that nice introduction. I mean, the work I've done is just part of much larger work that people have done more recently on making secure computation more practical. So I'm going to do today talk a little bit about some of the directions I've pursued in my research for doing that and going to a little bit more detail about specifically secure linear algebra protocols. And this is work jointly done by many co-authors, including my co-advisor, Matt Franklin. The overview, first I'll give some introduction about secure computation and what is the setting, the problem we're considering. Then I'll mention the three directions specifically I've been focusing on for designing more efficient protocols. And then go into a little bit more detail about secure linear algebra. So the conventional setting you have for secure distributed computing, you have multiple parties, P1 through PN, each with their own private inputs, X1 through XN, and they want to compute a function of their inputs while still keeping their inputs private. So at a high level the security requirement is keeping your inputs private while doing something useful with them, distributedly. Also, the classic example is the linear problem where you have two parties, each with their own private inputs, how much money they have, and they want to compute basically, they want find out who is richer. So the output they want to learn is whether X is greater than Y and they don't want to, for example, reveal exactly how much they own. So that's the classic example. There are more in conventional settings as well where secure distributed computation gets used. One example is where there's an asymmetry between the participant in the protocol. So, for example, you can -- you often have clients which have their own private inputs, but they don't necessarily do the computations on their inputs, instead they send you to a set of servers that they don't necessarily trust and these servers will do the computation for them. So the issue here is basically how we can do this without having to trust the servers who are doing the computation. And examples of this in the real world, you see online auctions and mixed with a scenario like this, sending inputs. And more recently cloud computing is a service where they send their data and do computation. Why look at security here in this setting? Obvious reason is data is sensitive. You often work doing computation on patient's health information, private network data. And the data is sensitive. But probably, more important, is that it's often shared between many parties. So not a single entity has all the data to do the computation on. And in other cases you have systems that are, you know, just distributed by nature, such as peer-to-peer networks or wireless networks. So there's no way around doing a distributed computing, in some sense. And there's other scenarios where clients are limited in the resources they have, so they have to outsource this computation to a set of servers and they don't always trust those servers to do the right thing. And so the examples, where, as I mentioned previously, cloud computing, scientific computing, et cetera. The way it's probably done these days is usually you just trust the servers with your data, all your data, and they do the computation for you. But maybe in the future that's something you can provide additional privacy to the clients in some way. So that first model we'll be looking at in this work is where you have an adversary who can corrupt a fraction of the participants in the protocol. And usually there's two main cases, either less than half of the parties or more than half of the parties. These are the two main circuits, cases. And then you also make an assumption about the nature of the adversary. You either have an adversary that is malicious which means he can do anything he wants. He can deviate from the protocol in an arbitrary way. Or someone, an adversary who follow the steps of the protocol will try to learn more information based on the messages he receives throughout the protocol. So, for example, for the case of semi adversaries, you can imagine a software that's not necessarily malicious, but someone can look at the logs of the software and try to learn additional information. So there's some settings where semi adversaries makes sense, but other settings where you have to consider malicious adversaries. And the final variation is basically the assumption you make about the computational power of the diversity, whether it's unbounded. So you don't assume any bounds. Or whether based on some computational assumption, you assume the adversary is limited in some way. These are the variations we consider in our definitions. Then the definition is usually, the definitions we use for our constructions simulation-based definition where you have a real world -- real protocol takes place between the parties and an ideal world where parties send their inputs to a trusted party who will compute the output and sends it to the corresponding participants. So the idea is that the ideal world is very, limits the adversary quite a bit. So there's very few things that the adversary can do because most of the functionalities performed by the trusted party. So and the goal is to basically say that we design a protocol that makes the real world behave in the same way that the ideal world does. So the best an adversary can do is what we can do in the ideal world. So basically say for any adversary in the real world there is a simulator in the ideal world so that what the output is indistinguishable in some sense. That's kind of the high level idea of the secure definitions we consider. Won't show up in the rest of the talk, but good to just know what are the security definitions we are working with. Okay. So there's many like important and classic results in secure multi-computation, you can basically compute any function securely using existing techniques, but they're not being used in practice. And I mean one of -- you don't see them being implemented and used very often. And I mean one main reason is inefficiency, and I guess with inefficiency, the three main metrics that I've been looking at, is one round complexity where round means the number of rounds that the parties exchange messages between each other. And then you have communication complexity which is the number of bits of communication communicated between the parties throughout the protocol and obviously the local computation. So some of these protocols don't achieve the best efficiencies in that sense. But probably also important is that the main focus in a lot of the work is on the asymptotic efficiency of the protocol, the constant factors are sometimes very large. Or in other cases, the crypto, for example, the primitives that are being used are very expensive to implement. But besides the inefficiency factor, there's also the problem that some of these protocols have really complicated descriptions or there is no real easy-to-understand description for developers to work with. And some of them in fact are more of an existence result rather than trying to describe the protocols. So that's also another issue that I guess limits the usage of these protocols in practice. So I mean there's some exceptions where protocols have been implemented and been used. So one good example is the fair play secure function evaluation system where in that work the author's basically compiling any function or program into a circuit and then use general techniques for securely computing any circuit based on Yau's garble circuit protocol, which is a classic result. And this is actually implemented in software. But so this is only efficient when the circuit being computed is small. So as the circuit gets larger, it tends to become impractical, and even for some simple functionalities, this compiler will lead to relatively large results. So it's, to some extent, limited. And there's also some recent work in a project done in Denmark where they have looked at the performing bit contract auctions privately, so this is actually in practice being implemented where by designing basic secure multi-party protocols for such as additional multiplication, et cetera. So there's developments in really actually implementing the protocols. But they're still narrow in their scope and how practical. So I've been following three directions. I will talk -- I will first talk about basically the first one, which is designing protocols for specific functions, looking at specific problems and trying to design more efficient protocols. And towards the end of the talk I will go into -- I will talk a little bit about also the other two directions that I have been following. So now protocols for a specific problem. So when you look at a specific problem you can consider the specific nature of the problem you're looking at and that means you can customize the crypto primitives that exist or kind of design new ones, even, sometimes for your problem. And probably, even more importantly, you can choose the algorithms to use. And you almost, many times you end up basically designing new algorithms for the protocol you want. So these are all the advantages you have that you don't have in a general construction for getting better efficiency. So I've been looking at several specific problems such as doing operations on sets, stable matching, which is used to rank medical students. More recently I've been looking at some genomic computation functionalities, and what I'm going to talk a little bit more today is secure linear algebra. So let me just jump into the work I've done on secure linear algebra now. But to get an idea of what the problem we're solving here is, so one example is solving distributed linear systems. So you have multiple parties, each of them with their own set of linear constraints on a set of variables, X, for example. So you can imagine -- so X is the set of variables is common between the parties. But the set of constraints are different. I guess each party might have some set of constraint on the set of value. And the goal is to find out whether there's a solution to that set of constraints or not. So basically the parties want to find a solution to this linear system of equation that combines their constraints. So but they don't want to reveal their set of constraints to each other and they just want to find the random solution uniformly, random solution to this linear system if possible without learning -- without revealing additional information. So that's one example. And the example and the applications that you can consider for secure computations is obviously you're doing scientific computing, and you often see that this scientific computing is outsourced to servers because it's expensive for some in my university, people most of my, mostly the data is sent to, the computation is basically outsourced to National Labs, to do the computation for them. But at this point usually you trust whoever that entity that does the computation for you. So you can see that maybe change. And, for example, when you study properties of networks, shared networks, ISP networks, there's many, the properties of the network can be actually expressed as linear algebra. For example, one example is, for example, if you have a, having a perfect matching graph can easily be related to the terminate of the assymetrics. So there's a lot of connection between property of graphs and linear algebra. So you can use those also in those scenarios. Okay. So the problem we have is basically doing secure linear algebra, meaning either solving the linear system, computing the rank of a matrix or determinant of a matrix, et cetera, just basic linear algebra functionalities that can come up in practice. And the setting we have is that we have a matrix or a linear system that is shared between multiple parties. So in the multi-party setting we assume that it's shared using a linear secret sharing scheme. But you can imagine that you can go from other types of sharing in practice to a linear secret sharing scheme. And in the two-party setting, we assume this is just additively shared. So we also have two variations in the security, in the computational power, where in the multi-party setting we look at information security. In a two-party setting we're achieving computational security. And we assume that the diversity can be malicious. So it can behave arbitrarily and disobey the prescription of the protocol. So what are some of the existing works here. We have the general construction of Yau's. And so using the best algorithms with linear algebra, you get N to the omega communication and constant Yau's. But when you really want to think of a practical protocol, it's closer to quibic N[phonetic]. And this is kind of an important thing that we want to really have something that you can implement and use. And there's also improvements, for example, by [inaudible] where they tried to kind of bring down the communication complexity to of N squared which is almost optional because that's the input size you have. They kind of lose a little bit on the round complexity and again this is also related to that omega function. So the practical round complexity they can get is linear in that. So there's a trade-off basically between round and communication complex. And the security they achieve is only against in minor adversaries. In the multi-party setting and information security setting things are even less efficient. So the best work, best construction by Kramer, et al., basically has N to the 4 communications, which is relatively for large matrices. So in our work, basically we design constant protocols that roughly achieve of N squared communication. So more exactly you get of S rounds and of S N to the 2 plus N over S for any constant S. So by choosing what S is you get a trade-off between round complexity and communication complexity and you get arbitrarily close to of N squared. But I guess beyond asymptotic efficiency, we were kind of concerned with how big the constant factors are. So I've done a rough calculation. So I don't have the exact constant, but I have a bound that it's less than 13 like here and here is the constant factors you would kind of be looking at. And the primitives we use are simple in the multi-party setting it's linear secret sharing scheme. In the two party setting it's additively more prescription scheme both of which have efficient constructions. So avoid expenses. So to give you an example of some kind of improvement we would get in the multi-party setting, is I mean compared to the best work previously, if you have 100 by 100 matrix over some field of size, some -- I forgot when I did this clarification how big the field it was that I was working on, but basically you can get a significant improvement. This is the communication complexity. So for large matrices the gain can be quite a bit. Okay. So we define as much as adversaries. So that's the summary of the results. But let me go into a little bit more details on the approach. I mean so first we reduce the linear algebra problems. We're looking at into testing singularity of the matrix. Then reduce testing of general matrix to testing singularity of the topless matrices. And then from there we reduce the protocol, designing protocol for testing singularity to the matrix to doing matrix multiplication. So, finally, we design a secure and efficient protocol for doing matrix multiplication. So I will start with kind of giving the protocol for designing a matrix multiplication and protocol and then go into a little bit more detail about the other steps. But that's the general framework. So let me first define what is the protocol we are looking for. So here I'm focusing on the two-party case. But the technique's actually easily generalized to the multi-party case. So I'm just avoiding that for simplicity. Most of the whole approach is the same for both cases and also both for computational and information theory setting. So the setting here is that parties have their two matrices M1 and M2, and parties want to multiply these two matrices. So the matrices are shared by the two parties additively here. So Alice holds A1 and A2, and Bob holds B1 and B2 and A1 plus B1 is M1 and A2 plus B2, that's the setting. But the parties don't learn the product matrix. What they learn is a sharing of that matrix. So what happens is Alice basically generates random matrix C and what Bob learns is the product matrix plus that random matrix. So since the output is also shared between the parties. And the reason for that is because you want to compose this protocol, run it multiple times throughout the bigger protocol for linear algebra. And so you don't want to reveal intermediate information about the inputs. So that's what we want, actually. And in terms of using whole amorphic [phonetic] encryption, it's easy to do and efficiently. So let me quickly introduce amorphic encryption. Encryptions we have a public key inscription scheme here where the public key is known by everyone and the secret key is held by one of the parties who can decrypt messages. And then the properties we have is if you have encryptions of two values, it's easy to, without knowing the secret key, compute encryption of the sum of those two values. And again for a publicly known value C and an encryption, you can compute the encryption of the product, not interactively without knowing the secret key, it's also easily generalizable to doing the same, similar operations on vectors and matrices. Again, without knowing the secret key. So we have kind of these doing multiplying matrices if you know one matrix as well. So these are kind of neat operations you get from amorphic encryption. Considering some adversaries, it's not hard to see how we can do matrix multiplication. So Bob can basically encrypt his input matrices using encryption scheme where he generated the public key and secret key and send that encryption to Alice. And Alice will generate a random matrix C. And what she can do is now compute given the encryption inputs from Bob, compute the encryption of Bob's final output here without having to decrypt. So, for example, here so all the terms in this product will become the product of a term that is known to Alice in times matrix that she doesn't know. So it's the amorphic properties led Alice to do it inattractively, so basically Alice sends that to Bob and Bob will decrypt and learn his output. And that's it. That's the matrix multiplication protocol. And then you can see it's constant number of rounds and the messages exchanged are three encrypted matrices. So you have N squared communication between the parties. >>: Maybe you can clarify. The computation that's going on here that's not scaled, isn't there? >> Payman Mohassel: What do you mean? You mean these matrices? >>: Yes. >> Payman Mohassel: So, yeah, that's the generalization of I guess -- no, it's -- so I guess looking at, let's say you have a matrix A 1 where Alice knows and also he knows if Bob B1, right? So let's say she wants to compute A Bob, A1, B1. That's kind of her goal. So I guess looking at the matrix is you have ->>: She's got A1 and ->> Payman Mohassel: Right. So that's what happens with all the terms in that product. She either knows both or at least one of them. Basically, yeah, by breaking down that multiplication to a couple of terms. So the main issue here is that it's only secure against semi honest addressers. So, for example, if Alice decides to compute something else, there's no way of detecting that from Bob. Or so there's no verification process here for products behaving honestly. So now the idea of turning this into a protocol against, secured against malicious adversaries is fist each party will encode its input matrices including the encoding. And let me explain quickly how this works. So, for example, if this is an input matrix by Alice, Alice will generate D random values in the field and D some parameter that will be defined. And it will invert -- these are the coefficients of polynomial with the constant coefficient being add. And then in the encoding basically he will evaluate this polynomial at K different inputs. And she does so for every element in the input matrix. So given the input matrix, it's encoded using K different, a collection of K matrices. And here K is the security parameter. So we get error probability. That is exponentially small in K again. So that's the choice of security parameters. So now having this encoding, so both parties will encode their inputs in this way, input matrices in this way. Now the matrix multiplication is done by performing the semi honest protocol I described earlier on every pair of encodings that correspond to each other between the parties. And the idea is that the final output matrix that, the final encoding that they get is actually an encoding of the product matrix they were looking for. Reads an encoding of that matrix. So now to kind of verify honesty, what you can do is reveal a random subset of those semi honest protocols and prove, show that your randomness and your transcripts that you did those honestly. What happens with this opening, first, because there's less than D of them opened, it doesn't reveal any information about the actual input matrices. But at the same time it gives you a guarantee that only there's only a small fraction of those semi honest protocols that were done maliciously. So which means that the encoding for the final output is a valid encoding of the final output. Then you can use the [inaudible] encoding algorithm to recover the output of the matrix multiplication protocol you originally designed. So that's the high level idea behind turning the protocol from security from semi honest to malicious addressers. So I'm done with -- this is the last, basically designing secret matrix protocol from matrix multiplication. Now I will go in a high level and describe these other steps. I won't go into much detail on them, but please feel free to ask me more detail about the actual techniques. But before I say something, just because I've been mentioning topless matrices, just mention there's structured matrices where all the elements on the diagonals are the same. You really have off and distinct elements in each matrix. So it's a structured matrix. That's why we can do things more efficiently than general matrices for topless matrices. Okay. So the second step was now having a secure matrix protocol, designing a protocol for testing singularity for a matrix. We have a secure matrix multiplication and the techniques we use are basically first we need to use a formula for computing traces of powers of topless matrices, over the formula. Once we have those traces of the topless matrix, we use Leveirs formula [phonetic], getting from the traces, finding out whether the topless matrix is singular or not, by first computing the characteristic polynomial and then testing the singularity. I'm not going into much detail about it. But the idea is that both of these steps, what is interesting about them is that they only -- they reduce doing matrix multiplication. And the number of matrix multiplications that you have to do is independent of the actual input. So it's oblivious to it, which is the sum of the property we need for having privacy. So the computation has to be oblivious. And, in fact, it gives us the deficiency that we're looking for. That's why this choice of these techniques instead of maybe others. So once we have protocols for testing singularity of topless matrix, we want to kind of extend it to general matrices. And this is actually kind of interesting, because there's these techniques by Vierdman [phonetic] in 1986, and then later extended by Calfophen [phonetic] and Saunders and et cetera, that were designed for operating on sparse matrices. What turns out they randomize algorithms. Turns out we can actually apply them to general matrices because they have this nice property that we want to, that they are kind of oblivious to the number of multiplications, matrix multiplication you have is oblivious to the actual content of the input. And you can reduce everything to doing matrix multiplication. So it was an interesting use of the techniques that were previously only used for sparse matrices. But they're only meaningful to use for general matrices in the context of like secure linear algebra, I think, because if there was no privacy concerns, they're not the most efficient existing protocols, for example, for linear algebra. So, again, that's -- that was just a general idea. And then actually reducing all these other problems to protocols for testing singularity of a matrix are rather standard techniques, and they don't require much -they don't increase the complexity of the protocol. And so all these important problems they can easily be reduced to testing singularity. So, for example, to give you an example, in case of the rank of a matrix, it's easy to do a simple binary search on rank of submatrices, for example, and find the rank. So there's techniques for doing that. But just let me quickly summarize. So this is the efficiency we get for any S, constant S, and the whole point was to reduce the protocols we want to do to a sequence of secure multiplications where a number of multiplications is independent of the input being used. So it's oblivious. And then designing secure protocols for matrix multiplication. And while I only talked about two-party case and computational setting, the results are for both settings and efficiency we get are for both settings. And also I've been working with two undergraduate students at Davis trying to design the two-party protocol for two parties to implement it. And we've kind of decided on the primitives we're going to use, but we haven't had to actually start implementing, but it's actually, because the implementations for Pyer's [phonetic] scheme exist, all these operations over final fields. There's libraries that do that relatively efficiently. So it's not going to be hard, actually, to put those together. But hopefully we can do it before I graduate, maybe I can. There's a lot of questions I guess remaining. But one of the main ones is, for example, what about sparse matrices? Can you do better? Because in practice usually you have sparse matrices, not general matrices. But one more challenging, I think, question is that the linear algebra protocols are designed for over finance fields. They work over final field zero. What about working over integers or REAs and what are the issues there? You can sometimes choose the final fit to be large enough so that it fits. So you can assume you're working over integers, but that can also be inefficient, or there's stability issues. So there's a lot of questions but they're not only restricted to linear algebra, they come up in almost anything that you do with numbers in secure computation, because you often are working over final field or ring such as or a ring. Because all these encryptions, for example, are this way. So it's an important question, I think, for trying to apply the techniques to practice still. Okay. So now I will also mention -- okay. The other two directions that I've been following. So reactions to notions of security. And also coming up with cheap techniques for strengthening the security, when possible. So relaxing the security notions. Here the idea is that in practice it's sometimes reasonable to relax the security notions to assist simulation based security notions we have to get more efficient protocols. So one -- and the goal is to get more efficient protocols, really. So one example is the covert adversity, covert adversaries which are a relaxation of having a malicious adversary, introduced by Amen and Lindell [phonetic]. The idea is really simple. You have an adversary who can cheat, but there is a reasonable probability that he will get caught. So originally you would record a protocol that there's a very high probability. So 1 minus a negligible problem to them for the addresser to get caught. But you're just making this sum maybe even a constant, depending on the scenario. And it turns out by doing this simple relaxation you can actually design much more efficient protocols. So we've been looking at secure two-party, general two-party multi- party protocols in the covert adversary model and designing more efficient models. This relaxation seems to be successful. But I think so one thing I've been looking at more recently is studying what if you allow for a limited leakage of information in a protocol? So can you do things more efficiently? So the goal is that -- I mean, the idea is that in practice you have many protocols that already leak many information. So one example, you often test equality of files by comparing their hashes. And obviously this leaks some information about the content, the hash of a file leaks some information about the content of the file. But not everything. And maybe in some scenarios this is a good enough protocol for doing equality testing. But the whole point is to quantify what is the leakage here and what is exactly the security we are getting. I mean, the other example is that in the context of electronic wording, in using mixed nets, there's this work by Yacobson Revest Jewels [phonetic] and I think others -- that's it -- where they -- to verify in the verification process, they reveal some partial information about the cypher texts that are encrypting words, but the argument is that in the context of electronic wording it's insignificant and doesn't lead to too much information being leaked. And they get more efficient protocols compared to the previous ones for verifying, validating the words. And a more recent example is in the case of deterministic encryptions where you don't use any randomness to encrypt. So inevitably the cypher text is leaking some information about the plain text it's encrypting. But you can get, for example, more efficient protocols for doing search on those encryptions. So here's another trade-off you have leaking some information about the plain text but getting some efficiency. So what I've been looking at is in the context of secure computation, how can you formerly, quantify the leakage in the protocols? So coming up with the formal definitions for what is the security we get, what does it mean to, for example, leak some K bits? And then try to design protocols that meet that definition. But, in general, in the context of relaxing the notions of security, I think this is, at least in secure computation field, it's a very -- it's in the earlier stages. So I think there's a lot to be done, for example, considering economic models for what would be a good incentive to prevent, for example, cheating and in that sense relaxing the security definition. So I think there's -- I would be definitely interested to follow, continue working in that area and looking at even specific protocols where certain relaxations make sense. Finally, I've been looking at like cheap techniques for strengthening the security from semi malicious adversaries to malicious adversaries. So the general techniques that exist are usually using zero knowledge proofs, generic knowledge proofs often, but those are very expensive and very hard to implement. So that's kind of one of the reasons where you see a lot of variables in the semi models but not in the counterparts in the malicious model. I've been looking at, for example, in the Yau's [inaudible] protocol, I've been looking at techniques that improve the security from semi malicious adversaries to malicious adversaries. And more recently I've been looking at protocols that evaluate parties' inputs at multi-variable polynomials, and tried to get the same strengthening of the security from semi honest to malicious adversaries without kind of losing the efficiency, the original efficiency you had in the semi-honest case. So I've been looking at strengthening there as well. So I think that is it. And I would be happy to answer some questions. [applause] >>: Questions? We had some earlier, I guess. Thanks again.

16760

Related documents

Products

Support

16760

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib