17269 >> Jin Li: Okay. Thank you very much for coming. Today we have an exciting talk by Professor Devavrat Shah currently Associate Professor in ECS Department at MIT. His research interest is statistic inference and network algorithms. He has received a number of awards, including the IEEE Infocom Best Paper Award in 2004. ACM Sigmetrics Best Paper Award in '06. He received 2005 George B. Dantzig Best Presentation Award from Informs. He's also the recipient of the first ACM Sigmetrics Rising Star Award in 2008 for his work on network scheduling algorithms. Without further ado, let's hear what he has to say about inferring rankings under constraint sensing. >> Devavrat Shah: Thanks, Jin. Thank you for hosting and [inaudible]. Very happy to be here. So what I'm going to talk about today is inferring popular rankings from partial or constrained information. The talk is going to be about learning distributions, say, over permutations of rankings. But as the talk will proceed through a few examples, I'll mention that the type of things that we are discovering here are likely to extend to a more general set of learning some full information from available partial information using some additional structures. This is joint work with my student Srikanth Jagabathula at MIT. Again, feel free to stop me during the talk, because I might be using a notation that may not be clear, and questions and answers will be very helpful. Okay. So to set up the context, let me start with the very basic motivation and very simple examples. So here's the first example. So election scenarios. Think of pre-primary, which was not too far in the past or maybe not too far in the future now. Let's say we have many candidates that are in the race, and we want to figure out what are the popular rankings of these candidates and voters. So think of candidates Obama, Clinton, Richardson, and so on. These are the candidates likely to be in the race. And every voter has implicitly or self-consciously or subconsciously some form of ranking of these candidates in mind. And we want to figure out what fraction of voters believe in such ranking. What fraction of voters like Obama as first and Richardson as second and Clinton as third. And so on. If you're N candidates, the total number of such rankings will be N factorial. So N equals to 4, this is a large number, like 24, right? So, in general, this is lots of candidates. If you have lots of candidates, then that's too many possible rankings. If you went and asked people, could you please tell me do you prefer this ranking or that ranking or that ranking, it's impossible and it's impossible because, A, I as an individual may not be able to sort of consciously put down those rankings, or I might not have incentive to tell you. Or if you're standing outside Times Square and taking a poll, then the person coming out of the subway has just three seconds to answer your question. So in that situation the likely answer that you might obtain from people, say through polls, would be of the following type. It would be like do you like Obama or Hillary or not? Or who is your favorite candidate? Or do you -- more generalization of this would be, do you like Obama as a third candidate? And so on. So these are the type of partial information that will be available. And then based on this you want to figure out what fraction of people believe in what rankings. Let's take another example, which is when a similar thing happens. So think of ranking teams in sports leagues. For example, say football league. There are N teams. Not all N teams are going to play with each other. Some of them will play with each other and over time we'll collect different results. And based on this we want to come up with ranking. Now, first I want to argue that there may not be one global ranking, because a team may be very strong, but let's say one of the days its quarterback is ill or let's say broke up, then maybe the team may not perform as good as its native strength. So there may be some form of a distribution over rankings that will be represented in the results of the various games. So what we want to do is looking at the results of the games, that is, what fraction of time Team X defeated Team Y. We want to figure out what is the distribution over rankings and the teams. Any questions? Just going through simple examples, and hopefully the examples are making sense. Okay. All right. So let's take another third example, and this is different from ranking, and this is where my comment was that some of the things you like to do are likely to extend to more general set of priorities rather than just ranking. This talk will only be about rankings. So network measurement case. So let's think of you have a network and a network has, let's say, N flow's active. Flows are sources and destination and they're sending data between sources and destination. The type of measurement that you're gathering are aggregate type information. So for a given link you would know how much demand is passing through, but you may not know which flows are sending what demands, and this is primarily because there are quite a few flows going through one length; you may not be able to measure them. Too much to. So given this kind of aggregate bandwidth of information, what you want to figure out which are the flows that are active and how much is the demand that they're sending. So all these questions in a nutshell are of the following type: That there is some partial information that's available, that's constrained. Constrained because in case of polling you're getting answers of candidate X greater than candidate Y or so on. And network measurement, only links allow you to do aggregate information. It's a constrain in that sense. But what we'd like to do is infer the complete information; that is, distribution over all rankings based on second comparison data and so on. Yes. >>: Do something -- it's possible, it's a matter of situation and solution, you find the most, least unstable [inaudible]. >> Devavrat Shah: Maybe there is some connection, but, yes. >>: There's like three, let's say in your ranking thing, there's like A, B and C. A and B, C, D, B and C. And CB something. >> Devavrat Shah: The answer is a paradox. There the question is if that type of situation happens, the following is not possible. So the question is what if the situation is A beats B, B beats C and C beats A. Then if you have just that type of information, then there's no one ranking. Distribution over rankings with support on one ranking is not possible. And that's exactly what I want to argue; that those type of situations arises because hypothesis is that there's only -- support is just one ranking. Here what I want to say is when you sort of relax that requirement or get out of that mode of thinking and now look for a general answer where I'm looking for a distribution over space of rankings, then there will be a reasonable way to answer that question and that will alleviate actually some of those classical paradoxes that are in the [inaudible] literature. That's a great question. Any other questions? Okay. So bottom line is that in all of these questions we want to learn something from partial information. And this type of thing, of course, if I have a distribution which is uniform over N factorial permutations and if I'm giving you only comparison information, there's not much I can do about it. I just have little information. So the right question to ask is: Given this type of partial information, when is the recovery possible? And if recovery is possible, how can we recover it? All right. Now let's just put a little bit of formalism around it. Let's start with very simple examples. Okay. So I'm thinking of, say, the case of election and two candidates. Now two candidates can be ranked in two ways. Candidate 1 ranks to candidate position one. Candidate 2 gets ranked to position two. This is my notation, my graph and this is how I'll use it throughout the talk. This is one ranking. The other ranking is candidate 1 is ranked in position two and candidate 2 is ranked in position one. It's a very simple thing. Let's say an even fraction of people believe in this ranking. And one minus P2 people believe in this ranking. So then in case of comparison information, the type of information we will get thanks to polling will be of the following type: What fraction of people rank 1 over 2? In this simple setup it would be probably be assuming that we are choosing people at random. So probability that 1 is ranked better than 2 which is P1 in this case. And, similarly, here the fraction of people who believe 2 is greater than 1, which is P2, or if I write in a matrix form, this is the information that I have available. This is the native information that is P1 believes in this first permutation or ranking. This is second ranking. And this column corresponds to the first one. And this column corresponds to the second one. All right. So this is just a setup. And more generally, if you're N candidates, you have N times N minus 1 possible pairs. Of course, half of them are redundant because we're thinking of distribution because if I know P I greater than J, then 1 minus PI greater than J is P greater than I, right? So this is the type of information we have available. And the total amount of possible rankings are the distribution over which this is that space is N factorial. So this is usually very, very long. And this is very, very short. Okay. So the matrix that you're going to see is very thin. Okay. Let's see another example. So here is a little more detail. And this is the example that I'm going to use throughout the talk, this type of example, to explain the results. So, again, going back to the same situation. This is ranking one. This is ranking two. P1 fraction people believe here. P2 fraction of people believe this one. I'm going to generate this bi-partide [phonetic] graph where every edge has a weight. So, for example, this is fraction of people who believe that candidate 1 should be ranked in position 1. And that is in this case it's just this edge because it's only this ranking that contains it. So Y11 is nothing but P1 and so on. And in this situation we have this -- this is the information YIJ, which is given by this matrix times the original data. This again, as before, this matrix is the columns are nothing but correspond to each different permutation. Questions? Okay. So here again you have N squared amount of information available and total possible positions in this distribution or support is N factorial. So we're trying to -- we have this information available. We are trying to recover this. It's a humongous task. One more example quickly. This is this network measurement situation where let's say we have two lengths and possible sources and destinations are any of these situations like N nodes and any other N minus nodes can be destination. So this many possible flows. Each link provides you aggregate flow that's passing through that thing. So, for example, here through this link only flow one is passing with amount P1. Through this link it's P1 plus P2. That's the amount passing. That's the observation you're going to make as Y1, Y2 and Y23. And again this is related to the flow matrix times the graph edges and symmetrics as follows. Any questions? So, again, we have a situation where this is tall, this is thin and this is tiny and we want to recover this out of this. Okay. So in a nutshell we have Y equals to AX, where X is, let's say, capital N dimensional. In case of ranking is N factorial. And the observations are M dimensional. In the two examples I gave you they are N squared dimensional. A is zero matrix and AMI says that one if, that nth component is represented by ith permutation in a nutshell. The question is recover X based on observation Y and the knowledge of A. It's like inverting. So if we had this big and this small, possibly it was likely but it's the enriched problem. So clearly we need more constrained question as in general this is hard to answer. So here's a classical approach. It's a philosophy. [inaudible] that is, you have observations Y. You can find various explanations that is Xs that are consistent with the observations. The one you want to find is the simplest. One way to think of simplest is sparsest. That is, find X that has smallest support. That is, find an X with number of positions with XI not equal to 0. Called L0 norm is smallest. And Y this type of approach have a likelihood in many of the examples that I discussed, well, for example, if you think of election situation, you as a voter do care about few issues. And those few issues define your rankings. Respective number of candidates, the few rankings that are likely to be dominating. Similarly in a sports league, you have teams. Teams have native strengths. And most of the teams are going to be ranked according to their native strengths. But because of some uncertainties like quarterbacks in this, there will be the rankings that you'll observe at least in the result will be some perturbation of these native rankings. Again likely to be sparse. In network, in situations it's likely there are a few flows which should be active, though there are many possible flows can be there. Okay. So given this, our problem left is we want to find out a sparsest solution consistent with our observations. And the question is when is that sparsest solution representing the real situation? Now, this is -- it's reporting formalism, find out Z that is consistent with your observation in this sense that has minimum support. Okay. This is the zero optimization. This is a very classical approach. And there are many situations where people have used it. I'll sort of allude to two of them. One of them is the communication setup, or in case for decoding. So think of you have a code word or a message that you can do transmit over a noisy channel. And at the end of that you will receive a noisy signal. So there are, let's say you transmitted N bits. Out of them few of the bits are flipped. So the sequence of 01. Some of them are flipped now that you receive. You want to figure out which ones are flipped. Now, in case of linear coding, essentially it corresponds to finding out the vector of, so think of vector of 0s and 1s where 1s represent the positions of where bit flips happen. 0 otherwise. It's called syndrome. What you want to do is find out a syndrome that is consistent with your observation that you received. And this type of L0 decoding would correspond to looking at a code word, looking at the received signal and finding out the closest code word to it and seeing how much distance. That's a quick relation to why these type of problems have been well illustrated in coding. More recently, this has become central in the topic which is popularly known as compressed sensing which many of you will know that here the question is, again, you have observations you want to figure out X which is sparse support. And in all of these situations, the primary questions of the following type. You want to design matrix A. In context of coding you want to design code so that you will be able to recover X as the original thing based on observations as long as X is sparse. Is the original X had few positions where it was nonzero. And in many of these cases the way you design A sort of solving actual zero optimization becomes easy. Because in general this optimization is a hard problem. So this has been general philosophy. Then we have the same situation. Y equals AX, but you can't design A. A is given to you. And so these types of things are unlikely to work. Let me just quickly give a small summary of what type of approaches that's popular here. So it can be useful for our purposes because we're not going to design A but if the way you prove that As are good you show that A has good property and the good property is called, roughly speaking, a restricted isoparametric property or [inaudible] what it says is you have a matrix A. If you look at any K subset of columns or of these A, then they're essentially autonomal so they're literally dependent and they presume norm. It's like identity. And if such is the case, then you can recover X using this optimization very quickly. But in our case A we cannot design but maybe we can ask the question does A have this property. Because if A has this property, then at least our quest is over. Did I lose half of you? >>: I thought the proper sensing random predictions. So do random matrices naturally have some property similar to K loop or ->> Devavrat Shah: Right. So one of the approaches to prove -- okay. So compressed sensing or say linear coding you do random linear coding or random projections. You come up with a matrix A and then you prove that actually this matrix is going to be good because it has this property. So while we can design matrix A but maybe our matrix by design has some such property and, hence, it may lead to good answers. Well, as it happens the choices of A that we have don't have this ID property. In proving that I'll give you a simple contra example that will explain it. So let's think of this simple contra example. It's four candidates, and thinking of marginal information, that is, I have four candidates on the left. These are positions. This edge represents candidate 1 ranked to position 1. And this number .5 says that .5 fraction of people believe that candidate 1 should be ranked in position 1 and this is the whole information that I have. So this is my Y. Now, here's a simple contra example, which says there are two ways to produce this Y in which the support is 2, which is the minimal possible. And it has this equal probability. So this ranking says that candidate 1 is ranked 1. Candidate 2 is ranked 2 and so on. Here it says 1 is ranked position 2. 2 is ranked position 1 and so on. Or the flip one where I took this on this side and this on this side. So now I've got two possible decompositions of this thing with minimal support. And L0 doesn't have a unique solution and, hence, it's not going to be able to recover what you want. So it is says that it doesn't even have two ID property. So in some sense the best recovery that we can do would be K equals to 1, which is rather totology or trivial. Okay. So this is one situation just to drive the point home. This is a situation where symmetric measurement case, that is a three linked network, and you're going to measure load two on this link and load one on this and load one on this. This can be done two ways. If you have one flow going like this across and there's one flow here, which will give you the same loading, or this is another example where it will happen. So, again, it says that the type of situation that we're going to see, it's impossible to expect some good RIP style property. So now the question is that, well, we have this kind of situation. Our RIP property is not going to work. That means we won't be able to recover anything interesting. So what can we do? Now, what I'm going to do is I'm going to give you a sufficient condition under which it's possible to recover X using L0 optimization as a unique solution. And I'll give you under that condition a very simple algorithm to recover that X. It's one thing. Well, that condition can be totally bizarre and useless. What I'll try to argue in the second part is that, well, indeed that condition is not bizarre. Actually, if you take a random model, an actual random model, then the conditions will be satisfied and these conditions will be satisfied in an optimal sense. And I will explain that optimality later. But what this will suggest is that the way RIP approach goes, adversarially recovery may not be possible. But for most sparse situations recovery will be possible. The contra example I produced to you, they're actually very special. If you laid down a uniform distribution over a space of sparse supports, then actually you will not see those contra examples with high probability. This is like encoding. There's a dichotomy between adversarial coding versus probabilistic coding, where, in adversarial coding, for graph codes you need expansion [inaudible]. But for random error you don't need expansion [inaudible]. It's roughly related to this. All right. And then, finally, I'll discuss some more things where here I will talk about specific type of information like comparison, favorite candidates. But then what is a dichotomy of this type of information? Is there a natural way to ask all sorts of information. And depending on the amount of information you give, might recovery increase. What's the right trade off between information and recoverability. I'll discuss those connections here. >>: Question. >> Devavrat Shah: Yes. >>: So can you go back to the previous, just the example. So here, right, I mean the answer would be you will give a .25 probability to each of these rankings or even 0 probability to those rankings because that would be the correct answer? >> Devavrat Shah: Yes .25 to this. .25 to this and the rest 24 minus 2. 0. >>: Point to each of the four things, right? >> Devavrat Shah: So one answer would be support 4. So answer 1 with support 2 would be .5 probability to this, .5 probability to this and 0 probability to the rest of the four factorial minus 2 there is 22. And answer two would be the similar support bit 2 and answer three could be support 4, .25, .25, .25 or choose the reasonable combinations. >>: So you're trying to find an answer with support two? >> Devavrat Shah: I'm trying to find an answer with the sparsest support because it will be the L0 minimization. If I had unique L0 minimalization, and Y is generated by the thing I'm looking for, then I have proved that L0 minimization recovers what I'm looking for. This is good. Is that helpful? Okay. So now let's move on. So I'm going to tell you sufficient conditions, at first may look bizarre but hold on there's a promise. It's actually satisfied by natural model. So let me walk you through sufficient conditions. Sufficient conditions has two parts. This is the sufficient condition for proving that the L0 optimization problem that I stated has unique solution, which happens to be the original acts that I want and there's an algorithm that will recover it very easily. First condition says unique signature. So let me just use an example to say what unique signature is and then we'll go back and read it again. Okay. So let's do the following. So let's say going back to the marginal thing, so three candidates. This is N squared. It's nine pieces of information. This candidate 1 ranked in position 1.9 fraction of people. And let's say original underlying data is this. There is, out of three factorial, which is six possible permutations, these three permutations are likely, these three rankings are possible upon people. .1 fraction of people believe in this, .2 fraction of people believe in this and .7 fraction of people believe in this. The rest of the rankings are not popular. Now, the unique signature says the following: That in the support of these acts, there's these three rankings, have unique edges. And those unique edges are these pink edges. What do I mean by unique edges? This edge is present or this type of ranking is done only in this one and not in another one. What that means is that candidate 1 is ranked in position 2 only by this one, and not by these other two non-zero mass ones. Of course there will be other permutations or other rankings which will rank 1, 2, 2 but have 0 mass. So essentially they're not visible. Similarly, the second one says that candidate 2 is ranked in position 2. This doesn't rank and this doesn't rank that way. And this is the third one. And that's what it means. This is unique signature. Or in a more general setup, it's a spectrum of rankings, Y equals EX form. I should do it this way. Y equals 2X form, says that for every I that's in the support of X, there exists a rule in which only AMI equals to 1 for this I but for all other J with non-0 support that have AMJ equals to 0. In a nutshell what it means when I will apply A to this X, YM will be equals 2 XI. So on one hand it says, well, I just told you XI, right. On the other hand, I haven't told you really which XI it is because you don't know which YM it is because you don't know which of the Js are. So all it says is there's a promise to recover information. But it's not obvious how to do it. The other one says linear independence. And linear independence, it's essentially what we think of as linear independence but in a restricted sense. That is, you look at all these XI values, take any combination of these with CIs coming from natural numbers. Sorry, rather, positive and negative numbers. But with values between minus X knot and plus X knot. As a special case what it will mean is that implication of this would be that look at all XIs are not equal to 0 and let's say it's their first, I'm ordering them as the first 23 up to X knot, and I'm taking any partial sum or subset of them. All of these partial subsets are distinct. So that's it going back to this one. What are the possible partial subset sums? It's just this element or this element or this element. So these are .1, .2 and .7. Another subset is these two guys .1 plus .2 is .3. Another subset is .7 plus .2 is .9. Third subset is .7 plus .1 is .8. And one and all these are distinct numbers, right? So this is the situation, and when you have these conditions satisfied, then we will have an algorithm and also these conditions will be satisfied under the random model that I'll describe. Now I will tell you the algorithm. So let's go back to this example. So here's what I will do. First I will look at the data I have. So, remember, we have this data. We don't know how many underlying permutations that are there that have generated this. We don't know their values, and we don't know which ones they are. But we want to figure out all those three. There are three permutations generating this. These permutations are these precisely. And their probabilities are .1, .9 and .7. To do that, I'll take this data and I will sort them. So that is .1 is the smallest number. I'll take the smallest number first, look at the edges with smallest number. Now based on the property, that unique signature and the distinct linear independence or subset sum property, I will argue that the .1 must be the probability of this permutation with the smallest probability. Because, okay, let's think, since we know that Y equals to AX types, and A is 01, each Y is value in each edge is possibly sum of few probabilities. It's PI1. PI2. PI3. PIK. Since it's a unique signature what it means there exists an edge which has exactly the value which is equal to P1 which is the smallest. So that means that if I sorted this whole thing, then the smallest numbers will be the ones which are equal to P1 and those are the edges which belong to the permutation with smallest number. So far so good. We just got started, right? Is this clear? I really hope that I will be able to convey this algorithm. Because if I don't then I think it will be a failure of the talk. So please feel free to ask me questions. >>: Since it's unique, you can [inaudible]. >> Devavrat Shah: Correct. Good. So now what you'll do, as you said, is now we know that these two guys here, these two guys are .1 probability edges, they must belong to first permutation. So here's what we do. We generate first permutation with, of course, here it's written but let's say the edges that we have learned and we put them and we assign the probability. So we just got started. Now let's sort of try to repeat this type of algorithm. We remove those two edges from this. Now look at the rest of the edges with the rest of the weights. Look at the second smallest now. Second smallest here is .2. So .2 -- two possibilities. Again, it can be from P1. Of course this is not P1. Or it can be from P1 plus P2 or P2. Of course P1 plus P2 will be bigger than P2. This must be coming from P2. Okay. So by the same argument we create another permutation with probability .2. Let's sort it again. The third is .3. .3 is P1 plus P2. And we know that unique subsets sum property. If it's P1 plus P2 it can't be equal to PI1 PIK. That means it has to be from P1 plus P2. We'll put these edges to these two permutations. In a sense we're almost done with this one. Now we look at the rest of the edges. Now we come with this edge. Now, what are the options? Well, options are P1 plus P2 we already learned. Either it's P3. Again, in this case it's trivial. But in general what can happen is the following: At any point in time when you're at the algorithm you have discovered P1 P2 up to PK. The number you're looking at it can be either one of the subset sums of the first key elements or PC plus 1. Two options. Either you discover edges of some of the combination of permutations you've already seen or you're discovering a new permutation. Okay. So that's it. So in this case .7 cannot be P1 plus P2. It has to be P3. So you discover them. And then the last one you know it's P2 plus P3 and that's it. That's the end of the algorithm. Any questions? >>: So one of the complexity of keeping these running, subset sums, stuff like that right, is there some efficient way to be thinking about this to do this in polynomial time. >> Devavrat Shah: Subsets by itself is a very hard problem. But for this you need to do it approximately. And for approximate there's fully polynomial demo of approximation algorithms. Good question. >>: [inaudible]. >> Devavrat Shah: Sorry? >>: [inaudible] variables. >> Devavrat Shah: Exactly. That's right. >>: Thank you. >> Devavrat Shah: Okay. So that was the algorithm. Now, if you see there's not much as far as structure of permutations that I've used here. This is the fact that as long as you have these two properties, these unique signature property and subsets and property linear independence property, this algorithm is going to work. Okay. And the theorem is that as long as those two conditions are there, L0 optimization is unique solution. That unique solution is equal to the original solution and this algorithm will recover it. Okay. So now the question is how good or how bad this condition is. Okay. So let's -- we know that this type of conditions are not going to be true adversarially. I just presented a contra example. Let's look at the next version. That is randomly. Random model goes like this. So let's say I want to generate a model with sparsity K, that is K nonzero elements. In case of permutations here's what I will do. Have N factorial possible permutations. I'm going to choose K permutations randomly out of them. And for each one of them I want to assign probabilities. P1, P2, PK. Here's how I'll generate. First I'll generate numbers uniformly at random from an interval A to B. Then I'll sort of normalize those numbers. That is, some of them divided by the summation to each number. And I will opt in probabilities and that will be my model. The question is that when are these sufficient conditions satisfied? More precisely, for what values, up to what values of K the sufficient conditions are satisfied with high probability. So that we can recover the original X using the algorithm. Okay. Now, of course, we know that adversary is not going to be satisfied for even K equals to 3. But for random model, here's what's going to happen. So I told you the marginal information setup. Candidate I ranked in position J. And this is true for all IJs. And in that situation up to sparsity and log N you will be able to recover the sufficient conditions are satisfied with high probability and hence the algorithm will be able to recover them with high probability. For comparison data, this works up to log N. Now, it's a little bit depressing in some sense, because you have the amount of information in both the cases you have is of order N squared, and here you're able to recover N log N. Here you're recovering up to log N. Now, it's sort of a small note about comparison. If you have K equals to 1 comparison is giving you all pair-wise comparisons. In some sense it's equal to classical sorting problem. What here you're saying is that up to log N permutations you can sort simultaneously using this permutation information. And in a second I'll tell you why I believe these are the right answers. Okay. And here is a bit of interesting fact that in addition to comparison, suppose I also give you the following fact: That I as a polling person I'm asking people information about compare people. Also I'll ask some people tell me your favorite candidate. Okay. So just with that bit of information you will certainly now go from N log N to square root of N. All right. So now there are two questions. One is the information that I just gave you, are there really tight or are they sort of an artifact of my sufficient conditions. That's one. And the second is what is the generic way of asking this information and what are the trade-offs depending on what question you ask. So first let me answer you the first question. That is, A, what I'm trying to show here is that actually the results that we have here are not just artifacts of our sufficient condition, but actually they are necessary. Specifically under random model, if your sparsity is scaling faster than N log N and no algorithm, whether it's polynomial or exponential, it doesn't matter. No algorithm will be able to recover the X with high probability. So it may not be LO, may not be anything. No algorithm will be able to do it. All right. So that's the first part. Sorry, this is a typo. Oh. No. Sorry. It's not a typo. I was reading this one. Okay. So what this says is that for marginal information N log N is this recoverability threshold in this random sense. Before it you can recover it after you can't recover and possibility is with algorithm I told you. For comparison I believe log N is the sparsity. If you're interested, I can talk to you off line why I believe this conjecture should be true. Now the question is what about the various types of information? And what is the trade-off between the information and the recoverability threshold? So let's just sort of think what are the other types of questions you can ask in the case of elections? You can say, well, I asked you first that candidate I ranked in position J, answer yes or no. Maybe now I can ask you more. It's like first order information. Now I can ask you second order information. Candidate I and I prime ranked in position J and J prime. Now tell me what fraction of people believe in that kind of thing. Now it's more information. Second order marginal. Or maybe K order marginal will be I1 to IK ranked in position J1 J2 JK. Now in these cases, question is how would the recoverability threshold change. Yes, please. >>: So the recoverability threshold you said the previous slide, you said it's independent of your recovery condition, whether it's either normal or ->> Devavrat Shah: Recoverability threshold has two parts, right? One is when can you recover and when can you not recover. So when can you recover that answer, that is up to N K equals to log N. The sufficient conditions will be satisfied and hence our algorithm will recover it. When can you not recover it, that's independent of algorithm. It's like in information theory sense it will be converse. >>: For example, contra example you gave, if you used L2 norms for L0 norms, like the fourth thing would be the right answer, right, of giving them each probability .25? >> Devavrat Shah: No. It may not be the right answer if I started with sort of I generated that information using this X which is sort of put .5 .5 support on those two cases. >>: Got it. So basically there are two unique -- good. Good. >> Devavrat Shah: Okay. So this situation, since it's undermined situation, there will be lots of solutions. So the question is your solution that generated it is the sparsest unique. And in some sense recovery part will say, yes, that is the case. And not recovery part will say, well, you can't do anything. All right. So here is the situation. Now, the question is how will threshold change here? So here the threshold change is N squared log N. Here threshold changes N2 to par log N. So there's some pattern. But, again, the question is: Are these all the type of questions you can ask? Can't I ask some more complex questions? So here's a class of questions that I'm going to say that how natural association. And these are essentially all classes of questions in some sense. Now, this type of questions come from the looking at the representation theory of permutations. But let me explain what this type of questions are and what type of recoverability threshold that comes up. So let's take number N. So N is the number of candidates. Now you can take different partitions of number N. So by partition I mean lambda one, lambda two, lambda R. Let's say they're decreasing. And they sum up to N. Okay. So, for example, one partition would be N minus 1 comma 1 is lambda 1 lambda 2 the sum of 2 N. Or more generally it's this type of partition and so on. Different type of partitions. Now this partition, my claim is that it is associated to partial information of the first order information. It is candidate I ranks to position J. This partial order is related to the K order marginal information, i.e., this type of information. More generally, there are different ways to associate them. I'm not going to go into detail. But if you're interested I'll be happy to talk to you at the end. And in general for every lambda there's a class of information that's associated. It's like you take a function and you write down what you transform, then different lambda represents different bases of that functional base space. That's roughly the idea. Okay. Now, okay. So I'll just classify the types of partial information that you can get, is lambda associates to a type of partial information. The question is: How does recoverability threshold depend on this lambda? Okay. So for that let me define one more thing. So given lambda, there's given type of partition of N. The number of ways you can partition N into this type of partition is this. Is N factorial divided by lambda 1 factorial lambda 2 factorial. This is, if you remember, from classical combinatorial class. So, for example, if you have lambda equals to N minus K111, this formula it's N factorial over K minus factor here. Essentially like N 2 power K. You see the pattern. I told you N to power K. For K marginal information the threshold was N to power K log N. So now if you have D lambda, what should be the threshold? Think of K constant and what would be the threshold? D lambda log D lambda. So for large class of lambdas of partial information, this will be a recoverability threshold. If you have a lambda that allows you to look for different partitions, more of them, you can have more information and recover a lot more. Otherwise you can recover less. >>: What would be a lambda for comparison? >> Devavrat Shah: Good. Comparison won't fall into this [inaudible], but it will be implied and, hence, it's conjecture of log N. And because this is -- okay. So this is a theorem that's D lambda log lambda. For comparison you can take lambda equals to N minus 2 comma 1 comma 1; that is, K equals to 2 Ks here. And then you can aggregate that information to obtain the comparison. But it won't fit naturally into this classification. It's not one of those conical bases function. So that essentially brings me to the end of the talk. What I did today is I tried to do the following: I tried to understand when does this sparsest solution based approach will work for a question of inferring rankings where rankings are to be inferred based on a partial information that's not engendered as in measurement matrix is not engineered, you're given. The question is when can you recover it. And what we tried to do is we tried to study this approach under natural model and natural model provides for large class of setup, it provides us optimal recovery threshold. So there are simple sufficient condition which tells you how you can recover them, and beyond that there's no way you can sort of recover the support or exit support. Going forward what I would like to do is actually the examples I pointed out, the network measurement case, clearly there again network topology should play a role and when are the sufficient conditions satisfied and when they are not. This type of simple sufficient condition, are they really optimal or are they really just reminding the recovery threshold. That would be interesting to know and I don't know the answers. Okay. That's the end of my talk. Thank you. [applause]. >>: I think your condition is different from the power IP conditions. [inaudible] satisfied for RK columns, [inaudible] essentially for the particular subset. >> Devavrat Shah: Yes. >>: Your particular subset is [inaudible]. >> Devavrat Shah: Correct. >>: And on top of that the thing works is [inaudible] is one, coefficients created once. Once. >> Devavrat Shah: Yes. >>: Is that a [inaudible] to work. Certain condition, very dependent. I'm not sure where that -- >> Devavrat Shah: Okay. So there's one condition which is the unique signature condition. It's a part of that that are two pieces. One is, as you said, the unique signature part. A witness part. The second one is A being 0 and 1. The question is A being 1 necessary? That's the first one. The second one where is the subset sum property or linear independence come up. So linear independence comes up because it allows you to sort of undeniably argue that if you're going from smallest to biggest you're doing the right things. So that's where it comes up. Now, as far as the A being 0 and 1 is concerned, I don't think it will really matter because one can sort of modify the similar things and so, for example, A is not 0 and 1 but let's say it's 01 or 2, then you would argue things -- you can add some clauses and make it work. But as it is it may not work. >>: But they may not work for -- it worked for -- might be generalized work, 0 [inaudible] but may not be general [inaudible]. >> Devavrat Shah: Yeah, I think so. It may not. You're right. Yes. >>: So the natural one that you have, you have a property with rankings, right? How does it translate into individual guys? So does it translate into some heavy [inaudible] property, the first one's probability or like the property that candidate 1 is the highest rank X candidate 2 is highest rank Y. Is there some [inaudible] properties for those people? >> Devavrat Shah: Because here the model is uniform, for most of the questions that you asked, that is, what is candidate 1 ranked in position 1, candidate 1 ranked in position 2 they'll be uniform under this. But you can think of models like the third model like I say my random model is like this. Say here is an identity, permutation, then I'm going to do perturbation around it. So then you can define, let me call it deperturbation model saying that is everybody is naturally, I maps to I or map uniformly plus or minus D of I. Now, under this random model, again, you can define appropriate thresholds. So, for example, for marginal one the recoverability threshold would be D log D. That's D is the perturbation. And actually that will be tied. So for that model things will work out as in -- so in some sense we've been in the situation where when does this type of simple condition not going to work, because it's sort of counterintuitive that these type of things are working. So there must be certain symmetry that we're using here that is sort of giving the exact answers. But we haven't been able to find it. >> Jin Li: Any other questions? Thank the speaker for his talk. [applause]