>> Yuval Peres: The next talk will be by Navin. He's going to talk about some algorithms for independent component analysis. >> Navin Goyal: Thanks. This is joint work with Santosh Vempala and Ying Xiao from Georgia Tech. I'll talk about independent component analysis or ICA. It's a problem that was first formulated in signal processing and medical imaging community in the 1980s. Since then it has found connections in machine learning strategies and there is extensive literature on it. Before I describe the general ICA problem, I'll start with a toy problem that gives the flavor of it. It's called the cocktail party problem. It's a standard way to explain ICA. Suppose you have two people speaking simultaneously and you have two microphones recording their voices. Now you have these two recordings and from these two recordings you would like to decouple what these people say. She would like to extract the voices of each person. To describe a mathematical model for those, this is a possible mathematical model. It's a bit idealized, but it will serve as a prototype for the ICA problem. We model the voices of these two people by signals s1 of t and s2 of t. s1 of t is a random variable which takes values at time steps 1, 2, 3 and so on. It has a distribution associated with it. That each time step it independently samples from this distribution to get the value s1 of t and similarly, s2 of t. These are the voices of the two people speaking. If you don't observe that, what we observe is a superposition, linear super positions of these signals. One microphone records one superposition and the other records another superposition and these are different because the microphones are located in different locations. >>: [indiscernible] >> Navin Goyal: I'll come to that, yes. At each time step you sample s1 of t independently from the same distribution. s1 has one distribution and s2 has another distribution and that each time stamp you individually sample. We don't know aij. We don't know where the speakers are located so we don't know aij’s. We don't know distribution of s1 or s2 and the problem is to recover aij and s given the samples of s1 and s2. This seems like an impossible problem. There is not sufficient data to solve it. That's true, so we have to either get more data or make more assumptions. The crucial assumption to make is that s1 and s2 are independent of each other, so not only s1 of 1 is independent of s1 of 2 and so on but s1 and s2 are also independent of each other, which is reasonable in many applications. When two people are speaking simultaneously they generally don't have much to do with each other. That's the simple version how I see it. We can describe what's happening here pictorially. Here I'm making a further assumption that s1 and s2 are uniform in [indiscernible] minus 1 1. If you plot them than the situation looks like this uniform distribution is square and what we are actually observing his linear transform of this square. It's parallel [indiscernible] and uniform distribution of this. From this we want to recover what [indiscernible] of these points on we want to recover what linear transform was used. At least in this situation you can see that if you sufficiently manage points you can sort of learn the shape of this [indiscernible] and you can record the linear transform. That's the general ICA problem. The difference now is that the number of microphones is n and the number of speakers is also n. We have this model x equal to As. x is what we observe. sv is the hidden signal but its components are independent of each other. A the constant metrics are non-singular and the problem is to recover A and possibly the distribution of s given distribution of x. Even now the problem is not actually verbose and we are to make some further restrictions to make it verbose. One thing is that suppose each component of s is standard Gaussian and they are independent of each other. If A is the rotation matrix, any rotation matrix, then what you get back is, again, independent Gaussian components. Any rotation matrix is independent component of it. The problem is not solvable in that case. We will just assume that any other components of s, none of the components of s is Gaussian. Another thing is that if I scale the first column by a factor of 2, let's say, and I scale s1 by factor of half I get a different A and a different s and this will satisfy the equation x is equal to As. I cannot learn the scalings of these columns or the scaling of s, so we will allow for the indeterminacy. Similarly, if I permute the columns of A by some permutation, I permute s by some permutation and I again get x equal to As for these new A and s. Again, I cannot learn this permutation so we will allow for these two indeterminacies. But that's all. There is a theorem from the 1950s, Darmois-Skitovich theorem that implies that given this assumption and if you allow these indeterminacies, then the problem is well posed. You can solve it given the distribution of x. Let me just remark that the dictionary learning problem is very similar to this. It has similar model x equal to As but the modeling assumption there is that the components of s are not necessarily independent but they are sparse so it's useful for different set of applications. I'm just restating the problem here. We want to recover a and s given polynomial number of independent samples of x. We don't get the distribution of it. That's the IT problem. Before I describe our contributions, I'll show you some [indiscernible] that are known for this problem. One can try to use PCA a principal component analysis for this. In this special case, suppose your distribution is a given point in rn or r2 in this case and suppose they are uniform in this rectangle. Then you can try to use principal component analysis to find a direction of maximum variance or second moment and that gives you this direction. This is the direction of maximum variance and there is also direction of independent component. Similarly, you can find this other direction. In this particular case you can just use PCA to solve ICA problem. It's not all that useful. If your rectangle was a square then the second moment looks the same in all directions, so it doesn't give you any information about the independent components. Independent components are these two. But this does suggest trying higher moments and interestingly, that works. If you take the fourth moment in direction v, v the unit vector and the fourth moment is just expectation of redirects to v4, then they local minima of f, this picture is showing the fourth moment. In this direction, this is the fourth moment. The local minima of F of v precisely correspond to the facets of the skew and they can be efficiently found using gradient descent type algorithm. And we can estimate the fourth moment quite well just using this empirical moment from the samples that we have. In this case the problem is reasonably well solved, although it doesn't handle all possible distributions for s but it handles large numbers of them. What remains to be done? There is a more general version of ICA which is called underdetermined ICA. The difference here is that the dimension of x vector and s vector are not necessarily the same. Dimension of x can be smaller than dimension of s and otherwise it's the same thing. Now the problem is, again, to recover A. s we cannot recover now because there is no sufficient information. We are given in some sense even less data compared to what we were given before because x is smaller dimension. So what I just showed you doesn't seem to apply for this. Here is a picture of a special case. Suppose s1, s2 and s3, suppose m is equal to 3 and s1, s2, s3 are uniform intervals, then the join solution looks like a cube, uniform distribution on a cube and suppose n is 2 then the data that we get there is some distribution in this geometric shape. From this shape you want to determine the orientation of the cube. Here the main result about underdetermined ICA, there was some previous work, but it applied only to some special cases. Our result is pretty general but I've only studied it for special cases for simplicity. n is the dimension of x, at least 2 and m is the dimension of s more than n at most n squared over 10. Suppose columns of A have unit norm which is without loss of generality because we can't learn the scaling of columns. Any two columns of A are linearly independent and si are far from being Gaussian. There is a way to quantify but I will not rewrite it here. First eight moments of each si are bounded. If these things hold then our algorithm can estimate columns of A within L2 additive error epsilon using number of samples which is polynomial n one over epsilon and 1 over sigma min of this follower A which I'll shortly described. It's a polynomial time algorithm except for an interest in [indiscernible] and times also the similar [indiscernible]. I'll describe what this power of A is. This is called Khatri-Rao square of A. If your matrix A looks like this, these are the columns, then Khatri-Rao square is obtained by taking Khari-Rao square of each column individually and that gives you a new matrix. What is the Khari-Rao of an individual column? Is basically just the tensor squared, put into a vector, so it's an n squared dimension vector. We are taking all possible pairs here for Am squared, A1, A2 and so. If this was n by n matrix this n squared by n matrix. I just restated the result here. You have sigma min of the Khari-Rao square of A here. It's still probably not clear what this means. This is hard to interpret quantity because it involves a funny operation from the columns of A. We have one result which I think gives some meaning to this to what this means. You can think of the matrix A has been generated by smoothing analysis type of distribution. Suppose you start with any matrix A of these dimensions and squared over 10 and then you perturb it by adding independent Gaussian noise and 0 sigma squared to each entry independently. Let's assume that this is actually the input [indiscernible] then we can show that sigma min of the Khari-Rao square of this perturbed matrix is small with small probability. In general, with high probability it's not too small. That gives us the probability of at least 1 minus 2 over n that the probability is over the choice of n as well as the random samples. Our algorithm finds columns of A plus N within small additive error with one over sigma square here now. I think that makes more sense. In the remaining time I will talk about the algorithm for this, for undetermined ICA. As I said before the local optimization, the n doesn't seem to generalize to m > n case. What we'll do is we'll build upon another method due to Yeredor which was given for m equal to n and we'll generalize it to m > n. I'll first describe the algorithm for Yeredor’s algorithm for m equal to n and then I'll describe the generalization. These are standard notions in probability theory. x and s and A are just as before. We can define for unknown variable x it's moment generating function by expectation of x expectation of E to the u transpose x. If you take the [indiscernible] sum of that than that gives us the CGF, cumulate generating function. Similarly, we can define these functions for s, [indiscernible] variable s, so it's the same definitions. The crucial point here is that the moment generating function for s because the components of s are independent it factorizes into moment generating functions for individual components. Similarly, the CGF decomposes into a sum for individual components. This fact will be crucial in what we do next. Assume that the variables t and u are related by this linear relation t [indiscernible] transpose u. Then e to the u transpose x is equal to e to the u transpose a s just by our ICA model and then this is equal to e to the t transpose as well, our assumption. That gives us that CGF of x [indiscernible] of q is equal to CGF of s [indiscernible] t. Now just by basic calculus we try to understand how the derivative of c changes under linear transform, so we can write this relation, this basic calculus. The Hessian metrics of c with respect to u is equal to metrics a times Hessian metrics of c with respect to t times a transpose. You just check how derivative changes in the linear transform variables so that gives you this relation. Now we have polynomial number of samples of x and using that we can estimate all of the entries of this, these now accurately. But we don't know the right hand side, anything on the right-hand side. We do have one piece of information about it which is that this Hessian matrix is a diagonal matrix and here we are using the fact that this cs decomposes as a sum of individual CGFs of individual components of s and that directly gives us that only these diagonal entries survive. We would like to exploit this information to compute a. Here is a very simple idea. You sample u and u prime in Rn uniformly from the unit sphere and evaluate your Hessians at u and you prime and take the ratio. We already saw this relation. Now since I is an invertible matrix, a transpose [indiscernible] with A transpose here and what we are left with is this quantity. Now the quantity here is diagonal matrix. What we get is this thing that we can compute is equal to I times the diagonal matrix times A [indiscernible]. And this is just a value I can vector equation. [indiscernible]. So it is just eigenvalue Eigen vector equation and so we can compute A as columns of eigenvectors of this matrix. That solves this problem. Eigen decomposition is unique if the entries here are pairwise distinct and that's why we use the randomness of u and u prime. We need to show that pairwise these things are distinct and actually for away from each other and that allows us to find Eigen decomposition. That's for m equal to n and I would like to generalize it to under determine ICA. We'll tried to proceed in a similar way but things will break down and we will see where things will break down. We still have t equal to A transpose u and we can still write as before the Hessian matrix of c with respect to u at [indiscernible] and then we can try to take this ratio. It is fine up to this point but this is incorrect because A is not a square matrix. Is not a invertible matrix, so A transpose and A transpose inverse does not cancel, so this doesn't work. We can try to take pseudo-inverse here but that also won't work. Here is the main new idea. We'll use higher derivatives of c so we can take the fourth derivative so this is a tensor now. It's n by n by order 4 tensor, but we are writing it as a matrix n squared by n squared matrix, so we just flatten the tensor into a matrix just like you can find me matrix, n by n matrix into a square dimensional vector. This is [indiscernible] square of A. The diagonal matrix the fourth derivative and then [indiscernible] square of A transpose. This is easy to obtain this relation. Now I will again try to proceed in a similar way. We can take the inverse and similar to what we were doing before, but again, we have this problem of this not being a square matrix. But this time if we take a pseudo-inverse for the second thing then things go through. This is indeed an identity matrix if this is zero inverse and so what we get is very similar to what we got before and now we can obtain the columns of [indiscernible] square of A by eigenvector computation and once you have the columns of this matrix you can also get columns of A easily. That's basically all the algorithm. There are several technical challenges in making this work which is one of them is that we have to show that these things are far from each other in order for eigenvector computation to be effective and all these diagonalization and all these things only happen in approximate settings, so you have to take care of all of those errors. That's it. I will finish with an open problem. [indiscernible] this polynomial time if n is the most n squared for any poly n, but suppose n is very small. n is a constant and n is a good number. Then our algorithm can be super constant, can take super constant time. We don't have a lower bound for proving that that's essential. That's an open problem here. Thank you. [applause] >>: I want to ask a question. It gets a little more interesting when you consider that the matrix is not a transfer function because then there is a fresh equation between my signal in the microphone and then the correlation within the signal affects the result, but you also have the same way that you have a vein of certainty, you could have widening of spectrum. I wonder if you look more into [indiscernible] to see how these [indiscernible] >> Navin Goyal: No I don't know much about these. There are these models but I don't know. >>: Do is a little bit of work here in the speech group, but I think they are not looking much into that. That might be interesting to visit. It still an interesting problem that we would like to solve. [applause] >> Yuval Peres: The last talk would be coming from Anup who is coming from across the lake, I guess. He will talk about lower bounds on multiparty communication. >> Anup Rao: Thanks. I recognize that it's the last talk of the day and you are tired and I'm tired, so I'm just going to show you a bunch of animations that I made in the last few days and then we'll call it good. Before I start talking about the complicated title, I'm going to tell you a little bit about why I care about these questions. I think it's kind of important because maybe a lot of people don't know why those questions are important. Here is for me the biggest thorn in my side as a complexity theorist, which is that I have no idea how to answer this kind of question. If someone asks me here's a problem that I'm solving. I have an algorithm for it. Is my algorithm the best you can do? I have no idea how to answer this question. You can make this concrete anyway you want. Pick any nontrivial algorithm that you know about. Is the algorithm the best algorithm that we know for finding matchings in graphs? We don't know. Is it the best algorithm for matrix multiplication that we have optimal? We don't know. The same for almost all algorithms that we have and, of course, the famous P versus NP question is also of this format. That's just like asking is the best algorithm that we can come up with for SAT optimal, and again, we don't really know. Here's something we do know. Here's some things that we know, about proving when algorithms are optimal. We know that for many tasks if you have a linear time algorithm, then that's about as good as you can do because you need to read the input and that takes linear time, so you can't do better than linear time. That's one thing we know. And then there's the technique of diagonalization, which is really clever and that allows you to say that for certain kinds of tasks which succumb to this technique the obvious algorithm is the optimal one. For example, if you're a given a program’s input and you want to know does this program stop in T steps, then essentially what you should do is run the program for T steps and see if it stops and that's about all you can do. You can't do better than that. That's about all the techniques that we actually have for proving lower bounds and the run time. People have worked on this question and most of the work has gone into studying all kinds of restrictions of the concept of what an algorithm is. You can restrict it like this, restrictive it like that and you get all kinds of questions and then you answer those questions and there's a long list of such results. First, I want to start by convincing you that actually this question about algorithms is essentially a question about communication. That's because of the following picture which took me a long time to make. You can see that the internet helped a lot. I still had to do things. What it says is that if you have an algorithm that computes some function f that maps n bits to 1 bit then you actually obtained a communication protocol. Suppose the algorithm runs in time T and computes as function F, then you can show that there is a protocol that has about a T players that participate in the protocol. Each player knows one of the bits of the input and each player during the protocol sends 1 bit to some set of players and every player here receives at most 2 bits. Let me just show you an example. This guy might send, based on his bit might send something to these two people. This person might send something to these two people. Now these players know some information. They use it to compute some bit and send that to some other people and in this way in every step somebody speaks and sends some bit to somebody. And eventually, after all of the communication is done, someone in the protocol magically knows the value of the function that we are trying to compute. That's something we can show. If there is an algorithm for F then you have a protocol that looks like this. Question? >>: [indiscernible] >> Anup Rao: It's a circuit. This is a circuit. This is basically a circuit. This is essentially equivalent to the concept of what an algorithm is, but it's really hard to prove lower bounds against this thing and one of the reasons why it's really hard is involved here are private channels. Somehow, the reason that this protocol is weak is that some of the parties know some parts of information and other people know other things and you need to exploit this fact that not everybody knows everything to prove that it's, that such a protocol can't work quickly. That's something that we basically have no idea how to exploit. I don't know any way to prove how to exploit that weakness. On the other hand, if I allowed all of the players to communicate by broadcast, just shout out a bit that everybody can hear, then you can compute anything with n players because everybody just announces, each person just announces one bit and then someone knows all the bits. We're done. Then it becomes a meaningless model. I need to somehow understand what that weakness is in the private channels to prove lower bounds against them. But there are settings where communicating by broadcast can be something that's meaningful to exploit. For example, here is a result that was proved by Valiant in the ‘70s, made more explicit by Rubich [phonetic] and myself in a paper that we did recently. It says that suppose F can be computed in parallel time log n and total work n, so you have some number of processors that are computing the function and the total number of instructions that are executed is order n but the total time it's taking is only order log n, so this is a log debt circuit of linear size. Then you actually get from that a protocol with little o of n players where each player knows a very small fraction of the bit into the .1 fraction of the bits of the input and the players communicate now by broadcast. The first player look set some small fraction of bits, says something. The second looks at some other fraction of the bits, says something and now all of the communication is by broadcast, but eventually, one of these parties knows the value of the function outputs. >>: Who decides which player [indiscernible] >> Anup Rao: Ahead of time based on the function the players decide these are the bits that I know. >>: [indiscernible] >> Anup Rao: Yeah. And then they start communicating. Everybody decides ahead of time what bits they know. >>: Someone must be fixed, right? One person decides [indiscernible] >> Anup Rao: I'm not sure what you mean. >>: Someone knows x but that person is fixed? >> Anup Rao: That person is fixed, yes. So the last person in the sequence, these people all speak in this order and the last person knows f of x. Yeah. Roughly speaking, these players each correspond to a line of the program that's being a run in parallel, but not all the lines because there are n lines and they are only n over log log n players, but there is a subset of the lines that are kind of the important ones and those are the ones that are simulated by these players. This is something that I can hope to try to prove lower bounds against because we have techniques that work against communication that happens with broadcasts. Let me give you some idea of what we know how to prove about communication. Here's some easy results that we know how to prove. All the proofs, many of the proofs I am going to tell you about here, I won't tell you the proofs, but they are all really short. One thing we can prove, for example, is suppose you have just two parties and each of them has a subset of 1 through n, so it's an n-bit string and they want to know are the sets the same. That's something you can do obviously by having one party send the other party his set. That will take n bits and this is the best you can do. There is nothing better you can do. That's easy to prove. It's an exercise. Another thing that you can show that requires n bits of communication is to compute the generalized inter-product of these vectors, mod 2, so that's the same as counting is the intersection size of x and y even? That also requires n bits of communication. It also requires n bits of communication to tell whether x and y are disjoint or not. All these things are easy to prove, but those kinds of results seem quite different from where we want to get to eventually because we want to understand this situation. The problem is in this situation the players have a lot of overlap in the input. They see a lot of bits that are the same and that's quite different from the two-party situation where they have independent inputs. Let me show you why that's different. Suppose we have now three parties and each party has a set and they want to tell if all the sets are equal, then again, it's really easy to show that actually it requires n bits of communication or n bits of communication to do this and omega n bits of communication to do this. But what if each party knows two of the sets? How many bits of communication does it take to tell whether the sense are the same or not? Two bits, and why is that? >>: [indiscernible] >> Anup Rao: Yeah. You just need one person to verify that x is equal to y and another to verify that y is equal to z and then you know that all three must be the same. Suddenly, the communication that you need drops down dramatically for this problem. It's hard to prove lower bounds when the parties start to have information they can see in common. Nevertheless, we do have methods that work against this kind of model. The really beautiful paper of [indiscernible] in the early ‘90s showed that if you want to compute generalized interproduct of x, y and z then that still requires n bits of communication in this model. The subject of this talk is understanding, for a long time it seems that that technique did not work when you're trying to understand if the sets are disjoint or not. That's exactly what we worked on. There are several reasons why this particular function is interesting. In theory, understanding it has applications to prove complexity and basically other applications in complexity that I won't get into. The main motivation was I want to find a technique that works eventually for the model has related to algorithms and this seemed like something we have to understand first. This seems like an easier problem. Here's kind of what the story is for this problem and our result is at the bottom here. Basically there has been a long line of work that used increasingly complicated techniques until our work which uses basically the same idea as BNS as [indiscernible]. Eventually what we managed to show is that if you have k parties that have k sets and each party knows k-1 of the sets and they want to know whether the sets are disjoint or not, then they need to communicate at least n over 4 to the k bits, universe is of size n. And that has more or less a ride dependence on n and k because there's an upper bound. There's a really clever upper bound of Gromosh [phonetic] who showed that you can compute is in k squared n over 2 to the k bits of communication. >>: Are the lower bounds subject to random [indiscernible] >> Anup Rao: Yes. All of these lower bounds also apply to randomized communication and we also reprove, so our proof of the deterministic lower bound is really simple. The proof of the randomized lower bound is not so simple but then we simplify the best proof before which was the work of Shurstoff that gave a lower bound of roughly square root n. We also reduced the complexity of that proof but we don't improve the bound. We get exactly the same bound. >>: But this is deterministic? >> Anup Rao: The upper bound is deterministic, yeah. The lower bound is really interesting because there is a quantum upper bound that matches this lower bound and the proof works also for quantum. The lower bound proofs also work for quantum. For quantum the lower bound, this lower bound is tight. Somehow if you're going to get a lower bound you need to distinguish quantum communication from classical, from randomized communication. When did I start? I forget. >>: You can go for 5 or 6 minutes. >> Anup Rao: Okay. Now I will quickly outline how the proof goes. I won't really show you many details. I will just give you some starting points. We are in the setting where we have k parties. There are k sets. Each party knows k-1 of the sets and they want to know if the sets are disjoint or not. Roughly, the idea is usually the way these kinds of proofs work is that you find some hard distribution, find a hard distribution on the sets and argue somehow that on this distribution players cannot succeed with the high probability. I now define a distribution on the sets. To do that let's partition the universe into n parts and each part is going to be a size roughly 4 to the k. Within each part I'll tell you how to sample all of the k sets. You sample the first k-1 of them uniformly randomly conditioned on the fact that all of those k-1 sets intersect in exactly 1 point. The first k-1 sets intersect in exactly 1 point and the last set is just a completely random set. That's how you sample those sets in the first part of the universe. You do exactly the same in every part of the universe. >>: What is the universe? >> Anup Rao: The universe is n elements and now I am describing how to sample the sets x1, x2 thru xk as subsets of these n elements. You take the n elements and break it up into m parts. And the first part that we sample is you sample the first k-1 sets so they intersect in exactly one of those elements and then the last set is random. The chance of having intersection in the first part is exactly 50 percent. It's the chance that the random set has this one point of intersection of the rest period and you do this independently in all of these m parts of the universe. That's the distribution. It's a really strange distribution because with extremely high probability there will be an intersection of these k sets. The chance of intersection in the first part is half, so with good probability there is going to be intersection somewhere. Nevertheless, it's useful for the analysis which is I think what makes the whole proof really counterintuitive and maybe it's why people missed this proof before. >>: The issue is you are not allowing any probability of error. >> Anup Rao: Yes. Although, later we will also use the same kind of distribution for the randomized lower bound and there we will allow error, but then we will have to do something more complicated. For now we are not allowing error, so let's allow Di to be the random variable that indicates whether or not there's an intersection in the i of the part of the universe. That's a random bit. Here's a theorem that occurs somewhere in one of the older proofs. It's due to Shurzstoff. He showed that if pi is a protocol that's computed by using communication c, if pi is a function computer by protocol in communication c then this expression holds. The expected value of pi on these sets times basically the parity of these Di’s the absolute value of that expectation is small. It's bounded by 2 to the c-2n. In some sense pi, this kind of says the protocol pi is bad if computing the parity of Di’s. But if you are you trying to prove a lower bound that's exactly what pi is trying to compute anyway. It's trying to compute the disjointedness, which is just like computing whether all these are set to one. What we can prove is that it's bad at computing the parity of the Di’s but in the deterministic case basically what we observed is that's enough. If pi is a deterministic protocol and it makes no errors, then when the sets are disjoint then pi is equal to one, but then all of these are equal to one, so whenever pi is equal to one this sum here is exactly equal to m. And when pi is equal to zero, everything is zero in this product. That means that this left-hand side if pi computes disjointedness is exactly equal to 2 to the minus m. It's just the probability that the sets are all disjoint. The left hand side is 2 to the minus m. Right-hand side is this. You rearrange it and you get the communication must be at least m. It's kind of confusing that something like this can happen, but it's actually really simple. This is pretty much what it is. >>: [indiscernible] >> Anup Rao: This thing is basically the same idea as [indiscernible] which I guess I don't have time to get into but it's a two-page proof and it just, you repeatedly apply Kushi-Schwartz [phonetic] to this expression until things become nice is all I can say. But I don't have time to explain now. But it's just Kushi-Schwartz. >>: [indiscernible] >> Anup Rao: Let me give you an outline of how the randomized lower bound goes. This, we can't use this kind of thing anymore because here I really use the fact that I does not make any errors, because if pi makes an error then there could be places where pi is equal to one and this thing is not fixed. But nevertheless, here's how the proof goes. You define this function f, j which takes an input j and it's simply the probability that the protocol outputs 1 when the number of places where the blocks are disjoint is g. That's the function fg. Then you can use any kind of bound that we saw in the last slide to get this kind of algebraic expression for f. It's not too hard to get here, but it says that if fj is defined as above, then it must be the case for all that for every r is bigger than c this expression f of the some of these di thru dr times m over r times this parity is small in expectation. This is kind of variant analogous to what I was seeing, what we were seeing in the previous slide. This is an expression that corresponds to pi, but I'm not going to tell you how we got there. And this point is just completely, we forget the fact that we are working with protocols. It turns out that you can use this fact to show that f can be approximated by a degree c polynomial once you have this thing. That's something that takes a couple of pages to prove, but it's again, a little bit tricky. It's not that straightforward. And once you know that fact, if f computes disjointedness then, if f came from a protocol that computes disjointedness, then f is supposed to look something like this, right? The probability that the protocol outputs one is supposed to be zero when the some of the Di’s is less than n and only when all the blocks are disjoint is it supposed to accept. So kind of f should look something like this. It should be close to zero at all these points and then suddenly spike of to something close to one, but we know, it was proved by Nisan and Segarti that any such polynomial must have degree square root m. We know that f can be approximated by a degree c polynomial and any such polynomial must have degree squared m and that kind of shows that c has degree roughly squared m. I guess I dropped the k terms [indiscernible]. That's roughly what happens in the proof. I'll stop there. I will just spend a couple of minutes telling you what I think we should do next. I think the big open problem here is to prove all the lower bounds we know how to prove for k parties in the set up are the type n over 2 to the k and the big open problem is to prove a lower bound that is really like n or n over k or n over polynomial and k. And here's a candidate problem that I think should be hard for this model. Imagine that you have, the input is k matching, so there is one matching between these two sets of vertices and another matching between these two and another matching between these two and so on. What we want to know is if you start at this designated vertex and keep walking clockwise for n over hundred steps, n is the number of vertices in each graph, and each part here, then do we end up at an odd vertex or an even vertex? The trivial way to do it is, you know, figure out where this edge is going. Then figure out where the next edge is going and so on. And there should be essentially no better way to do it than that with communication. That's a candidate, but I will warn you that there is a really clever protocol that can do something really clever. You can try and figure it out for yourself but it's pretty hard. It turns out that if you want to just do one round, if you want to just do three steps and there's three players, there's a protocol that can figure out where the third step goes in little o of n communication. It is something, there is something non-trivial that you can do there. But then I don't believe that you can, that that protocol can be extended to get something that figures out how a really long path goes. That's all so basically related to what I think is a hard problem for this question. For parallel rhythms should be hard to tell when you are given a directive graph where every vertex has odd degree 1, it should be hard to tell if the vertex 1 is connected to vertex 2 simultaneously in log depth and in a log time and work. I'll stop there. [applause] >> Yuval Peres: Questions? >>: I want to ask a question. Can you go back to [indiscernible]. Everybody has everything except like xi has everything except xi? >> Anup Rao: Yeah, sure. Every player sees basically a matching from, the first player sees a matching from this to this end he just doesn't know what the matching is between these two layers. >> Yuval Peres: Okay. Let’s thank Anup again. [applause]