>> David Wilson: So I'm pleased to introduce our next speaker, Ioana Dumitriu, from the University of Washington where she's an associate professor in the math department. She got her Ph.D. at MIT in 2003 with Alan Edelman, who we heard about in the preceding talk. After that she was a Miller Fellow at U.C. Berkeley until 2006, and then came to UW. She's worked in various fields. Probably perhaps most recently but also numerical analysis and combinatorics and linear algebra and she's going to tell us about a regular stochastic clock model. >> Ioana Dumitriu: Thank you for the very kind introduction. I'm sorry, but I'm going to have to correct you. It's not the same Alan Edelman. It's actually not Alan Edelman. It's somebody else. Edelman Greene, is that what you were -- >> David Wilson: Oh, really. >> Ioana Dumitriu: It's a different Edelman. Seems to be a popular name. All right. So it is my great pleasure to be talking about an ongoing problem that we're working on at UW. This is joint work with my students Gerandy Brito, Shirshendu Ganguly, with my colleague Chris Hoffman and for part of this we were joined by Linh Tran who was a post-doc at UW up until this year. So we're talking about a regular stochastic block model. Whoops. I guess this one. Or not. All right. So I'm going to give you a quick overview of what I'm going to tell them, what I'm going to tell you and then I'm going to tell you and then I'm going to tell you what I told you. I'll give you a quick introduction to the field. Actually it's not so quick. It's probably going to be somewhere between a third and a half really of the talk. About independent edge binary SBM which actually be partly described in the introduction as well and then I will tell you how you change that definition to make it regular. And I will talk about the issues that arise, why certain things become easier. Why certain things are slightly harder, but overall the problem is much more tractable it seems. And I'll tell you about current and future work. All right. So let me, without further ado, talk about the clustering problem. This is a very well-studied problem. Some would say that it has been studied to death, but it's still not solved completely. So it's still worth talking about. In fact, it's interesting to people. The idea is to input a network with clusters with some sort of properties, possibly overlapping, although I won't be talking about this today. And the idea is to be able through some algorithm to detect or recover these clusters accurately and efficiently. I'll talk about what it means to detect and recover in a moment. But it has applications in many places. Machine learning, community detection. Synchronization, channel transmission and so on. And there are many questions that are still open and quite subtle, especially in the case when overlap is possible which again is not going to be the case I'll be talking about today. There's a huge body of work as I mentioned. And it overlaps many subjects, many fields, optimization. EE, theoretical computer science and math. There are two approaches that one can take to studying this problem. One of them is to study actual networks and try and see if your algorithm actually can detect the clusters. This is Zachary's Network. The network that described the interactions between members of martial arts club that suffered at some point a division and some people went with the old instructor and some people moved to the new, and essentially this picture describes how the cessation took place. This is a group that went with or, rather, stayed with the instructor and this is the group that seceded. This is a network that's been really studied to death. It's used as a benchmark for pretty much any algorithm. Look, my algorithm can actually cluster Zachary's Network correctly and so on. The other approach is to focus on studying idealized models of networks. Like, for example, once produced by Erdos-Rényi and this is known as the problem of studying spherical cows. That's what we're going to be doing. So here's the stochastic block model or SBM for short. Also known as the planted partition model, if you've seen that in the literature. Essentially it says the following thing. It's very easy to describe. You have to consider K Erdos-Rényi graph with size I and probability PI independent and nonoverlapping. And aside from that you can put on the join of the graphs a multi-partite and by multi-partite I mean basically any two vertices in nonadjacent -- sorry, in not in the same cluster get joined by an edge with probability Q. So you have some sort of cluster. Some sort of K clusters and then on the outside of them you put a multi-partite Erdos-Rényi with some given probability. The question then becomes under what sort of conditions on all of these parameters -- this should have been a small case k, sorry, can one recover detect the presence of the partition. Recovery here is understood as being generally it's understood as being complete recovery, although sometimes weak recovery is considered, where you expect to recover all but sublinearly many of the nodes. Detection means that you're able to say if you believe that there's some structure like this at work or if the network is just random. Okay. So the possibility of recovery has been studied generally, the maximum likelihood estimator and convex relaxations thereof. Recently, though, there's a new approach which is described by using multiple structure -- multiple structure matrix algorithms. So you think of the adjacency matrix as being sparse plus low rank sparse. Being the outside plus. And the low rank actually identifying the clusters. So being essentially a matrix that corresponds to, instead of clusters, just taking clicks, the same nodes. So this is one of the references. Vinayak Oymak and Hassibi. There are more general, the most general analysis. So via impossibility, with impossibility with information theoretic bounds plus a convex relaxation with the maximal likelihood estimator is fairly new. It's actually very new. It appears in a paper by Chen and Xu from 2014 and it gives various order sharp bounds, which means they don't get any constants but the order of the thresholds is found. For illustrations. The only case that has so far really been solved in terms of various thresholds is the two equal cluster binary case and one can say that most of the work has been done by an alum now's of the Microsoft group and two students, two ex-students. I'm referring to Erin Mossel and Joe Neeman and Allen Sly. So here's a dictionary of terms. I will talk about a strong recovery regime when it is possible there are algorithms that will give you the partition completely. Weak recovery regime means you don't get it exactly but you get essentially up to sublinear many nodes labeled correctly and others may be mislabeled. There's an approximation regime where you get a fraction of the nodes correctly, the fraction is bigger than 50 percent but the rest may be mislabeled. There's a detection regime where you can guarantee to get just about 50 percent of the nodes correctly labeled. But you can't quantify that. You can say how much better you can do. And then there's the impossibility regime where it is impossible to do better than guessing essentially. And generally it's because of indistinguishability, I'm not even sure if that's a word, indistinguishability reasons, where your model is essentially indistinguishable from that of an Erdos-Rényi with the according adjusted probability. So the nomenclature in the field is varied. And it seems that everybody uses their own preferred words. I'm using this, which is kind of a combination between Mossel Neeman Sly and Abbe, I'm not sure how to pronounce his name, Bandeira. I should know how to pronounce it because I know him. So it's embarrassing. This is the nomenclature. I hope you'll be able to remember it. This is the definition for binary stochastic model. You start with two N nodes and you pick N of them to be labeled one and the others minus one uniformly and independently. Then add an edge between vertices with the same probability of P which in fact it's going to be a function of N. And then outside of that, so you add labels, sorry, you add edges between vertices with different labels with probability Q. And you can call the resulting graph model G to NPQ. I guess I'm going to have to use both of these somehow. Okay. So let me talk about briefly about the strong recovery regime. You will need to have P and Q at least logarithmic in N because otherwise you'll get essentially with depending on the constant with constant probability you'll get isolated vertices, and those you cannot classify. There's a bunch of people who have worked on this problem and have made seminal contributions and it was solved almost completely by Abbe Bandeira and Hall in 2014 and completely by Mossel Neeman Sly a few months later. In other words, if you have this regime plus some tiny conditions I guess which I'm going to show you in a moment you can do complete recovery. So this is the state of the art. And when you see MMS, it's going to be short for Mossel Neeman Sly, you'll see that a lot in this talk. Which is why I just felt like I could shorthand it. So the state of the art is a rather complex characterization, but certain cases can be more explicit. For example, if you think of QPN as being roughly constant times log N, although the constant can fluctuate a little bit, then strong recovery is possible even and only if this happens. So in other words, if this number here or if the sequence of numbers here is strictly positive, at all times then you have strong recovery and it's necessary for strong recovery to be at least nonnegative. This matches the Abbe Bandeira and Hall results but they weren't thinking in terms of letting the constants fluctuate. And therefore this is what they got. They didn't get this. >> When you say "strong recovery is possible" did that include a statement about computational complexity? >> Ioana Dumitriu: It's a little vague. Abbe Bandeira and Hall were showing this is possible for slightly different conditions. It seems that the algorithm that MSN has come up with is almost linear in the number of edges. So that would suggest yes. >> But I'm -- is that what you're focusing on or just -- >> Ioana Dumitriu: Just the existence, possibility of strong recovery. But it seems like that's the case. I think that they will probably do a complete analysis. In fact I was going to say something in a little while. But it seems like that might be the case. So there's a weak recovery regime in which you can recover up to sub linearly many of the vertices, and this is again MNS 2014. And it's a very nice result, essentially says that you have to have probabilities slightly better than 1 over N, you want them to go to infinity NPN and NQN have to go to infinity and you have to have this extra condition attached. So not sub logarithmic. You can have them sub logarithmic and you cannot because of that you cannot hope for full strong recovery. And it seems that the reason why you do not get or in a sense what you cannot get is caused by the fact that some vertices may be mislabeled in the sense they might actually be label one but have more connections to the wrong set, to the other set. >> Minus slide on the second ->> I'm sorry. This should be a plus sign. This should be an identity. So this is a plus. Apology. Okay. Then there's an approximation regime which actually was first studied by co-gel kin, I hope I'm pronouncing it correctly, that showed if you consider P and Q to be A over N and B over N and you have this extra condition for some large constant, then you can detect, bear with me I'll talk about detectability next. You can detect the presence of the partition but the fraction of recovered vertices is bounded. You cannot do better than a constant fraction of the vertices which extraction depends on C and then some M and N from 2003 algorithm, but there are others that you'll talk about soon use propagation to show that this interaction is actually achievable. Yielding thus an approximation regime in which you can get a certain fraction of the vertices correctly. When C approaches two, the constant approach is a half, which is why nobody expected that any kind of approximation could be done in the lower range. Finally detection impossibility. Again, Mossel Neeman Sly have been all over this problem. So it turns out that they had a challenger, Massoulié at the same time independently obtained a different proof for this. But they showed when you're in this regime of P and Q being 1 over N, specifically A over N and B over N then there exists polynomial time algorithms to find a correlated partition if and only if this identity is true. This is a threshold that's been conjectured by Krzakala Moore and Zdebrova. And has been reinforced several times since then. In particular in 12 when Mossel Neeman Sly showed that if you're under that threshold then the graph is indistinguishable with a graph that has two N vertices and average, the average of the two, there should be an N there, I apologize. No, there shouldn't be an N, so because -- sorry. It's correct. And as reconstruction it's impossible. If you cannot distinguish, you cannot detect. Okay. So complexity as mentioned it's not -- it's not written in stone, but it seems that there's polynomial time strong or weak construction belief propagation is also efficient detectability was shown to be polynomial time by both groups who showed detectability. So the bottom line is that there seems to be no regime reconstructionist impossible but not in polynomial time. That's a very interesting thing because it runs contrary to the widespread belief that there are hard regimes if you have more than two clusters. Perhaps clusters that are growing or not linearly leaning. And it has a connection to the minimized bisection problem which again known to be hard. So it's kind of an interesting ->> I'm sorry, but I got lost. If there's this graph that this randomly generated and somebody wants to answer all these questions, what is the information that that person has access to. >> Ioana Dumitriu: Adjacent symmetrics. Thank you. Sorry. You have access to the graph you have to test, whether it's a large graph, you have to test whether the model it's been generated from is the one described here. Okay. So let me introduce this notion of a regular stochastic log model. So basically we do the same thing, except that now instead of taking Erdos-Rényi graphs, we take uniformly regular graphs. So for integers 1 and 2, you take two D1 regularly, uniformly regular random graphs of size N and you connect them bipartite NN D2 also uniformly random. Everything is independent of the other things that you're doing. And of course the question is why, why would you do something like that and of course the answer can be many-fold. The structure in such a model is a lot more rigid. Can you say a lot more than you did before, can you do recovery in other kinds of cases, can do recovery in the so-called lower regimes when you have P and Q smaller. There's also edge dependence, how does that affect things because before edge dependence was playing a pretty heavy role in the calculations, do you use the same methods or you come up with others and of course lastly because it's there, we're in the math department. Okay. So the first thing to note if you remember, I mentioned that there's an impossibility regime for the Erdos-Rényi stochastic model, and that is that if the relationship between the two probabilities is of a certain nature, then you cannot distinguish essentially between your model and just a bigger Erdos-Rényi model with the average of the probabilities. Here you can always distinguish between this model and just a random uniformly regular D1 plus D2 graph, your model is also of that type. So if you look at the fact that you have two D1 regular joined by a D2 regular that means that graph is actually D1 plus D2 regular. However it's a very different distribution than that. So you can tell basically. And the reason is that if you count the number of graphs that you have in both set it's exponentially smaller. So this is not -- this is where things diverge for the first time. Of course unfortunately indistinguishability has no computational value. What we'd like really is to prove uniqueness of the partition. In other words, if you generate your graph like this, you would like to know that there's basically no chance that it can also be expressed in the same way with a different partitioning of the two classes of vertices. Then you can hope for recovery. If it turns out that you don't have uniqueness, no chance of recovery. Okay. So that's what one would like. And of course we've made progress towards this kind of very interesting progress, because we can show uniqueness when D2 is less than D1 but for huge sizes, for huge sizes of D2. The interesting stuff happens when D2 is small. So that's why this is very partial progress. But we're not going to stop here. The idea is roughly to improve our results on this lemma, which is the overlap lemma, to which I'm going to refer again. Which says that if the partition exists, if the second partition exists, if you want, in some case, then the smaller swap set must be large. So if you have two such partitions in your graph, then the set of overlaps must be really large. So in other words you can just swap a few vertices and hope you get a second partition. We're working on improving this from upping the fraction from one over two D2 to one-half, if you can show that you need to swap at least one-half or one-half is a barrier, because it's the smaller of the two sets. It's very easy to explain why this happens. So suppose that this is your left, let's say, D1 regular graph. This is your right D regular graph. And suppose that a second partition is possible, which essentially swaps the set of vertices in B to the set of vertices in C. Then it turns out that at least a trivial actually observation that if you pick a vertex V and B, the number of connections that it has to vertices in A has to be the same with the number of connections to vertices in D. Because you're about to swap things. And this has to be also the new swapped thing has to be still D1 regular. The problem is that D1 and D2 are different and we know that V cannot have more than D2 connection. In fact, V has to have precisely the two connections to see union D. So it cannot have more than D2 connections to D. However, if this set is small, the chance of getting a vertex whose all D1 connections are to A rather than to B, is significant. And at that point you would have to have the same degree to D but D1 is bigger than D2 so you can't do it. So this is essentially a good heuristic for the argument of why you have to have uniqueness of the partition in such a case. Okay. So I think ill just explained this. Then we have -- and this is I think the only proof that I'll show, I'm going to zoom through the rest of it. There's an easy spectral regime. The fact that you are working with a D regular graph allows you access to a whole host of very interesting simple linear algebra properties. So as a consequence of these properties we have this theorem which says that if the difference between the two degrees is bigger than this quantity here, then the second largest eigenvalue of the adjacency matrix is D1 minus D2, the first eigenvalue is D1 plus D2 regular graph. But the important thing is the second largest eigenvalue of D1 minus D2 with multiplicity 1, with eigenvalue corresponding to the correct partition. If you have adjacency matrix, if you're in this regime, basically you find the second eigenvector and you're good to go. That's going to give you the partition. And also the consequences that the partition is recoverable. This is due to the fact that the multiplicity here is 1. If you were to have two distinct partitions, both of those would have to be eigenvector or would have corresponding eigenvectors with eigenvalue D1 minus D2. However, the multiplicity of this eigenvalue is D1. The partition is unique and recoverable and it solves the mean bisection of this problem. The proof is simple. It's linear algebra. These two are facts. You can check them. Then you can split the adjacency -sorry? I'm thinking if people are falling asleep in the audience they could spend some time actually checking -- split the adjacency matrix like this. So now you have two D1 regular graphs and this is the bipartite D2 part. These are adjacency matrix of random regular graph and this is the random adjacency matrix of the bipartite regular graph. And you just do spectral analysis on them essentially. So you calculate this unbounded from above, actually I should have probably put an absolute value here but it's a symmetric matrix. I should have put an absolute value. This is what happens. So if you look at a vector that's orthogonal to the first eigenvector and the second eigenvector, then the value that this quadratic intake is at most this. And this is strictly smaller than D1 minus D2 by the condition. How can you show that? Essentially split the matrix like I said and then you make a gross overestimate. Essentially split it into these two parts. And you use the fact that at least for the -- this should have been an A-1. Apologies. This should have been been an A-1. Remember that A-1 is just the part, is the part corresponding just to the two D1 regular graphs. And on each one of them you have due to Friedman another nice result on the second eigenvalue, you bound that and you get the overall bound is two squared D1 and the same thing holds true for the bipartite graph. It's been shown that a similar bound on the second eigenvalue is true by Puta in 2013. And you put it together and you get this. And it follows that the second eigenvalue is D1 minus D2 with multiplicity 1, et cetera, et cetera. So this is an easy regime. Very spectral. That's not the only case an issue you can get strong recovery in this model. That's just an easy regime. And actually we've done that because it's easy to explain. But it turns out that essentially using the methods of Massoulié, adapting I should say the methods of Massoulié we can define this theorem. If the difference of the degrees is bigger than D1 plus D2 then the partition is strongly recoverable in polynomial time. Now mind you Massoulié had detectability. This is much stronger recoverability. Essentially the methods are adapted to work for the case where you have a different kind or almost independence of edges because D1 and D2 are fixed and you can use the configuration model for the uniformly random regular graph. But it's not exactly trivial work. However, you get a lot more. So Massoulié used the matrix of self-avoiding long walks, in contrast with the M and S strategy of using the matrix of nonbacktracking long walks. The difference is in the second case the entries of the matrix is bigger and the method is not spectral, whereas the method Massoulié spectral and we used that because it was more accessible to us. And this is all I'm going to tell you about how to prove it. I had several lemmas actually that I was hoping to show that I'm obviously not going to have time for. The idea is to do a local analysis. You have to show that no cycles are closed. And of course this is known for the deregular graph. It takes a little bit more work to show it in this, in the context of this model where you put regular graphs together. Then -- and this holds. So cycles are not as close as C log N. So cycles are far apart. And then there's, of course, the connection between path structures and labels of the neighborhood, which in our case, given that we have a regular graph, is very simple to establish because we know what the neighborhoods look like. We know exactly what the neighborhoods look like. Whereas in the case of Erdos-Rényi, it's still an element of randomness, where here it's an exact count. And finally the true important ingredient is to do the following thing. You have to show spectrum separation of the first two eigenvalues of this matrix, the matrix of self-avoiding walks of length L from the rest of the spectrum. So the first eigenvalue, and maybe I'm going to show that lemma. So if you show spectrum separation of the top two eigenvalues then essentially you will get a partition that is correlated with the correct, the original partition, except that in our case it's not just correlation. In our case is allows you to recover the original partition, up to N minus little o of N and correct the measures by majority rule. All of them. And then you get complete recovery. This is pretty standard technique. Okay. I'm going to just show you this. So this is the core of the argument, really. So we showed that the graph is tangle free with high probability. And if that's true, then the following estimates are true for these two quantities. So E to N remember is the vector of all 1s and sigma is the vector that gives you the signs. If you look at what happens here, you see that these two, if you scale them down by square root of N they will become unit vectors, both of them. So this N here and here and this N minus big O of N to delta where the delta is small over here will essentially disappear and what you're left with is telling you that this is essentially going to go to the first eigenvalue of SL, which is going to be D1 plus D2 to the L. And this is almost an eigenvector. And similarly this will go to what will turn out to be the second eigenvalue of this which is D1 minus D2 to the L and this is close to an eigenvector. You can actually show that this is polynomially close to an eigenvector. Provided that you can show good separation from the rest of the spectrum. So if you can show that for any other vector that's orthogonal to these two guys then the same estimate, it's much, much smaller than you're done. And it turns out that that's true. It turns out that if you look at unit vectors that are orthogonal to those two, then the estimate is much, much smaller. Notice here that you have a D1 plus D2 to L over 2. You essentially want that to be smaller than D1 minus D2 to the L. And that's what gives you the condition. Finally, then there's the question of, okay, so we know that the second eigenvector is going to be giving you the partition up to sub linearly many vertices which are going to be incorrectly labeled, but you could have two partitions for which both -- for both of which have the property that the second eigenvector does that. Well, no. Because in that case, the two partitions overlap very much. More than if shown -- sorry overlap -- the swap is very small and we've shown that the swap has to be relatively big. This was the last thing. And so this is just the recap. We showed that strong recovery is possible in polynomial time and we believe that recovery is always possible because uniqueness, that's what uniqueness tells you that recovery is always possible. This is a rigid model. So if you have the graph, you could in principle just test all the vertices. Of course, that's not efficient. But you could test all possible pairings and see if it works. So that's very different from before. And the question is: Is there a threshold for the complexity? So we show that recovery is possible in polynomial time in a certain regime. We believe that and we believe we'll actually be able to show that recovery is always possible. And because -- so this threshold is given by the method. We have no idea if it can be pushed down just yet, but we're working on it. But it's also possible that this is actually an efficiency threshold. It's perhaps lower than this you cannot get polynomial time algorithms. And of course then the idea is to generalize to multiple clusters and I'm going to stop here. Thank you. [applause]. >> David Wilson: Questions? >> I have a question but it's not mathematical. I understand these results are asymptotic so in some sense they should call for large networks. So I'm thinking about something like Facebook. But in Facebook, when I try to think about communities, I would guess that there's actually a very large number of communities, not just two or three or small finite numbers. >> Ioana Dumitriu: And there's overlap as well. >> So basically what are good, practical multiplying examples of a large network which on this big scale or either are a small number of communities? >> Ioana Dumitriu: That's why I talked about spherical cows. >> What? >> Ioana Dumitriu: That's why I talked about spherical cows. Yes, in general you're completely right. So the idea is to get results that are asymptotically true for many clusters for overlapping clusters, for clusters of different sizes. And so on. And there's a whole body of literature on that. No thresholds. So thresholds that are perhaps at best order thresholds. So we started off with this example because there's hope here that one can analyze it completely. Higher than that, probably not. However, I am actually working with Marian Frizelle and with a couple of her students on a problem just like that. So we're actually making some interesting progress. Generally the algorithms that will produce the clustering will be -- so some sort of convex relaxation of MLE. And it turns out that in certain regimes they perform well and the question is what are these regimes. And generally the conditions that you get involve all the parameters in the problems. So therefore under so you have to start saying okay if the clusters are equal, what does that mean in terms of the probabilities associated to each cluster, if the clusters are very separated, how can I play the probabilities to get an impossibility regime and things like that. >> Just one relevant example of course the Rath model is a rough provision but clustering it in two is something you want to do all the time. For instance, just differentiate legitimate websites from spam websites. That would be two clusters. And there's not links between the legitimate websites and each other and lots of artificial links created between the spam ware site and two links that go between these groups. But those may be different nature. And this can -graph structure is the basis for using -- one of the tools I should say for this kind of distinction, though. The real picture is very different from this beautiful models. >> So that's nice. So actually there are some real examples. >> There are lots of real examples and we want to distinguish between the two. So that part is real. These random graph models are ->> In that case you don't really have two plus, because the spam may consist of many different spam clusters. >> Right. >> Ioana Dumitriu: Or not clustered points even, which is ->> The thing you want to make a binary. >> You want to make a binary proficient. >> Ioana Dumitriu: Not necessarily with equal weight or anything. >> What do the counterexamples of D2 equal 2 look like, so I get some idea when it -- >> Ioana Dumitriu: You can find counterexamples, for example, D2 is equal to 2, you don't have connectivity. So you can have long cycles of the same length and you can essentially swap those. You can construct those. >> But you're saying you think it's a theorem for any deregular graph -- >> Ioana Dumitriu: Yes, yes, so there's a theorem that says the following thing. With D greater than or equal to 3, then with probability 1, AS, the graph is connected. >> No, but what you strongly suspect is true. You suspect it to be theorem, but deregular graph. >> Ioana Dumitriu: You can find examples for D2 equals 2 where the probability of encountering a graph that has two possible partitions is not zero. >> But it's not asking about the probability. >> Ioana Dumitriu: I'm sorry, I guess I don't understand. >> He's asking about the uniqueness of partitions for -- >> Ioana Dumitriu: Oh, what's the question? >> So I find you saying that you suspect that it's deterministically true that a D regular graph-- >> Ioana Dumitriu: Not deterministically true. I think it's true with high probability. For example, if you think of a square, okay, so you have N -- N is even. So you just put N over 2, N over 2, N over 2. And then you just put there you put I guess D over 2 or something like that. Regular examples and then between them bipartite D over 2. Bipartite D over two in this case D1 is equal to D2 but you can imagine other cases it works. >> The idea is these would occur with multiple -- >> Ioana Dumitriu: Yes. No, there's no reason to believe this is deterministic. I think it's with high probability, though. With vanishing probability do you expect to encounter a second partition in a graph that has one. Sorry I didn't understand. >> David Wilson: Any other last questions? Let's thank Ioana again. [applause]