>> Larry Zitnick: Okay. It's my pleasure to introduce Dhruv. He's currently a student at CMU, fifth-year student of Tsuhan Chen who moved to Cornell. So Dhruv is actually at Cornell right now finishing up his Ph.D. He has done a lot of great work. I think most of it has been in image labeling. Some in segmentation and also in activity recognition. Today he'll be talking about his recent work on MRF inference, which he presented a part at NIPS and part is a submission to CPVR this year. >> Dhruv Batra: Thanks, Larry. Thank you for coming. First of all, I must admit I feel a little bit like Morgan Freeman in The Bucket List. It's one of my things, give a talk at MSR using a Mac - check. So having done that, here's what I'm going to be talking about today. This is the work on outer-planar decomposition. This is work on MRF inference that I'll be mostly focusing. This is joint work with Andrew Gallagher, who is at Kodak now; Devi Parikh who is at TTSC and our advisor, Tsuhan Chen. So all of us have been at CMU at some point or another. Time remaining, I'll also talk about some applications on interactive co-segmentation which has been some recent work and application to 3-D reconstruction, which I think will be pretty relevant here, and hopefully -- but the focus of the talk is going to be the first one. So let me begin. Let me try and convince you that a number of problems in vision can be formulated as discrete labeling problems. The classical two-class segmentation problem in which you're either unsupervised or via some sort of semi-supervised information, you have some scribbles. You're trying to label every single pixel as foreground or background. Those are the two classes. You're labeling sites and pixels, labels are foreground and background. This could be, of course, multi-class segmentation problem, semantic segmentation in MSRC data typeset where for every single pixel it's not foreground and background now, you're trying to label it with one particular class that you see in this dataset. This could also be a geometric labeling problem. This was work done by Gary Holem and also [indiscernible] Saxena at Stanford where, for every single pixel, you're not trying to label -- you're trying to label a geometric class. A rough geometric class, saying it's a round plane, vertical surface, sky, vertical surface facing left or right. So rough geometric information. But once again it's a discrete labeling problem. There's also been work in name face association which can be thought of as a labeling problem. So your labeling sites are these faces that you found in images and your labels are names that you have extracted through captions associated with those images. There's also been work in our group by Andrew Gallagher, who has tried to do this work on -- you have an image, you have certain labels, which are image level labels, and you're trying to propagate them to face level labels. A priori, this transfer is ambiguous, of course. You have all possible answers. But in his work he tried to look at age classifiers and looking at U.S. Social Security data to find forced names priors. So he found Mildred was really a popular name in 1940s and least popular later on. And if you have some sort of age classifier running on the face set you can do a better job trying to assign names to faces. But again at the heart of it, discrete labeling problem. Classical vision. This hardly needs any explanation here. Stereo is the disparity labeling problem. In optical flow you have instead of one-dimensional, you have a two-dimensional labeling problem, two-dimensional discrete motion flow labeling problem. Denoising, which at the surface doesn't seem like a labeling problem, you're trying to assign every single pixel with one of the labels 0 to 255. That's your label space. So all these problems have of course been well studied under the framework of Markov random fields, MRF. Hardly needs an explanation here, but just so I can get my notation right we're going to be working with a set of discrete random variables. There is a pairwise MRF or energy function that we're going to describe on this MRF, and we're going to be interested in MAP inference. Given a discrete energy function which is composed of node energies and edge energies, I'm going to want to minimize the energy function and find the best labeling under this energy function. So given that's the problem that we're interested in, it's well known that this problem is in its full generality NP-hard, faced with NP-hard problem, you have two choices. Solve a sub class. Solve a sub class exactly, or come up with an approximate algorithm that works on the entire class. Or there is a third option that you saw the P equal N problem but we'll avoid that to the next lecture. So given that if you're trying to solve, if you're trying to come up with an exact algorithm for a certain sub class, there has been work done which foreshowed, the earlier work if your graph is a tree, we can solve it exactly. In vision, of course, there's work done by [indiscernible]. If your energy functions are sub modular, then we can solve this problem exactly irrespective of what the graph structure is like. More recently there's also been more work done NIPS last year if your graph structure is outer-planar solve this outer-planar graphs and what this solution is in a second. Coming to approximate algorithms, the first step, of course, was let's take BP applied to a problem which has loops. And that's, of course, the naive loopy belief propagation. There's been work on 3-D weighted message passing by Martin Wainwright, Kolmogorov, Komodakis recently, and this is where our work in outer-planar decomposition or OPD will fit in. It's going to be an approximate output algorithm that will work for general case. And I'll point the connections between outer-planar graphs and outer-planar decomposition in just a second. Interestingly, in our paper, we point out that a lot of these approximate inference algorithms can be thought of as decomposition method. So you give me a general graph. You give me a problem on a graph, and what I'm going to try and do is I'm going to try and break it down into trackable sub components I can solve. This is a trivial visualization, you could have solved the problem on the phone or network anyway. I'll try and break it down to tractable sub graphs which might be trees, or chains, that I can solve exactly that I'm going to try and merge these solutions together to get a global solution. You can think of it as breaking down energy function into a sum of energy function, solving each one of them first. So how did these previous works fit into this framework? If you think about BP, you take a graph and you're trying to find -- you're trying to propogate these messages along norms. So your local problems are nodes in the neighborhoods, and you're propagating messages. You're computing the solution exactly on this neighborhood, and you're passing messages across these neighborhoods. 3-D weighted message passing takes a particular graph, breaks it down into trees, into spanning trees, found in this graph. Solves each one of these problems on these trees exactly because we have methods that do that, and then use a message passing algorithm to combine solutions from these trees. And in this way, we will note a natural progression of neighborhoods that are increasing. There's going to be BP, which is solving the star-like problems exactly. There's going to be TRW, which is solving tree-based problems exactly. And there's going to be outer-planar decomposition which will solve problems on a larger neighborhood called outer-planar graphs, which are strictly larger than trees. So now let me come to what are outer-planar graphs and why we're interested in them. Outer-planarity is a notion from classical graph theory. A graph is outer-planar, first of all, if it allows planar embedding. I can draw it in a plane. So that was not a planar embedding, this is a planar embedding I've drawn a graph in a plane. In addition, all nodes must be accessible from the outside. They must lie on an external unbounded surface. So this is not an outer-planar graph, because I can't access this node, too, from the outside, whatever outside, like an unbounded surface around you without any edge crossing. And an alternate definition is sometimes more useful to think about this, which is that I should be able to add an extra node to the graph connected to all other nodes and the results should be planar and they're equivalent. Like when you can do this, the graph is outer-planar and there's an if and only if condition. So the definitions kind of defined in a topological sense, but one of the examples, what do these graphs look like. Well, first of all, all trees are outer-planar. You can draw trees on planes and they're all accessible from the outside. But it contains much more. It contains things that are not trees that have loops in them. I took this one graph. I just dropped an edge. And that's outer-planar. If you're trying to visualize how is this outer-planar, all you have to do is take the node, plop it on the other side. It's drawn on a plane and suddenly everything is accessible from the outside. So in your mind, if you're trying to think of outer planar graphs I visualize sort of a polygon that has all its edges on the inside without crossing and suddenly everything is accessible from the outside. So that's a good way of thinking about these graphs. Why do we care? Because we can do exact inference on outer planar graphs. So before we come to -- let me try and just explain to you how this exact inference algorithm works. This was work done by Schraudolph, et al., at NIPS, and they said we're going to take an outer-planar graph, we're going to add an extra node. We're going to have this construction that ads an extra node, connects it to everything else. And there's a way to go from energies to weights on this graph, with certain special properties. The property is that every single cart here is going to respond to a labeling of your nodes. So every time you're in the same segment as this source node you're zero, every time you're not you're one, you're labeled one. The cost of a cart is going to correspond to the cost of the labeling. If you've seen these sort of arguments before, which I'm sure a lot of you have, it reminds you of the Kolmogorv deconstruction. You're right, this is similar to that. There are two key differences. We're constructing these guys are constructing an undirected graph instead of a directed graph. So you only have to add one node. And you're searching for a global min cut not an estimate min cut. More importantly the difference is you're not appealing to sub modularity. There's no constraints on the parameters. Your energy functions are arbitrary. Also, if you're not appealing to sub modularity your edge weights can be negative. STM cut cannot -- min cut can't be solved with max flow techniques. In fact, this is solved with a more common -- with a common algorithm which appeals in perfect matching. I won't go into how these problems are solved I'll be happy to talk to you after the talk if you want to know about the details. But there is a restriction. I don't want you to think that any problem can be solved with this construction. The restriction is the graph on the right must be planar. That's when we can solve this problem. And that restriction means if I make my original problem a little denser, took the four node, made it completely connected, what I get on the right is the K-5. The 5 node graph which is fully connected. That's not planar. I can't solve this problem that's why this planar cut sort of algorithm implies an outer planar constraint. That's where outer planarity comes from. Because we can only work with planar graphs. Given that we can solve energy minimization problems on outer-planar exactly, are we done? Like shouldn't vision be solved? Well, it's not, because even though outer-planarity is larger than trees. It's still a restricted class. In vision problems we're typically dealing with graphs that looks like this that are not outer-planar. Grid graph, for example, as soon as it's bigger than two-by-two is not outer-planar. You'll have these landlocked nodes that you can't access. So you can't solve this problem exactly. If you form your graphs on super pixels which a lot of time we do break it down to regions make every region a node. You'll again have these graphs that are not outer-planar. So what do we do? We are going to propose an algorithm. We're going to leverage this new class which is amenable to exact inference and propose an approximate inference algorithm called outer-planar decomposition. Let me quickly go over what the algorithm entails. It's going to follow the same trends as I mentioned earlier about decomposition methods. We're going to take a nonouter-planar graph. This is the smallest nonouter-planar graph we can find. So I'm just using this as an example. And I'm going to represent it as a collection of outer-planar graphs that cover this graph. Now, this is, of course, an overcomplete representation. I just needed two of these to cover the edge set of my original graph. I'm going to discuss how many of these sub graphs you need later. But think about this, all of these graphs on the right are outer-planar. Some of them have not been drawn as outer-planar graphs. For example, there are edge crossings happening. But that's only to visualize the correspondence about which edge has been dropped. So that's an -- these are all outer-planar graphs. The reason why I made this decomposition was that so I could pull up that extra node back and get planar graphs and do min cut in each one of them. So I've solved the problems exactly on each one of these sub graphs. And you should think of it again as a decomposition problem where I had a problem that I couldn't solve. I decomposed it into these sort of overlapping sub problems and I solved each one of them exactly. They're not going to agree with the labelings to give the nodes, obviously. If they agreed we'd be done, we would have solved the problem. And that's why we're going to develop a message passing algorithm that sort of works on top of these decompositions. The message passing algorithm that we use, we actually generalize -- we actually present not one but we present four message passing algorithms, that generalizes popular message passing algorithm used in literature. For example, the first one is OPD NP which is a generalized version of belief propagation. So it takes the message passing that belief propagation does and lifts it to outer-planar graphs. In fact, for particular choices of outer-planar graphs you will get belief propagation back. It will reduce the belief propagation. So it contains belief propagation as a special case. Similarly, we have an algorithm called OPD duel decomposition, which was, duel decomposition paper was introduced to Komadakis, et al, and the same property holds. We're sort of lifting the message passing scheme to outer-planar graphs. If you were to choose outer-planar graphs as trees, you would get their work back. So all the guarantees -- it would reduce [indiscernible] 2D. Similarly, we can do this for other message processing schemes but I'll focus on the first two and that's going to be the kind of experiments that we're going to see. So let me quickly go over the message passing scheme. There's going to be ->>: Is this going to be linear rather than -- it's not ST min cut, but it's global -what prevents you from leaving all the most to zero? Because the distribution solutions? >> Dhruv Batra: You can do that. That solution would give you an energy of zero. But you're trying to minimize energy. So there might be -- so cutting an edge might give you a negative. So there might be negative weights. So you would -- that's precisely because we're not restricted to positive edge weights. So let me just quickly go over what this message passing algorithm will look like. We're sort of lifting belief propagation to outer-planar graphs. What we're going to do is we have these decompositions. I'm going to introduce these agreement variables that are sort of forcing these decompositions disagree in the labelings that they're assigned to these nodes. I'm trying to force them to agree. So I'm sort of introducing these agreement variables. And there are going to be messages that these decompositions send to the agreement variables, and the agreement variables send back. So like I mentioned there are going to be two types of messages. There's going to be a message from decomposition to these agreement papers and there's going to be a message from the agreement variables to the decomposition. And if you've seen belief propagation type things before these messages are going to look intuitive. If you haven't, I'll try to walk you through the intuition. The intuition is one of the messages is really simple. What does the agreement send to the decomposition? It sums up all messages from everybody else, sends to it a decomposition. Think of the message as a confidence. So it's a confidence that you assign. How confident am I of this particular labeling. So I assign a particular node state zero how confident am I of that. What the agreement variables are doing are just sending you the confidences of everybody else. Everybody else thinks this node should be stage zero; you're probably better off incorporating that into your decision. The message from a decomposition to agreement variables, as I mentioned, is a sort of confidence measure. The equation is there. But let me walk you through the intuition. The intuition is, the decomposition is going to pick a particular node. Picks the first node. Finds the min cost cut that assigns this, finds the best labeling that assigns this node state zero. Then finds the best labeling that assigns this node state one. Writes those two energies down as a vector and then repeats this process for all other nodes. What does this mean? Well, the idea is why is this a measure of confidence? It's a measure of confidence because if you consider a node, if you assign it state zero and assign state one and you can only achieve the same minimum energy, you're not confident. You're sort of ambivalent what labeling should be. But on assigning at state one you can reach significantly lower energy than assigning at state zero, then you're more confident about state one. Then you're better off labeling at state one. >>: These are min margins? >> Dhruv Batra: Yes. This is exactly the concept of min marginals. You're sort of constraining your energy function at a particular variable and constraining it to a particular state and optimizing over everything else. Okay. And then you pass these messages back and forth until you converge. An interesting question to ask is where do these sub graphs come from? I just started with a toy example and showed you some sub graphs that are outer-planar. If you were doing a 3-D decomposition one way to do is it do a minimum spanning tree. In this case it's an interesting question to ask, can I find the densest sub graph that's outer-planar or even better the right objective function here would be I have an energy function over this graph, can I find the outer-planar graph that captures most of this energy, that retains most of this energy? That would be the ideal thing to do. Unfortunately, we can't do that because in general this maximum outer-planar sub graph problem is NP-hard. If you had such an algorithm this would be a special case of that. So that algorithm can't exist. What is easy -- sorry, about the two maximals, one is easy is the maximal outer-planar sub graph problem. The difference between minimum and maximal is maximal says I can't add another edge to it and still be outer-planar. So it's sort of a local optimality and there can be many maximals and the maximum of those maximals is the maximum. Sorry about the tongue twister. So we can check for outer-planarity and that leads to a interesting heuristic, which is we just start with a spanning tree. Keep adding edges until it stays, until it stops being outer-planar. Remove those edges from your graph and repeat. And this is what the process looks like. I have an extremely dense graph. I'm going to start with a spanning tree, keeping adding edges until it is outer-planar and then remove those edges and then repeat the process a couple of times. So I get these outer-planar sub graphs that are contained in the original graph. And you can think of randomized schemes where you decrease the weight so you don't have to be mutually exclusive in edges and things like that. So now that I've talked about that set of a method, another thing to do is here's a decomposition scheme that I'm showing for grids. For grids, I can show you an exact, a deterministic decomposition scheme that takes a grid and converts it into two outer-planar graphs that contain all those edges. So the first one here captures all horizontal edges. The second one captures all vertical edges. So the union is obviously the grid graph. The first one, they have interesting comb-like structures. So they're sort of like two ladder structure happening joined at the top by another ladder. But I want you to notice that you can do exact inference on this component. You can do exact inference on this graph. So if you imagine an image, you have a pixel here. You have a pixel at the top right and they can communicate. Of course, the communication is limited by a [indiscernible] all the information has to flow through a [indiscernible] channel and that of course is necessary because you can't solve the general problem. But you can get the exact answers on both of these components and then merge them together. Interestingly, this is not just a hack. Every planar graph can be decomposed into two outer-planar graphs. That was a theorem proved recently. And there's a linear time algorithm to do this. And so in our heuristic is able to find the two sub graph decompositions surprisingly often. Like in one of my experiments I'm going to show a delinear planar graph and we're able to find these decompositions that are both planar and only two of them. All right. So let me just talk about some experiments. We, first of all, wanted to -- we ran this on a few typical vision problems. For example, the kind I showed before. But we also wanted to control the energy function. So we tried a synthetic energy problem. So the node energies were just sampled from Gaussians. Known energies are state zero and one. Sampled them from Gaussians. The edge energies we just set the diagonal terms to zero and off diagonals sampled from Gaussians of increasing variance. Why did we increase the variance? The one way to think about these increasing variances, your edge terms are becoming stronger and stronger. So interaction potentials are increasing and the problems are getting harder. Let me just show you some results. There's a lot to take in here. So let me walk you through it. So, first of all, the three graphs from left to right, the problems are becoming harder. The sigmas are increasing. And your age interaction terms are becoming stronger. We're trying to do and solve an energy minimization problem. So lower is better. >>: What's the graph structure here? Is it? >> Dhruv Batra: It's K-4. So this is a toy example. >>: Okay. This is a 4. Okay. >> Dhruv Batra: It's a 4. I'm going to show you larger ones, too. So this is a K-4 two labeling problem. Sigma is being changed from left to right. It's an energy minimization problem so lower is better. I'm comparing to TRW, BP, QPVO, which is a generalization of graph cuts, and I'm also plotting the lower bounds. So you have an energy minimization problem and you have a lower bound that you're increasing with titration. So the lower bounds is coming up, energy is going down, when the two meet you know you've solved the problem. And the interesting thing to note is when sigma is small, almost every method can solve this problem really quickly. Really, really quickly. When sigma is large, as sigma gets large, here's TRW lower bound. Here's the energy. And there's a big gap between the two. So the problems are hard. These methods are not going to be able to solve these problems. These are extremely hard nonsub linear problems. And that's where OPD or our method comes in. Here's the lower bound of OPD and here's the energy. So you're sort of not even -- you're cutting into the lower bound, and you're reducing the energy. So you're better off, because now you've shrunk that area where the energy can be. Any questions on this one? Okay. And this was the toy example that I showed before. We also did this on a 30-by-30 grid. Again, the results are similar. The convergence straights are slightly slower, because it's a larger graph. But on the left the problems are really easy and every method converges, and on the right our lower bound is slightly tighter than the TRW lower bound and the energies are lower. And this was again, this was the decomposition scheme followed for these methods. We also did these experiments on the gender face -- I'm sorry? >>: I'm curious why BP did better than TRW. Is there any ->> Dhruv Batra: That was actually very surprising to us, too. It's interesting. We find that BP's been very badly maligned. It's not that bad. It's better than a lot of methods. I don't know honestly. But it's surprisingly performs very well. So here's an application that I showed before. This is not a face, name face name problem, this is a gender face labeling problem. So we have these faces detected on these images. So this was some work that Andy was doing in his thesis. He has these group shots. He can run a face detector. We run a face detector, we find out where the faces are on the image. The goal is to label gender. Here's the facial weak classifiers based on facial features that give you your node potential that say how confident you are whether this face is male or female. In addition, he has a label -- we have a label dataset where we have genders labeled for all the faces. We construct a graph here by a planar delinear triangulation. Find pairwise features that describe relative location and scale of these faces. So it's like saying if you have a person that's slightly taller than the person standing right next to them. You go to the dataset and you do nearest neighborhood and you find that it's more likely that this person is male and this person is female. It's just incorporating those priors into your labeling. Again, I'm showing the minimization of the energy defined over this graph. This is the TRW lower bound. That's the energy, our method sort of solves the problem exactly with the first titration. There's only one line and it's converged. Here, the correctly labeled faces are in solids and the incorrect ones are in dashed lines. Blue following -- I'm sure I will be criticized for this, but following natural convention mailers, blue and female is pink. And here again on a larger graph structure. We also tried this on a multi-class object labeling problem. So this is the semantic segmentation problem. So you have these segments -- an image P segmented. This is one segment, this is another segment, this is another segment and we have nodes that represent those segments and we have a fully connected graph on those nodes. We have local color and appearance, color and texture classifiers that extract features from these segments and say whether this is one of these categories or not. We have pairwise features that describe Corkoran's [phonetic] relative location and scale, and you can learn parameters on them from a given dataset. And describe an energy function. So what's happening is you're sort of incorporating again your priors on these edges. I've seen a tree next to a road. And if I see a road, if I see a building, if I see a sky, it's more likely that the this position should be filled by a tree. And of course here we were wrong. But you got everything else right. And again here are the energies that are achieved by various methods. In this case we see a significant gap between the TRW, lower bound and the TRW energy, and we're sort of better here. We're getting lower energies. And this is a pretty interesting application. Because this naturally encodes some of the repulsive potentials that I was talking about before, where given that you have found a road or -- if this is a road, then your potentials are sort of saying that this other segment cannot be, let's say, water, or things like that. Those are extremely [indiscernible] that are hard to encode with sub modular parameters. And that's where most energy minimization methods fail. And it's interesting to see that we can get some improvement here. >>: Do you know the ground truth, the minimum energy is for these? Is PD getting close to the energy? >> Dhruv Batra: I don't know. We did check accuracies on this application. And OPDs did slightly better. But again there was no parameter learning. So it's not guaranteed to do any better. >>: Seems like doesn't need -- after the first iteration ->> Dhruv Batra: It's basically, this is a small graph. And so the problem's fairly solved -- it would be really interesting if this was on pixels and I'm sure it would be a slower convergence. And we also tried this on the standard Middlebury dataset. We have the optical flow problem. Every single pixel you're trying to label with a two-dimensional motion flow vector and these are images, as I mentioned, on the Middlebury dataset. We follow the same energy functions that are in standard papers by Soleski and Bare, [phonetic] et al. And it's interesting to see that these datasets are fairly well solved by standard methods. TRW, for example, here's the lower bound and here's the energy. And you see that within a few iterations you've solved the problem. OPD the improvement, if any, could be that we decreased the energy sooner. But there's not much to improve in these problems. So I'd say that the take-home messages here were that we took the first step towards structures that are topologically more complex than trees. And that's sort of the most exciting thing to me. We've restricted ourselves to tree-based methods because we could solve problems on trees. I want to argue no, we can solve more things than trees. We just haven't incorporated them. And OPD is really useful for hard nonsub modular problems and I think object labeling the semantic segmentation is a good example of that problem. Anytime you're trying to incorporate priors that say this and this cannot be together and there's a strong repulse potential it might be useful to look at OPD. And interesting in traditional benchmarks might be getting saturated. It might be time to throw them away. I want to mention that we're also thinking about this future work where if you think about what we have right now, you know, we can solve max flow min cut. We can use max flow min cut, that's a graph cut idea. And that sort of forces us to work with sub modular energies. And that's the definition of sub modularity, by the way, and I showed in this work, what I was leveraging on was this planar cut idea if your graphs are planar then we can work with this and that forces us to with outer-planarity and these two constraints are sort of orthogonal one is a constraint on parameters and one constraint on structure of problem. In general, your problems are neither going to be outer-planar nor sub modular. But it's still interesting that we have these black boxes that can solve certain sub problems exactly. So it might be time to think about decompositions, following some of the ideas that I mentioned here that take an arbitrary problem, break it down into the sub modular part of the problem, and the outer-planar part of the problem, solve them exactly with whatever black box algorithm you can come up with and merge those solutions together. And that I think might take us to a better approximation of the kinds of problems that we do want to solve, because sub modularity, although has gotten us really, really far a lot of problems displace our modularity, but I think there are still a lot of interesting vision problems that don't dismays sub modularity, and it might be useful to think about these kinds of issues. I do want to talk about something really quickly. This is some work that I've been doing over the past year on interactive co-segmentation. And some applications to 3-D reconstruction. I'm going to go through this really quickly and just some videos basically. And I'd be happy to talk to you more about this. So this is mostly joint work with students [indiscernible] at Cornell. So let me just quickly go through -- we all know interactive single limit segmentation, sort of beaten that problem to death. But people don't take images like that. Typical image collections look like this. They take multiple images of the same scene, of the same game that they went to. And it's interesting question to ask, well, if I know these images are related, they contain the same thing, can I sort of do better? Can I segment all of them at the same time, can I co-segment and can the user guide me in the co-segmentation process. That's the problem we tackled in this paper. We sort of started with a collection of images. You have scribbles on these images. And you're trying to -- and you're trying to segment not only a single image but all this entire collection at the same time. It follows the same basic principles as a single limit setup. The idea is you scribble on one on multiple images. We fit some appearance priors. We fit some appearance models to these scribbles, and then we set up an energy minimization problem that's solved with graph cuts and all these images. So that's like the basic setup. But the interesting things happen, how do you let a user do this? And let me just quickly show some videos that we -So we actually built this system that allows a user to cut out an object out of multiple images. So this is -- this is the interface that we built. This is an HP Touch Smart and you see images that are all related in the sense they contain the same foreground object. The user says this is what I'm interested in. This is foreground, and marks out -- this is background. And in our system, as you'll see next, not only segments that particular image, but it segments all the images in this collection. So you'll see cutouts out here on the right you'll see cutouts not only from that image but all those images. The interesting problem here is not this part. This part is -- it is fairly straightforward. You can generalize single limit segmentation. The interesting part is how do you ask users to make corrections to 50 images? You give them some results. They don't like the results that you reported. How do you ask them to give you feedback from 50 images or 100 images or extremely large collections? And the naive setup would say, you know, naive setup would say what do I do in a single image I showed them the image asked them to correct. I'm going to show them 50 images ask them to correct. What we said was why not have the segmentation system guide you towards this process. Why not have the segmentation system say here's where I'm confused why don't you give me more scribbles here. And that's what we developed. It's the idea of intelligent scribble guidance. Why not have the segmentation algorithm quantize how certain it is about the segmentation. Mind you, we're not saying that the segmentation knows that it's incorrect here. There's no way of knowing correct or incorrect segmentations. It's just certainty. And if you can quantize your uncertainty, you can save the amount of time it takes for people to get to certain segmentations that are required. So I'm just showing you some examples where the user says that you know this is foreground, this is background. This system might make a mistake. It thinks that that grass region was also foreground. So sign here gets foreground. But it's not too confident about it it shows you a box that says why don't you give me more scribbles here. And once that happens it does a good job of segmenting everything else. Interestingly, once we had this -- sorry about that. Once we had this setup, my co-author, he immediately saw applications to 3-D reconstruction. So his idea was I see an object out in the real world. I go and take a bunch of pictures off that object. All I have to do is scribble on that object. So I just tell you that this is foreground, this is background. I can use this interactive co-segmentation technique to get some sort of silhouettes or segmentation. And then I can use a shape from silhouette algorithm to get a rough volumetric 3-D model of that object and you can do this to objects you can't take back to a controlled setting. You can't take the stone back to a structured lighting setting and try and get this dense reconstruction. We thought it was just a cute cool idea that you could just take a bunch of images, Mark out a few scribbles and get dense reconstructions from that. Here's let me just -- let me skip to something that might be more interesting. This was statute at Cornell, we're at Cornell now so we had to do that. We actually also got the Statue of Liberty dataset from norm. So what this algorithm is doing is given these silhouettes, it's also running a structure for motion algorithm to find our camera parameters. So you know where the cameras are that took those pictures and you're projecting those silhouettes back into space. And these are not, mind you, these are not extremely dense reconstructions. This is not the quality you would get out of laser vein scanner but they're better than sparse point clouds, and that's sort of the idea here. We also did this experiment where we just took a video of a person standing on the ground. We're not using the fact that this is a video. We're just sub sampling some frames from this video sequence. Took a bunch of these frames just scribbled on it and said this is foreground, this is what I'm interested in, and we were able to extract somewhat accurate 3-D reconstruction of that person. And so this is just for that object. The thing that you might find that might be more interesting to this audience, given that there was some work done here, was that my colleague also -- we extended this not just to any particular object, but to also do this for a planar reconstruction of a scene. And there's, of course, been a lot of work done on that here. So we used some of the ideas of co-segmentation to take a group of images to mark out lanes in the scene to say this is one particular plane, this is another particular plane. Get like a planar co-segmentation. So you're sort of segmenting out not objects but sort of planes in the scene. Let me quickly jump to -- and we're able to -- we're able to not only vendor one particular object volumetrically into the scene but also get a planar sort of approximation of the world that we're seeing. And, again, it uses some of the same ideas of the co-segmentation setup. You're posing it as a discrete labeling problem over these planes. Let me see if we have -- yeah. This one is, of course, a toy example. But it's still interesting to look at, that you have a collection of these images. The user just marks out a few scribbles and you have this fairly accurate reconstruction of the world. That's me. So there's the foreground being segmented. There's a planar segmentation of the scene. We had some problems with that result. There's actually a visual hull that's coming out of the body. All right. So with that I'll stop. If any of this seemed interesting, I'd be happy to talk to you more after this. All right. Thank you. [applause] >> Larry Zitnick: Questions? >>: So for co-segmentation, is the basic idea that the color model or color texture model is being blurred across multiple images at once? >> Dhruv Batra: So the basic idea is that, yes, the appearance is transferrable across these images. But it's not just -- we actually have a pairwise, a unary appearance pairwise appearance not only fitting what the foreground is supposed to look like GMM and what the background is supposed to look like in terms of GMM, but we're learning which sort of colors are supposed to be similar to each others, which are not. And there's a learning algorithm that learns that these two colors appear in the foreground together. So they should be similar to each other. So you'd rather keep them together. But at the heart of it yes there's some transferability assumption that what I see from an image should hold true another image if you change your mind about what the four ground is between these images then, yes, there's nothing we can do. >>: On your MRF on the mesh inference, OPD, looked like a trade having single chains, you had double chains. >> Dhruv Batra: Right. >>: Is that the difference? Does that make a big difference in practice? >> Dhruv Batra: So, that was just one particular structure for grids. That was just one particular structure that we experimented with. Yes, that structure alone does make a difference, but in general this approach has nothing to do with that particular structure. It's that any larger structure that you can find, which is still outer-planar, it's a lot denser -- it's denser than trees, so you're solving a bigger chunk of the problem exactly. And it's all about pushing that envelope of what you solve exactly. When you can solve a bigger problem, bigger sub problem exactly, when you merge them together, you're closer to the original problem in some sense. >>: Have you thought at all about [indiscernible] the cone structure, which structures, especially given the underlying graph, like visible edges against the image and whatnot, structure coming to ->> Dhruv Batra: Yeah, so we have been thinking about sort of extracting structures based on the problem. So imagine a problem in which I sort of construct a graph which only has, which is outer-planar. So I only construct a square graph. There are no edges in between. But I throw in some random edges with zero weights. If you're not looking at the edge weights and you do a decomposition, then you're far off. But if you somehow knew that the edge weights are zero and you chose this as a decomposition, you would have solved the problem exactly. So we are looking at -- so that heuristic I talked about is along the same lines. You find a spanning tree that keeps most of the energy, then you keep adding edges based on how strong those edge weights are. So it's hard to describe that structure independent of the problem. But given a problem, we can find a structure that sort of represents most of the ->>: What were some of the other failure decompositions besides the comb? What do they look like? Because the comb doesn't propagate, intuitively it seems it does it vertically or horizontally, how do you double wide -- should be marginal improvement over having a single image. >> Dhruv Batra: Right. Let me just try and address the first question, which is what do these decompositions look like. Some of them are not that intuitive, in the sense that I showed you -- this is a -- this is a delinear planar graph where the blue dots are the nodes and you're sort of finding these structures. They're both outer-planar and they're sub graphs of that one. So the interesting thing is you sort of look at the structure and it looks like if it looks like a tree on triangles, that is an accurate, that is an accurate representation. Outer-planar graphs, if you think about them in terms of tree width, they have tree width two. So they're strictly the next largest structure from trees. >>: What is the tree width? >> Dhruv Batra: A tree width is if you were to think of this in terms of a junction tree approach or in terms of triangulations of your graph, then a tree width is the largest click that would be formed in a triangulation. So trees have a tree width one. Outer-planar graphs have tree width two. And so on that X axis you're not very far away from trees. It's distinctly the next step from trees. So if it feels like we haven't gotten too far beyond trees, you're right. We haven't. But it's still interesting to see that in our applications that does make a difference. You're better off. >>: These triangles could make a difference. This is just an abstract graph, but if it was structured for motion or something, trip relationships are much stronger than pair trip relationships. >> Dhruv Batra: Yes. And you're incorporating those sort of constraints at a long-range level. >>: So we did sculpture graphs to may instruct motion faster, but that's when you're basically solving a linear system so it's not a graph cut, right? >> Dhruv Batra: Uh-huh. >>: And this is -- in each case you're always solving a binary latency problem even if you're running it on flow or something? >> Dhruv Batra: Right. Right, but point to remember is all the message passing algorithms, if you did have an algorithm that solved a multi-level problem, all the message processing problems would be able to incorporate that without any change. So if you -- I showed you a cut-based algorithm. If you weren't willing to use something else like a junction tree or something then all these message passing algorithms would work on that, too. But right now we're restricted to solving binary problems at the sequence. >>: So a junction tree, that's the -- many people have rediscovered this, but the pearl may be a one thing junction tree algorithm, right? >> Dhruv Batra: In a sense. Exactly it's exponential with tree width plus one. So if you're willing to pay exponential cost, but it's exponential in tree width, so solving it on trees is easy. Solving it in outer-planar graphs is just slightly trickier. It's just one -- the tree width goes up by one, basically. >>: Okay. So how -- so you haven't done that, but how hard -- I'm trying to figure out how useful this is beyond -- because in addition to working on binary problems occasionally, like we do segmentation, things like that, we also do a fair amount of sparse linear solving, things like that or other problems which aren't binary. Does this planar decomposition give you some that couldn't be used, for example, as a precondition for a linear system solver? >> Dhruv Batra: Frankly I haven't thought about that but if I were to abstractly think about would this make -- if I were to change a question to would this make a difference for multi-level problems, too, if I were to work with that my intuition it would make a difference even for that, because -- because right now the way we solve multi-level problems is within alpha expansion step up ended into these, so even with alpha expansion, we can see some gains over TRW or BP width would solve the multi-level problem to begin with. So I'm definitely confident if we had to throw away the alpha expansion and this two level engine and just come up with something that solved the multi-level problem exactly, it would do better than the alpha expansion plus two level thing. It should work, theoretically. >> Larry Zitnick: Okay. Dhruv, thank you very much. >> Dhruv Batra: Thanks. [applause]