>> Larry Zitnick: All right. It's my pleasure to introduce Daniel Freedman. He graduated from Harvard with his PhD a while ago, and after that, went on to be a tenure professor at Rensselaer followed by a stint at multiple industrial research places such as HP and IBM and finally ended up at MSR in Israel. And some your earlier work was active contours and segmentation I believe. And today he's going to be talking about a lot of his more recent work. So looking forward to it. >> Daniel Freedman: Thank you Larry. So, yeah. So I gave this sort of whimsical title. I decided that it would be more interesting to stick a whole bunch of things together rather than talk about one thing just to keep you interested. So to start, I want to start with a Lemma. If we have a simplicial complex K and we, I’m just joking. We’re not going to start with that. That Lemma actually appeared in a paper of mine from about three years ago, and I will talk about something related to that a little later, but hopefully in a way that's a little more fun than that. Okay. So the idea is just to give a high-level overview of a few different topics I’ve work on the last three-ish years. Hopefully you'll stay awake and maybe even find it fun. Okay. So just a little bit quickly about me. Larry already mentioned, but I joined MSR, the ATL in Israel, the Senior Researcher just about a year ago. Before that I did, I was a physics [inaudible] undergraduate at Princeton, and then I finished my PhD at Harvard in 2000. I was a Professor of Computer Science at RPI for nine years, and at some point in the middle got tenure. I came to Israel on a Fulbright. I was visiting a professor of Applied Math at Weizmann Institute and, well, this is Troy in the winter where I used to live and that’s Zikhran Ya’aqov also in the winter where I live now. So we were happy there, and so we, I bounced around a few different places in Israel, but now I'm at MSR and it's been fun. Let me tell you little bit about our group, which is currently called the one vision group, so basically the ATL initials divided between two sites in Haifa and Herzliya, our group in Haifa is the OneVision Group. We try to do computer vision research which is on the boundary with machine learning, generally, although not always. The idea is that it is often is useful if it relates to Microsoft business. And our main focus recently has been on things like 3-D sensing and object recognition. I'll talk a little bit later about some of the 3-D sensing stuff, but unfortunately, like I was telling Larry, there's some NDA sort of things, so I'll just give a kind of high-level overview of that. So general things I want to talk about: so the first part of the talk I want to talk about things that I did that are related to image editing. And this was actually an interesting departure for me because I didn't, until that point I was spending my time doing two things basically: vision and computational geometry slash topology, and this is stuff that's on the boundary with graphics. It's not real graphics. It's more, sort of, but it's as far as I went into that world. And it was fun because we could actually draw pictures of things and it looks, the results often look nicer than that vision results. Second thing is algebraic topology. Obviously, I’m not a topologist. I play with my TV. I'm interested in how to use this in problems that are of interest to us. We did do some work which I won't talk about where that theorem comes from, which was actually like more in the CS theory sort of boundary with this kind of math stuff, so we published in [inaudible] and discrete and computational geometry and places like that. But of what I'm interested in talking about today is how you can use these tools in computer vision, in particular, tomology[phonetic] groups; and hopefully, it's like a bit of a, it’s a heavy field from the point of view of machinery, but I'm going to try and give a nice sort of overview, and if you like it, I'd be happy to point you in the direction of some interesting papers you can read, not just from myself, obviously, but sort of overview papers. And I'll talk a little bit about the 3-D sensing work. I guess I'm a little restricted what I can say because of the NDA, but the problem’s an interesting one. So I'll give a flavor of that. And finally, fancy pants features. Some sort of features that we’re trying to develop for images. Also, this is, there is no NDA there. It’s ongoing, and I just want to give a little taste of the sort of thing we’re working on. So those are the things I want to talk about. So let's start with an image editing. And this is joint work with Pavel Kisilev and Zachi Karni who were at HP labs, Craig Gotsman, who is at the Technion in Israel, and Renjie Chen and Ligang Liu who were at Zhejiang University in China. Okay. So the first problem here is the image resizing or retargeting problem. Some of you may know this, but just to give the overview of what we're trying to do, we want to change the aspect ratio of an image. So we have an image that's of some aspect ratio, we want to stretch it or squash it or something like that, and this is a common problem that people want to, a common operation people want to do in, you know, sort of print things for different size albums or whatever. And, of course, you could just rescale the axis in the ordinary way just by stretching. You could do that and that would be fine if it's okay for people to become fatter, or alternatively to become thinner, which you'd think people would like to do, but it actually looks very strange and artificial. So the question really is: is there a better way of doing this? Okay. So that’s the problem. So now what does this look like? So if we just, so here's an image and if we just do the regular uniform sort of squeezing, right? What we get is something that looks like that, and you can see that here there is no person, I mean it's not sort of, an obvious thing that bothers, but it's not the same image that we were looking at before clearly, right? So what we'd like to do instead is do something like that. Now what's happened here is you can notice that the parts that we care about, namely the island and so forth, the aspect ratio’s more or less been preserved so it looks realistic. Now, of course, that has to come at a cost somewhere else, so if you look at the sky, that isn't the case there. And here there's been some more stretching, but the principle is you don't really care about what happened over there or in the water, you don't really see those details, okay? So to draw it again, there's the picture of the important parts, so to speak, where we want to maintain the aspect ratio, or salient parts, and then the yellow parts are unimportant and we are allowed to have some sort of distortion there. And that’s the whole game, is basically to do that. Okay? Now how do you measure what parts are important? Well, there's lots of ways to do that. You can do things that are sort of the very, very simple like gradients, which aren’t very sophisticated, and that will at least give you something like, well I know this is a white wall or maybe this part of this sky, and so I know that that's not important. Or you can do something more interesting, and there's a lot of work in the vision community on saliency, recently there's even a salience data set and people run on this. All that can be used and we take that as sort of external or exogenous for now, we just take that is given, and we are more interested in given that measure, how do you do the deformation correctly? Okay. So seam carving, I can't say, I'm not 100 percent sure that was the very first thing, if it wasn’t the very first approach it was the second approach to this. But it was one that popularized the problem. So I guess a lot of times the, that's sort of one of the great achievements of an algorithm is not necessarily what the algorithm is but popularizing the problem it’s trying to solve, that's what the seam carving guys did. And it was a very nice work. And what they wanted to do is basically do this by removing unimportant seams from the image. Okay? So if you have an image that looks, well this isn’t a very interesting image, but let's imagine there was something inside, and we want a seam would be a kind of a curved column. And I'm going to measure, based on how much saliency there is in that curved column, I’m going to try and remove a seam that's unimportant, like maybe something like that, and assume that there's unimportant stuff under the red and it would then get squashed. And we just removed seams one after the other. And that was a very nice idea. Now the problem with doing this is that it's discrete and anything that's discrete will have the [inaudible] descrete sort of artifact. So what you might get, so here's an image of a bunch of women, and if you apply seam carving, you can look at this image for a second and you can start to notice there's something strange; she has no leg. Right? This is not uncommon, this sort of thing, especially if you have narrow structures like that. This is maybe more than 4 to 3, 3 to 2, but it's still in the realm, it’s not squashing something to a very, very thin kind of wedge, and you end up getting this thing where people lose limbs or whatever. Okay. So what we did was we decided we would go towards a more continuous approach. Now, obviously, the idea behind that is to avoid these kind of discrete artifacts, okay? In addition, it's going to just generally give the whole thing, even when there aren't sort of strong artifacts, it's going to give the whole thing sort of the smoother look. The problem is that it’s sort of heavier to compute. So you trade one versus the other. So what does this look like? This is the image again. And what we want to do is, so we already saw that that’s sort of the result that we want. Now what does it look like actually? Sorry. So we’re going to overlay a quadrangulation, which begins just as a bunch of squares or rectangles or whatever, let's say squares, and we’re going to deform that quadrangulation into another quadrangulation which might look like that. Now if you look at it, you can see that the thing we were advertising in the previous slide hold, so what is that, what do I mean by that? So over here in this section, right where you have the island, you have lots of little squares, and they're still squares, almost. So you haven’t much deformation in terms of the aspect ratios. And up here they’ve become these sort of elongated rectangles. And that's sort of the type of thing that we're going to expect, and of course it's not, I mean in the end these things are no longer rectangles at all, they’re some sort of trapezoids or whatever, but the point is that they’re more or less close to that. >>: May I ask a quick question? >> Daniel Freedman: Yeah. >>: Basically you're saying to try to preserve the [inaudible] make things even worse for [inaudible]. >> Daniel Freedman: Exactly. Yeah, yeah. You have to pay somewhere. >>: [inaudible] term, I mean, the other thing that happened here is, of course, is the salience got smaller. >> Daniel Freedman: That's right. >>: [inaudible] scale>> Daniel Freedman: That's a very good point. And, in fact, there is, in about three slides I’ll show you, but to begin with there's no term and then one can add in the term like that. Okay. So before I just go on, just to make clear, the question was sort of you saw that these things have actually, it’s sort of undesirable, they become smaller, you don't want that. So to begin with we won’t have that. So let’s see what the original sort of objective function is going to do in constraints. Okay. So each quad is going to be defined by four points, which we’ll label one corner I, J, and then I plus 1, J and so forth around. So each, we're going to have a term that’s basically applying to each of the four edges, and I’m just when to show you one of them because they're all identical, okay? So let's say we are looking at this edge right here. And what we want is we want, we are given the X’s and the Y’s. Here are the original coordinates of the four points around the quadrangle, and the tilde variables are the new coordinates, and we’re going to compute those new coordinates. We are also going to compute some intermediate variables, which we call the stretch variables, which are A and B. So what do we want? We want that the new quad should be more or less an axis-aligned affine transformation. In other words, just basically a square going to a rectangle of the old variables, of the old square. So how does that look in theory? Well, we have this term E, I, J, which basically just looks at the difference along this edge of the X’s and the Y’s. And we're saying that it should basically, the delta X should just, the new guy should just be a scaled version of the delta X of the old guy and likewise for the Y’s. That scaling factor might be different for X and Y, here it's A and here it’s B, but that's what we are aiming for. So if it was exactly an axis-aligned deformation, then this energy would be zero, and if it is not exactly, if it becomes some sort of trapezoid, well hopefully it will be more or less a rectangle. So that's the energy, that energy term. Now, that doesn't say anything at all so far about what's important and what's not important. So let's go on and talk about how we’re going to introduce that, and we’ll introduce that via constraints. So we have three types of constraints. The first are not very interesting. It's just boundary constraints on X tilde and Y tilde, namely if you live along the boundary you have to live along the boundary. You can't, the thing in the end, the image has to be a rectangle. So that's sort of clear. The second sort of constraints is that these are things that must be positive, these stretch variables. In other words, I can't have some sort of reflections and things flipping over. Or at least I'm going to try not to. And the third is the interesting one. And this is where we get this importance, and it says that the ratio of A to B, I’m sorry, before we look at this let's suppose that we've stretched more in the X direction, so it's some square thing and we’ve pulled it, okay? So the ratio of A to B, remember A is the stretch in the X and B is the stretch in Y is lower bounded by one, in other words I must try to stretch more in X than Y, and is upper bounded by something, okay? That's something is a per quad measurement which depends on the importance. So, in fact, this upper bound is going to be a decreasing function of importance. If the thing is super important, I'm going to set si to one, which effectively says I have to scale by the exact same amount in both a and B. Or X and Y, if you like. If it's not important at all I can set this thing to infinity and then I can do whatever I want and anything in between. And that’s the constraint. Now, you could do this with a soft constraint. And we tried that too, rather than a hard constraint like this. But we found that the hard constraint worked much better. Of course, if you stretch more in Y than X then this would just flip. Okay. That's about the optimization. I'm sorry that's the general problem. So the optimization itself, we’ve got these variables, these X tildes, Y tildes, A's and B's that we want to solve for, and if we sort of just collect them together into one big Z vector, then it's not too hard to see that it's going to be a quadratic objective function; and we have linear constraints, so we have a quadratic program which we can solve. We solve this in several ways. So MatLab is, you could use, is much too slow to do this on something of any reasonable size, so you can use, we used CVX, we used some commercial packages at the time, and they all worked fine. Now, some comments about some sort of variations you can do to this problem, so one is you could use a different norm in energy. So we had a square term. So instead squares we could use a one norm, an infinity norm or something, and of course it still remains convex. It made absolutely no difference. I mean, very little. You could see some minor changes. So that wasn't very interesting. One thing that is more interesting is prevention of fold-overs. So what do I mean by that? So this is a problem that plays continuous methods as opposed to discrete methods. So suppose you have two quads, one beside each other. There's nothing to prevent me from mapping, remember I'm just giving you X tildes and Y tildes, and I could get something like this and then it folded over on itself. And this is something that graphics people know very well and vision people, or at least this vision person knew less before I did this, but it’s sort of a standard thing, you can say that now you don't know how to render and so you can prevent this easily by having extra constraints which just say, can't do that. I have to have this X tilde, for example, is bigger than this one, so I'll going to always be on the side. And the way we did that, in fact, is not just that it's greater than, but it's greater than with a certain margin. So you can't have an infinitely, I’m sorry, a zero width rectangle. You have to have some width. And so you can prevent that. And this was interesting, by the way, just because there were sort of methods that try to do this in a local optimization sense that can’t do this. So it's not, it’s very simple here, but it actually added something to it. And then finally, the question from before, which is how do we deal in large important regions. And here it’s very easy. You can just add in a linear term. It would be a minus some term, we would maximize a linear term in the importance or minimize a minus a linear term in the importance. So just an extra linear term and the thing remains a quadratic program. So let's see some pictures of this. So we’ll start with the image we had from before, and that was the original, right? And there's the uniform scaling. And here, obviously, it’s just uniform. You don't see any terrible artifacts, just what we said before. Here is the seam carving, which as we pointed out, she’d lost her leg as we saw before, but actually something else I didn’t point out over there, this person sort of disappears. It's not as sort of abrupt as losing the leg, but just kind of disappears. And here's our result. The quadratic programming result, you can, if you look you can see certain artifacts as well. Like, for example, the fact that you get sort of a certain curvature of this line, which was straight before. But on the other hand, the women all look relatively normal compared to the way they looked before simply because their saliency was considered to be high. Of course, I should say that when we compared all methods, seam carving and I'll show you some other methods, we use the same saliency measure across all of them. So that isn’t an issue. Okay. Some other pictures. So here's an example which addresses exactly the issue of small things, salient things becoming small. So here we had the extra linear term which encourages a large saliency and the house was deemed salient and so actually becomes very large here. It's a bit of matter of taste whether you like that or not. But you can clearly see the effect there. The other methods we’re showing, so you can see the seam carving, and there were some there, there were some successes to seam carving, there was something called multi-operator. There was two methods, one of ours, which was the Local-Global, that’s a previous method of ours which didn't have this size term, and another one which was called Optimize Scale and Stretch. I won't go into the details there, but you can see, and again, these things are bit of matter of taste. There was an attempt to have a competition at one point organized by, I think it was [Shioby Dunn], who was the original seam carving person. Here, you can see, I think we do a little bit better at keeping the table and the wine and more or less in its right aspect as opposed to pretty much all methods except the Local-Global, which is very similar. That was our previous method and does sort of similar type of results, although it's a little smaller there. And here again, you see the size issue with the fish. As I said, there was a competition at some point run on some several sort of standard sets of images. Now let me show you something we're not good at at all. And I sort of alluded to this before, which is preserving straight lines. You saw that a little bit with the curve of that cement block. Here you can see something a little bit more blatant. If you look at the bend of the, and it’s terrible, right? And you can do, there's a hack, you can fix this semi automatically if you have someone sort of say I want this line to be straight. And you get something reasonable, and what they're basically, it's just along that line you're putting very strong constraints on those stretch variables to say that they’re more or less the same. There's nothing magic, it’s not a very satisfying result though. Okay. So that's a sort of, that’s the image resizing and retargeting type problem that continues right now. What you could do is you could take an extension of this and do something related, which is a little bit more graphics-y maybe, which is not images, just shape deformations, and we want sort of natural shape deformations. Okay. So what is this idea? So here shapes are going to replace images and shapes are some sort of 2-D contour in what's inside of it. And triangles are going to replace quads, and now it sort of the standard graphics formulation with triangulations. And now we want to say that each triangle should undergo a nearly natural deformation. And there were a few pieces of work that tried to do this. I'll tell you our little innovation on it, if you're familiar with the work, and it has to do with what we consider a natural deformation. So natural could be something like Euclidean or similarity, right? Or it could be something a little more complicated. And the more complicated thing that we did was something that, it was actually, both of course Euclidean and similarity are groups of transformations, we ended up coming up with a set that sort of interpolates, we took similarity as sort of the gold standard, in other words, if you had something that was really salient, you'd want it to be a similarity transformation in the new deformation, and if it wasn't salient, you could do something that was a bit simpler. I'm sorry, a bit more deformative. And what was that? Okay, so let's say if something’s really not salient at all you could do whatever you want with it. And that would just be a general affine deformation. So we want to interpolate between similarity and affine somehow based on how salient you are. So there's lots of ways to do this, and the way we did our little sort of secret sauce here with the following: you just take, so any deformation in 2-D is going to look like some two by two matrix times the thing plus another two by one vector for the translation. So let's focus on the two by two matrix. If you take the singular value, the SVD, what one finds with, of course, a general transformation, the singular values can be anything positive or nonnegative. If you take the SVD of a similarity transformation, the singular values must be the same. Okay? If you just take a look at the ratio of the singular values, the big one to little one, you can say, okay now I'm going to basically transform from one to the other based on how big that is. And if the thing is sort of, it’s supposed to be really salient, I'm going to make that thing be close to one, that ratio, and if it’s not salient, it can be whatever it wants. So it's very similar in that regard to what we saw before, only instead of those A's and B's [inaudible] variables, we are now looking at singular values. And that was sort of what we did. Now how does this look in pictures? So here the game is the following: you can see that there's the Kool-Aid guy. And what you do is you see these little green dots here, you are going to basically, someone's going to move them manually to somewhere that they want, and we're going to compute the new transformation based on the constraints that those green things have to move where we want them to. So you can get something like this, so someone specified, I want the hand to move down and whatever, and it computes the deformation. Now, actually it’s pretty good. You could see that the left, sorry, his left arm, our right, this one here is a bit weird, right? Because it’s all 2-D. So you just know out of playing anything. But other than that it looks sort of semi-reasonable. There's the frog doing some sort of dance. And you can do this in images too and you can get something like that. So we start here and we move to here and you get a reasonable looking transformation. Okay. So that's the end of that problem. Now I want to talk about a different problem that I did in sort of this image editing world which was recoloring. So we are on to something new but still within image editing. And here, this is sort of a problem of transferring color scheme. So I want, I'm going to have the two images, a source and a target, I'm going to take these colors from the source and transfer them to the target. And the key is to retain, this is some sort of vague hand wavy thing, I want to retain the original object’s look and feel. So I'll show you a picture and you'll see what I mean by that. So here's the source, some sort of crest, and here's the target, if you like football that's Adrian Peterson, and I want to somehow take the colors from the source and transfer them to the target. And of course, if you can imagine, there's lots of different ways to do that. To begin with, I'm going to assume I have a segmentation and I'm not going to worry about that, and you'll see later on that there are some errors occasionally due to the segmentation, and that’s something that I take as given, and I don't really care about the segmentation. I use some tool of my own that I developed a while ago, but you could do anything. Now, based on what's within the segmentation, the mask, I’m going to take some sort of distribution over colors, okay whatever space, you can specify what space you want, and basically I'm going to compute a color transform, which we'll call Big Si, and then I'll just run the part that's in the mask through that and I will get this and that's the final result. Now what do I mean by the look and feel here other than the fact that yeah, okay, I got the colors from the source to the target? If you look at the red over here, the little red rectangle, and the corresponding red rectangle, I don't know if you can see with the granularity of the projector, but you sort of retain the wrinkles. And you can do things that are, sort of give you something rough but don't get that level of thing, but that's what we’re going to look for is that level of detail is preserved. Okay. So that's what we want to do. So, by the way, this appeared at CPBR a year or two ago I guess. Two years ago. And a variant appeared at ICPR afterwards. Okay. So which colors match? So we're going to basically be agnostic about this and we’re going to say that the user can specify some sort of ground distance between colors. So I could have something like RGB, and that's generally not a good idea, or I could do HSV or LAB or something like that and match colors on that basis, and that's going to be our ground distance. Or I could do something that's weirder, I could match things based on their brightness or something like the opposite, I call it one minus brightness, the opposite of brightness on the edge of the bright and the source matches to something that starts in the target or whatever or something more exotic, whatever it is I'm not going to specify that. I'm going to use that as a sort of a basic tool and then compute things that sort of work based on the ground that's specified. What's interesting is you'll see is because we do something a little bit more sophisticated, we actually, if you want to stick to perceptual color spaces, it turns out based on what we do that using RGB, is actually just about as good as using LAB, which is sort of interesting, it's not something we anticipated. So what do we do? There's three steps here. The third step is simple; it's just the first two steps that matter. The first step is based on the ground distance. We're going to compute a coarse color transform by the transportation problem. So the transportation problem is basically the earthmovers distance. So, in fact, what we are going to do is we’re going to take something like the EMD, almost very similar, but we're going to add two wrinkles to it. So the earthmovers distance just remind you sort of you have the earth in the source and the earth in the target and it’s just within the transformation, within the distributions themselves, and based on how much work I have to do to move it, then I'm going to minimize that amount of work. Now, what do we do this different from the earthmover’s distance? Well, the first thing is we relax the conservation constraints. That’s the most important. So why do we do this? The conservation constraints say that you have to move the right amount of earth from here over to here. I have a certain amount of earth in the target; I'm going to move it all over the source. I don't want to do that for the following reasons. Suppose, I'll give you a little toy example. But this is something we saw in practice. Suppose you have two images and we're going to match based on brightness. And one image is blue and the other is red. So we’re just matching the light blue to light red and dark blue to dark red. But one image is 50 percent light blue and 50 percent dark blue and the other image is 40 percent light red and 60 percent dark red. Well, I'm going to be mismatching some of the light and not light because I don't have the same amounts in both ones, right? And that's something I want to avoid. So I’m going to relax that conservation and say the conservation must be true within a certain slack. You can't do anything, if I totally relaxed it then I’d just be doing nearest neighbor effectively. And then that would mean that if there's one little noisy pixel, well, I'm matching everything to that possibly. So I don't do that. But I say, I allow some slack. It doesn't have to be totally conserved, but it has to be conserved and so I can't move more than twice the amount of what's here over to there or whatever and you set that slack. So that was one change. And that still leaves the whole program to be a linear program. It just adds in some more, instead of equality constraints I have a linear inequalities that get added into the convex program. The second term actually changes that a little bit and says smoothness term and what do we want, what do I mean by flow smoothness term, well in the original earthmovers distance, you don't have any constraint that says if two pixels in the target are nearby, they ought to [mott] to nearby, I’m sorry, two colors not pixels, two colors that are similar in the target, excuse me, in the source, I don't have anything that says that, so I want to put something like that in. And that just, depending on how you do it, that basically will lead you either to if it's an L, 1 type term you'll get a still linear program, if it's an L, 2 term you'll get a quadratic program and so forth. That's less important than this but it still does add something to it. Okay. So what I said here was this is going to give us a coarse color transformation. But a coarse color transformation won't keep the look and feel. Why? Because I'm binning things. I've got a bunch of bins and I'm just mapping one bin to another bin and so forth. Right? So basically remember the sort of look and feel part was the wrinkles ought to match to other wrinkles. Wrinkles means subtle changes in color, it’s the same color but subtle changes in brightness. Here's it’s sort of, it’s getting to be something really blocky. So I don't want that. So that's going to give me the second step which is going to sort of give you within each bin I'm going to get, instead of having it mapping to another bin, one sort of a bin center to another bin center, it's going to give you a linear transformation within that bin. And that's going to give you what you want. So how do we do this? So if we look at flow to a given got target bin, right? We have certain source bins that map to it. Remember the earthmovers distance isn't combinatorial; it basically allows a few different, each bin can have a bunch of bins that map to it with different weights. So what I can do is I can compute the mean covariance of the source bin’s mapping to a target bin and I can compute easily the mean and covariance within the target bin just by looking at the size of it. And now, basically, what I want to do is my second step is say I have a mean covariance of one thing and map to a mean covariance of another thing and I'm going to do this via some sort of affine transformation in color space. So we have all this local little affine transformations, linear transformations plus an off set, and then I'll stitch them together in some simple way, which I'll talk about afterwards, but basically now what’s going to happen is basically it’s a local linear thing which gives you a global nonlinear thing, and that's good. So the question is really how do I do this affine transformation? And we call that, it's a stupid name, but it was, I couldn’t think, it’s an SMSP, the stretch minimizing structure preserving transformation. So why do I need this? Isn’t mapping between two Gaussians just a really easy thing to--Yeah. >>: Just quickly. These are 2, 3-D mappings bins? >> Daniel Freedman: That's right. >>: And you're moving the, these little bins themselves [inaudible] important too? >> Daniel Freedman: Exactly. That's right. And this one is easy, right? We could just do something like scale so the individual variances match and then do whatever off set we need for the means. You could do that. And that's, in fact, there was a nice paper on recoloring from 2001, which everyone seems to like to compare with by Reinhard, which was in SIGGRAPH I think that year, and that's, so you could do that, and that's a good idea for LAB sometimes. Okay? But, in general, if you're in some sort of general color space, it's a bad idea and a warning here, this is an animation which I don't think quite worked out the way I wanted, but we’ll try to anyways. So here we go. So let's say we are now in 2-D not 3-D. So here's one and here's the other. There's the source and the target and I want to map one to the other. So if I just squash individual variances, what's going to happen here? I'm going to push this guy down and I’m going to pull this this way, and obviously that's a bad idea because I'm just sort of simply amplifying noise and squashing signal and this is the part that doesn't work well. Yeah. I thought it worked when I did it. But anyways, the point is you end up squashing and stretching the noise, and that's not a good idea. So stretching amplifies noise. Instead, a better idea might be to rotate. And that's a sort of very natural. And, again, it didn't work the way I wanted, but the rotation is better. So what we want to do when we want to rotate, we want to keep axis lengths preserved. And so what we call that is this SMSP. We want to minimize the stretch and we want to preserve the structure. And preserving the structure basically means I want to keep the orthogonal, I want to keep the orthogonal axis orthogonal. So that's the basic idea there. So what does this mean in practice? How do we do this? We take, when I say the orthogonal axis, I mean to the principal axis, and the principal axis, obviously of the covariance matrix should, the goal is that they should remain orthogonal and then whatever their mapping to I should sort of squash as little as possible. And you can formulate this in a way that is, you can find the global optimum of this thing. In fact, you can find, in 3-D it's very simple and you can actually sort of, it ends up leading to a mixed continuous combinatorial thing which is easy to solve because 3-D in the combinatorial thing there’s only sort of six options. But, in general, if you did it in N, D and this might be a useful thing, then you can solve the combinatorial part by the Hungarian method. And you can do it in a reasonable amount of time. So that's sort of the, and I didn't go into great detail about how one does that, but that's sort of the one nice thing pieces in the paper. And the third step is how do you stitch these together? You're just going to do some sort of convex combination of these transformations. So in a given bin I just look at nearby bins and see what they map to, what their thing is, and based on where I am in the bin, I sort of take some sort of weighted average and that's just a simple sort of hack that gives you something where you don't get any artifacts. That's all. So let me show you some pictures. So the source here is the blue car and the target is the green car and we want to color the green car blue. Reinhard, this method which basically says let's look in LAB space, let’s just squash L’s and A's and B's and that's all, works really well here because it's a good perceptual thing and there's only one color there. So it does what it's supposed to. Interestingly here, I don't think we do any better. I don't think we do any worse, we do fine. What's interesting is we use RGB in this method. So what sort of, which is not, if you did that with Reinhard you wouldn’t get good results, not great results anyways, and what's interesting is somehow this SMSP transformation more or less figures out sort of that it's, what to do. But that wasn't the original sort of impetus for that, the original impetus for this thing is that sometimes you have by-modal or are tri-modal or things like that, here's the example you saw, and Reinhard, because it's trying to basically this method that’s sort of trying to fit a Gaussian to something that's multimodal, invents colors. Not surprisingly it mixes the colors together, so that blue and yellow sort of become sort of one Gaussian, and this green, purple and white become another color. And so you end up getting some sort of funny things in here. Obviously, you get something that's a little better. Again, here the distance, the ground distance was RGB. Now you could say well, let's look at different ground distances. So here was an example where the ground distance was matching brightness to something that was bright to dark and dark to bright. You see some artifacts here. In particular, you see the edges of the segmentation weren't terrific. But in general, that’s sort of a reasonable in the rest of it. Now here's another example where it's tri-modal. It was hard to find tri-modal examples. I ended up finding this; apparently this is the most hated third jersey in all of sports. And we, you get, maybe it looks better here, I don't know. But anyways, it does the color matching. These are sort of standard examples. You could also do something a little bit different. You could say, what if my source and my target where the same thing except that in the source I take the source to be that lit part and the target be the part in shadow. So the mask for the source is the part that's illuminated, all of this, and the mask for the target is the shadowed part and you could do that then; and now the matching, the ground distance would be based not on brightness but actually on RGB normalized by brightness, right? So RGB over R plus G plus B and then you get something which does shadow removal pretty well. Okay, it's not perfect, I don’t know if you can see from there, there are definitely artifacts around the edges and there are probably better things one can do for shadow removal, but it wasn't supposed to do that so we thought that was kind of cute. And then just for fun, you could show what happens if you do different kind of transformations. So here you have a source and a target. There their transformed, we’re trying to match bright colors, and here their transformed, we’re trying to match bright to dark and dark to bright, and of course you get something that looks a little different. Okay. So that's the end of that section on image-editing. Now let's switch gears entirely and talk about some algebraic topology work. And this is joint work with my former student, who is now in his finishing his second postdoc, Chao Chen, and Christoph Lampert, who is at IST in Austria. Okay. So a little short intro about algebraic topology. Obviously, traditionally considered one of the purest areas of math, in the last 10 years or so there's been an interest in two different directions of things one can do with algebraic topology. One is computational slash algorithmic results. So this is, again, the sort of stock soda crowd or whatever, and they're interested in things that you can say about complexity of different things you want to do, computing homology groups and whatever. And what's interesting, by the way, I once talked to a, so I did some work in this as well, and I once talked to a guy who was just a pure mathematician about this. He said oh, yeah. We've done things like that too. We found this one algorithm, so if N is the size of this complex which is basically the size of the space, it goes, the complexity is something like this, and he writes like that. And I said well, what is that? I don't know that notation. He said, well, that's our own notation. That means two to the two to two to the two to the end. That’s 256 times. So basically, if you have something that’s a size two, even one, you have problems. But he thought this was interesting. So that's the perils of being a pure mathematician and dealing with complexity. Anyways, so that's sort of one direction. Another direction is to use algebraic topology and fields that are of interest to people in either computer science or engineering. This includes computer vision, sensor networks was quite popular, some work in biochemistry as well from the computational biochemistry crowd. So this is sort of a way we were riding. In particular, we did some work in applying this division in addition to the complexity type work. The key thing is that obviously this is a very technical field. There's a lot of machinery one has to use in order to really get up to speed in it, and that's sort of a little bit annoying in trying to give it in a survey. But anyways, I think that I will try to give the overview of what's involved. And like I said, if you're interested I'd be happy to point you to some papers that are very good as intros for this area. Now, in general we can say that topology is about invariance to continuous deformations. So invariance, so this is Felix Klein, the 19th century German geometer, talked about Euclidean geometry, geometry being the study of invariances. So Euclidean geometry is the study of invariances to the Euclidean, the worse set view group of Euclidean transformations. And affine geometry is the set of invariances to the group of affine deformations and so forth. So here we’re interested in the much more general idea of invariances to all continuous deformations. And so here's the picture. So there's a cube, and that is homeomorphic to this sphere, okay, now we’re talking about, let's say, what could be the interior too, but let’s say we’re talking about the surfaces, and it's not homeomorphic to the dot or torus. So this is a standard example, right? Now the gold standard for saying these things is what’s called homomorphism, and we say X is homeomorphic to Y if there is a function which maps X to Y such that F and its inverse are continuous. That's the definition and so that cube and the, or the surface of the cube and the surface of the sphere will have such an F whereas the surface of the sphere and the surface of the torus will not have such an F. That's just not at all useful for computation, right? We are computational people. How can I use this? It's hard enough to find the F, what is there is no F, right? What if the point here is that there's no such F? So I have to show that there's no F. So you can't really compute with this. So that's what leads us to the algebraic part. This is point set topology; this is just general. Now we want to compute, we want to define algebraic invariance. So we're going to define, on each space like this, we’re going to find a set of groups. There's two main sets. There's homotopy groups and homology groups. Homotopy groups are much more natural. They make more sense. They involve things like a rubber band is sliding around and we are not going to use those because they are harder to compute with. Homology groups are less intuitive, but they are easier to compute with. So that's what we use. Now why, again, what is the idea here? If you have homeomorphic spaces, this implies that you have isomorphic groups. And, in fact, the groups often have simple structures which allows us to say our two groups isomorphic and that way I can compute the groups for this, for this. Are they isomorphic based on the simple rank kind of computation? Yes, no? And I'm done. The converse is not always true. If two are not, if two isomorphic it doesn't necessarily mean they’re homeomorphic, but for most nice-looking spaces okay, that's hand wavy, the converse does hold. Not exactly true, but it's sort of, it’s strictly weaker, but it’s still interesting to compute with. Okay. So that's the intro. Now what are homology groups? Okay, now this is sort of the crux of the matter, and I'm going to give you some intuition, maybe some of you know this already. But this is the part which involves lots of machinery and let's see if we can get the intuition. So informally, these things count the number of holes in the space, holes of different dimensions. So here there are three holes. That was easy, right? Here, let's think of this thing, I haven't drawn it right because I'm not very good at drawing, but let's think of this thing as the surface of the cylinder. Nothing inside. How many holes? Well, I'm actually talking about kind of a 1-D hole, so there's really one there. A Kind of thing that goes around. So you can think of this as a tunnel, in that case. So those are 1-D examples. What if I have this and I talk about a 2-D example, again, this being the surface, nothing’s filled in? Well, there's one 2-D hole inside, right? And that’s a 2-D example. A single 2-D hole. So here we are talking about 1-D holes, here we’re talking about 2-D holes. So one often called these tunnels and these ones voids, but you can have 3-D holes and higher. You can also have 0-D holes, which is a strange concept, but we can deal with that in a second. So that's informally. Formally, these things count the number of non-bounding cycles. So let's draw a picture here to explain what this means. So here again is the torus, always the surface. I'm not interested in what’s inside. So here's the surface of the torus, here's a cycle, now this is clearly a cycle. Sometimes when we talk about cycles later it’s something a little bit more abstruse but a cycle and it bounds something. That's not what I wanted. That's not interesting. And this is a cycle which doesn't bound anything, right? It’s not the bounding of anything because remember in the inside it's empty, and that is interesting. I mean, maybe you have to, it’s not interesting to everyone. But it basically tells us about the number of holes in the space. This thing actually has two non-bounding cycles. You see the other one is the one that goes around this way. Right? Now, even more formally, H is Z [inaudible] quotient out B. What does that mean? Z is the set of all cycles, B is the side of boundary cycles, and I performed this quotient operation. So let me explain, let's ignore that for a second. Let's say each element is not actually just a nonbounding cycle but it's a whole family of them. So here's an example. These two are basically the same guy, right? I just moved one to the other. In other words, if you like, and this is not accurate here exactly, but you can think of these guys as rubber bands, I can just kind it along and get from here to here. So this idea means that I'll just call all of these guys the same thing and that’s my coset. So this is one and the same thing you can imagine with the other one. There’s actually more than two. There's actually three or four depending on how you count them. Because I can add these two guys together. I can also take something that looks like this plus something that looks like this, there’s sort of an additional operation, and I can also take the thing that doesn't bound anything and say, well that's not interesting to me, but that's also sort of an element. And the key is, the way to think of it is I'm not going to consider those other things because I’m going to think of it as like the same way I think about rank, or the basis rather, of a vector space. Well, there’s kinda, here’s two orthogonal vectors and then I can add in also to things that are the sum of them, right? But there's only two that are really sort of making anything up. And the same thing here. I’m going to sort of take the generators, as you say in group theory. Now, do this formally, there's lots of ways to do it, there's different types of homology, singular homology, simplicial homology. So basically the simplicial homology, you build, think of building a mesh or a simpler complex, which is a more general type mesh. And then I can start doing things like adding together triangles and I give an operation. That gives me these what are called chains, and then based on that I can look at boundary groups and cycle groups based on whether something has a boundary or not. We can do all this formally, but for now this is the intuition of what I want. So I'm going to look and count the number of sort of nonbounding cycles. Now, let’s get, that sort of algebraic topology or homology theory in one slide. And now let's get to vision. So I want to do this problem. I want to say curve evolution with topological control. So back to vision. And hopefully, if you have questions now about that, I'm going to deal with a much more specific case, and so the general theory won't be as important. So what’s curve evolution? So standard in vision, like snakes, I’m going to evolve a curve because I want it to go somewhere. I want to segment something; I want to track something. Usually segmentation. And these generalized surfaces, so I have generally a partial differential equation which looks like this. DCDT equals some force times the normal. It pushes things out in the normal direction and the strength of that force depends on what I'm trying to do. If I'm trying to drive it to a place where there are strong gradients, that the standard thing to do, then I can find something that has strong gradients, or I can find something more interesting that maybe depends on the interior of the curve or the surface and drive it to something. It's a standard technique. Now what I want to do here, I want to do this, take a general equation like this, it doesn't matter what the F is, okay the F will be application-specific, but I want to evolve a curve like that while fixing its topology. So here's the example of the sorts of things we use to get. Level sets with a nice technique for doing this which allowed you to get this sort of thing. I start with a curve out here, the big red curve, it evolves inwards, and eventually it snaps right around these blue things, and look, the topology has changed. And this was considered an advantage. People said this is great because we can have arbitrary topology. I can start with one, I get another one, and sometimes it is an advantage. And then sometimes it isn't. Why isn't it? Sometimes I know the topology of the thing I'm looking for. If you're into medical imaging, for example, you might know something like the liver has the topology, at least the surface has the topology of a sphere. The prostate does not. It has the topology of a torus because there is this thing going through the middle of it. I might want to use that fact. There are other examples, obviously, envision itself where we want to get something which is just a single piece. So we might want instead something that looks more like that. Now you can argue whether you want that or not, but here the idea is this white part is noise. I know somehow that it has to be one piece. There are other methods which do this. There are some more digital methods which do digital type topology. They end up with some strange artifacts, like you get sort of these long skinny fingers. Continuous methods, there are also continuous methods, but they don’t really address topology directly, they say things like, I want to ensure that two pieces which are far in geodesic distance but close in Euclidean distance don't touch each other. What does that mean? I get something that foreign geodesic distance but close you can use, somebody that’s trying to close in on each other I will not let it close in. I will repel. Again, that's like heuristic, and it works sometimes and not other times. In our case, we're going to look at a simple example here where we are looking at just curves, not surfaces, and the topology is just the topology of a circle, but this can be generalized to more interesting cases with surfaces. Now here is the term that we are going to add. So we have to talk about something called robustness. We’re going to focus on curves which are level sets of a function phi. And this works nicely with the existing level set formulations of all this kind of curve evolution stuff. So usually you take the zero level set of something. Now the robustness of a homology class, so a homology class, going back to what we said, is a piece of topology. The robustness of that homology class is basically how hard or easy is it to eliminate that class by perturbing the level set function. So let's take a look to make this clear. Here is a level set function and here is say the zero level set and it’s three pieces, right? And that looks just like this. There's the zero level set. So we have three homology classes. Now I'm going to cheat a little. I'm going to say I want to destroy one, I’m going to say let's destroy all of them so there's just one left. So let's perturb phi to get exactly one homology class. Now in this case this is simple, right? Let's stop calling it a homology class and call it components. But in general, one could do this for weirder cases. So we just want to get component back. What could we do? Well, we can do that. Let's see what happened. That piece went down, that piece of function, and this piece of function went up. Right? When that happens, when I now take the slice, I've lost, let's go back here, I have lost this piece entirely and these two have become joint. And you can see that exactly there. So what's interesting is if you look at what happened here, what did we raise? We raised the local minimum. Here, we lowered a saddle point. And that's not an accident. Critical points are actually crucial to this whole thing. So let's see that. So we want to talk about total robustness now. And that's related to this idea of destroying everything but one. The key is that the robustness of class is closely related to critical points. Let's see it again. So here we have a bunch of different critical points. We have, I’ve not drawn all of them, but we have this that we labeled M zero, which we’ll ignore for a second, we have this saddle, and we have this local minimum. And what we find is that it's not just that I want to raise one and lower the other, but the robustness is actually just equal to the absolute value, let's take this as the zero level set, okay? It's actually called the absolute value of the actual function value, the critical point. So here the robustness of this guy was just equal to this little, little distance here. I just had lower this a little bit and I can destroy the thing. Here I had to do a little more work. I had to raise it that high and then I’ve destroyed that class. So that, and as I said, this is not an accident. Now if you know some, if you're familiar with Moore's theory, this is just a sort of generalization of Moore's theory. So Moore’s theory says I can look at smooth functions and smooth functions on spaces, if I look at their critical points, is very closely related to the topology. And it's a very intimate relationship. Here, these are actually not smooth functions at all, right? They're often signed whatever distance functions. We’ve generalized, these aren't real critical points, actually. But it's okay, we can work our way around that, and you can define things which act just like critical points but are defined for non-smooth functions. Something called tame functions which is a much more general class, but anyways I won't get into details, but it works. So what do I want to do? I want to destroy everything but one component because in the end I want to have just one component right? I want to have the topology be the topology of a circle. So I'm going to define the degree to total robustness to be the sum of the squares instead of using absolute values because it would just be nicer, of all the critical points except for the global minimum. Why am I eliminating the global minimum? I want to keep one component. So this is the degree to total robustness, and then what I do is I want to now go ahead and note that I have this theorem that says, which we’ve proven in the paper, which says that I minimize the total robustness that ensures that the topology remains the topology of a circle. Now why is that critical to me is that, obviously if I just minimize this thing I’ll get this topology that I want, but what's more interesting is that the advantage of this is that if I add it to the flow it drives this towards the correct topology, but it doesn't restrict us. What do I mean by that? It doesn't say we have to have this topology of a curve the whole time; it will just drive us towards that. We can start with some other topology, and eventually it will drive us in that direction. Restriction to some existing topology leads to artifacts. Like I said, you end up with often long skinny fingers which are, you could have somebody that has the topology of a circle but it's got all these weird little things coming off it and that's not really what, that's kind of fake topology, right? It is the correct topology, but it’s weird looking. So this gets around that. We can ignore this is sort of, this is useful for taking derivatives, that is continuous in phi, so what ends up with is this flow. This is the old, if you know sort of the level set formulation, this is the standard flow one gets is a hyperbolic equation: D phi to T equals minus phi times the magnitude of the gradient, that I don't want to get into but it's standard converting from a curve evolution to a level set evolution. The new term is this, and this is a topological control term, and what's interesting here is basically you have a sum of delta functions around the critical points and this thing is equal to zero or one depending whether that critical point is relevant or not for the computation. And that's the new term. That drives everything. So let me show you some pictures of what you get with this. So here is this, here’s some toy examples then we’ll see some more real examples in a second, here is what's called that geodesic active contour. It’s basically like a fancy snake. You can see there are three components there. And here, when you add in topological control, you get one component and you can see in what we've changed here is we've merged components. Now that's not the only way to remove topology. Let me show you another example. Here's an octopus. And here's the octopus with topology control. And here we have a different situation. If you look at the green, excuse me, the blue rectangles, what we've done there is we've actually done something different which is to remove holes. That's not the same as merging components. It’s different. I’m actually taking away holes. But we've actually done a third thing too, which you can see over by the green parts, and that's we've torn handles. So this handle here has actually been torn and now you get something which has the correct topology and not all these extra handles. Here's another example, and you can see similarly a lot of hole removal and some handle tearing. Here some pictures of brains. This is 3-D now, although I didn’t talk about 3-D, but you can do it for 3-D. That's the Chan-Vese flow, which is a different geodesic active contours, basically tries to segment by keeping the interior of the two parts to be sort of uniform. So one should have a, if you like, a certain gray level or a certain distribution of the outside should have a different distribution. And there you get, you can see that there's non-surface topology. You can see these handles here and here. It's maybe a little hard to see. And now what I’m going to show you is something you can get by doing this, and you’ll notice that's a little hard to see, but you don't get those handles anymore. The topology is fixed. But we added other artifacts which became sort of grainy. That was actually due to the fact that there are many almost critical points and they contribute. It didn't look good there. In terms of the 2-D slices it does look a lot better, but that was a subject of ongoing investigation by Chao to see whether we could improve that. In the meantime though he said, and this appeared in CPR a couple years ago with Chao and Christoph Lampert, he said can we do this with GrabCuts? So this would be more interesting maybe. And this is harder to do because GrabCut is this sort of a global optimization, and how do we introduced this here? Everything else there was local. So the way we do this, remember you do GrabCuts to get segmentation, I'm talking about min-cut, really. This is like that sort of, so you label some points, so we used examples from the graph cut database, you label some interior points, some exterior points and you let the thing run and give you the best segmentation. So how do we do this here? We run the whole thing once without any, just the regular min-cut, and then we take the inside and we say to ourselves, okay, let's readjust the unary term. We’ll readjust the unary term based on having run our topology. So the unary term on its own afterwards would give you something which wasn't quite the right topology. We could adjust the topology of that. So we rerun the whole thing and then use that as the unary term afterwards. And you can show some very, very simple theorems on what this guarantees. It doesn't guarantee everything. Chao’s sort of a theory guy, so he likes to prove everything, and he was disappointed he couldn't prove anything here. But it does give us some nice pictures. So here's the standard example. Here is an image and there is the tri-map, and the tri-map the idea is you know this is inside and then you know that this is outside and you have to compute the rest, this black part. It's outside and then you can do the rest. And so min-cut gives you this. And we get the proper topology here, and it’s very good at finding these sort of thin structures, but also things like this, at filling in this sort of piece or several pieces, and again here. Christoph did a very sort of in-depth kind of work on actually showing the performance rather than just showing pictures. I like pictures. Okay. So that's it for topology. And, like I said, if you find that interesting I'd be happy to point you in other directions. Now some very, I want to switch onto things I've done recently in the last year. >>: We’re kind of running out of time. >> Daniel Freedman: Oh, are we? It's an hour not an hour and a half? >>: Yeah, it’s more like an hour. >> Daniel Freedman: More like an hour. Okay. How about we’ll do 3-D sensing in five minutes and we’ll skip the last part. So 3-D sensing, and I can’t tell you too much anyways due to the NDA so this is the problem, this is joint work with Eyal Krupka, who’s our lab, who is our manager, Yoni Smolin, my intern, and Ido Leichter. So time of flight sensors sends out some light and measures how long it takes until it comes back. So here's the picture. We have, the little thing on the left is the camera, the little thing on the right is the piece of surface. I sent out a beam, measure it coming back, and now I have the time. And of course, what I end up doing is not measuring time, but I measure phase. And the time of flight sensor does this for all directions within the field of view. And that's how you get your picture. Now, in fact what you get is an integral over time with a received signal to get rid of noise. Multipath. So this is the problem we tackled. More than one path arrives at the sensor along the same way. And so there's the camera, there's the surface, that's what we had before. Now let’s add in a floor. And now, I could have another path that went like that, and of course that would come back along the same ray, and I don't really know which one's which; the blue one is correct, though one is incorrect, but the problem is that it causes, multipath causes these problems because the depth is based on the time of flight; and I don't get either time of flight, in fact what I get is some sort of weird mixture of both paths together. Now this causes major problems. So our idea was to diagnose and hopefully eliminate it. And we wanted to take into many kinds of multipath, not just the simple kind, and the idea was to do it with a theoretically well justified yet light weight algorithm. Why lightweight? Because we really have no time to do this. This isn’t real-time, this is like crazy real-time with almost no computational budget. Now what’s Lambertian Multipath? Just very quickly, here we have, so an ideal Lambertian surface will give an infinitesimal amount of light equally in all directions, so we have a picture like that. So if you have an infinitesimal amount coming that's not going to bother you but you have to have an infinite number of infinitesimal amounts and add those up and then you get something. So what you get? So something like this. Let's say they're all bouncing off of different points to the same nearby and each one’s giving off a little bit in that direction, and then end up with this very ugly smeary thing which looks like this. So this is the regular two path. Here's intensity versus distance. Here's what you get with Lambertian. You get something that’s sort of smeared out like this. And then, of course, you can get three path or two path plus Lambertian that could be related if the surface or whatever. And that's what I wanted to tell you. This is a hard problem, an interesting problem, and we have a nice solution, which is great, and I can't tell you too much about it, but that's it. Thank you.