1 >> Lev Nachmanson: Hello. I'm happy to present Michael Kaufmann who is a professor at the University of Tubingen. Michael's interests are mainly in graphs, in graphs visualizations and graph drawing and algorithms and complexity and in many other things. So I hope you will enjoy the talk. Here's Michael. >> Michael Kaufmann: Thanks, Lev. It's a pleasure for me to be here. Thanks for the invitation. I will talk about map labeling with leaders. Map labeling is a traditional and old problem, how to label map, how to put text on maps. And I will talk first about map labeling and review the traditional style and then I go to map labeling with leaders, a special model that kind of we invented. And we are a core group of Michael Bekos and Antonios Symvonis from Athens in Greece. And we did this work in the last five, six, years. And part of the work was also done together with these four other people. And the first paper, for example, was shared with Alexander Wolff from Karlsruhe and the last one by Martin Nollenburg, also from Karlsruhe. And the other -- and the other two people from Greece, they also shared some of the work in the middle. So what map labeling. Here's the traditional problem. You have a map and you want to label -- it's a map of France, and what you want to do is you want to label some regions of France. And the way to do it, the very simple way is to put a point somewhere in the middle of the regions, and then you place a label next to the point. And the rule here that I applied was to place these labels always in the northeast corner of the point. Okay. The problem that you also see already is that some of the labels overlap, and that should be avoided. That's the map labeling problem that you want to avoid, cluttering of the labels. So I will -- as I'm mainly algorithmic oriented, I will restrict on very simple mathematical models, and I want always try to achieve optimal solutions and make this all this to find efficient algorithms to find such solutions. Okay. And here's, again, the model, the simple model for map labeling. So what you -normally you model a label by a box, and the box is attached to a point, like here. So you have four points, A, B, C, D. And they are -- you can attach the text in the northeast corner. It would look like that. Then you see certain overlaps here. Okay. That's the 1-pos problem, the one position problem. But of course you can also extend this to two positions or three positions, four positions. There are several models there. 1-pos and 2-pos models are the most simple. So in the 2-pos model, for example, you have two possibilities to place the label. Yeah. But let's come back to the 1-pos problem. And what you see is that the label of A and B overlap, would overlap, so you can only place one of them. Okay. And you model 2 this by a conflict graph. So you say vertex -- you have two vertices, A, B, C, D, and two labels are in conflict, they overlap. And then you solve maximum independent set problem. The maximum independent set problem places labels on the vertices of this graph such that at each vertex -- at each edge there's only at most one label that's resolved all the conflicts. The maximization problem here of course has the goal to maximize the number of labels that can be done here. For example, you could realize the labels for A and D or for B and C. But nothing more. So that means here you shoot for maximization problem. But of course maximum independent set is in general NP-complete, and also for these kind of graphs it's NP-complete. Therefore people use heuristics or approximations and exponential-time algorithms to get the solution. Okay. Yeah. So this is very simple and basic model. But what we want to solve is such problems. So that's the map that motivates -- motivated our research. That's a map of Greece, and it shows the infrastructure of the Greek school system. And what you see also, there are a lot of labels. And labels are attached. Here's Athens here. Labels might be attached to cities, like here or there. But they also have as attachments big label blocks. And of course such blocks cannot be attached to the city. And, therefore, you place them somewhere else, and the designer here -- well, for Greece, there's lots of sea around, so there's space to place the labels somewhere around. And the labels -- you see the labels are connected to the cities by polygonal segments. And these segments will be called leaders in the following. Okay. So we want -- that was our try to label such maps automatically, to find algorithms for such maps. But of course we are -- well, we started with very simple mathematical model that we could -- well, where we restricted on some basics. And this is here. The first model is that we say we have a rectangle and the points that have to be labeled are in this rectangle. And we consider only that we said all the labels should be outside of the rectangle. And here we say let's restrict on the case that they are on the right-hand side. Okay. So we invented two different kinds of leaders. We wanted to have -- I will talk here only on the rectilinear leaders. We also have had other kinds of leaders. But here I want to talk only on rectilinear leaders. The first is OPO kind of leaders. OPO means orthogonal parallel orthogonal, so it's each leader consists of let's say three segments. So the first segment is orthogonal, runs orthogonal to the side where the label is, and then a little bit in parallel, and then again a little bit orthogonal. So it goes straight to the boundary, and then there's little routing problem to be [inaudible]. 3 For PO leaders, it's a little bit simpler. So we have -- each leader has only two segments, vertical and horizontal in this case. Of course the labels have to be disjoined, they have to lie outside but touch the boundary and so on. Intersections of leaders with other leaders are not allowed, so we don't want to have crossings. Okay. And you might prescribe ports where the leaders touch or not. So, again, two different models here. So let's come to the first algorithm. Here's an example. So we just consider the problem all PO leaders connecting labels to one side, like here. And the labels have uniform height. Are like this. Okay. So and here you see your solution. Maybe it's not so visible, but of course -- so what's the method to do that, we want to have. I remind you. What is the method that avoids crossings of the leader. Well, you start here, assign that label at that point to the first label. The second to the second. The third to the third. Very easy. So you ->> [inaudible] >> Michael Kaufmann: Yeah. You just do a sweep line. Okay. Again, summary of this algorithm. The model is you have a fixed size and fixed ports for the labels. The goal is to find a feasible solution, just a solution, no real optimization. And the method is just a plane-sweep from top to bottom, from bottom to top, whatever. We did always from bottom to top. Okay. So that means you connect the i-th lowest point to the i-th lowest label. Okay. And, well, to do that, you just have to sort and then you assign that. Running time is N log no. Okay. That was the first algorithm. Now here's the second. The second algorithm we want to now minimize. And what we want to minimize is the number of bends. You remember OPO has two bends or zero bends. If it goes straight, in the last example there were many leaders that just went straight to the label. I was lucky. Here we want to minimize the number of bends. And that means we maximize the number of straight leaders, because it's either zero or two bends. And, well, we consider the model that the label should have uniform height. Let's say H. They should all have the same heights. And the method is dynamic programming to do that. You can do it efficiently by dynamic programming. So you define S of KI. S of KI is -- okay. I should draw something here. So what you have is a set of labels on the right-hand side. And, well, if there's space, if there's space, then you are allowed to do that, to have some slack here. But what you want to control is this height. That's what I call stack, because the labels 4 stack on top of each other. And this is S of KI. So you look at the K lowest points if there are K points here, so this is the first one, this is the second one, this is the Kth. And if you compute the lowest height where -- for the first K points, you have I of them as straight. So you save two I bends. How do we do that? Let's see. So the lowest -- place the lowest -- the lowest label, it just goes on the -- at the baseline. The label has height H. Okay. So S of 1, 0 -- so no matter what you do is just place it here. And if 1 is up here, then you cannot -- well, then you cannot do -- can not do this straight. And that means so 1, but no straight. The next is 1 but 1 straight. So either if 1 is down here, then the first is here. But if it's here, then -- and this should be straight, then you move that label a little bit up. But only to this point. Okay. And that is -- this height is YI, Y1. So it's the Y coordinate of the first point. Okay. Good. And then we go to the recursion, and the recursion is the following. So what is SKI -- or let's say we have already SK minus 1 and we consider now the Kth point, which is here. Okay. Well, if we have already I is straight here, then we just place the Kth label on that, on top of the lowest. That is this, SK minus 1. I plus H, that's the height. If not, if we have -- then we compare this, so this solution, maybe we have that, to the other solution where we have here K minus 1 I minus 1. That means this Kth should be straight. And you place it there. And that's -- well, you compute this, you go to YK minus H, and then you add a plus H on top of it. And that's it. That's all. So that's -- here you have your recursion, and then you just fill the table. It's a two-dimensional table. >> [inaudible] >> Michael Kaufmann: Well, you start here and then you just compute these numbers. K runs from 1 to N, I runs from 1 to -- at most to N. >> Is it N squared then? >> Michael Kaufmann: It's N squared. At the end you would have S of N combined. So you need so many entries. Okay. That was the second algorithm. Now I'll go to the third. The third algorithm is let's allow labels on two sides. Okay. Now we know how to do it on one side, OPO, now we say, well, two sides. And but very simple model. Let's say we have labels of maximum height, so such that they just fit on both sides. Well, how would you do that? Well, you just have to decide on which side the leader goes. You have to decide that this leader, that's Sicily, goes -the label of Sicily is on the left-hand side, otherwise it's on the right-hand side. So you 5 have to -- somehow you have to split the set in two parts, and then you solve two independent problems, right? Well, that's easy. So you -- what you can do, just move here, go to such that N half on that side, N half on that side, and then you just do that. But maybe it's not optimal. As you see here, here we solve the maximization problem. Namely, we want to minimize the total leader length. Okay. For example, here in Italy, Italy is like a diagonal, right? So many points on the north on the left, and in the south on the right. And if you would split it exactly in half, then points that are here in half that are in this area, they would be connected over there. And that would make very long leaders. And we want to avoid that. Right? To do that we also use dynamic programming. And here we just say L of KI is the minimal length, so we sweep again from bottom to top, consider the Kth lowest points, and we say this is the minimal length wherefore the Kth lowest point, I of them on the left-hand side, the others on the right-hand side, and therefore we compute the minimal -the solution of minimal lengths, and then we add new one. And, well, the recursion is very similar to the other one. So just fill a table for different variables of K and I, and the running time is O of N squared again. Okay. Third. Now fourth. We allow labels on all four sides. Part one is -- well, I suggested that already. Before we had Italy and we just divided the set of points in two parts, in two sets. The points that go -- those labels -- leaders go to the right, and the others go to the left. Here we would have four sides. So four sets. And there we have to be a little bit more careful because the leaders -- some leaders go horizontal and the others go vertical. If they go to the topside, labels on the top, they have to go vertical. But then we have to avoid that vertical and horizontal leaders cross. So that means we have to partition the problem, the rectangle such that -- you have to partition the rectangle such that -- in something like this such that here they go to this, here they go to this, here they go to the bottom, and here they go to the left. Such that here we have N over 4 -let's say N over 4 and just assume that we have N over 4 labels on that side, on that side. So we have to partition it in that way. And the right formulation is that those areas should -- those regions should be convex. But that can be done by a rotational sweep of this to find -- this is not straight. So imagine that those points are sitting all in here. Then it's much harder to split into [inaudible] in four parts. But it can be done in log N. But here I want to show something else. So that's actually a map from a Web site of Karlsruhe that shows the -- some addresses, some locations for day cares for children. There are eight such day cares in Karlsruhe. And the guy who designed this just placed the -- you really can see -- just placed the address the map and then draw [inaudible] this to the locations without thinking anything. 6 Therefore, you get a mess. Many crossings that you cannot see anything. And he even forgot one line. So he -- I think he forgot this one. We solved it optimally, and it looks like this. And optimally means also with minimal total length, leader length. So here we had -- in this case we had only feasible solution when you partition. >> So you didn't use the left side over here? >> Michael Kaufmann: Yeah. >> You did not. >> Michael Kaufmann: No, no, no. No. We just say keep it on three sides. >> I see. >> Michael Kaufmann: But the method that we used is the following. We formulated this problem as a matching problem, bipartite matching problem as follows. So in bipartite matching you have two sets. One set are the locations of the day care of the points, right, the coordinates, and the other are the labels. But you don't know to which label the points will be connected. So but you know where they are. That means you can -- if there is a leader from here, from that point to this leader, then it will cost you. Then this leader will have that length. And that gives the graph. So you have the labels and the points, and you have edges of potential leaders, with weights of the length. Then you do a minimal weight bipartite matching. Very easy. Conceptually, very easy. And that can be done. And you have to prove of course that what -- that the solution is crossing free. It's not trivial to prove that. But you can do it with [inaudible] algorithm. This can be done in O of N squared log cube N. [inaudible] algorithm is for matching problems in geometry in the L1-norm, in the Manhattan norm. Okay. Let's come to PO leaders. PO leaders, again, as I said, we have -- it's on the right-hand side. So here I have the OPO solution and the PO solution. Okay. And the first observation is there are no unique crossing-free routings in PO. Okay. In OPO it's clear. It's unique. So you just go straight to the right-hand side to the boundary and then connect it crossing free. So the lowest will go to the lowest and the second will go to second and so on. Here you see that the second point is connected to the third label. Although, you will see that it could be different. Namely, what you do is just take the OPO solution and you flip the first two segments. Okay. If you have an OPO, looks like this, then you do -- flip these two segments to here, and that saves one bend and gives a PO leader. 7 But in that case the second would be connected to the second point. Now, but this is state here. You might create crossings by that operation. How is this possible? Well, if this is here, okay, and you would do the same, so 1 goes to 1, 2 goes to 2. And you create a crossing here. But you cannot resolve that by saying 2 now goes -- is assigned to 1, takes this label, and that one takes this label. That's resolution of crossings. But you have to do it several times, N squared times eventually. And that leads to a crossing-free solution. And here's an example. It's actually a nice application. In medical maps, you often find labelings of this style. And it says N squared, use N squared time algorithm. Okay. And if you know that, you can do everything -- all the algorithms that I have before, the four algorithms you can do now for PO, the same. So now that means we have eight. So here are the theorems, corresponding theorems. So from the minimal total leader length, and here it's four the two sides. You can do the same with PO. And I just -- I'll just show the example, the corresponding minimal total leader length for PO and OPO. So PO is here. OPO is here. And, well, which one is better? >> I think on the left is PO, right? >> Michael Kaufmann: The left is PO, right. >> They have less totals? >> Michael Kaufmann: Yeah, of course. I mean, they only make at most one bend, and the other makes normally two bends. And the bends are -- in OPO are very close together. But what is not so good in PO, but for this example it doesn't matter, is that you have additional vertical segments within the map. And here you just go straight outside in the shortest way outside and you don't disturb much. >> So a user study is needed. >> Michael Kaufmann: Yeah. Okay. Yeah, we extended this to -- well, we said what will happen if you don't have space for just one -- on one side. So you would have two stacks, can you do that, can you do the same. The answer is yes, you can still do dynamic programming and you can do that again. But the complexity is much higher because the dynamic programming is much more complicated. And you can even extend this to three stacks, and you even can say, well, if you use let's say the second stack for your label, then the rule should be that the bends are next to the stack. So it's clear that these bends that use this are here, but the bends to that label could also be here. Why not. And we say no, we don't allow that. But this also works. 8 Okay. Another more general scenario is that you don't have points but polygons. If you have -- want to label some regions. Like in is Germany now. And what we do here is -- well, we formulate it, again, as a bipartite weighted matching problem. And we abstract such a complicated region by a rectangle, by a large rectangle, and allow to have the point somewhere on the boundary of this rectangle. And then -- well, then if the label of Bavaria, this is Bavaria, this here, then it's optimal to place the point somewhere down here. Because then the leader is very short and straight. Okay. So and the weight, there you have to compute the minimum Manhattan distances between the polygonal region of Bavaria, of this rectangle, and that one, and that label -or that label and that label and that label in the formulation as a bipartite weighted matching problem. But this also works. Okay. Next. The next, here's the motivating example. It's from the Deutsche Welle TV weather forecast map. And you see that the points -- that somehow they used diagonals, so not straight, and then these two bends and so. But one bend, but with a diagonal, it's like diagonal segment and the straight, a horizontally straight. So and that was not introduced by us but by bank [inaudible]. But we adopted Martin Nollenburg from this group that invented that and extended the problem. The problem that they didn't -- couldn't solve is the following. Imagine that the cities of South America are all at this point, somewhere here. Then you cannot -- you cannot connect from here to down there. It's impossible. If you do just diagonal and straight. It's not possible. Therefore we extend this with the following. So we allow three kinds of leaders. Namely, OD, PD, or DO. DO. And if you have only these three -- so it's still -- the angle is not sharp, there is only one, and you have a diagonal into it. And we could prove that if you only use two kinds of leaders, namely DO and PD, go back, DO is this as before, as in the examples before, and PD, this, so if it's too close, if this point is too close here, then you use PD. Then it works. In O of N cubed time, you can do almost anything. Minimal total length, leader length. But it's very nontrivial, the crossing resolution phase. Okay. Another variant was done by Lin, Kao and Yen, not by us. So this is not our work. This is Taiwan, and these are births that occur in Taiwan. And there are eight different birds, but they occur in many different places. So they consider many-to-one labeling. Okay. So, for example, [inaudible] has four positions. Well, there you come back to -you can take rescue in graph drawing. In graph drawing you have such a problem. Namely, you have -- what you have here is you want to minimize the number of crossings in the two-layer scenario. So we have two layers somehow and you -- this somehow is fixed, and you want to rearrange the labels such that the number of crossings is minimal. And that's of course NP-hard. And they propose heuristics for. 9 Okay. It's come to my last scenario, and that's mixed labeling. So now we want to -- so we had this inner label, this traditional label. That was my introductory example. And now we say, well, if we have clutter, if we have overlap, we cannot avoid it, then we use boundary labeling. But only there -- only if it's necessary. So we mix it. And we just take a very simple model, 1-pos, one position, northeast corner, and OPO will lead us to the right-hand side. And this gives that solution without cluttering quite nicely, I think. Although -- yeah. Well, why do we need, for example, these three leaders? I mean, in the middle area you had only here this cluttering. So if you have clutter, if you have overlaps, then you have to use a leader. Well, at least one. Okay. And then, well, which one would you take? Then it crosses, this leader crosses something else. And then that makes ->> [inaudible] >> Michael Kaufmann: Yeah. This and then this. Domino effect is the right word for it. Okay. But let's consider these two labels. P to lower [inaudible], which one would you take to be a leader? >> [inaudible] the right. >> Michael Kaufmann: No, that's not the right answer. If you would take this, the topmost of the two, it would cross the other one. So that makes -- wouldn't cause that the other is also a leader. Then you would have two leaders. That means you always have the lowest as a leader and the other you don't know. Maybe also ->> I see. >> Michael Kaufmann: -- by other reasons. And that's a rule. So here's the model again. You have A, B, C, D, inner -- as inner labels, so you have the rectangles and the leaders. I just have drawn the first segments, not the others. Because it's -- now it's clear. Here's the solution for it. So you can realize only B as inner because -- well, and here -yeah. Here's the observation. If you have two intersecting boxes, assign the leader to the lower of the two intersecting, of the two points. For A and B, well, if A is an inner label, it intersects B. No matter if it's inner or a leader. So it cannot be in a label. It must be a leader. And of course this leader destroys that one and that one and just gives immediately that. Yeah. And that's the crucial observation here. And then you just do a plane-sweep and you say, well, for two intersecting boxes, apply that, assign a leader. And if you count down and you have not assigned a leader to a box, then you can place the inner box there. And you just go from top to bottom in time N log square N. And log square N because you have to solve intersection problems in geometry in the plane. So that's if you go to the right-hand side. If you go to the left, then the thing stops. It's 10 not true anymore. The observation is not true anymore. It's not symmetric because the point is not in the middle but at the left. So you cannot say that, well, the top just throw the leader to the left but the lowest of the two. It's not true. That's a solution. So and there we didn't manage anything very good, but maybe it's not possible. We don't know. So we applied a divide and conquer algorithm, and we succeeded with N to the power of O of log N, which is still sub-exponential. Actually, that's an amazing result because the maximization problem of inner labels is NP-hard. If you maximize the inner labels plus the leaders, you can do it in sub-exponential time. Okay. Conclusion. Well, what did I show? I show -- I've shown a nice playground for algorithms to work with, that we developed. We developed plane-sweep algorithms, we developed bipartite matching algorithms and so on. Of course dynamic programming, divide and conquer. A wide range of algorithms could be applied here. I learned dynamic programming before I -- well, I knew how it stand, but I didn't really apply it. And here I really had to really work hard with dynamic programming. What is open is better algorithms for the mixed model. I think we wanted to come closer to, well, what we had before, and that was a mixed model. Actually, you should not use boundary labeling if you could apply -- if you have space in the middle that you can place the labels inside. But if there are many points and the labels are too big, then you should use the space outside the boundary. Therefore, well, we did a whole range of generalization, but we still didn't manage to do this, our old motivating blue map. I mean, here, we don't have a rectangle, but the boundary is somehow polygon. And we don't have nice stacked labels, but these labels are very much distributed. And sometimes they take also care of the islands of Greece. And it's still changing. There are more challenges inside of here. Okay. Thanks a lot. That's the end of my talk. >> Lev Nachmanson: Thank you very much. I have a question. So what -- the approach is, I assume, very practical, so were they implemented in somewhere? >> Michael Kaufmann: Um-hmm. Very implemented. These, what you have seen. >> Lev Nachmanson: [inaudible] are these industrial tools? >> Michael Kaufmann: No, no, just ->> Lev Nachmanson: I see. In your [inaudible] so what is your observation, is it easier to work with diagonal leaders or rectangular leaders? >> Michael Kaufmann: Graphically more appealing are diagonal leaders. 11 >> Lev Nachmanson: Mathematically? >> Michael Kaufmann: Yeah. But to compute them or to resolve the crossings, it's much easier to do it rectilinear. You have to put much more effort when you do it diagonal. >> Lev Nachmanson: I have no more questions, so just -- thank you. >> Michael Kaufmann: Thank you.