1 >> Lev Nachmanson: Hello. I'm happy to present... at the University of Tubingen.

advertisement
1
>> Lev Nachmanson: Hello. I'm happy to present Michael Kaufmann who is a professor
at the University of Tubingen.
Michael's interests are mainly in graphs, in graphs visualizations and graph drawing and
algorithms and complexity and in many other things.
So I hope you will enjoy the talk. Here's Michael.
>> Michael Kaufmann: Thanks, Lev. It's a pleasure for me to be here. Thanks for the
invitation. I will talk about map labeling with leaders. Map labeling is a traditional and
old problem, how to label map, how to put text on maps.
And I will talk first about map labeling and review the traditional style and then I go to
map labeling with leaders, a special model that kind of we invented. And we are a core
group of Michael Bekos and Antonios Symvonis from Athens in Greece. And we did
this work in the last five, six, years.
And part of the work was also done together with these four other people. And the first
paper, for example, was shared with Alexander Wolff from Karlsruhe and the last one by
Martin Nollenburg, also from Karlsruhe. And the other -- and the other two people from
Greece, they also shared some of the work in the middle.
So what map labeling. Here's the traditional problem. You have a map and you want to
label -- it's a map of France, and what you want to do is you want to label some regions
of France. And the way to do it, the very simple way is to put a point somewhere in the
middle of the regions, and then you place a label next to the point.
And the rule here that I applied was to place these labels always in the northeast corner of
the point. Okay.
The problem that you also see already is that some of the labels overlap, and that should
be avoided. That's the map labeling problem that you want to avoid, cluttering of the
labels.
So I will -- as I'm mainly algorithmic oriented, I will restrict on very simple mathematical
models, and I want always try to achieve optimal solutions and make this all this to find
efficient algorithms to find such solutions.
Okay. And here's, again, the model, the simple model for map labeling. So what you -normally you model a label by a box, and the box is attached to a point, like here. So you
have four points, A, B, C, D. And they are -- you can attach the text in the northeast
corner. It would look like that. Then you see certain overlaps here. Okay. That's the
1-pos problem, the one position problem.
But of course you can also extend this to two positions or three positions, four positions.
There are several models there. 1-pos and 2-pos models are the most simple. So in the
2-pos model, for example, you have two possibilities to place the label.
Yeah. But let's come back to the 1-pos problem. And what you see is that the label of A
and B overlap, would overlap, so you can only place one of them. Okay. And you model
2
this by a conflict graph. So you say vertex -- you have two vertices, A, B, C, D, and two
labels are in conflict, they overlap.
And then you solve maximum independent set problem. The maximum independent set
problem places labels on the vertices of this graph such that at each vertex -- at each edge
there's only at most one label that's resolved all the conflicts.
The maximization problem here of course has the goal to maximize the number of labels
that can be done here. For example, you could realize the labels for A and D or for B and
C. But nothing more. So that means here you shoot for maximization problem.
But of course maximum independent set is in general NP-complete, and also for these
kind of graphs it's NP-complete. Therefore people use heuristics or approximations and
exponential-time algorithms to get the solution.
Okay. Yeah. So this is very simple and basic model. But what we want to solve is such
problems. So that's the map that motivates -- motivated our research. That's a map of
Greece, and it shows the infrastructure of the Greek school system.
And what you see also, there are a lot of labels. And labels are attached. Here's Athens
here. Labels might be attached to cities, like here or there.
But they also have as attachments big label blocks. And of course such blocks cannot be
attached to the city. And, therefore, you place them somewhere else, and the designer
here -- well, for Greece, there's lots of sea around, so there's space to place the labels
somewhere around.
And the labels -- you see the labels are connected to the cities by polygonal segments.
And these segments will be called leaders in the following.
Okay. So we want -- that was our try to label such maps automatically, to find
algorithms for such maps. But of course we are -- well, we started with very simple
mathematical model that we could -- well, where we restricted on some basics.
And this is here. The first model is that we say we have a rectangle and the points that
have to be labeled are in this rectangle. And we consider only that we said all the labels
should be outside of the rectangle. And here we say let's restrict on the case that they are
on the right-hand side.
Okay. So we invented two different kinds of leaders. We wanted to have -- I will talk
here only on the rectilinear leaders. We also have had other kinds of leaders. But here I
want to talk only on rectilinear leaders.
The first is OPO kind of leaders. OPO means orthogonal parallel orthogonal, so it's each
leader consists of let's say three segments. So the first segment is orthogonal, runs
orthogonal to the side where the label is, and then a little bit in parallel, and then again a
little bit orthogonal.
So it goes straight to the boundary, and then there's little routing problem to be
[inaudible].
3
For PO leaders, it's a little bit simpler. So we have -- each leader has only two segments,
vertical and horizontal in this case. Of course the labels have to be disjoined, they have
to lie outside but touch the boundary and so on.
Intersections of leaders with other leaders are not allowed, so we don't want to have
crossings. Okay. And you might prescribe ports where the leaders touch or not. So,
again, two different models here.
So let's come to the first algorithm. Here's an example. So we just consider the problem
all PO leaders connecting labels to one side, like here. And the labels have uniform
height. Are like this. Okay.
So and here you see your solution. Maybe it's not so visible, but of course -- so what's
the method to do that, we want to have. I remind you. What is the method that avoids
crossings of the leader.
Well, you start here, assign that label at that point to the first label. The second to the
second. The third to the third. Very easy. So you ->> [inaudible]
>> Michael Kaufmann: Yeah. You just do a sweep line. Okay. Again, summary of this
algorithm. The model is you have a fixed size and fixed ports for the labels. The goal is
to find a feasible solution, just a solution, no real optimization. And the method is just a
plane-sweep from top to bottom, from bottom to top, whatever. We did always from
bottom to top. Okay.
So that means you connect the i-th lowest point to the i-th lowest label. Okay.
And, well, to do that, you just have to sort and then you assign that. Running time is N
log no.
Okay. That was the first algorithm. Now here's the second. The second algorithm we
want to now minimize. And what we want to minimize is the number of bends. You
remember OPO has two bends or zero bends.
If it goes straight, in the last example there were many leaders that just went straight to
the label. I was lucky. Here we want to minimize the number of bends. And that means
we maximize the number of straight leaders, because it's either zero or two bends.
And, well, we consider the model that the label should have uniform height. Let's say H.
They should all have the same heights. And the method is dynamic programming to do
that. You can do it efficiently by dynamic programming.
So you define S of KI. S of KI is -- okay. I should draw something here. So what you
have is a set of labels on the right-hand side. And, well, if there's space, if there's space,
then you are allowed to do that, to have some slack here.
But what you want to control is this height. That's what I call stack, because the labels
4
stack on top of each other. And this is S of KI.
So you look at the K lowest points if there are K points here, so this is the first one, this is
the second one, this is the Kth. And if you compute the lowest height where -- for the
first K points, you have I of them as straight. So you save two I bends.
How do we do that? Let's see. So the lowest -- place the lowest -- the lowest label, it just
goes on the -- at the baseline. The label has height H. Okay.
So S of 1, 0 -- so no matter what you do is just place it here. And if 1 is up here, then you
cannot -- well, then you cannot do -- can not do this straight. And that means so 1, but no
straight.
The next is 1 but 1 straight. So either if 1 is down here, then the first is here. But if it's
here, then -- and this should be straight, then you move that label a little bit up. But only
to this point. Okay. And that is -- this height is YI, Y1. So it's the Y coordinate of the
first point.
Okay. Good. And then we go to the recursion, and the recursion is the following. So
what is SKI -- or let's say we have already SK minus 1 and we consider now the Kth
point, which is here. Okay. Well, if we have already I is straight here, then we just place
the Kth label on that, on top of the lowest. That is this, SK minus 1. I plus H, that's the
height.
If not, if we have -- then we compare this, so this solution, maybe we have that, to the
other solution where we have here K minus 1 I minus 1. That means this Kth should be
straight. And you place it there. And that's -- well, you compute this, you go to YK
minus H, and then you add a plus H on top of it. And that's it. That's all.
So that's -- here you have your recursion, and then you just fill the table. It's a
two-dimensional table.
>> [inaudible]
>> Michael Kaufmann: Well, you start here and then you just compute these numbers.
K runs from 1 to N, I runs from 1 to -- at most to N.
>> Is it N squared then?
>> Michael Kaufmann: It's N squared. At the end you would have S of N combined. So
you need so many entries.
Okay. That was the second algorithm. Now I'll go to the third. The third algorithm is
let's allow labels on two sides. Okay. Now we know how to do it on one side, OPO, now
we say, well, two sides.
And but very simple model. Let's say we have labels of maximum height, so such that
they just fit on both sides. Well, how would you do that? Well, you just have to decide
on which side the leader goes. You have to decide that this leader, that's Sicily, goes -the label of Sicily is on the left-hand side, otherwise it's on the right-hand side. So you
5
have to -- somehow you have to split the set in two parts, and then you solve two
independent problems, right? Well, that's easy.
So you -- what you can do, just move here, go to such that N half on that side, N half on
that side, and then you just do that. But maybe it's not optimal.
As you see here, here we solve the maximization problem. Namely, we want to minimize
the total leader length. Okay. For example, here in Italy, Italy is like a diagonal, right?
So many points on the north on the left, and in the south on the right. And if you would
split it exactly in half, then points that are here in half that are in this area, they would be
connected over there. And that would make very long leaders. And we want to avoid
that. Right?
To do that we also use dynamic programming. And here we just say L of KI is the
minimal length, so we sweep again from bottom to top, consider the Kth lowest points,
and we say this is the minimal length wherefore the Kth lowest point, I of them on the
left-hand side, the others on the right-hand side, and therefore we compute the minimal -the solution of minimal lengths, and then we add new one.
And, well, the recursion is very similar to the other one. So just fill a table for different
variables of K and I, and the running time is O of N squared again.
Okay. Third. Now fourth. We allow labels on all four sides. Part one is -- well, I
suggested that already. Before we had Italy and we just divided the set of points in two
parts, in two sets. The points that go -- those labels -- leaders go to the right, and the
others go to the left. Here we would have four sides. So four sets.
And there we have to be a little bit more careful because the leaders -- some leaders go
horizontal and the others go vertical. If they go to the topside, labels on the top, they
have to go vertical.
But then we have to avoid that vertical and horizontal leaders cross. So that means we
have to partition the problem, the rectangle such that -- you have to partition the rectangle
such that -- in something like this such that here they go to this, here they go to this, here
they go to the bottom, and here they go to the left. Such that here we have N over 4 -let's say N over 4 and just assume that we have N over 4 labels on that side, on that side.
So we have to partition it in that way.
And the right formulation is that those areas should -- those regions should be convex.
But that can be done by a rotational sweep of this to find -- this is not straight. So
imagine that those points are sitting all in here. Then it's much harder to split into
[inaudible] in four parts. But it can be done in log N.
But here I want to show something else. So that's actually a map from a Web site of
Karlsruhe that shows the -- some addresses, some locations for day cares for children.
There are eight such day cares in Karlsruhe. And the guy who designed this just placed
the -- you really can see -- just placed the address the map and then draw [inaudible] this
to the locations without thinking anything.
6
Therefore, you get a mess. Many crossings that you cannot see anything. And he even
forgot one line. So he -- I think he forgot this one.
We solved it optimally, and it looks like this. And optimally means also with minimal
total length, leader length.
So here we had -- in this case we had only feasible solution when you partition.
>> So you didn't use the left side over here?
>> Michael Kaufmann: Yeah.
>> You did not.
>> Michael Kaufmann: No, no, no. No. We just say keep it on three sides.
>> I see.
>> Michael Kaufmann: But the method that we used is the following. We formulated
this problem as a matching problem, bipartite matching problem as follows. So in
bipartite matching you have two sets. One set are the locations of the day care of the
points, right, the coordinates, and the other are the labels. But you don't know to which
label the points will be connected.
So but you know where they are. That means you can -- if there is a leader from here,
from that point to this leader, then it will cost you. Then this leader will have that length.
And that gives the graph.
So you have the labels and the points, and you have edges of potential leaders, with
weights of the length. Then you do a minimal weight bipartite matching. Very easy.
Conceptually, very easy. And that can be done.
And you have to prove of course that what -- that the solution is crossing free. It's not
trivial to prove that. But you can do it with [inaudible] algorithm. This can be done in O
of N squared log cube N. [inaudible] algorithm is for matching problems in geometry in
the L1-norm, in the Manhattan norm.
Okay. Let's come to PO leaders. PO leaders, again, as I said, we have -- it's on the
right-hand side. So here I have the OPO solution and the PO solution. Okay. And the
first observation is there are no unique crossing-free routings in PO. Okay.
In OPO it's clear. It's unique. So you just go straight to the right-hand side to the
boundary and then connect it crossing free. So the lowest will go to the lowest and the
second will go to second and so on.
Here you see that the second point is connected to the third label. Although, you will see
that it could be different. Namely, what you do is just take the OPO solution and you flip
the first two segments. Okay. If you have an OPO, looks like this, then you do -- flip
these two segments to here, and that saves one bend and gives a PO leader.
7
But in that case the second would be connected to the second point. Now, but this is state
here. You might create crossings by that operation. How is this possible? Well, if this is
here, okay, and you would do the same, so 1 goes to 1, 2 goes to 2. And you create a
crossing here.
But you cannot resolve that by saying 2 now goes -- is assigned to 1, takes this label, and
that one takes this label. That's resolution of crossings.
But you have to do it several times, N squared times eventually. And that leads to a
crossing-free solution.
And here's an example. It's actually a nice application. In medical maps, you often find
labelings of this style. And it says N squared, use N squared time algorithm.
Okay. And if you know that, you can do everything -- all the algorithms that I have
before, the four algorithms you can do now for PO, the same. So now that means we
have eight.
So here are the theorems, corresponding theorems. So from the minimal total leader
length, and here it's four the two sides. You can do the same with PO. And I just -- I'll
just show the example, the corresponding minimal total leader length for PO and OPO.
So PO is here. OPO is here. And, well, which one is better?
>> I think on the left is PO, right?
>> Michael Kaufmann: The left is PO, right.
>> They have less totals?
>> Michael Kaufmann: Yeah, of course. I mean, they only make at most one bend, and
the other makes normally two bends. And the bends are -- in OPO are very close
together.
But what is not so good in PO, but for this example it doesn't matter, is that you have
additional vertical segments within the map. And here you just go straight outside in the
shortest way outside and you don't disturb much.
>> So a user study is needed.
>> Michael Kaufmann: Yeah. Okay. Yeah, we extended this to -- well, we said what
will happen if you don't have space for just one -- on one side. So you would have two
stacks, can you do that, can you do the same. The answer is yes, you can still do dynamic
programming and you can do that again. But the complexity is much higher because the
dynamic programming is much more complicated.
And you can even extend this to three stacks, and you even can say, well, if you use let's
say the second stack for your label, then the rule should be that the bends are next to the
stack. So it's clear that these bends that use this are here, but the bends to that label could
also be here. Why not. And we say no, we don't allow that. But this also works.
8
Okay. Another more general scenario is that you don't have points but polygons. If you
have -- want to label some regions. Like in is Germany now.
And what we do here is -- well, we formulate it, again, as a bipartite weighted matching
problem. And we abstract such a complicated region by a rectangle, by a large rectangle,
and allow to have the point somewhere on the boundary of this rectangle.
And then -- well, then if the label of Bavaria, this is Bavaria, this here, then it's optimal to
place the point somewhere down here. Because then the leader is very short and straight.
Okay. So and the weight, there you have to compute the minimum Manhattan distances
between the polygonal region of Bavaria, of this rectangle, and that one, and that label -or that label and that label and that label in the formulation as a bipartite weighted
matching problem. But this also works.
Okay. Next. The next, here's the motivating example. It's from the Deutsche Welle TV
weather forecast map. And you see that the points -- that somehow they used diagonals,
so not straight, and then these two bends and so. But one bend, but with a diagonal, it's
like diagonal segment and the straight, a horizontally straight.
So and that was not introduced by us but by bank [inaudible]. But we adopted Martin
Nollenburg from this group that invented that and extended the problem.
The problem that they didn't -- couldn't solve is the following. Imagine that the cities of
South America are all at this point, somewhere here. Then you cannot -- you cannot
connect from here to down there. It's impossible. If you do just diagonal and straight.
It's not possible. Therefore we extend this with the following.
So we allow three kinds of leaders. Namely, OD, PD, or DO. DO. And if you have only
these three -- so it's still -- the angle is not sharp, there is only one, and you have a
diagonal into it. And we could prove that if you only use two kinds of leaders, namely
DO and PD, go back, DO is this as before, as in the examples before, and PD, this, so if
it's too close, if this point is too close here, then you use PD. Then it works.
In O of N cubed time, you can do almost anything. Minimal total length, leader length.
But it's very nontrivial, the crossing resolution phase.
Okay. Another variant was done by Lin, Kao and Yen, not by us. So this is not our
work. This is Taiwan, and these are births that occur in Taiwan. And there are eight
different birds, but they occur in many different places. So they consider many-to-one
labeling.
Okay. So, for example, [inaudible] has four positions. Well, there you come back to -you can take rescue in graph drawing. In graph drawing you have such a problem.
Namely, you have -- what you have here is you want to minimize the number of
crossings in the two-layer scenario.
So we have two layers somehow and you -- this somehow is fixed, and you want to
rearrange the labels such that the number of crossings is minimal. And that's of course
NP-hard. And they propose heuristics for.
9
Okay. It's come to my last scenario, and that's mixed labeling. So now we want to -- so
we had this inner label, this traditional label. That was my introductory example. And
now we say, well, if we have clutter, if we have overlap, we cannot avoid it, then we use
boundary labeling. But only there -- only if it's necessary. So we mix it.
And we just take a very simple model, 1-pos, one position, northeast corner, and OPO
will lead us to the right-hand side. And this gives that solution without cluttering quite
nicely, I think.
Although -- yeah. Well, why do we need, for example, these three leaders? I mean, in
the middle area you had only here this cluttering. So if you have clutter, if you have
overlaps, then you have to use a leader. Well, at least one. Okay. And then, well, which
one would you take? Then it crosses, this leader crosses something else. And then that
makes ->> [inaudible]
>> Michael Kaufmann: Yeah. This and then this. Domino effect is the right word for it.
Okay. But let's consider these two labels. P to lower [inaudible], which one would you
take to be a leader?
>> [inaudible] the right.
>> Michael Kaufmann: No, that's not the right answer. If you would take this, the
topmost of the two, it would cross the other one. So that makes -- wouldn't cause that the
other is also a leader. Then you would have two leaders. That means you always have
the lowest as a leader and the other you don't know. Maybe also ->> I see.
>> Michael Kaufmann: -- by other reasons. And that's a rule. So here's the model again.
You have A, B, C, D, inner -- as inner labels, so you have the rectangles and the leaders.
I just have drawn the first segments, not the others. Because it's -- now it's clear.
Here's the solution for it. So you can realize only B as inner because -- well, and here -yeah. Here's the observation. If you have two intersecting boxes, assign the leader to the
lower of the two intersecting, of the two points.
For A and B, well, if A is an inner label, it intersects B. No matter if it's inner or a leader.
So it cannot be in a label. It must be a leader. And of course this leader destroys that one
and that one and just gives immediately that.
Yeah. And that's the crucial observation here. And then you just do a plane-sweep and
you say, well, for two intersecting boxes, apply that, assign a leader. And if you count
down and you have not assigned a leader to a box, then you can place the inner box there.
And you just go from top to bottom in time N log square N. And log square N because
you have to solve intersection problems in geometry in the plane.
So that's if you go to the right-hand side. If you go to the left, then the thing stops. It's
10
not true anymore. The observation is not true anymore. It's not symmetric because the
point is not in the middle but at the left. So you cannot say that, well, the top just throw
the leader to the left but the lowest of the two. It's not true. That's a solution.
So and there we didn't manage anything very good, but maybe it's not possible. We don't
know. So we applied a divide and conquer algorithm, and we succeeded with N to the
power of O of log N, which is still sub-exponential.
Actually, that's an amazing result because the maximization problem of inner labels is
NP-hard. If you maximize the inner labels plus the leaders, you can do it in
sub-exponential time.
Okay. Conclusion. Well, what did I show? I show -- I've shown a nice playground for
algorithms to work with, that we developed. We developed plane-sweep algorithms, we
developed bipartite matching algorithms and so on. Of course dynamic programming,
divide and conquer. A wide range of algorithms could be applied here.
I learned dynamic programming before I -- well, I knew how it stand, but I didn't really
apply it. And here I really had to really work hard with dynamic programming.
What is open is better algorithms for the mixed model. I think we wanted to come closer
to, well, what we had before, and that was a mixed model. Actually, you should not use
boundary labeling if you could apply -- if you have space in the middle that you can place
the labels inside.
But if there are many points and the labels are too big, then you should use the space
outside the boundary. Therefore, well, we did a whole range of generalization, but we
still didn't manage to do this, our old motivating blue map.
I mean, here, we don't have a rectangle, but the boundary is somehow polygon. And we
don't have nice stacked labels, but these labels are very much distributed. And
sometimes they take also care of the islands of Greece. And it's still changing. There are
more challenges inside of here.
Okay. Thanks a lot. That's the end of my talk.
>> Lev Nachmanson: Thank you very much. I have a question. So what -- the approach
is, I assume, very practical, so were they implemented in somewhere?
>> Michael Kaufmann: Um-hmm. Very implemented. These, what you have seen.
>> Lev Nachmanson: [inaudible] are these industrial tools?
>> Michael Kaufmann: No, no, just ->> Lev Nachmanson: I see. In your [inaudible] so what is your observation, is it easier
to work with diagonal leaders or rectangular leaders?
>> Michael Kaufmann: Graphically more appealing are diagonal leaders.
11
>> Lev Nachmanson: Mathematically?
>> Michael Kaufmann: Yeah. But to compute them or to resolve the crossings, it's much
easier to do it rectilinear. You have to put much more effort when you do it diagonal.
>> Lev Nachmanson: I have no more questions, so just -- thank you.
>> Michael Kaufmann: Thank you.
Download