Document 17864885

advertisement
>> Lev Nachmansen: The first session -- the first talk in the session is on shrinking the search
space for clustered planarity, and it will be given by Karsten Klein.
>> Karsten Klein: So in this talk I will give some new results on the complexity of the clustered
planarity problem. And these are basically derived from graph theoretic reductions of the search
space.
And this work is done by Markus Chimani and me. And my position is also sponsored by Tom
Sawyer Software. So this was a, well, obvious mentioning of the sponsor of the session.
So the clustered planarity problem is a longstanding open problem basically since clustered
graphs have been introduced by Feng, et al., in the '90s. And the clustered graph is just a
standard graph together with a inclusion hierarchy that the given by a rooted tree and the leaves
of the tree are the nodes of the graph, and we would like to try out this clustered graph in a way
that the clusters are nicely represented by some simple closed region like in a [inaudible] divide.
So cluster planarity now an extension to planarity covers some aspects that are introduced
because of this inclusion hierarchy which cannot be covered just by counting edge crossings.
For example, if you look at this graph without any edges, if I add some matching edges to it, the
graph is still planar, and I will tell you in a minute why I have this crossing here, because now
using this inclusion hierarchy we can add some clusters that make a cyclic structure out of this
nonconnected graph. And I can add a further set of clusters.
And now as you can see we can are have a K33-like structure which obviously is not planar, but
the underlying graph still is nonconnected and is planar, and it can even get rid of this crossing
that would be needed for the K33 by just rooting the edge like this.
But now I have an edge-region crossing. This edge here goes through the middle region that
represents one of the clusters, and we would like to avoid these when we talk of cluster planar
drawings.
The same thing can happen if -- even if the edge leaves the cluster because it can just reenter the
cluster and partition the cluster, for example, to go around such an obstacle.
This is also an edge-region crossing. There's a third kind of crossing besides edge crossings and
edge-region crossings that can occur in a clustered graph drawing when we have two regions that
cross. So we have no edges that are involved here, but the regions cross. And all of these cases
should be avoid when we want to have a cluster planar drawing.
So the concept of cluster planarity basically requires that we have a planar drawing of the
underlying graph. Each edge crosses the boundary of the drawing of the clustered most once,
which gets rid of the edge-region crossings, and we have a so-called inclusion representation of
the tree T.
And inclusion representation is just a representation of each cluster as a simple closed region
such that the subtree rooted at the cluster is represented within this region.
For the green cluster, for example, it has a brown and an orange tri-cluster and three vertices, and
the brown and the orange tri-cluster regions are within the region for the green cluster and also
the vertices and edges in these tri-clusters.
So C-planar drawings have a planar drawing of the underlying graph and the two additional
requirements.
There has been a lot of work for C-planarity. I don't want to go into detail for all of the classes,
but I will shortly discuss one of these results because we will use the concepts in the further
discussion, and these are the completely connected graphs which were introduced by Cornelsen
and Wagner in 2003.
A completely connected graph is a graph -- is a clustered graph where for each cluster the
induced subgraph of the cluster and the induced subcraft of the complement are connected.
In this example, if you look at this cluster, the induced subgraph is connected and also the
induced subgraph of the complement is connected, and the same is also for the other two
clusters, so this is a completely connected graph.
Why is this a useful concept? Well, basically this requirement gets rid of the two additional
properties that we have for cluster planarity in addition to standard planarity, which is that we
don't want to have edge-region crossings and region-region crossings.
And so it's not a surprise that there's this result also by Cornelsen and Wagner stating that a
completely connected clustered graph is C-planar if and only if the underlying graph is planar.
So for cluster graphs that are completely connected, the cluster planarity is reduced [inaudible].
So how can we exploit this result now? Using again a result by Cornelsen and Wagner, so it's a
great paper, you should read it, which is basically saying that a clustered graph is C-planar if and
only if it is a subgraph of a C-planar completely connected clustered graph, which means that we
can try to find the planar completely connected augmentation of a graph, and if this
augmentation exists, we know that the graph is C-planar. And if the augmentation does not
exist, we know it cannot be C-planar.
So that brings us to the following situation. We have this general problem where the
complexities are known. We know that for this very restricted class of graphs, the completely
connected graph, the problem of finding the clustered planarity equals planarity. And in between
we have an augmentation that could be possible.
And the question that we can ask now is what is the complexity of this augmentation?
Obviously if you can augment a graph to be completely connected, this augmentation should
somehow capture the whole complexity of the clustered planarity problem.
So we can ask questions like which edges for the augmentation are always needed, so we have to
add them, reducing the problem; which could be needed, so they need to be in the pool when we
search in our search space; which are never needed, so we can just remove them from the
problem, or are they, for example, equivalence classes where we know we can just pick one of
the edges. But we don't know exactly which one we need.
So here comes our result interplay. The search space reduction characterizes a set of sufficient
edges. And especially a set that is smaller than the whole set of augmentation edges.
So let's get into details. First off, I'm going to explain a couple of concepts before I explain our
results.
We want to achieve complete connectivity, augmenting a graph or trying to find an augmentation
and then testing just planarity to find if the graph is C-planar or not.
So we have to achieve cluster connectivity for the cluster and connectivity also for the
complement of the cluster. If we look now at cluster, we have a couple of chunks, which is just
the connected components of the induced subgraph, and we would like to connect these chunks.
There are many versions to do that. For example, even for this single node we can connect it to
every other node in all other chunks, so there's a huge search space. And we can do that for all
of the nodes and increase the complexity of our search space.
On the other hand, if you look at the following situation, if you have subclusters in this cluster
that connect already chunks, then we know because we have to achieve the connectivity for all
clusters, all of the subclusters will be connected.
So as some of the chunks are connected now via the subclusters, for the bigger cluster we don't
have to care about connecting this chunk, for example, with this chunk, because they are already
connected over the subclusters.
So instead of connecting all the chunks, it is sufficient to connect these new substructures. This
one here we just connected over that subcluster and this one here which has three subclusters,
and all we need is connect from top to bottom and not in between the different chunks.
Because this is an important concept, we gave it a name. We call these subcluster connected
chunks backs of a cluster. And instead of connecting chunks for connectivity, we know now that
we only have to interconnect bags. I just use the term interconnect to distinguish between
connecting in a bag and interconnecting different bags.
So for connectivity we interconnect bags. And a further reduction that we can do is have a look
at this bag here. This is a connection to the outside. And imagine this connection would not be
there. So what is the influence now of this bag to the overall C-planarity of the graph. If this
bag is C-planar and this one is C-planar, then I can just add it somewhere in this cluster, and I
don't care where it is. It does not restrict the C-planarity of the remaining graph. And if it is not
C-planar, the whole graph is not C-planar.
So what I can do with this bag that does not have a connection to complement of the cluster is
just test it independently. So in the beginning what we do is a partitioning of the input graph,
removing all of the maximum bags that we have that are not connected to the complement of
their containing cluster.
And in the remainder we just focus on the bags that are connected over vertices to the
complement of the cluster.
In the following I will call these vertices or the active vertices.
So this is for the connectivity. And if you remove the cluster, we want to achieve
co-connectivity or connectivity of the complement of the cluster.
Now, here the same mechanism can be applied. If there are subclusters or clusters that connect
chunks in the complement, we know also because we want to achieve connectivity of these
clusters that parts of the chunks in the complement may be connected by these clusters, and we
don't need to connect if there's already connection via subcluster, we only need to connect this
part, then the whole connected part here with this part and this part, and that is all we need to do.
So we have a reduction also for the connection of the complement of the cluster.
And basically it's the same concept as for the cluster connectivity, but to make sure that we
distinguish in the following between the bags that we have here and the structures we have here,
we also give these structures a name, and the bags and the complement are called satchels of a
cluster. These satchels are connected via inneractive vertices to our cluster.
Okay. So a lot of concepts. And our result on this slide basically is it is enough to interconnect
the bags for connectivity and to interconnect the satchels for co-connectivity. And there are two
types of connections that we can have for the satchels. The first one is a direct interconnection
which just means, for example, this satchel is connected via an edge to this satchel. And there
can also be another interconnection because there may be parts of the graph that are not satchels.
That is, they're not connected to the cluster. But we can still connect satchels via these parts so
that the complement of the cluster is connected.
And I call this interconnection indirect interconnection in the following.
So now from the concepts to the results. The first result is a lemma that basically states instead
of having a complete connectivity it is enough to use the connectivity of the bags and of the
satchels. How can we see that? Well, in the one direction of the equivalence it's quite simple. If
I have a completely connected augmentation, then obviously all the bags have to be
interconnected and all the satchels have to be interconnected.
The other way to show that we can achieve C connectivity, we can do a bottom-up induction in
the cluster tree. First we look at clusters that don't have subclusters, like this one here, for
example. And we know that, then, the bags are just chunks, they don't have other subclusters.
So we just connect them. So this subcluster is connected. We do the same for this subcluster.
And now these bags are connected chunks. And all we do is connect these bags now in the
larger cluster, and what we have is C connectivity.
So the same basically is for a co-connectivity. We know that each bag is connected and that the
satchels are also basically bags. They will be connected. And also in our augmentation they are
somehow interconnected. And what we have to take care of now is only if there are other parts
of the graph that are not connected because they're not satchels, that we can planarly connect
them to this part.
So how can we see that? We do a topdown induction in the cluster tree and want to make sure
that a complement of a cluster is connected for a root of the cluster tree that is simple because the
complement is empty, so it's connected.
And if we look at any other cluster and we assume that the ancestors are already completely
connected, let's call this one C and the parent P, we have the following situation. The
complement of P is already connected. And we have some satchels of C that are interconnected.
And then there are maybe parts that are not connected so far.
Where can these parts lie in the graph? They cannot lie here because then they are satchels.
They cannot lie here because then they are connected to the complement. So the only way we
can position them is that they are here.
What does it mean that they are here? They don't cross cluster boundaries, which basically
means they are independent. So I can just check them for C-planarity. If they are C-planar, I
just put them somewhere in P and connect the whole thing and I'm done.
Okay. So this proves our lemma. What we now know is that we only need to connect bags and
satchels but we can further reduce our search space. Intuitively we can do the following. We
just use outeractive vertices for the connection of the bags, and we use inneractive or outeractive
vertices for the connection of the complement.
So in this example it would be these two outeractive vertices are connected with each other, and
these two are connected, and then the whole cluster is connected. And for a complement, we just
choose a couple of vertices that are inneractive, this one and this one and this one are inneractive
to this cluster, and this one is inneractive to this cluster.
What we can get rid of are nodes that are inneractive but deeper in the hierarchy than our cluster.
So we only need inneractive outeractive vertices that are at least at a level of this cluster.
So the theorem, much less intuitive, is saying a clustered graph with a planar -- underlying graph
of C-planar, if there is a planar augmented graph that satisfies basically two criteria, the first one
is the outeractive bags of each cluster are directly interconnected using only outeractive vertices.
And the satchels are interconnected using only vertices that are either inneractive or outeractive
with a level not deeper than the level of the cluster.
So I guess as I only have one minute I'll just skip the proof and come to our last result, the fixed
parameter tractability. We have now shown that we can use shrinked set of edges for the
augmentation, and now imagine we have a bounded number of inner- or outeractive vertices,
then we know that we only have a bounded number of possible augmentation edges, and also this
gives us a bounded number of augmentations. And for each augmentation the problem of
deciding C-planarity is just there is not planarity, which means if you have a bounded number of
augmentations, we can basically decide C-planarity in linear time.
Okay. So to conclude, we have a way to shrink the search space for C-planarity testing. Sorry,
I'm in some order. We have FPT algorithm based on that. And we have an ILP-based
implementation of our approach in OGDF.
And what we would like to do is give better characterizations of needed substructures. For
example, not just based on edges but based on paths. And also use further structural information
like triconnected components which partially fix the order of edges, and this could also lead to a
further reduction of the search space.
And also we had yesterday this alphabet of gamma drawings, and I think this is somehow
related, but I'm not really sure how it is related. So this should be something to investigate, and
the conjecture is that C-planarity -- I don't know -- okay. I guess that's it. Thanks.
[applause]
>> Martin Gronemann: Welcome to my talk. I'm Martin Gronemann, and this is joint work with
my supervisor, Michael Junger. And I will talk about a similar topic than Karsten, drawing
clustered graphs but as maps, more specific topographic maps.
As an introduction, I will give a -- tell this old story how we actually came up with this idea. It
was not like that we wanted to draw clustered graphs. It's more that we started with hierarchical
clustering and what Karsten referred to as this inclusion hierarchy, as drawn to the right, so that
is what you get.
You can see that the edges. And the question at some point is you do not only want to cluster a
network, you want to see the result. And most of these methods, they are based upon -- they are
rectangle based, and that works for smaller instances, but if the hierarchy gets deeper and the
number of nodes increases, this is a bit difficult to look at.
And, yeah, then at some point you start about what happens if I draw a clustered graph by hand
and you end up with this inclusion representation, and if you add colors to them, you use map
colors, you end up with the topographic map.
This is the result. It's a small graph result of our method, and it follows a basic principle. So
nodes in different clusters are separated by a valley or water, so something that is lower than the
nodes. And in the same nodes that are in the same cluster, they are on the same plateau or
elevation level.
Why is this in general a good idea? Topographic maps are easy to understand. Everyone knows
them. Even people outside [inaudible] you don't have to explain them the whole hierarchy thing.
And, yeah, this turned out to be an advantage that we were not intended to have.
Yeah. This is the outline of the talk and of the algorithms. So I will not talk about the clustering.
I will just talk about the layout, that is a treemap approach, and how you can obtain a triangle
mesh that describes the elevation. And this is in the end use to do the edge routing as well.
Okay. Let's start with a treemap. Treemap is very simple thing. You have in our case a binary
weighted tree given, and our treemaps, unlike the one that showed [inaudible] before, based on
convex boundary -- on convex polygons.
So you start to recursively subdivide the polygons and, yeah, the area of course has to be
proportional to what you want to fit inside this polygon.
And, yeah, this is a small example. You can see how we recursively split this. And in the end
you get a nested structure of convex polygons and not rectangles. And the graph nodes that form
the leaves of the cluster tree are just placed in the center of their corresponding polygon.
Yeah, the question is how do you partition such a polygon. You know that you have to make a
cut somewhere so that the area is proportional. But you want a nice shape. So you don't want
any very thin polygons, and nice shape is defined as small aspect ratio. That basically means
that you want as much area but still with the low diameter. The diameter is the maximum
distance of two vertices.
Yeah. The -- this partitioning or this -- how you cut the polygon basically is only then defined
by the orientation of this cutting line. And we use the fat polygon partitioning of de Berg, et al.
I will not talk about how this actually works.
Okay. We have a layout for the nodes, and what we now do is we regenerate a triangle mesh for
the elevation model. And in the beginning we applied Delaunay triangulation that is constrained
by the boundary polygon. So the black dots here are standard vertices.
And when you now consider a triangle in this triangulation, yeah, there are basically two options.
Here the triangle -- all three vertices of the triangle correspond to graph nodes in the cluster
hierarchy, so two leaves. In the boundary there are these Steiner points from the Delaunay
triangulation. We think of them as part of the root. So there's some [inaudible] corner points.
This is -- will be the boundary of the map.
So when you now consider the edges, the edges of such a triangle, they induce a path in the
cluster tree. And they do not only induce a path, there's a -- the lowest common ancestor for
each of these three pairs. And what we do now is we refine this mesh further and actually insert
representatives for these cluster nodes.
So we subdivide a training angle into four subtriangles, and on each of the edge we insert a new
vertex that represents the lowest common ancestor of the two end points of that edge. And here
on the right you can see how it looks for the complete mesh.
Yeah. So far we did everything in the plane. So we want a topographic map, and the
topographic map is usually modeled by a 2N half D mesh. So what we now do is we are going
to lift this mesh. And to do that we first assign the tree nodes in elevation level. And to think
about that, you better turn around the tree so it looks like this. So the root is the lowest point on
the map and, yeah, then here these clusters. They are going to form mountains and on top of that
later the node's allocated.
Okay. Now, the idea is straightforward. We just take these elevation levels and transfer them to
the mesh, and this results in the 2 1/2 D mesh.
For the final drawing we rasterize the mesh so you get an image, a raster image, and, yeah, you
can do this with cubic Bezier triangles to get smooth boundaries. And I would -- if I have time I
will tell a little bit more about this step. I did it here by hand, so this is what it looks finally.
Okay. So what is missing. The edges are missing and the idea is straightforward now. So what
we can do is we use the mesh that describes the landscape to -- as a routing network for some
shortest path-based edge router.
And, yeah, for those who don't know about that, it's very simple. You'll have an edge here in the
graph to the left and you have two graph nodes in the elevation mesh. And you compute a
shortest path just from the starting node to the end node based on some distance metric.
Yeah. To have smooth curves, like we use the cubic Bezier triangles for the elevation map, the
nodes found on the shortest path from the source to the target node, they are just used as control
points for quadratic graph or something like that in the final drawing.
Okay. One problem. So when you consider this red edge here from A to D and the shortest path
from A to D and the routing network, you notice that there are some unnecessary turns. And
while you can get from C to D in this nearly straight line, this does not work for A and D.
And what we do now to solve this problem, we insert additional edges. So A is adjacent to two
cluster nodes, the green and the gray one, but not to the third of that triangle. And we just insert
this edge now and as a result you get a straight line. This is just to compensate for the problems
during the subdivision.
Okay. The thing is that this is -- on the right you can see how it looks for the complete mesh. I
skipped some parts on the boundary. But the problem is with the distance function for the
shortest path area. These newly inserted edges, they allow shortcuts. It's not really shortcuts, it's
more that these edges might serve as a bridge over a valley. And so we split the distance
function into the Euclidean distance in the plane and, yeah, it's more like the tree distance.
So the edge -- an edge has not -- when it's kind of a bridge, you have to account for the complete
where you go down the valley and up, and you have to account for that.
Best example here is if you go from B to this node, you are crossing this value here. And if you
just take the Euclidean distance, you're not going to account for that.
Yeah. This is the final drawing with all edges routed. And, yeah, I described the treemap layout,
the triangle mesh and the edge routing, and this leads to this elevation model and there's a raster
graphic and the edge layout in forms of control points for curve. And what we did then is we put
all these things into tools from cartography to, yeah, which are usually used to draw real maps,
so instead of using something from graph drawing, we went for something that produces real
maps.
And for the graph drawing collaboration network, it looks like something like this. And if you
go down this road even further, you can get something like Google Maps. It's very interesting
that while you are doing this there are a lot of common things between cartography and graph
drawing. Especially you can save a lot of work.
And, yeah, you can try it. It should work. You can search for things. It's searching for people
and it searches the abstract of most of the graph drawing articles I think until 2011. And, yeah, it
works like Bing or Google Maps, so it's quite easy to use.
Yeah. Ongoing and future work. One can argue that the treemap approach has the big problem
that it does not care about the edges. And that's true. It does not care about the underlying edge
set of the graph at all.
So this is -- these are two pictures from an application in chemistry. And there's a problem that
the hierarchy is part of the input. So in the original approach the clustering is part of the layout.
So there's an indirect link between the edges and the tree. And this is not the case here. And we
had to do something about it and we modified the partitioning approach, so it takes the
underlying edges into account. And this works quite well. It's probably a bit of improvement.
But in general it works quite nice.
The other thing is the shoreline. You notice that there is a very big difference between water and
land. In the GD map, this has no meaning at all. It does have a meaning in this application from
chemistry. But, yeah, we are working on this, so probably the max modularity communities
would be a good idea to set the water level at.
And the third most important thing is the edge routing. There are two problems. There is the
visual appearance. You can see here that it's really hard to follow edges. The other thing is the
runtime. The runtime is not acceptable for larger graphs. It's in the end something like all pairs
shortest path and, yeah, it just -- it's not practical. There are probably ways to improve that, but
this question should be answered first.
So what we did is we tried something totally different, like hierarchical edge bundles from
Danny Holten, and, yeah, just to resolve this problem by moving these bundles into real parallel
bundles from the -- but this is not -- yeah, it's still in development and it's not that nice.
So if there are any suggestions, we'd be really happy to hear about that. Okay. That's it.
>> Lev Nachmansen: Thanks, Martin.
[applause]
>> Lev Nachmansen: We have time for questions. Questions?
I have a question. Are you filtering out some of the edges when you're showing the map of the
GD collaboration? Are you filtering out some of the edges?
>> Martin Gronemann: No.
>> Lev Nachmansen: You were showing everything?
>> Martin Gronemann: Yeah. Yeah. But still that is a very sparse graph, and I tried, for
example, this [inaudible] graph from -- was it last year, the GD challenge, and that looked really
horrible.
>>: So you may need to remove [inaudible].
>> Martin Gronemann: Yeah. I have to do something about it. Even if you have something like
a hierarchical edge bundling, it's still too much.
>> Lev Nachmansen: Other questions? Thanks, Martin.
[applause]
>> Soroush Alamdari: I'm Soroush Alamdari. This is work with Therese Biedl, [inaudible]
drawings. I will get to what they there in a second. So just a review of the basics.
So a planar drawing is a drawing with no crossing. A planar graph is a graph that has some
planar drawing, and a plain graph is a planar graph with a fixed embedding. And a straight-line
drawing is a drawing in which all edges are straight line segments. So, okay, we hopefully know
this.
So what's the rectangle of influence drawing? A rectangle of influence drawing is a drawing, is a
straight-line drawing, in which the axis aligned rectangle for the -- for each edge, the axis aligned
rectangle that has its edge as its [inaudible] is empty in the sense that it contains no other vertex.
So, for example, this is a rectangle of influence drawing. This is two if we consider rectangle of
influence drawings, if we consider the rectangles to be open. But this is not because of this
rectangle and the edge that it contains, induces it.
So it has been studied in strong and weak instances, closed and open, exact and approximate.
Approximate will be the next talk [inaudible].
And so yeah. So this is the continued -- this talk is the continuation of the talk -- the work of our
work last year which was in turn inspired by work of Miura, Matsuno, and Nishizeki.
So we want to characterize RI drawings. So RI drawing, I mean by RI drawing open weak
planar rectangle of influence drawings.
So the goal is to test if a given graph G has an RI drawing. So we study the problem for plane
graphs and planar graphs. For planar graphs we proved the problem to be NP-hard. For plane
graphs we can't solve the problem, so we change the problem. So we add some properties to the
RI drawing that we are looking for, then we can stop it.
>>: [inaudible].
>> Soroush Alamdari: Of each. Okay. So I didn't tell you what strong is. So ->>: [inaudible]?
>> Soroush Alamdari: Yeah. So weak is what I just said, but the way I introduced it, it was
weak. So a strong RI drawing -- in a strong RI drawing, if an edge could exist, should be there.
So yeah. So we rather be talking about weak RI drawings. So thanks for asking.
Yeah. Before going to RI drawing, I want to -- well, yeah, introduce a labeling, type of labeling
may be [inaudible]. So we label the angles of a straight-line drawing by just counting the axis
inside each angle. So here we will have labels 1, 0, and 3. And this -- so what happens if we
have an axis aligned edge, so we assume that edge contributes half a point to each of the two
angles that it appears to.
So, for example, we have 3.5 and .5. So this type of drawing captures the properties of some
other labelings that have been proposed for angles, angular labelings. So might be of interest.
And they are uniquely defined for each, a straight-line drawing.
So let's go back to rectangle of influence drawings. We will -- first let's look at the simplest
like -- yeah, simplest type of graphs that we can look, the graphs that have triangle or outer-face.
So if you have a triangular outer-face, the outer-face basically forbids us to place the points
anywhere that is hashed here, so the only points -- the only place that vertices can be are on the
green lines here. So the points that -- the vertices that are inside the triangle. So I'm drawing the
graph above down there as an RI drawing.
So if we don't -- so let put all four aside, the rest of the graph basically should be drawn on two
lines. And that's not hard. So drawing a graph on two lines would be -- would require us to find
a chain of cycle. That's not hard and can be done. So basically we can find an RI drawing if the
outer-face is a triangle.
And more than that, we can check all the -- we can find all the axis-count labelings that are -axis-count labelings of the outer-face that are realizable in a rectangle of influence drawing of
the graph. So for graphs that have triangular outer-faces.
So let's go to general graphs. So scary. Would be scary. So what we do is that we first -- so I
will just overview the algorithm. What we do is that we remove the inside of field triangles, the
triangles that have stuff inside them, so that the graph is simplified, it has no field triangles
anywhere. And doing this we have some angles that are forced to have axis-count label 1 in the
final drawing. If we respect this label, these forced labels, in the end we can put back in where
we removed in this step.
Next step we switch to some -- to the dual space. Well, some graph that is defined somewhat
weirdly. But it has -- it has a vertex for each triangular face and a face for each nontriangular
face.
So the blue graph is the graph that they do a white graph. And we translate the labels that are -that should have label -- axis-count label 1 in the final label -- final drawing to the labels, to the
corresponding labels, of the dual-like graph. So you have a graph in dual space with some of its
angles marked.
So this is the -- well, this is the step of the algorithm that will fail if we cannot draw our graph as
an RI drawing that satisfies those conditions.
So basically we check if this dual-like graph admits an orthogonal drawing that satisfies some -again, some properties that I'm not going to go into exactly, but, for example, the labels that we
have marked should be right angles. So -- and this can be done in polynomial time. This
orthogonal drawing with some angles restricted to be right angles can be done in polynomial
time.
So if this step fails, there is no RI drawing that we want. But if this step goes through, we have
an orthogonal drawing without bends. And the next thing we do is that we add some edges to the
graph, so the N vertices, so that the graph becomes a rectangular drawing.
From this rectangular drawing, we go to the -- back to the primal. So remember that we are in
the dual space somehow. So we go back to the -- we take the dual of this thing, except for the
outer-face vertex. So this graph now is a supergraph of the original graph that we had, not
original, after removing of inside of field triangles. So now what we do is that we use the
structure of the rectangular drawing to extract a labeling for this new graph that we have such
that the labeling is realizable as an axis count labeling of an RI drawing.
So the way we do it is for each angle we count the number of right angles that that angle faces.
So, for example, here we have one for this angle, 0 here, 1, and 2 here. The labeling that is
extracted has some good properties.
So Miura, et al. gave some properties of a labeling of angles of a graph that if those properties
are satisfied, then that graph can be realized as an RI drawing such that that labeling is the
axis-count labeling of the drawing. The labeling that we extracted satisfies these properties. So
we use their algorithm to construct the RI drawing. Then recall that we -- some of the faces -we add some faces to the graph that turned into some of the vertices of this graph that we already
have, so we need to remove those dummy vertices. This is the frame graph, the graph with the
inside of filled triangles removed.
Now, what we have is that the angles that we marked have now axis-counts label 1. So we can
put back in. So because we have respected this property throughout the construction, we can put
back in the vertices of the filled angles that we removed, so by the algorithm that I initially
explained for graphs with triangular outer-face, so each of these triangles can be filled with
this -- using that algorithm.
So we have an RI drawing of our initial graph.
So let's look at -- so this was for plane graphs. Let's look at planar graphs. For planar graphs we
have much more freedom for how we embed the graph. We actually embed the graph. And, for
example, for orthogonal drawings without bends, the problem is NP-hard. Garg and Tamassia
show that it is NP-hard.
We use this same -- we use this same idea, we use the same approach, to prove that this
problem -- RI drawing -- open RI drawing of -- finding open RI drawing of planar graphs is
NP-hard. The result is Not-All-Equal 3-SAT.
So the idea of the proof is that the idea of showing hardness is that -- so this is what appeared in
the paper too -- so is that to construct a graph that with -- a component with a subgraph that is
attached to the rest of the graph through two vertices and is connected enough so that it has only
one -- it needs only one embedding in its flip and its mirrored version.
And these two drawings will give us a truth assignment to our variables. So each of these such
components will be corresponding to a variable, a truth assignment to a variable. So yeah.
So review. What we prove is that -- what we give is an algorithm to test if a given graph has an
RI drawing that satisfies the nonaligned frame properties. So this is the property that we added,
we needed to add, in our aligned frame property. This is not a natural property, but is not very
restrictive. It doesn't happen very often.
So yeah. So for plane graphs this is what we have. We need it for the proof. We can't get rid of
it. And the runtime is N to 1.5.
For planar graphs we prove that the problem is NP-complete even if we add the nonaligned
restrictions. So for the nonaligned case, the question is like solved basically, this specific
question. So yeah.
Yeah. So two questions that remain are fast -- other faster algorithms, and so this one is good if
it can be done. Can we get rid of the nonaligned constraint? We tried; we couldn't.
Yeah. I finished early. Thank you.
[applause]
>> Lev Nachmansen: Questions.
>>: Is it obvious that the problem [inaudible]?
>> Soroush Alamdari: It's not obvious. The -- the reason -- so -- so the reason for that is that the
only thing that you need to specify for an RI drawing is the order of vertices. And that can be
given. So you don't need to -- if you give me the order of vertices in horizontal and vertical
directions, I can tell you if that's an RI drawing or not. You don't need to -- so then therefore it is
an NP.
>>: Is it some type of special algorithm, or is that ->> Soroush Alamdari: No, no, it's not. So the only thing that you would need to see if a -- if a
vertex is inside a rectangle defined by two vertices is that it is between that both in horizontal
and vertical ordering, the vertex that we want to see between the other two. Do you see what I
mean?
>>: [inaudible].
>> Lev Nachmansen: Other questions? I saw another hand earlier.
>>: I didn't quite understand how you moved to the [inaudible] kind of dual in the beginning,
very many vertices, the blue ->>: Yeah. So I -- I tried to -- like I thought I would be -- it would take long, so tried to be fast.
So what we do is that we -- for each triangle we leave the triangular faces as they are. For
nontriangular faces we add the vertex inside each of them. And for each instance of a vertex on
a nontriangular face we add three parallel edges to that dummy vertex that we added.
So here you see for each, for example. In this instance, for example, this vertex has two
instances [inaudible] so it has [inaudible] to the dummy vertex. Now we take the dual. So the
reason for this is that when we are drawing -- so when we are drawing our rectangular
orthogonal drawing, we want these -- this edge to be able to bend. So we need to -- we need
these parallel edges.
>> Lev Nachmansen: Other questions? Let's thank our speaker again.
>> Emilio Di Giacomo: Good afternoon. This work is a joint work with [inaudible] Liotta and
Henk Meijer. It is again about rectangle of influence drawings, but here we use -- we study an
approximate version of the problem.
So let P and Q be two points in the plane. The rectangle of influence of P and Q is a -- the
rectangle of rho P and Q. I think P and Q as an opposite corner. So this the rectangle of
influence of these two points. And if the boundary of the rectangle is considered to be part of the
rectangle, we say that it is a closed rectangle of influence, otherwise we say that it is an open
rectangle of influence.
And then a rectangle of influence drawing of a graph is a straight-line drawing such that for
every edge U and V the rectangle of influence of U and V does not contain any vertex existing
from U and V. And for every pair of nonadjacent vertices U and V, the rectangle of influence
contains at least another vertex. So this is the strong version of the problem.
So, for example, this is a rectangle of influence drawing. You can see that for the existing edges
the rectangle of influence are empty while for nonadjacent vertices the rectangle of influence
contains some other vertices.
Okay. This type of drawing has been introduced and is studied in this paper by Liotta, et al., in
1998. And in this paper they characterize the class -- the graphs, drawable graphs, I call them, to
the rectangle of influence drawable graphs in some families like wheels, cycles, trees,
outerplanar graphs, and so on.
I will not enter into the details of the result, but what is interesting for our paper -- for this talk is
that the class of graphs that can be drawn is very restricted. I mean, not all cycle, not all tree, not
all other planar graphs can be wrong.
So if you want to draw something more, you need to relax the problem some way. And one
possibility is to use the weak version of the problem, where you just care -- you just require that
for the edges the rectangle of influence does not contain anything while you don't care about
nonadjacent vertices. And we have seen the previous talk about this.
Another possibility is to use an approximate -- the approximate version of the problem. And the
approximate proximity is a concept that has been introduced last year in a paper by Evans, et al.,
where a general framework for the study of approximate proximate drawing has been introduced.
And so we will study the problem within this framework. So let me define what is an
approximated rectangle of influence drawing.
So let rho P and Q be the rectangle of influence of P and Q and let epsilon be a given value. The
epsilon expanded the rectangle of influence of P and Q if the rectangle obtained by enlarging the
rho P and Q by a factor 1 plus epsilon, while the epsilon shrunk rectangle of influence of P and Q
is a rectangle obtained by shrinking rho P and Q by a factor 1 over 1 plus epsilon.
And now we -- so for this -- this is a rectangle of influence. This is the expanded version. That
one is the shrunk version for some epsilon.
And now we can define an epsilon 1, epsilon 2 rectangle of influence drawing of a graph. It is a
straight-line drawing with the following property. For every edge U and V, the epsilon 1's
shrunk rectangle of influence does not contain any vertex [inaudible] from U and V, and for
every pair of nonadjacent vertices the epsilon to expand the rectangle of influence contains at
least one vertex.
So the idea is that for the adjacent vertices I use a small rectangle and for nonadjacent vertices a
larger rectangle. So this makes my life easier.
And so let's look at an example. This is not a rectangle of influence drawing because these two
vertices are not adjacent, but the rectangle of influence is empty. And these two vertices are
adjacent, but the rectangle of influence contains something.
But if we choose epsilon 1 and epsilon 2 equal to 0.5, we have that the expanded rectangle of
influence of the two point contains the whole drawing, so contains some other vertex. And the
shrunk rectangle of influence of these two points does not contain anything. So this is an epsilon
1, epsilon 2 rectangle of influence drawing for epsilon 1 and epsilon 2 equal to 0.5.
Okay. So we study what the graphs need on epsilon 1, epsilon 2, rectangle of influence drawing
for different values of epsilon 1 and epsilon 2. And we also investigated the area of the drawing.
In particular we investigated whether it is possible to obtain polynomial in some cases.
Okay. This is the list of results. First of all, we proved that every planar graph has both an open
and a closed epsilon 1, epsilon 2 rectangle of influence drawing for every positive epsilon 1 and
epsilon 2. So if epsilon 1 and epsilon 2 are positive, we can draw all planar graphs.
On the other hand, if one of the two parameter is 0, either epsilon 1 or epsilon 2, there exists
planar graph that cannot be drawn. And so motivated by this result we investigated the case
when one of the two parameter is 0, in particular we concentrate on the case when epsilon 1 is 0.
And we study other planar graphs. So we prove that every planar graph needs both an open and
closed 0 epsilon 2 rectangle of influence drawing for every epsilon 2 larger than 0. But the
drawing has an exponential area. So we try to reduce the area. And we can do it for every other
planar graph, so every other planar graph has a drawing N to the fourth area, but we need that
epsilon 2 is at least -- is more than 2.
And if epsilon 2 is less than -- is less or equal to 2, we can -- we are able to draw only binary
trees. Again, in [inaudible] given by that formula where the exponent depends only on epsilon 2,
so for fixed at epsilon 2 is a constant.
Okay. So in the remaining part of the talk I try to give you the idea behind this -- the
technique -- the idea of the technique behind these results.
Okay. Let's start with a planar graph and a positive epsilon 1 and epsilon 2. Here the technique
is very simple. The idea is to construct the drawing of vertex per time according to the canonical
ordering. So at some point I have drawn the graph [inaudible] by the first K minus 1 vertices,
and they have to have the K vertex VK.
And the idea is to place it sufficiently far from the existing drawing. And what does sufficiently
far mean? Well, consider one vertex in the existing drawing. If I can -- if I place the K very far,
you have the expanded rectangle of influence of these two points contains the word drawing, and
at the same time the shrunk rectangle of influence does not contain anything. So basically I can
decide -- I can connect the K to this vertex or not. I can choose. And so I can connect VK to all
its adjacent vertices.
And with some geometry and some math, you can see that the distance that we need is given by
this equation, and these two terms depends on epsilon 1 and epsilon 2, and on D, which is the
diameter of the existing drawing.
There are some other technicalities, but basically this is the idea. Okay. Let's look at other
planar graphs. Okay. In this case epsilon 1 is 0. So for the edges I consider the usual rectangle
of influence for the [inaudible] that I use the expanded rectangle of influence.
Here the idea is to compute -- okay, to compute a BFS tree of the graph, to do the tree and then
to add the remaining edges. So this is another planar graph. The bold edges define a BFS tree.
And this is another drawing of the same graph.
Okay. Now, in order to draw the tree, I first draw the star and use it by the root and its children.
And to do this I choose a real number P such that P is at least 2 over epsilon 2 plus 1. And then I
draw U the roots of the origin and draw the child number high at the point which coordinate P to
Y minus 1, P to K minus I where K is the number of children.
And so, for example, for six children I have a drawing like this, where ERP is two. And in this
drawing you can see that for the edges the rectangle of influence is empty, so the edges are -- can
be there. And now consider two nonadjacent vertices, in particular two consecutive children, UI
and UI plus 1. Well, we have that the width and the height of the rectangle of influence delta X
and delta Y are given by these two equation, by these two.
And if you now consider the expanded version of the rectangle and we consider this
enlargement, this delta X prime, we have the delta X prime is epsilon 2 over 2 times delta X.
And doing some math you can prove that this is more, is larger than P to I minus 1. But P to I
minus 1 is the ex-coordinate of this point. So this means that this distance is more than the
distance of this point from the Y axis. And [inaudible] you can prove that this distance delta Y
prime is more than the distance of this point from the X axis. And this means that the expanded
rectangle of influence contains the origin, so contains the root of the tree. And so these two point
can be nonadjacent.
Okay. Now we have drawn the first two letters and we can draw the subtrees rooted at each
child recursively inside these boxes. And these boxes must satisfy this condition, I mean, the
dimension must satisfy this condition, and again I will not enter into detail, but if you satisfy this
condition, you can prove that choosing a vertex from a subtree, say TI, and another vertex from
other subtree, say TI plus 1, the expanded rectangle of influence contains UI and UI plus 1. So
the two points can be -- the two vertices can be nonadjacent.
Okay. So this is the drawing of the tree. But now we have to add the edges of the other planar
graph that are not in the tree. And since this is a BFS tree, I mean, since this is another planar
graph and we perform a BFS visit, the edges are either between consecutive children, like 2, 3,
and 4, 5, yeah, 2, 3, 4, 5, and then in this case the rectangle of influence is empty, or you can
have edges connecting consecutively vertices that don't have the same parent, like say 10, 11, or
12, 13, and also in this case you can prove that the rectangle of influence is empty. And also you
can have edges like this one, 12, 4, so from different level but, again, consecutive points.
So also in this case you can prove that the rectangle of influence is empty.
Okay. So this is the technique for other planar graphs. And it's needed to see the drawing as
exponential area. I mean, even the drawing of the star as exponential area. So we want to try
and reduce the area. And, as I say, we can do it for all other planar graphs if epsilon 2 is more
than 2.
And indeed the technique is slightly more general because we can draw not only the planar
graphs but proper track planar graphs. And proper track planar graphs are basically a level
planar graph, so the vertex can be placed on levels in a planar way, but the edges can be only
between consecutive levels, and this is the meaning of the word proper.
But, on the other hand, we allow edges within the same level. And this is why I call the graphs
track planar instead of level planar. But notice that since the drawing must be planar, the
vertices in the same level must connect several vertices on the level.
And it's easy to see that other planar graphs are proper track planar graphs.
Okay. Now, if we have a graph like this, I draw it this way. I place the vertex of the first level
on X coordinate Y, 2, 3, 4, 5, and on Y coordinate 5, 4, 3, 2, 1, so there on a straight line and
slope line is one.
Then I place next level similarly on another line level with [inaudible] minus 1 and so on. And
then I up the edges. And you can see again that the rectangle of influence of the edges is empty.
I mean, it's immediate for these edges because the two points are consecutive. There is nothing
in the middle. But it's true also for vertices on a different level because this rectangle is
contained in the strip defined by the two lines with slope minus 1.
What happens for nonadjacent vertices? Okay. Consider, for example, these two vertices. They
are not adjacent. Since epsilon 2 is at least -- is more than 2, the expanded rectangle of influence
include contains the two neighboring vertices 12 and 15. And so the two point can be
nonadjacent.
And you can prove that also nonadjacent vertices on the front level -- I mean, the expanded
rectangle of nonadjacent [inaudible] contain some other vertex.
So this is the technique. It's very easy. But what is the area? If you look at the drawing that I
showed you, the area is N squared because it is N minus N times N minus N. So where does the
N to the 4th come from? The problem is that I didn't say the truth. Because I told you that if I
consider these two vertices and then consider the expanded rectangle of influence, it contains the
neighboring vertices.
But what happen if these vertices are not there? So what happen if the level has only two
vertices? Well, in this case, the algorithm doesn't work. Of course we can fix it, this problem,
but to fix it we have to enlarge the drawing and we get the N to the 4th [inaudible].
Okay. Let's move to the last result by our trees, epsilon 2 smaller than 2. And this is the more
complex technique of our paper. We use a recursive drawing technique based on a greedy path
decomposition. A greedy path is a path in the tree that goes from the root to a leaf and is
constructed this way, a sub from the root, then it choose the subtree with more vertices and they
continue constructing the path in that subtree.
So I go to U2 because it is the root of the largest subtree. And then [inaudible] go on and they
have a path and a set of subtrees attached to the path. Okay. And a set of subtrees attached to
the path. And then I can [inaudible].
So now the drawing technique, the idea of the drawing technique is to draw the path and then to
attach to the path this drawing of the subtrees computed recursively. And we assume that the
drawing of each subtree [inaudible] satisfies a tree invariance.
Okay. First of all, it is a valid drawing. The second one is that the root of the trees on the left
border of the drawing, and there is nothing below it. And finally the drawing is completely
containing a bounding box whose dimensions are bounded by these two functions.
Okay. So now we place the vertices of the heavy path on a straight line with slope minus 1. And
we attach the subtrees to the path. So here the first subtree is gamma 0, and we place it at a
vertical distance, which is 2 over epsilon times the height of the drawing. Then we place next
one and we have that this distance is 2 over epsilon times the maximum between the two widths.
And this distance is 2 over epsilon the maximum between the two height and so on. We have
some special case when the subtree is only one vertex, but in the end we have something like
this.
And now you can see that for the existing edge, again, the rectangle of influence is empty. This
is easy to see for these edges and for these edges. And for the vertical edges it is also true
because in the case of the vertical edges the rectangle of influence coincide with the segment,
and by the second invariant there is nothing here, no vertex here. And so it's okay.
Now, if we consider nonadjacent vertices things are slightly more complex because there are
many cases to consider, we'll show you just one case. So suppose I have a vertex in gamma 0
and a vertex in gamma 1. Well, the rectangle of influence of these two vertices contains the
rectangle of influence of P and Q of these two point. And if I look at the rectangle of influence
of P and Q, its dimensions are delta X and delta Y, and the enlargement of delta X prime and
delta Y prime is given by this equation. So that X prime, for example, is epsilon over 2 delta X.
Which, according to the choice of this distance and this distance, is -- sorry. According to the
choice of this distance, is more than both the width of the two drawings. And analogously delta
Y is larger than the height of the two drawings. And so the expanded rectangle of influence
contains both subdrawings, and so these two point can be adjacent, but every vertex here and
every vertex here can be nonadjacent.
Okay. About the area. According to invariant tree, the drawing is contained in a bounding box
whose dimensions is this, and with some calculation you obtain that the area is this one. Okay.
Some open problems. Okay. The first open problem is to improve -- to devise a technique to
compute drawings with polynomial area for general planar graphs when epsilon 2 is smaller than
2. I mean, when epsilon 2 is smaller than 2, we can only draw binary trees. It would be nice to
draw all our other planar graphs.
Then we studied the case when -- we studied the case when epsilon is 0 and epsilon 2 is larger
than 0. And in this case we study basically only other planar graphs. So two -- it can be
interesting to study other families. We have a some further result in this -- some simple result in
this case.
And finally we can start -- it would be nice to study the semantic case when epsilon 1 is larger
than 0 and epsilon 2 is 0. And also in this case we have a very simple result. But okay. Thank
you for your attention.
[applause]
>> Lev Nachmansen: Questions for Emilio?
>>: I have a question. Given the practical applications, are there any experiments on
perceptional -- when you look at weak, strong ->> Emilio Di Giacomo: No. Not that I know. I mean, we didn't make an experiment, and I
think that's never been done. I don't know.
>> Lev Nachmansen: Anything else for Emilio? Thanks, Emilio.
[applause]
>> Emden Gansner: So thank you for coming. So this is going to be talking on graphs and
maps, looking at dynamic data, particularly on an area with regard to social networks. We had
some discussion earlier today.
Okay. So, again, as been mentioned earlier, looking at online social networks and then they
involve that as a very hot topic, been mentioned several times in various context. But there's a
problem. For example, this is a trace of a listing of various tweets coming in over a small period
of time, so it goes on and on and on and on and on and keeps -- and there's these millions of
people out there constantly tweeting to each other. You know, why -- how they could be
spending their time better, I don't know. But they're doing it. Okay.
So, for example, if you're just looking at the tweets that contain the word "knees," we're seeing
up like 34,000 tweets per hour on these things. Okay. Now, this isn't big data, per se, but it's
certainly pleasingly plump data. Okay. And so the question is is there some way that we can use
visualization to help to handle this to analyze this stuff to get a handle on this thing. All right.
So this is why I do a brief change here and pull that down. And with luck it will still be here.
Yes. Okay. So this is -- we're going to -- this is going to be a topdown talk. I'm not going to tell
you all the stuff and hear pieces and at the end I get you to the conclusion. We're going to start
with a chase. Okay.
This is a little piece of software we put together called TwitterScope. And the idea is is it's going
to give you a chance to look at a view of data from a dynamic flow of tweets coming in for every
tweet on a very particular topic. So you pick some particular term, and we're looking for all
tweets that contain that term. And we're trying to analyze these things and see how do they
cluster together, how are they related.
Now, we have built in this thing a few basic topics. So let's just start with, say, for example,
visualization, which seems to be appropriate. And what do you mean? Oh, dear. I don't see
Bing on there, do I? Okay. I mean Internet Explorer, sorry. Dear, dear, dear. Okay. Well, this
is going to be a faster demo than I thought.
>> Lev Nachmansen: [inaudible].
>> Emden Gansner: Oh, I don't suppose there would be one of those little browsers [inaudible].
Too much to expect. All right. Well, there goes part of my talk.
>> Lev Nachmansen: Didn't you have some videos?
>> Emden Gansner: I do have videos.
>> Lev Nachmansen: What page?
>> Emden Gansner: On that page?
>> Lev Nachmansen: Yeah. Where you just were.
>> Emden Gansner: Where I just was. Yes. Okay. I could try that.
[multiple people speaking at once]
>> Emden Gansner: Okay. This will be an old video, but we can try it. A little help here?
Okay. Maybe not. Okay. Well, maybe I won't show it.
[multiple people speaking at once]
>> Emden Gansner: Okay. So this is more or less a version we see in this thing. So this is
probably a topic of news or something, and these are all being clustered together. And so what
you'll see are there are various topics coming in. Each tweet that comes in has an icon associated
with it. And each cluster has a variety of terms that we determine as most identifying and shared
amongst those various tweets.
And then on the left-hand side, it's very hard to see, is a timeline, historical timeline, and so the
fact you see the learning tweets coming in at the top and each one as they come in you'll see it
expands up to see what the tweet looks like. And then make sure you have a time on the left here
where you can see the number of tweets coming in over a period of time, and you're able to
move the cursor down to various things and check these things.
I'm not too -- oh, yes, for example, here's a case where you can click on one of these things and
go right to the particular article that it was referring to.
I'm not too concerned about not giving you this demo right now because in fact this is all online,
and I'll give you the URL at the end and you can play with it to your heart's content and see what
was really going on. But this is a rough idea of what's happening.
So let me now stop this and go back to the top before I get any deeper in this thing. Okay. And
back to -- okay. So that was the demo. Anyway, so we put this together. And it's currently
being used by several organizations within AT&T. And this is where I can make a confession.
Another example where my ability to predict things that are popular is totally wrong. I mean, I
really thought this -- felt there was nothing in this thing, and yet it has become very popular
within the company and they're using it within all sorts of places.
For that matter, I didn't think Twitter was going to take off either, so there you are. I mean, so in
fact if you want to make money, find something that I think is not going to work and put it on the
opposite, and you'll be set. All right.
Okay. So rough architecture in this piece of work is large [inaudible] two pieces, one part that's
doing the data collection from the Twitter part and the other part that's actually doing the major
analysis and visualization stuff, which is the thing that gets pushed out to the browser.
So the data collection part is -- uses one of the standard API provided by Twitter, so you're
allowed to ask for all tweets containing one of up to 400 keywords, or if you want to you could
actually get 1 percent of all tweets, or if you want to pay a lot of money you can get a lot more
than that.
But there are various interfaces that Twitter can provide you. And the data is stored
permanently, so we can do this historical checking back in time and looking at things that
occurred in the past.
The other part is that this upper [inaudible] up here which is doing the analysis of the
visualization. Basically it starts with the raw tweets, it does some [inaudible] analysis very
cursory, trying to get some information, some way of putting these tweets together, uses that to
kind of construct a graph of relationships, and then does this business about the analysis and the
drawing and the clustering and the map and stuff.
So what I want to do with the rest of the talk is more or less dive into these things in a little bit
more detail.
Okay. As far as the semantic analysis part goes, as with any of these things, the first thing you
have to do is clean the data. All data is dirty. It's going to have stuff you don't really want to use
because that's going to lead you astray. So you have to remove all the various markup notations
and this thing, all the things telling you about retweets and references to other people.
URLs need to be removed because typically they're truncated and they don't have a lot of good
information in them. And also these things are going to have lots and lots of stop words that
there's just no semantic information whatsoever, so you've got to get rid of the's and a's and and's
and that type of stuff.
Okay. Once that's done, then we want to construct the similarity matrix on the -- for the data to
construct a graph on the various tweets making relationships.
The process we use is the term frequency inverse document frequency. It's a mouthful. All
right. It's actually very simple. It's basically the term frequency is a fraction of times that a
particular term appears in a document, and the inverse document frequency just takes a look at
this other term of basically how many times a particular document contains a term over all the
documents in that particular class.
And then the tf-idf number is simply the product of those two things. And to get a document
similarity we can simply take the cosign of the vector where you hold one of the -- two
documents, you look at all the vector form by over T, take the cosine, and that gives you a notion
of similarity.
This is a vary dense matrix, and so to kind of clean it out a bit and not have too many edges we
put in a threshold, found that .2 works kind of nicely to get rid of the really weak edges. You
don't want too much cluttered.
We did look at a more sophisticated technique called latent Dirichlet allocation. This is a much
more sophisticated, potentially more accurate, approach to getting a similar matrix.
But we basically found that it tended to be too sophisticated. Especially in this case you're
dealing with very small messages, all in 140 characters, so you can't say much. So you can't rely
on too much information. It's like when you're trying to do decryption. If you only have a bit of
text, you can't really do a good job. Need a lot of text to make it worthwhile.
And same thing here. This thing is too expensive and it tends to get misleading clusters, whereas
a simple -- this certainly guarantees that any messages within the same cluster at least share some
terms.
Okay. So the next step then is to take the similarity matrix and use that to get the obvious graph.
So if there's a non0 value on the entry of the matrix, that's an edge. All right.
Here we're starting to get toward the hardware. We're looking at the amount of space we have on
the display to use. And so we figure you typically see up to 500 tweets per a given window in a
browser. That's where it's going to be displayed.
Now, the layout part at this point is done using a combination of multidimensional scaling, and
then to do overlap removable we'll use the algorithm that was described in graph drawing four
years ago.
And then finally we take -- once we got this layout and all this thing moved, we take this layout
and use that to create a graph, a map very similar to the one described earlier. And that was from
based on this GvMap algorithm that the Stefan [inaudible] described in this article here.
So what you've got something like that, you saw earlier. So this is the -- you have basically the
set collection of countries, and this would have been water otherwise and just put it together in
that thing.
Up to this point, this has been totally a static description of the algorithm. But obviously this is
the stream data coming through. So we have to handle that. And what we do is basically update
the information roughly K minutes. We found -- we started with 1. 1 was about [inaudible] too
much, so we went to about five minutes, that seems about right.
And we're trying to preserve the users' mental map. And so do that, we, first of all, start by
rerunning multidimensional scaling using the previous positions, or if we enter new tweets in, we
use a position related by the average of the neighbors. And that gives a fairly stable display in
terms of the relative positioning.
However, you do tend to get some things like rotations and other things that come involved, and
we want to -- we want to keep it very stable, so we have to figure out how to get it back to what
it was before.
And to handle that we do a Procrustes transformation. So basically we're looking for a scaling
rotation and translation to kind of to take the new drawing and make it match the old drawing.
And that's a well-known problem, so the solution is here where the X and Y matrices are
basically the N by 2 versions of the X Is and Y Is. So this gives us the translation, this gives us
the -- this gives us the rotation, this gives us translation.
We actually decided just to use the scaling as 1, because if you don't do that, you tend to
[inaudible] overlap again. Since the nodes are not points, would actually take the space, we need
to keep them separate.
And in fact for heavy streams trying to preserve the mental map actually isn't that important. If
you've got lots and lots and lots of matrixes coming through, basically the topics are going to
come and go. And so you almost have to every so often just toss out the entire thing and start
over again anyway, so this is more important for ones we're changing more gradually over time,
like visualization, which I would have showed you.
Okay. So there's one thing that we still need to handle in this thing, and when we took that set,
that threshold to .2, that oftentimes takes away enough edges and you can actually disconnect a
graph.
The graphs I showed there wasn't fully connected. So you had these countries which really aren't
connected and you want to pack it in there. And we have to figure out a way to handle that,
because as time goes on, you have this problem that, okay, well, you now redraw the graph and
now you've got this overlapping, well, you now have to remove that somehow.
And the standard packing algorithms don't necessarily preserve the relationships. Like, for
example, here this is to the -- here the [inaudible] to the left and this all gets messed up. What
we'd like to do is be able to take these components and reposition them so we have a rough -- we
moved overlap, but we have roughly the same layout as we had earlier.
Okay. And so the solution we did this is to take the PRISM algorithm we used before to remove
overlap, but to extend it so it handles nonrectangular shapes. And instead of just worrying about
getting rid of overlap, we're also going to be using it to get rid of space, to pack things more
tightly.
Okay. So it's done reasonably simply. Instead of just using rectangles, we use polyominos to
represent the nodes and edges and use that for collision detection. We want to kind of reduce the
graph -- the problem down to a tractable form. So to do that we use a proximity graph, in this
case Delaunay triangulation for a scaffolding.
So that gives a nice rigidity and this also gives you a very sparse graph to work with. In that case
all we're going to do now is for each of the edges in the proximity graph we check for the N
points and we trans- -- we map all the polyominos down to that line. And either there's going -so this is kind of the left-hand side of the green part, this is the right-hand side of the blue part, so
either there's going to be overlap, in which case we want to push them apart, or there's going to
be this extra space there and we can pull them together. So it will be one of those two situations
where you either have the overlap or a gap.
Okay. So how to solve that. Well, what we want to do is we now -- we set up an ideal length
factor to be that extra either gap or overlap we're trying to handle, and we now want to put a new
layout with these edge lengths, basically either removing the gap or expanding, get the distance.
And of course for us typically, okay, we want to do something. MDS is all we know, so we pull
out MDS and get the same problem and we just pack it in there and we've got the ways to solve
it, we solve it. All right. This almost works. Slight problem is that as with most of these things,
if you just apply it straightforward, you get too fast to moves and you'll break up the proximity
and the relationship that you had.
So the way to do that is to kind of gradually add into this external information. So what we do is
we damp the movements. Instead of using this raw T here, we use kind of a scale version of T
and we pick these to bound it. And so we iteratively keep moving it and moving it and moving it
until there aren't anymore -- any problems in the layout.
So that, again, mostly works, except that once you've gotten done and you've taken care of the
proximity graph, you look at the problem and you say, oh, there are still overlaps. That's because
some of these things are going to have large -- they have bad aspect ratios which you didn't take
care of by the proximity graph that you've instructed. And so then you have to add a little bit
more extra information, so we put some more edges in here into the proximity graph to handle
those situations, and then we just do this.
So here is the first loop, here is the second loop, and we run it and, again, there's no proof that
this works, but basically in principle and by practice it seemed to do the job.
So an example. This is the algorithm we call STAPACK. This is the initial configuration. And
after one iteration we get this, second iteration we get this, and finally a third iteration we have a
separation. So the areas are now more or less where they were before, rough proximity, but they
have been moved apart, but a lot of extra space.
And if we apply that to the full display, so here it would be one situation. Now some more
things pop in, so there's a slight adjustment, pretty stable, but there have been some changes.
Now some more ones are coming in down here. They'll fill in some spaces, and again you have
a very, very small adjustment as time goes on, as each of these comes in.
And again and mentioned that after time there will be enough changes where these -- you simply
can't have stability anymore and you throw up your hands and say, okay, forget it, I'm going to
start afresh and you just dump everything on the floor and start with a new display.
Anyway, so as I mentioned before, you can do your own demo. Please go to this site and try
that. And then we have the various versions, and you can try the various options and play with
them. It's available. And we've done various changes to it. In fact, they're still being worked on.
So, in fact, it may even break when you try it. Anyway, thank you.
[applause]
>> Lev Nachmansen: Questions for Emden?
>>: [inaudible] the edges are not very readable. Do you consider them not so important?
>> Emden Gansner: Yeah. Basically you can zoom in and get more detail on these things. You
can click on the things and do exploration down that. But at this high level here we don't view
this as being particularly important. The main thing is the clustering at that point.
But, yeah, if you want to go into -- delve into more detail, that's possible.
>>: I like it. If you're really looking at temporal sequences over a long time, is there ways that
you can identify critical moments to look at?
>> Emden Gansner: I guess the definition would be what would be critical. One thing we have
seen is you will in fact -- I mean, one easy way is if you look at the timeline on the left, every so
often you will see a big peak, and that's when something important has happened, and there is ->>: So there may be [inaudible] and turns, so that may be one indicator of a particular
frequency. The other sort of thing is what happens, as I may have missed it, but do clusters
reform regroup and split?
>> Emden Gansner: Yeah, there will be this constant motion of clusters. Because as tweets
come in, whole topics will disappear, new topics will form, some tweets from one cluster will
now in fact be part of another cluster. Yeah. That will be a constant change in what's going on.
>>: Yeah. Okay. All right.
>> Emden Gansner: And certainly needs to be -- if you're really trying to analyze the data, you'll
need lots of other tools that you'll want to run in conjunction with this thing.
>>: So what's the purpose for the TF/IDF scores, just the history or all that you've seen so far?
>> Emden Gansner: Yeah, there's a window. We use a particular window. I forget the exact
size of it. But, yeah, you keep shifting that window. And then you're doing also that. And even
the summations we use for the -- for the timeline, if you pick a point and you want to see the
keywords to that particular moment in time, that's in fact a totally different calculation of the
TF/IDF based on another window.
>>: I'd like [inaudible] I'd like to ask a question on behalf of the T1 people in the audience. So
the map is essentially a planar graph, so why don't you use planarity-based methods?
>> Emden Gansner: Because the map isn't -- well, I mean, the map isn't a planar graph.
>>: [inaudible] you're saying if you think of the countries of concern.
>> Emden Gansner: Oh, yes. But the underlying -- the underlying tweets themselves are very
heavily connected and very nonplanar.
>>: They represent [inaudible] time.
>> Emden Gansner: The representation [inaudible] all the countries. Yes. So what would you
have us do?
>>: [inaudible] planar graph and [inaudible] four more years of planar graph.
>> Lev Nachmansen: Other question. I forgot to mention that this is indeed the T2 track paper,
and in fact the best paper award was given to that paper, so congratulations.
>> Emden Gansner: Thank you.
[applause]
Download