Document 17864885

>> Lev Nachmansen: The first session -- the first talk in the session is on shrinking the search space for clustered planarity, and it will be given by Karsten Klein. >> Karsten Klein: So in this talk I will give some new results on the complexity of the clustered planarity problem. And these are basically derived from graph theoretic reductions of the search space. And this work is done by Markus Chimani and me. And my position is also sponsored by Tom Sawyer Software. So this was a, well, obvious mentioning of the sponsor of the session. So the clustered planarity problem is a longstanding open problem basically since clustered graphs have been introduced by Feng, et al., in the '90s. And the clustered graph is just a standard graph together with a inclusion hierarchy that the given by a rooted tree and the leaves of the tree are the nodes of the graph, and we would like to try out this clustered graph in a way that the clusters are nicely represented by some simple closed region like in a [inaudible] divide. So cluster planarity now an extension to planarity covers some aspects that are introduced because of this inclusion hierarchy which cannot be covered just by counting edge crossings. For example, if you look at this graph without any edges, if I add some matching edges to it, the graph is still planar, and I will tell you in a minute why I have this crossing here, because now using this inclusion hierarchy we can add some clusters that make a cyclic structure out of this nonconnected graph. And I can add a further set of clusters. And now as you can see we can are have a K33-like structure which obviously is not planar, but the underlying graph still is nonconnected and is planar, and it can even get rid of this crossing that would be needed for the K33 by just rooting the edge like this. But now I have an edge-region crossing. This edge here goes through the middle region that represents one of the clusters, and we would like to avoid these when we talk of cluster planar drawings. The same thing can happen if -- even if the edge leaves the cluster because it can just reenter the cluster and partition the cluster, for example, to go around such an obstacle. This is also an edge-region crossing. There's a third kind of crossing besides edge crossings and edge-region crossings that can occur in a clustered graph drawing when we have two regions that cross. So we have no edges that are involved here, but the regions cross. And all of these cases should be avoid when we want to have a cluster planar drawing. So the concept of cluster planarity basically requires that we have a planar drawing of the underlying graph. Each edge crosses the boundary of the drawing of the clustered most once, which gets rid of the edge-region crossings, and we have a so-called inclusion representation of the tree T. And inclusion representation is just a representation of each cluster as a simple closed region such that the subtree rooted at the cluster is represented within this region. For the green cluster, for example, it has a brown and an orange tri-cluster and three vertices, and the brown and the orange tri-cluster regions are within the region for the green cluster and also the vertices and edges in these tri-clusters. So C-planar drawings have a planar drawing of the underlying graph and the two additional requirements. There has been a lot of work for C-planarity. I don't want to go into detail for all of the classes, but I will shortly discuss one of these results because we will use the concepts in the further discussion, and these are the completely connected graphs which were introduced by Cornelsen and Wagner in 2003. A completely connected graph is a graph -- is a clustered graph where for each cluster the induced subgraph of the cluster and the induced subcraft of the complement are connected. In this example, if you look at this cluster, the induced subgraph is connected and also the induced subgraph of the complement is connected, and the same is also for the other two clusters, so this is a completely connected graph. Why is this a useful concept? Well, basically this requirement gets rid of the two additional properties that we have for cluster planarity in addition to standard planarity, which is that we don't want to have edge-region crossings and region-region crossings. And so it's not a surprise that there's this result also by Cornelsen and Wagner stating that a completely connected clustered graph is C-planar if and only if the underlying graph is planar. So for cluster graphs that are completely connected, the cluster planarity is reduced [inaudible]. So how can we exploit this result now? Using again a result by Cornelsen and Wagner, so it's a great paper, you should read it, which is basically saying that a clustered graph is C-planar if and only if it is a subgraph of a C-planar completely connected clustered graph, which means that we can try to find the planar completely connected augmentation of a graph, and if this augmentation exists, we know that the graph is C-planar. And if the augmentation does not exist, we know it cannot be C-planar. So that brings us to the following situation. We have this general problem where the complexities are known. We know that for this very restricted class of graphs, the completely connected graph, the problem of finding the clustered planarity equals planarity. And in between we have an augmentation that could be possible. And the question that we can ask now is what is the complexity of this augmentation? Obviously if you can augment a graph to be completely connected, this augmentation should somehow capture the whole complexity of the clustered planarity problem. So we can ask questions like which edges for the augmentation are always needed, so we have to add them, reducing the problem; which could be needed, so they need to be in the pool when we search in our search space; which are never needed, so we can just remove them from the problem, or are they, for example, equivalence classes where we know we can just pick one of the edges. But we don't know exactly which one we need. So here comes our result interplay. The search space reduction characterizes a set of sufficient edges. And especially a set that is smaller than the whole set of augmentation edges. So let's get into details. First off, I'm going to explain a couple of concepts before I explain our results. We want to achieve complete connectivity, augmenting a graph or trying to find an augmentation and then testing just planarity to find if the graph is C-planar or not. So we have to achieve cluster connectivity for the cluster and connectivity also for the complement of the cluster. If we look now at cluster, we have a couple of chunks, which is just the connected components of the induced subgraph, and we would like to connect these chunks. There are many versions to do that. For example, even for this single node we can connect it to every other node in all other chunks, so there's a huge search space. And we can do that for all of the nodes and increase the complexity of our search space. On the other hand, if you look at the following situation, if you have subclusters in this cluster that connect already chunks, then we know because we have to achieve the connectivity for all clusters, all of the subclusters will be connected. So as some of the chunks are connected now via the subclusters, for the bigger cluster we don't have to care about connecting this chunk, for example, with this chunk, because they are already connected over the subclusters. So instead of connecting all the chunks, it is sufficient to connect these new substructures. This one here we just connected over that subcluster and this one here which has three subclusters, and all we need is connect from top to bottom and not in between the different chunks. Because this is an important concept, we gave it a name. We call these subcluster connected chunks backs of a cluster. And instead of connecting chunks for connectivity, we know now that we only have to interconnect bags. I just use the term interconnect to distinguish between connecting in a bag and interconnecting different bags. So for connectivity we interconnect bags. And a further reduction that we can do is have a look at this bag here. This is a connection to the outside. And imagine this connection would not be there. So what is the influence now of this bag to the overall C-planarity of the graph. If this bag is C-planar and this one is C-planar, then I can just add it somewhere in this cluster, and I don't care where it is. It does not restrict the C-planarity of the remaining graph. And if it is not C-planar, the whole graph is not C-planar. So what I can do with this bag that does not have a connection to complement of the cluster is just test it independently. So in the beginning what we do is a partitioning of the input graph, removing all of the maximum bags that we have that are not connected to the complement of their containing cluster. And in the remainder we just focus on the bags that are connected over vertices to the complement of the cluster. In the following I will call these vertices or the active vertices. So this is for the connectivity. And if you remove the cluster, we want to achieve co-connectivity or connectivity of the complement of the cluster. Now, here the same mechanism can be applied. If there are subclusters or clusters that connect chunks in the complement, we know also because we want to achieve connectivity of these clusters that parts of the chunks in the complement may be connected by these clusters, and we don't need to connect if there's already connection via subcluster, we only need to connect this part, then the whole connected part here with this part and this part, and that is all we need to do. So we have a reduction also for the connection of the complement of the cluster. And basically it's the same concept as for the cluster connectivity, but to make sure that we distinguish in the following between the bags that we have here and the structures we have here, we also give these structures a name, and the bags and the complement are called satchels of a cluster. These satchels are connected via inneractive vertices to our cluster. Okay. So a lot of concepts. And our result on this slide basically is it is enough to interconnect the bags for connectivity and to interconnect the satchels for co-connectivity. And there are two types of connections that we can have for the satchels. The first one is a direct interconnection which just means, for example, this satchel is connected via an edge to this satchel. And there can also be another interconnection because there may be parts of the graph that are not satchels. That is, they're not connected to the cluster. But we can still connect satchels via these parts so that the complement of the cluster is connected. And I call this interconnection indirect interconnection in the following. So now from the concepts to the results. The first result is a lemma that basically states instead of having a complete connectivity it is enough to use the connectivity of the bags and of the satchels. How can we see that? Well, in the one direction of the equivalence it's quite simple. If I have a completely connected augmentation, then obviously all the bags have to be interconnected and all the satchels have to be interconnected. The other way to show that we can achieve C connectivity, we can do a bottom-up induction in the cluster tree. First we look at clusters that don't have subclusters, like this one here, for example. And we know that, then, the bags are just chunks, they don't have other subclusters. So we just connect them. So this subcluster is connected. We do the same for this subcluster. And now these bags are connected chunks. And all we do is connect these bags now in the larger cluster, and what we have is C connectivity. So the same basically is for a co-connectivity. We know that each bag is connected and that the satchels are also basically bags. They will be connected. And also in our augmentation they are somehow interconnected. And what we have to take care of now is only if there are other parts of the graph that are not connected because they're not satchels, that we can planarly connect them to this part. So how can we see that? We do a topdown induction in the cluster tree and want to make sure that a complement of a cluster is connected for a root of the cluster tree that is simple because the complement is empty, so it's connected. And if we look at any other cluster and we assume that the ancestors are already completely connected, let's call this one C and the parent P, we have the following situation. The complement of P is already connected. And we have some satchels of C that are interconnected. And then there are maybe parts that are not connected so far. Where can these parts lie in the graph? They cannot lie here because then they are satchels. They cannot lie here because then they are connected to the complement. So the only way we can position them is that they are here. What does it mean that they are here? They don't cross cluster boundaries, which basically means they are independent. So I can just check them for C-planarity. If they are C-planar, I just put them somewhere in P and connect the whole thing and I'm done. Okay. So this proves our lemma. What we now know is that we only need to connect bags and satchels but we can further reduce our search space. Intuitively we can do the following. We just use outeractive vertices for the connection of the bags, and we use inneractive or outeractive vertices for the connection of the complement. So in this example it would be these two outeractive vertices are connected with each other, and these two are connected, and then the whole cluster is connected. And for a complement, we just choose a couple of vertices that are inneractive, this one and this one and this one are inneractive to this cluster, and this one is inneractive to this cluster. What we can get rid of are nodes that are inneractive but deeper in the hierarchy than our cluster. So we only need inneractive outeractive vertices that are at least at a level of this cluster. So the theorem, much less intuitive, is saying a clustered graph with a planar -- underlying graph of C-planar, if there is a planar augmented graph that satisfies basically two criteria, the first one is the outeractive bags of each cluster are directly interconnected using only outeractive vertices. And the satchels are interconnected using only vertices that are either inneractive or outeractive with a level not deeper than the level of the cluster. So I guess as I only have one minute I'll just skip the proof and come to our last result, the fixed parameter tractability. We have now shown that we can use shrinked set of edges for the augmentation, and now imagine we have a bounded number of inner- or outeractive vertices, then we know that we only have a bounded number of possible augmentation edges, and also this gives us a bounded number of augmentations. And for each augmentation the problem of deciding C-planarity is just there is not planarity, which means if you have a bounded number of augmentations, we can basically decide C-planarity in linear time. Okay. So to conclude, we have a way to shrink the search space for C-planarity testing. Sorry, I'm in some order. We have FPT algorithm based on that. And we have an ILP-based implementation of our approach in OGDF. And what we would like to do is give better characterizations of needed substructures. For example, not just based on edges but based on paths. And also use further structural information like triconnected components which partially fix the order of edges, and this could also lead to a further reduction of the search space. And also we had yesterday this alphabet of gamma drawings, and I think this is somehow related, but I'm not really sure how it is related. So this should be something to investigate, and the conjecture is that C-planarity -- I don't know -- okay. I guess that's it. Thanks. [applause] >> Martin Gronemann: Welcome to my talk. I'm Martin Gronemann, and this is joint work with my supervisor, Michael Junger. And I will talk about a similar topic than Karsten, drawing clustered graphs but as maps, more specific topographic maps. As an introduction, I will give a -- tell this old story how we actually came up with this idea. It was not like that we wanted to draw clustered graphs. It's more that we started with hierarchical clustering and what Karsten referred to as this inclusion hierarchy, as drawn to the right, so that is what you get. You can see that the edges. And the question at some point is you do not only want to cluster a network, you want to see the result. And most of these methods, they are based upon -- they are rectangle based, and that works for smaller instances, but if the hierarchy gets deeper and the number of nodes increases, this is a bit difficult to look at. And, yeah, then at some point you start about what happens if I draw a clustered graph by hand and you end up with this inclusion representation, and if you add colors to them, you use map colors, you end up with the topographic map. This is the result. It's a small graph result of our method, and it follows a basic principle. So nodes in different clusters are separated by a valley or water, so something that is lower than the nodes. And in the same nodes that are in the same cluster, they are on the same plateau or elevation level. Why is this in general a good idea? Topographic maps are easy to understand. Everyone knows them. Even people outside [inaudible] you don't have to explain them the whole hierarchy thing. And, yeah, this turned out to be an advantage that we were not intended to have. Yeah. This is the outline of the talk and of the algorithms. So I will not talk about the clustering. I will just talk about the layout, that is a treemap approach, and how you can obtain a triangle mesh that describes the elevation. And this is in the end use to do the edge routing as well. Okay. Let's start with a treemap. Treemap is very simple thing. You have in our case a binary weighted tree given, and our treemaps, unlike the one that showed [inaudible] before, based on convex boundary -- on convex polygons. So you start to recursively subdivide the polygons and, yeah, the area of course has to be proportional to what you want to fit inside this polygon. And, yeah, this is a small example. You can see how we recursively split this. And in the end you get a nested structure of convex polygons and not rectangles. And the graph nodes that form the leaves of the cluster tree are just placed in the center of their corresponding polygon. Yeah, the question is how do you partition such a polygon. You know that you have to make a cut somewhere so that the area is proportional. But you want a nice shape. So you don't want any very thin polygons, and nice shape is defined as small aspect ratio. That basically means that you want as much area but still with the low diameter. The diameter is the maximum distance of two vertices. Yeah. The -- this partitioning or this -- how you cut the polygon basically is only then defined by the orientation of this cutting line. And we use the fat polygon partitioning of de Berg, et al. I will not talk about how this actually works. Okay. We have a layout for the nodes, and what we now do is we regenerate a triangle mesh for the elevation model. And in the beginning we applied Delaunay triangulation that is constrained by the boundary polygon. So the black dots here are standard vertices. And when you now consider a triangle in this triangulation, yeah, there are basically two options. Here the triangle -- all three vertices of the triangle correspond to graph nodes in the cluster hierarchy, so two leaves. In the boundary there are these Steiner points from the Delaunay triangulation. We think of them as part of the root. So there's some [inaudible] corner points. This is -- will be the boundary of the map. So when you now consider the edges, the edges of such a triangle, they induce a path in the cluster tree. And they do not only induce a path, there's a -- the lowest common ancestor for each of these three pairs. And what we do now is we refine this mesh further and actually insert representatives for these cluster nodes. So we subdivide a training angle into four subtriangles, and on each of the edge we insert a new vertex that represents the lowest common ancestor of the two end points of that edge. And here on the right you can see how it looks for the complete mesh. Yeah. So far we did everything in the plane. So we want a topographic map, and the topographic map is usually modeled by a 2N half D mesh. So what we now do is we are going to lift this mesh. And to do that we first assign the tree nodes in elevation level. And to think about that, you better turn around the tree so it looks like this. So the root is the lowest point on the map and, yeah, then here these clusters. They are going to form mountains and on top of that later the node's allocated. Okay. Now, the idea is straightforward. We just take these elevation levels and transfer them to the mesh, and this results in the 2 1/2 D mesh. For the final drawing we rasterize the mesh so you get an image, a raster image, and, yeah, you can do this with cubic Bezier triangles to get smooth boundaries. And I would -- if I have time I will tell a little bit more about this step. I did it here by hand, so this is what it looks finally. Okay. So what is missing. The edges are missing and the idea is straightforward now. So what we can do is we use the mesh that describes the landscape to -- as a routing network for some shortest path-based edge router. And, yeah, for those who don't know about that, it's very simple. You'll have an edge here in the graph to the left and you have two graph nodes in the elevation mesh. And you compute a shortest path just from the starting node to the end node based on some distance metric. Yeah. To have smooth curves, like we use the cubic Bezier triangles for the elevation map, the nodes found on the shortest path from the source to the target node, they are just used as control points for quadratic graph or something like that in the final drawing. Okay. One problem. So when you consider this red edge here from A to D and the shortest path from A to D and the routing network, you notice that there are some unnecessary turns. And while you can get from C to D in this nearly straight line, this does not work for A and D. And what we do now to solve this problem, we insert additional edges. So A is adjacent to two cluster nodes, the green and the gray one, but not to the third of that triangle. And we just insert this edge now and as a result you get a straight line. This is just to compensate for the problems during the subdivision. Okay. The thing is that this is -- on the right you can see how it looks for the complete mesh. I skipped some parts on the boundary. But the problem is with the distance function for the shortest path area. These newly inserted edges, they allow shortcuts. It's not really shortcuts, it's more that these edges might serve as a bridge over a valley. And so we split the distance function into the Euclidean distance in the plane and, yeah, it's more like the tree distance. So the edge -- an edge has not -- when it's kind of a bridge, you have to account for the complete where you go down the valley and up, and you have to account for that. Best example here is if you go from B to this node, you are crossing this value here. And if you just take the Euclidean distance, you're not going to account for that. Yeah. This is the final drawing with all edges routed. And, yeah, I described the treemap layout, the triangle mesh and the edge routing, and this leads to this elevation model and there's a raster graphic and the edge layout in forms of control points for curve. And what we did then is we put all these things into tools from cartography to, yeah, which are usually used to draw real maps, so instead of using something from graph drawing, we went for something that produces real maps. And for the graph drawing collaboration network, it looks like something like this. And if you go down this road even further, you can get something like Google Maps. It's very interesting that while you are doing this there are a lot of common things between cartography and graph drawing. Especially you can save a lot of work. And, yeah, you can try it. It should work. You can search for things. It's searching for people and it searches the abstract of most of the graph drawing articles I think until 2011. And, yeah, it works like Bing or Google Maps, so it's quite easy to use. Yeah. Ongoing and future work. One can argue that the treemap approach has the big problem that it does not care about the edges. And that's true. It does not care about the underlying edge set of the graph at all. So this is -- these are two pictures from an application in chemistry. And there's a problem that the hierarchy is part of the input. So in the original approach the clustering is part of the layout. So there's an indirect link between the edges and the tree. And this is not the case here. And we had to do something about it and we modified the partitioning approach, so it takes the underlying edges into account. And this works quite well. It's probably a bit of improvement. But in general it works quite nice. The other thing is the shoreline. You notice that there is a very big difference between water and land. In the GD map, this has no meaning at all. It does have a meaning in this application from chemistry. But, yeah, we are working on this, so probably the max modularity communities would be a good idea to set the water level at. And the third most important thing is the edge routing. There are two problems. There is the visual appearance. You can see here that it's really hard to follow edges. The other thing is the runtime. The runtime is not acceptable for larger graphs. It's in the end something like all pairs shortest path and, yeah, it just -- it's not practical. There are probably ways to improve that, but this question should be answered first. So what we did is we tried something totally different, like hierarchical edge bundles from Danny Holten, and, yeah, just to resolve this problem by moving these bundles into real parallel bundles from the -- but this is not -- yeah, it's still in development and it's not that nice. So if there are any suggestions, we'd be really happy to hear about that. Okay. That's it. >> Lev Nachmansen: Thanks, Martin. [applause] >> Lev Nachmansen: We have time for questions. Questions? I have a question. Are you filtering out some of the edges when you're showing the map of the GD collaboration? Are you filtering out some of the edges? >> Martin Gronemann: No. >> Lev Nachmansen: You were showing everything? >> Martin Gronemann: Yeah. Yeah. But still that is a very sparse graph, and I tried, for example, this [inaudible] graph from -- was it last year, the GD challenge, and that looked really horrible. >>: So you may need to remove [inaudible]. >> Martin Gronemann: Yeah. I have to do something about it. Even if you have something like a hierarchical edge bundling, it's still too much. >> Lev Nachmansen: Other questions? Thanks, Martin. [applause] >> Soroush Alamdari: I'm Soroush Alamdari. This is work with Therese Biedl, [inaudible] drawings. I will get to what they there in a second. So just a review of the basics. So a planar drawing is a drawing with no crossing. A planar graph is a graph that has some planar drawing, and a plain graph is a planar graph with a fixed embedding. And a straight-line drawing is a drawing in which all edges are straight line segments. So, okay, we hopefully know this. So what's the rectangle of influence drawing? A rectangle of influence drawing is a drawing, is a straight-line drawing, in which the axis aligned rectangle for the -- for each edge, the axis aligned rectangle that has its edge as its [inaudible] is empty in the sense that it contains no other vertex. So, for example, this is a rectangle of influence drawing. This is two if we consider rectangle of influence drawings, if we consider the rectangles to be open. But this is not because of this rectangle and the edge that it contains, induces it. So it has been studied in strong and weak instances, closed and open, exact and approximate. Approximate will be the next talk [inaudible]. And so yeah. So this is the continued -- this talk is the continuation of the talk -- the work of our work last year which was in turn inspired by work of Miura, Matsuno, and Nishizeki. So we want to characterize RI drawings. So RI drawing, I mean by RI drawing open weak planar rectangle of influence drawings. So the goal is to test if a given graph G has an RI drawing. So we study the problem for plane graphs and planar graphs. For planar graphs we proved the problem to be NP-hard. For plane graphs we can't solve the problem, so we change the problem. So we add some properties to the RI drawing that we are looking for, then we can stop it. >>: [inaudible]. >> Soroush Alamdari: Of each. Okay. So I didn't tell you what strong is. So ->>: [inaudible]? >> Soroush Alamdari: Yeah. So weak is what I just said, but the way I introduced it, it was weak. So a strong RI drawing -- in a strong RI drawing, if an edge could exist, should be there. So yeah. So we rather be talking about weak RI drawings. So thanks for asking. Yeah. Before going to RI drawing, I want to -- well, yeah, introduce a labeling, type of labeling may be [inaudible]. So we label the angles of a straight-line drawing by just counting the axis inside each angle. So here we will have labels 1, 0, and 3. And this -- so what happens if we have an axis aligned edge, so we assume that edge contributes half a point to each of the two angles that it appears to. So, for example, we have 3.5 and .5. So this type of drawing captures the properties of some other labelings that have been proposed for angles, angular labelings. So might be of interest. And they are uniquely defined for each, a straight-line drawing. So let's go back to rectangle of influence drawings. We will -- first let's look at the simplest like -- yeah, simplest type of graphs that we can look, the graphs that have triangle or outer-face. So if you have a triangular outer-face, the outer-face basically forbids us to place the points anywhere that is hashed here, so the only points -- the only place that vertices can be are on the green lines here. So the points that -- the vertices that are inside the triangle. So I'm drawing the graph above down there as an RI drawing. So if we don't -- so let put all four aside, the rest of the graph basically should be drawn on two lines. And that's not hard. So drawing a graph on two lines would be -- would require us to find a chain of cycle. That's not hard and can be done. So basically we can find an RI drawing if the outer-face is a triangle. And more than that, we can check all the -- we can find all the axis-count labelings that are -axis-count labelings of the outer-face that are realizable in a rectangle of influence drawing of the graph. So for graphs that have triangular outer-faces. So let's go to general graphs. So scary. Would be scary. So what we do is that we first -- so I will just overview the algorithm. What we do is that we remove the inside of field triangles, the triangles that have stuff inside them, so that the graph is simplified, it has no field triangles anywhere. And doing this we have some angles that are forced to have axis-count label 1 in the final drawing. If we respect this label, these forced labels, in the end we can put back in where we removed in this step. Next step we switch to some -- to the dual space. Well, some graph that is defined somewhat weirdly. But it has -- it has a vertex for each triangular face and a face for each nontriangular face. So the blue graph is the graph that they do a white graph. And we translate the labels that are -that should have label -- axis-count label 1 in the final label -- final drawing to the labels, to the corresponding labels, of the dual-like graph. So you have a graph in dual space with some of its angles marked. So this is the -- well, this is the step of the algorithm that will fail if we cannot draw our graph as an RI drawing that satisfies those conditions. So basically we check if this dual-like graph admits an orthogonal drawing that satisfies some -again, some properties that I'm not going to go into exactly, but, for example, the labels that we have marked should be right angles. So -- and this can be done in polynomial time. This orthogonal drawing with some angles restricted to be right angles can be done in polynomial time. So if this step fails, there is no RI drawing that we want. But if this step goes through, we have an orthogonal drawing without bends. And the next thing we do is that we add some edges to the graph, so the N vertices, so that the graph becomes a rectangular drawing. From this rectangular drawing, we go to the -- back to the primal. So remember that we are in the dual space somehow. So we go back to the -- we take the dual of this thing, except for the outer-face vertex. So this graph now is a supergraph of the original graph that we had, not original, after removing of inside of field triangles. So now what we do is that we use the structure of the rectangular drawing to extract a labeling for this new graph that we have such that the labeling is realizable as an axis count labeling of an RI drawing. So the way we do it is for each angle we count the number of right angles that that angle faces. So, for example, here we have one for this angle, 0 here, 1, and 2 here. The labeling that is extracted has some good properties. So Miura, et al. gave some properties of a labeling of angles of a graph that if those properties are satisfied, then that graph can be realized as an RI drawing such that that labeling is the axis-count labeling of the drawing. The labeling that we extracted satisfies these properties. So we use their algorithm to construct the RI drawing. Then recall that we -- some of the faces -we add some faces to the graph that turned into some of the vertices of this graph that we already have, so we need to remove those dummy vertices. This is the frame graph, the graph with the inside of filled triangles removed. Now, what we have is that the angles that we marked have now axis-counts label 1. So we can put back in. So because we have respected this property throughout the construction, we can put back in the vertices of the filled angles that we removed, so by the algorithm that I initially explained for graphs with triangular outer-face, so each of these triangles can be filled with this -- using that algorithm. So we have an RI drawing of our initial graph. So let's look at -- so this was for plane graphs. Let's look at planar graphs. For planar graphs we have much more freedom for how we embed the graph. We actually embed the graph. And, for example, for orthogonal drawings without bends, the problem is NP-hard. Garg and Tamassia show that it is NP-hard. We use this same -- we use this same idea, we use the same approach, to prove that this problem -- RI drawing -- open RI drawing of -- finding open RI drawing of planar graphs is NP-hard. The result is Not-All-Equal 3-SAT. So the idea of the proof is that the idea of showing hardness is that -- so this is what appeared in the paper too -- so is that to construct a graph that with -- a component with a subgraph that is attached to the rest of the graph through two vertices and is connected enough so that it has only one -- it needs only one embedding in its flip and its mirrored version. And these two drawings will give us a truth assignment to our variables. So each of these such components will be corresponding to a variable, a truth assignment to a variable. So yeah. So review. What we prove is that -- what we give is an algorithm to test if a given graph has an RI drawing that satisfies the nonaligned frame properties. So this is the property that we added, we needed to add, in our aligned frame property. This is not a natural property, but is not very restrictive. It doesn't happen very often. So yeah. So for plane graphs this is what we have. We need it for the proof. We can't get rid of it. And the runtime is N to 1.5. For planar graphs we prove that the problem is NP-complete even if we add the nonaligned restrictions. So for the nonaligned case, the question is like solved basically, this specific question. So yeah. Yeah. So two questions that remain are fast -- other faster algorithms, and so this one is good if it can be done. Can we get rid of the nonaligned constraint? We tried; we couldn't. Yeah. I finished early. Thank you. [applause] >> Lev Nachmansen: Questions. >>: Is it obvious that the problem [inaudible]? >> Soroush Alamdari: It's not obvious. The -- the reason -- so -- so the reason for that is that the only thing that you need to specify for an RI drawing is the order of vertices. And that can be given. So you don't need to -- if you give me the order of vertices in horizontal and vertical directions, I can tell you if that's an RI drawing or not. You don't need to -- so then therefore it is an NP. >>: Is it some type of special algorithm, or is that ->> Soroush Alamdari: No, no, it's not. So the only thing that you would need to see if a -- if a vertex is inside a rectangle defined by two vertices is that it is between that both in horizontal and vertical ordering, the vertex that we want to see between the other two. Do you see what I mean? >>: [inaudible]. >> Lev Nachmansen: Other questions? I saw another hand earlier. >>: I didn't quite understand how you moved to the [inaudible] kind of dual in the beginning, very many vertices, the blue ->>: Yeah. So I -- I tried to -- like I thought I would be -- it would take long, so tried to be fast. So what we do is that we -- for each triangle we leave the triangular faces as they are. For nontriangular faces we add the vertex inside each of them. And for each instance of a vertex on a nontriangular face we add three parallel edges to that dummy vertex that we added. So here you see for each, for example. In this instance, for example, this vertex has two instances [inaudible] so it has [inaudible] to the dummy vertex. Now we take the dual. So the reason for this is that when we are drawing -- so when we are drawing our rectangular orthogonal drawing, we want these -- this edge to be able to bend. So we need to -- we need these parallel edges. >> Lev Nachmansen: Other questions? Let's thank our speaker again. >> Emilio Di Giacomo: Good afternoon. This work is a joint work with [inaudible] Liotta and Henk Meijer. It is again about rectangle of influence drawings, but here we use -- we study an approximate version of the problem. So let P and Q be two points in the plane. The rectangle of influence of P and Q is a -- the rectangle of rho P and Q. I think P and Q as an opposite corner. So this the rectangle of influence of these two points. And if the boundary of the rectangle is considered to be part of the rectangle, we say that it is a closed rectangle of influence, otherwise we say that it is an open rectangle of influence. And then a rectangle of influence drawing of a graph is a straight-line drawing such that for every edge U and V the rectangle of influence of U and V does not contain any vertex existing from U and V. And for every pair of nonadjacent vertices U and V, the rectangle of influence contains at least another vertex. So this is the strong version of the problem. So, for example, this is a rectangle of influence drawing. You can see that for the existing edges the rectangle of influence are empty while for nonadjacent vertices the rectangle of influence contains some other vertices. Okay. This type of drawing has been introduced and is studied in this paper by Liotta, et al., in 1998. And in this paper they characterize the class -- the graphs, drawable graphs, I call them, to the rectangle of influence drawable graphs in some families like wheels, cycles, trees, outerplanar graphs, and so on. I will not enter into the details of the result, but what is interesting for our paper -- for this talk is that the class of graphs that can be drawn is very restricted. I mean, not all cycle, not all tree, not all other planar graphs can be wrong. So if you want to draw something more, you need to relax the problem some way. And one possibility is to use the weak version of the problem, where you just care -- you just require that for the edges the rectangle of influence does not contain anything while you don't care about nonadjacent vertices. And we have seen the previous talk about this. Another possibility is to use an approximate -- the approximate version of the problem. And the approximate proximity is a concept that has been introduced last year in a paper by Evans, et al., where a general framework for the study of approximate proximate drawing has been introduced. And so we will study the problem within this framework. So let me define what is an approximated rectangle of influence drawing. So let rho P and Q be the rectangle of influence of P and Q and let epsilon be a given value. The epsilon expanded the rectangle of influence of P and Q if the rectangle obtained by enlarging the rho P and Q by a factor 1 plus epsilon, while the epsilon shrunk rectangle of influence of P and Q is a rectangle obtained by shrinking rho P and Q by a factor 1 over 1 plus epsilon. And now we -- so for this -- this is a rectangle of influence. This is the expanded version. That one is the shrunk version for some epsilon. And now we can define an epsilon 1, epsilon 2 rectangle of influence drawing of a graph. It is a straight-line drawing with the following property. For every edge U and V, the epsilon 1's shrunk rectangle of influence does not contain any vertex [inaudible] from U and V, and for every pair of nonadjacent vertices the epsilon to expand the rectangle of influence contains at least one vertex. So the idea is that for the adjacent vertices I use a small rectangle and for nonadjacent vertices a larger rectangle. So this makes my life easier. And so let's look at an example. This is not a rectangle of influence drawing because these two vertices are not adjacent, but the rectangle of influence is empty. And these two vertices are adjacent, but the rectangle of influence contains something. But if we choose epsilon 1 and epsilon 2 equal to 0.5, we have that the expanded rectangle of influence of the two point contains the whole drawing, so contains some other vertex. And the shrunk rectangle of influence of these two points does not contain anything. So this is an epsilon 1, epsilon 2 rectangle of influence drawing for epsilon 1 and epsilon 2 equal to 0.5. Okay. So we study what the graphs need on epsilon 1, epsilon 2, rectangle of influence drawing for different values of epsilon 1 and epsilon 2. And we also investigated the area of the drawing. In particular we investigated whether it is possible to obtain polynomial in some cases. Okay. This is the list of results. First of all, we proved that every planar graph has both an open and a closed epsilon 1, epsilon 2 rectangle of influence drawing for every positive epsilon 1 and epsilon 2. So if epsilon 1 and epsilon 2 are positive, we can draw all planar graphs. On the other hand, if one of the two parameter is 0, either epsilon 1 or epsilon 2, there exists planar graph that cannot be drawn. And so motivated by this result we investigated the case when one of the two parameter is 0, in particular we concentrate on the case when epsilon 1 is 0. And we study other planar graphs. So we prove that every planar graph needs both an open and closed 0 epsilon 2 rectangle of influence drawing for every epsilon 2 larger than 0. But the drawing has an exponential area. So we try to reduce the area. And we can do it for every other planar graph, so every other planar graph has a drawing N to the fourth area, but we need that epsilon 2 is at least -- is more than 2. And if epsilon 2 is less than -- is less or equal to 2, we can -- we are able to draw only binary trees. Again, in [inaudible] given by that formula where the exponent depends only on epsilon 2, so for fixed at epsilon 2 is a constant. Okay. So in the remaining part of the talk I try to give you the idea behind this -- the technique -- the idea of the technique behind these results. Okay. Let's start with a planar graph and a positive epsilon 1 and epsilon 2. Here the technique is very simple. The idea is to construct the drawing of vertex per time according to the canonical ordering. So at some point I have drawn the graph [inaudible] by the first K minus 1 vertices, and they have to have the K vertex VK. And the idea is to place it sufficiently far from the existing drawing. And what does sufficiently far mean? Well, consider one vertex in the existing drawing. If I can -- if I place the K very far, you have the expanded rectangle of influence of these two points contains the word drawing, and at the same time the shrunk rectangle of influence does not contain anything. So basically I can decide -- I can connect the K to this vertex or not. I can choose. And so I can connect VK to all its adjacent vertices. And with some geometry and some math, you can see that the distance that we need is given by this equation, and these two terms depends on epsilon 1 and epsilon 2, and on D, which is the diameter of the existing drawing. There are some other technicalities, but basically this is the idea. Okay. Let's look at other planar graphs. Okay. In this case epsilon 1 is 0. So for the edges I consider the usual rectangle of influence for the [inaudible] that I use the expanded rectangle of influence. Here the idea is to compute -- okay, to compute a BFS tree of the graph, to do the tree and then to add the remaining edges. So this is another planar graph. The bold edges define a BFS tree. And this is another drawing of the same graph. Okay. Now, in order to draw the tree, I first draw the star and use it by the root and its children. And to do this I choose a real number P such that P is at least 2 over epsilon 2 plus 1. And then I draw U the roots of the origin and draw the child number high at the point which coordinate P to Y minus 1, P to K minus I where K is the number of children. And so, for example, for six children I have a drawing like this, where ERP is two. And in this drawing you can see that for the edges the rectangle of influence is empty, so the edges are -- can be there. And now consider two nonadjacent vertices, in particular two consecutive children, UI and UI plus 1. Well, we have that the width and the height of the rectangle of influence delta X and delta Y are given by these two equation, by these two. And if you now consider the expanded version of the rectangle and we consider this enlargement, this delta X prime, we have the delta X prime is epsilon 2 over 2 times delta X. And doing some math you can prove that this is more, is larger than P to I minus 1. But P to I minus 1 is the ex-coordinate of this point. So this means that this distance is more than the distance of this point from the Y axis. And [inaudible] you can prove that this distance delta Y prime is more than the distance of this point from the X axis. And this means that the expanded rectangle of influence contains the origin, so contains the root of the tree. And so these two point can be nonadjacent. Okay. Now we have drawn the first two letters and we can draw the subtrees rooted at each child recursively inside these boxes. And these boxes must satisfy this condition, I mean, the dimension must satisfy this condition, and again I will not enter into detail, but if you satisfy this condition, you can prove that choosing a vertex from a subtree, say TI, and another vertex from other subtree, say TI plus 1, the expanded rectangle of influence contains UI and UI plus 1. So the two points can be -- the two vertices can be nonadjacent. Okay. So this is the drawing of the tree. But now we have to add the edges of the other planar graph that are not in the tree. And since this is a BFS tree, I mean, since this is another planar graph and we perform a BFS visit, the edges are either between consecutive children, like 2, 3, and 4, 5, yeah, 2, 3, 4, 5, and then in this case the rectangle of influence is empty, or you can have edges connecting consecutively vertices that don't have the same parent, like say 10, 11, or 12, 13, and also in this case you can prove that the rectangle of influence is empty. And also you can have edges like this one, 12, 4, so from different level but, again, consecutive points. So also in this case you can prove that the rectangle of influence is empty. Okay. So this is the technique for other planar graphs. And it's needed to see the drawing as exponential area. I mean, even the drawing of the star as exponential area. So we want to try and reduce the area. And, as I say, we can do it for all other planar graphs if epsilon 2 is more than 2. And indeed the technique is slightly more general because we can draw not only the planar graphs but proper track planar graphs. And proper track planar graphs are basically a level planar graph, so the vertex can be placed on levels in a planar way, but the edges can be only between consecutive levels, and this is the meaning of the word proper. But, on the other hand, we allow edges within the same level. And this is why I call the graphs track planar instead of level planar. But notice that since the drawing must be planar, the vertices in the same level must connect several vertices on the level. And it's easy to see that other planar graphs are proper track planar graphs. Okay. Now, if we have a graph like this, I draw it this way. I place the vertex of the first level on X coordinate Y, 2, 3, 4, 5, and on Y coordinate 5, 4, 3, 2, 1, so there on a straight line and slope line is one. Then I place next level similarly on another line level with [inaudible] minus 1 and so on. And then I up the edges. And you can see again that the rectangle of influence of the edges is empty. I mean, it's immediate for these edges because the two points are consecutive. There is nothing in the middle. But it's true also for vertices on a different level because this rectangle is contained in the strip defined by the two lines with slope minus 1. What happens for nonadjacent vertices? Okay. Consider, for example, these two vertices. They are not adjacent. Since epsilon 2 is at least -- is more than 2, the expanded rectangle of influence include contains the two neighboring vertices 12 and 15. And so the two point can be nonadjacent. And you can prove that also nonadjacent vertices on the front level -- I mean, the expanded rectangle of nonadjacent [inaudible] contain some other vertex. So this is the technique. It's very easy. But what is the area? If you look at the drawing that I showed you, the area is N squared because it is N minus N times N minus N. So where does the N to the 4th come from? The problem is that I didn't say the truth. Because I told you that if I consider these two vertices and then consider the expanded rectangle of influence, it contains the neighboring vertices. But what happen if these vertices are not there? So what happen if the level has only two vertices? Well, in this case, the algorithm doesn't work. Of course we can fix it, this problem, but to fix it we have to enlarge the drawing and we get the N to the 4th [inaudible]. Okay. Let's move to the last result by our trees, epsilon 2 smaller than 2. And this is the more complex technique of our paper. We use a recursive drawing technique based on a greedy path decomposition. A greedy path is a path in the tree that goes from the root to a leaf and is constructed this way, a sub from the root, then it choose the subtree with more vertices and they continue constructing the path in that subtree. So I go to U2 because it is the root of the largest subtree. And then [inaudible] go on and they have a path and a set of subtrees attached to the path. Okay. And a set of subtrees attached to the path. And then I can [inaudible]. So now the drawing technique, the idea of the drawing technique is to draw the path and then to attach to the path this drawing of the subtrees computed recursively. And we assume that the drawing of each subtree [inaudible] satisfies a tree invariance. Okay. First of all, it is a valid drawing. The second one is that the root of the trees on the left border of the drawing, and there is nothing below it. And finally the drawing is completely containing a bounding box whose dimensions are bounded by these two functions. Okay. So now we place the vertices of the heavy path on a straight line with slope minus 1. And we attach the subtrees to the path. So here the first subtree is gamma 0, and we place it at a vertical distance, which is 2 over epsilon times the height of the drawing. Then we place next one and we have that this distance is 2 over epsilon times the maximum between the two widths. And this distance is 2 over epsilon the maximum between the two height and so on. We have some special case when the subtree is only one vertex, but in the end we have something like this. And now you can see that for the existing edge, again, the rectangle of influence is empty. This is easy to see for these edges and for these edges. And for the vertical edges it is also true because in the case of the vertical edges the rectangle of influence coincide with the segment, and by the second invariant there is nothing here, no vertex here. And so it's okay. Now, if we consider nonadjacent vertices things are slightly more complex because there are many cases to consider, we'll show you just one case. So suppose I have a vertex in gamma 0 and a vertex in gamma 1. Well, the rectangle of influence of these two vertices contains the rectangle of influence of P and Q of these two point. And if I look at the rectangle of influence of P and Q, its dimensions are delta X and delta Y, and the enlargement of delta X prime and delta Y prime is given by this equation. So that X prime, for example, is epsilon over 2 delta X. Which, according to the choice of this distance and this distance, is -- sorry. According to the choice of this distance, is more than both the width of the two drawings. And analogously delta Y is larger than the height of the two drawings. And so the expanded rectangle of influence contains both subdrawings, and so these two point can be adjacent, but every vertex here and every vertex here can be nonadjacent. Okay. About the area. According to invariant tree, the drawing is contained in a bounding box whose dimensions is this, and with some calculation you obtain that the area is this one. Okay. Some open problems. Okay. The first open problem is to improve -- to devise a technique to compute drawings with polynomial area for general planar graphs when epsilon 2 is smaller than 2. I mean, when epsilon 2 is smaller than 2, we can only draw binary trees. It would be nice to draw all our other planar graphs. Then we studied the case when -- we studied the case when epsilon is 0 and epsilon 2 is larger than 0. And in this case we study basically only other planar graphs. So two -- it can be interesting to study other families. We have a some further result in this -- some simple result in this case. And finally we can start -- it would be nice to study the semantic case when epsilon 1 is larger than 0 and epsilon 2 is 0. And also in this case we have a very simple result. But okay. Thank you for your attention. [applause] >> Lev Nachmansen: Questions for Emilio? >>: I have a question. Given the practical applications, are there any experiments on perceptional -- when you look at weak, strong ->> Emilio Di Giacomo: No. Not that I know. I mean, we didn't make an experiment, and I think that's never been done. I don't know. >> Lev Nachmansen: Anything else for Emilio? Thanks, Emilio. [applause] >> Emden Gansner: So thank you for coming. So this is going to be talking on graphs and maps, looking at dynamic data, particularly on an area with regard to social networks. We had some discussion earlier today. Okay. So, again, as been mentioned earlier, looking at online social networks and then they involve that as a very hot topic, been mentioned several times in various context. But there's a problem. For example, this is a trace of a listing of various tweets coming in over a small period of time, so it goes on and on and on and on and on and keeps -- and there's these millions of people out there constantly tweeting to each other. You know, why -- how they could be spending their time better, I don't know. But they're doing it. Okay. So, for example, if you're just looking at the tweets that contain the word "knees," we're seeing up like 34,000 tweets per hour on these things. Okay. Now, this isn't big data, per se, but it's certainly pleasingly plump data. Okay. And so the question is is there some way that we can use visualization to help to handle this to analyze this stuff to get a handle on this thing. All right. So this is why I do a brief change here and pull that down. And with luck it will still be here. Yes. Okay. So this is -- we're going to -- this is going to be a topdown talk. I'm not going to tell you all the stuff and hear pieces and at the end I get you to the conclusion. We're going to start with a chase. Okay. This is a little piece of software we put together called TwitterScope. And the idea is is it's going to give you a chance to look at a view of data from a dynamic flow of tweets coming in for every tweet on a very particular topic. So you pick some particular term, and we're looking for all tweets that contain that term. And we're trying to analyze these things and see how do they cluster together, how are they related. Now, we have built in this thing a few basic topics. So let's just start with, say, for example, visualization, which seems to be appropriate. And what do you mean? Oh, dear. I don't see Bing on there, do I? Okay. I mean Internet Explorer, sorry. Dear, dear, dear. Okay. Well, this is going to be a faster demo than I thought. >> Lev Nachmansen: [inaudible]. >> Emden Gansner: Oh, I don't suppose there would be one of those little browsers [inaudible]. Too much to expect. All right. Well, there goes part of my talk. >> Lev Nachmansen: Didn't you have some videos? >> Emden Gansner: I do have videos. >> Lev Nachmansen: What page? >> Emden Gansner: On that page? >> Lev Nachmansen: Yeah. Where you just were. >> Emden Gansner: Where I just was. Yes. Okay. I could try that. [multiple people speaking at once] >> Emden Gansner: Okay. This will be an old video, but we can try it. A little help here? Okay. Maybe not. Okay. Well, maybe I won't show it. [multiple people speaking at once] >> Emden Gansner: Okay. So this is more or less a version we see in this thing. So this is probably a topic of news or something, and these are all being clustered together. And so what you'll see are there are various topics coming in. Each tweet that comes in has an icon associated with it. And each cluster has a variety of terms that we determine as most identifying and shared amongst those various tweets. And then on the left-hand side, it's very hard to see, is a timeline, historical timeline, and so the fact you see the learning tweets coming in at the top and each one as they come in you'll see it expands up to see what the tweet looks like. And then make sure you have a time on the left here where you can see the number of tweets coming in over a period of time, and you're able to move the cursor down to various things and check these things. I'm not too -- oh, yes, for example, here's a case where you can click on one of these things and go right to the particular article that it was referring to. I'm not too concerned about not giving you this demo right now because in fact this is all online, and I'll give you the URL at the end and you can play with it to your heart's content and see what was really going on. But this is a rough idea of what's happening. So let me now stop this and go back to the top before I get any deeper in this thing. Okay. And back to -- okay. So that was the demo. Anyway, so we put this together. And it's currently being used by several organizations within AT&T. And this is where I can make a confession. Another example where my ability to predict things that are popular is totally wrong. I mean, I really thought this -- felt there was nothing in this thing, and yet it has become very popular within the company and they're using it within all sorts of places. For that matter, I didn't think Twitter was going to take off either, so there you are. I mean, so in fact if you want to make money, find something that I think is not going to work and put it on the opposite, and you'll be set. All right. Okay. So rough architecture in this piece of work is large [inaudible] two pieces, one part that's doing the data collection from the Twitter part and the other part that's actually doing the major analysis and visualization stuff, which is the thing that gets pushed out to the browser. So the data collection part is -- uses one of the standard API provided by Twitter, so you're allowed to ask for all tweets containing one of up to 400 keywords, or if you want to you could actually get 1 percent of all tweets, or if you want to pay a lot of money you can get a lot more than that. But there are various interfaces that Twitter can provide you. And the data is stored permanently, so we can do this historical checking back in time and looking at things that occurred in the past. The other part is that this upper [inaudible] up here which is doing the analysis of the visualization. Basically it starts with the raw tweets, it does some [inaudible] analysis very cursory, trying to get some information, some way of putting these tweets together, uses that to kind of construct a graph of relationships, and then does this business about the analysis and the drawing and the clustering and the map and stuff. So what I want to do with the rest of the talk is more or less dive into these things in a little bit more detail. Okay. As far as the semantic analysis part goes, as with any of these things, the first thing you have to do is clean the data. All data is dirty. It's going to have stuff you don't really want to use because that's going to lead you astray. So you have to remove all the various markup notations and this thing, all the things telling you about retweets and references to other people. URLs need to be removed because typically they're truncated and they don't have a lot of good information in them. And also these things are going to have lots and lots of stop words that there's just no semantic information whatsoever, so you've got to get rid of the's and a's and and's and that type of stuff. Okay. Once that's done, then we want to construct the similarity matrix on the -- for the data to construct a graph on the various tweets making relationships. The process we use is the term frequency inverse document frequency. It's a mouthful. All right. It's actually very simple. It's basically the term frequency is a fraction of times that a particular term appears in a document, and the inverse document frequency just takes a look at this other term of basically how many times a particular document contains a term over all the documents in that particular class. And then the tf-idf number is simply the product of those two things. And to get a document similarity we can simply take the cosign of the vector where you hold one of the -- two documents, you look at all the vector form by over T, take the cosine, and that gives you a notion of similarity. This is a vary dense matrix, and so to kind of clean it out a bit and not have too many edges we put in a threshold, found that .2 works kind of nicely to get rid of the really weak edges. You don't want too much cluttered. We did look at a more sophisticated technique called latent Dirichlet allocation. This is a much more sophisticated, potentially more accurate, approach to getting a similar matrix. But we basically found that it tended to be too sophisticated. Especially in this case you're dealing with very small messages, all in 140 characters, so you can't say much. So you can't rely on too much information. It's like when you're trying to do decryption. If you only have a bit of text, you can't really do a good job. Need a lot of text to make it worthwhile. And same thing here. This thing is too expensive and it tends to get misleading clusters, whereas a simple -- this certainly guarantees that any messages within the same cluster at least share some terms. Okay. So the next step then is to take the similarity matrix and use that to get the obvious graph. So if there's a non0 value on the entry of the matrix, that's an edge. All right. Here we're starting to get toward the hardware. We're looking at the amount of space we have on the display to use. And so we figure you typically see up to 500 tweets per a given window in a browser. That's where it's going to be displayed. Now, the layout part at this point is done using a combination of multidimensional scaling, and then to do overlap removable we'll use the algorithm that was described in graph drawing four years ago. And then finally we take -- once we got this layout and all this thing moved, we take this layout and use that to create a graph, a map very similar to the one described earlier. And that was from based on this GvMap algorithm that the Stefan [inaudible] described in this article here. So what you've got something like that, you saw earlier. So this is the -- you have basically the set collection of countries, and this would have been water otherwise and just put it together in that thing. Up to this point, this has been totally a static description of the algorithm. But obviously this is the stream data coming through. So we have to handle that. And what we do is basically update the information roughly K minutes. We found -- we started with 1. 1 was about [inaudible] too much, so we went to about five minutes, that seems about right. And we're trying to preserve the users' mental map. And so do that, we, first of all, start by rerunning multidimensional scaling using the previous positions, or if we enter new tweets in, we use a position related by the average of the neighbors. And that gives a fairly stable display in terms of the relative positioning. However, you do tend to get some things like rotations and other things that come involved, and we want to -- we want to keep it very stable, so we have to figure out how to get it back to what it was before. And to handle that we do a Procrustes transformation. So basically we're looking for a scaling rotation and translation to kind of to take the new drawing and make it match the old drawing. And that's a well-known problem, so the solution is here where the X and Y matrices are basically the N by 2 versions of the X Is and Y Is. So this gives us the translation, this gives us the -- this gives us the rotation, this gives us translation. We actually decided just to use the scaling as 1, because if you don't do that, you tend to [inaudible] overlap again. Since the nodes are not points, would actually take the space, we need to keep them separate. And in fact for heavy streams trying to preserve the mental map actually isn't that important. If you've got lots and lots and lots of matrixes coming through, basically the topics are going to come and go. And so you almost have to every so often just toss out the entire thing and start over again anyway, so this is more important for ones we're changing more gradually over time, like visualization, which I would have showed you. Okay. So there's one thing that we still need to handle in this thing, and when we took that set, that threshold to .2, that oftentimes takes away enough edges and you can actually disconnect a graph. The graphs I showed there wasn't fully connected. So you had these countries which really aren't connected and you want to pack it in there. And we have to figure out a way to handle that, because as time goes on, you have this problem that, okay, well, you now redraw the graph and now you've got this overlapping, well, you now have to remove that somehow. And the standard packing algorithms don't necessarily preserve the relationships. Like, for example, here this is to the -- here the [inaudible] to the left and this all gets messed up. What we'd like to do is be able to take these components and reposition them so we have a rough -- we moved overlap, but we have roughly the same layout as we had earlier. Okay. And so the solution we did this is to take the PRISM algorithm we used before to remove overlap, but to extend it so it handles nonrectangular shapes. And instead of just worrying about getting rid of overlap, we're also going to be using it to get rid of space, to pack things more tightly. Okay. So it's done reasonably simply. Instead of just using rectangles, we use polyominos to represent the nodes and edges and use that for collision detection. We want to kind of reduce the graph -- the problem down to a tractable form. So to do that we use a proximity graph, in this case Delaunay triangulation for a scaffolding. So that gives a nice rigidity and this also gives you a very sparse graph to work with. In that case all we're going to do now is for each of the edges in the proximity graph we check for the N points and we trans- -- we map all the polyominos down to that line. And either there's going -so this is kind of the left-hand side of the green part, this is the right-hand side of the blue part, so either there's going to be overlap, in which case we want to push them apart, or there's going to be this extra space there and we can pull them together. So it will be one of those two situations where you either have the overlap or a gap. Okay. So how to solve that. Well, what we want to do is we now -- we set up an ideal length factor to be that extra either gap or overlap we're trying to handle, and we now want to put a new layout with these edge lengths, basically either removing the gap or expanding, get the distance. And of course for us typically, okay, we want to do something. MDS is all we know, so we pull out MDS and get the same problem and we just pack it in there and we've got the ways to solve it, we solve it. All right. This almost works. Slight problem is that as with most of these things, if you just apply it straightforward, you get too fast to moves and you'll break up the proximity and the relationship that you had. So the way to do that is to kind of gradually add into this external information. So what we do is we damp the movements. Instead of using this raw T here, we use kind of a scale version of T and we pick these to bound it. And so we iteratively keep moving it and moving it and moving it until there aren't anymore -- any problems in the layout. So that, again, mostly works, except that once you've gotten done and you've taken care of the proximity graph, you look at the problem and you say, oh, there are still overlaps. That's because some of these things are going to have large -- they have bad aspect ratios which you didn't take care of by the proximity graph that you've instructed. And so then you have to add a little bit more extra information, so we put some more edges in here into the proximity graph to handle those situations, and then we just do this. So here is the first loop, here is the second loop, and we run it and, again, there's no proof that this works, but basically in principle and by practice it seemed to do the job. So an example. This is the algorithm we call STAPACK. This is the initial configuration. And after one iteration we get this, second iteration we get this, and finally a third iteration we have a separation. So the areas are now more or less where they were before, rough proximity, but they have been moved apart, but a lot of extra space. And if we apply that to the full display, so here it would be one situation. Now some more things pop in, so there's a slight adjustment, pretty stable, but there have been some changes. Now some more ones are coming in down here. They'll fill in some spaces, and again you have a very, very small adjustment as time goes on, as each of these comes in. And again and mentioned that after time there will be enough changes where these -- you simply can't have stability anymore and you throw up your hands and say, okay, forget it, I'm going to start afresh and you just dump everything on the floor and start with a new display. Anyway, so as I mentioned before, you can do your own demo. Please go to this site and try that. And then we have the various versions, and you can try the various options and play with them. It's available. And we've done various changes to it. In fact, they're still being worked on. So, in fact, it may even break when you try it. Anyway, thank you. [applause] >> Lev Nachmansen: Questions for Emden? >>: [inaudible] the edges are not very readable. Do you consider them not so important? >> Emden Gansner: Yeah. Basically you can zoom in and get more detail on these things. You can click on the things and do exploration down that. But at this high level here we don't view this as being particularly important. The main thing is the clustering at that point. But, yeah, if you want to go into -- delve into more detail, that's possible. >>: I like it. If you're really looking at temporal sequences over a long time, is there ways that you can identify critical moments to look at? >> Emden Gansner: I guess the definition would be what would be critical. One thing we have seen is you will in fact -- I mean, one easy way is if you look at the timeline on the left, every so often you will see a big peak, and that's when something important has happened, and there is ->>: So there may be [inaudible] and turns, so that may be one indicator of a particular frequency. The other sort of thing is what happens, as I may have missed it, but do clusters reform regroup and split? >> Emden Gansner: Yeah, there will be this constant motion of clusters. Because as tweets come in, whole topics will disappear, new topics will form, some tweets from one cluster will now in fact be part of another cluster. Yeah. That will be a constant change in what's going on. >>: Yeah. Okay. All right. >> Emden Gansner: And certainly needs to be -- if you're really trying to analyze the data, you'll need lots of other tools that you'll want to run in conjunction with this thing. >>: So what's the purpose for the TF/IDF scores, just the history or all that you've seen so far? >> Emden Gansner: Yeah, there's a window. We use a particular window. I forget the exact size of it. But, yeah, you keep shifting that window. And then you're doing also that. And even the summations we use for the -- for the timeline, if you pick a point and you want to see the keywords to that particular moment in time, that's in fact a totally different calculation of the TF/IDF based on another window. >>: I'd like [inaudible] I'd like to ask a question on behalf of the T1 people in the audience. So the map is essentially a planar graph, so why don't you use planarity-based methods? >> Emden Gansner: Because the map isn't -- well, I mean, the map isn't a planar graph. >>: [inaudible] you're saying if you think of the countries of concern. >> Emden Gansner: Oh, yes. But the underlying -- the underlying tweets themselves are very heavily connected and very nonplanar. >>: They represent [inaudible] time. >> Emden Gansner: The representation [inaudible] all the countries. Yes. So what would you have us do? >>: [inaudible] planar graph and [inaudible] four more years of planar graph. >> Lev Nachmansen: Other question. I forgot to mention that this is indeed the T2 track paper, and in fact the best paper award was given to that paper, so congratulations. >> Emden Gansner: Thank you. [applause]

Document 17864885

Related documents

Products

Support

Document 17864885

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib