Floyd Warshall with Path Recovery leading to multithreaded all pairs shortest distance Kevin Kauffman CPS149s Fall 2009 Abstract Introduction The Floyd Warshall algorithm has been around forever (or at least since Floyd and Warshall got bored on a rainy afternoon a few years back). The algorithm is a polynomial time algorithm to compute the shortest distance between all pairs of points in a set. So, basically, if you gave it a map with a bunch of cities and the distance between them, it would tell you what the shortest distance between any two were. This is superior to the more famous Dykstra’s algorithm because Dykstra figured it was ‘utter nonsense’ to calculate the distances between all the points, because you could only go from one city to the next anyway, you can’t go to all of them. The hit on the Floyd Warshall algorithm, though, is that it takes more time. While Dijkstra is happily moving along at nlogn, Floyd and his buddy warshall are hogging up the slow vehicle lane running N^3. Too bad they hadn’t invented HOV lanes, or the pair would have been home free. So, anyway, once you run your FW on your set of points, you have all your distances. What do you do now? You know that its 100 miles from New York to Philly, but you have absolutely no idea how to get there. The answer is Path recovery. I argue that with minor modifications, you can run FW using about twice as much memory, in the same runtime complexity and be able to recover the actual path which the shortest path takes between any two points. Once this is complete, I present how the idea can be extended to create a multithreaded distance algorithm which is totally awesome. Floyd Warshall The FW, as it stands now, is a way of calculating the distances between all pairs of points in a set. Seeing as there are n^2 paths, it is quite amazing that you can calculate the lengths of all those paths using only n^3 complexeity. Here’s how it works. So you start with a nice adjacency matrix with each node having an infinite distance to every other node (except its neighbors because it would be sad if your neighbors lived infinitely far away). So you set your neighbors distances. You are now ready to go. You start with an arbitrary node, and for every pair of nodes, you check whether the shortest path between those two nodes is shortened when you add in the arbitrary node in the middle. So it works like this. If A and B are 10 apart, A and C are 5 apart and B and C are 2 apart, and your arbitrary node is C, you look and say, is the path from A to B through C than my current path from A to B? If you are smart you say ‘yes’ because 7<10. In this way, you continue picking arbitrary nodes (really you just choose nodes in order….so its not really arbitrary, but it kind of is because it doesn’t matter one bit if you pick them in order or not so long as they are all picked) and checking all pairs to see if their distance is shortened when using that extra node. The code will be in the appendix by the time I finish this, but I’ll explain it anyway. Basically what you have is a triple for loop iterating over all nodes in the graph. The outer loop represents the picking of arbitrary nodes, and the inner double loop is the mechanism by which you are able to look at all pairs of points under that arbitrary node. Once this is done you have this amazing recurrence relation which does all the work for you forever. Path[j][k]=min(path[j][k], path[j][i]+path[i][k]); Basically if J and K are the two endpoints of the current path you’re looking at, and I is the intermediate point, you say, which is shorter, my current path, or the path if I go to I first and then to k. You take the shortest one. Do this n^3 times and you got yourself some shortest distances. Once you’ve run through your n^3 comparisons, what’s left is your adjacency matrix with the distances between all your points. But still you’re left with this problem that distances are great, but you don’t got any directions. Imagine if google maps, when you asked for directions from New York to Dallas just said ‘1000 miles’ Pretty useless huh? Network Routing It would appear that is section has nothing to do with FW, and you’re right. But it gave me the inspiration for my path recovery scheme. Basically with network routing, you have yourself a bunch of these routers and you’re trying to send data all between them. Now of course, in the naïve case, when you got a packet at D trying to go to V, the packet knows what turn it should take at every router. For this to work, though, you would either need to run Dijkstra’s with path recovery every time a packet came through, or you would need to have a map of every path from every your node to every other node, which undoubtedly takes up a ton of space. Thank goodness there are unnaive people in the world to come up with good ideas (like FW) so that we don’t have to endure the travesty which would have come to pass had each router in the world had to store a path to every other router in the world. (Prof. Sorin inserts “How did they cope with the storage problem”) So, what we do instead is a sort of dynamic routing. It’s dynamic in the sense that the entire path isn’t known when the packet is dispatched, but the path is figured out along the way. What happens is that each router stores a table of where to forward each packet based on its destination. So if a packet is at D and wants to go to V, the router looks up V in the table, but instead of seeing the entire path, all it gets is 1 node, the node which the packet should be forwarded to. For completeness’ sake, we’ll say it’s Q. It doesn’t really matter what letter in this example, though, because its an example. But once the packet gets to Q, it looks up V in its table and sees that it should go to J next. This continues till the packet ends up at V. (Prof. Sorin inserts “Clever!” here) Now, of course, you may or may not have realized how this relates to FW, but that fact is imminently irrelevant, as I’m about to tell you. FW with Path Recovery So as I’ve said like 3 times already, FW has a drawback in that you can’t know where you’re going, only how long it will take. But we’re about to change that. The basic premise is this: if through the calculation of the shortest path distances, we can maintain a second matrix of each where the mapping of [j][k] represents the what node you must go to next if you are at J going to K. In the case of network routing, you only care about getting from 1 node to all other nodes. But with FW, all the calculations are together, so you put all the tables together and get your NxN matrix with the values. Thus to recover a path, you do the same thing the packet in our network did, If you are trying to get to node V, you look up the node you’re at, and then pan over to the V column, that node is the next node in the chain. Then you look up V from that node and obtain the next node in the chain. You repeat this until a node points directly to V, and then if you were smart and were writing down the points as you went, you have your path. With this method, you can effectively store the path from each node to every other node in only n^2 space, which coupled with the adjacency matrix, only doubles the total amount of space FW takes up to 2n^2. Now, you might be like ‘okay, we’re done,’ but you’d be very, very wrong. So we have this great matrix, but how in the world do we update the values without increasing the complexity? TRIVIAL!!! The answer lies in cleverness. So we sort of use the n^3 comparisons from before and make them do some extra work. (sort of) So here’s how it goes. We start with our adjacency matrices with most of the values infinite and the few neighboring values filled in. Here we can start filling in the path table (as I’ll call it from now, or P-tabby for short). For each pair of neighbors, before we run the algorithm, we know that if you’re at the node, and you have a path to it, the path is direct. So if A and B are neighbors to start, in the adj. matrix we have the distance between them, and at [a][b] in the path table we have b, because the next node in the path from a to b is b, and at [b][a] we have a for the same reason. If this last sentence blew your mind, please stop reading, because the rest of the paper will blow much more. So we fill in these trivialities and leave the entries for which the path length is infinite as null. Now we can actually start running the algorithm. So with our recurrence relation at the middle of the algorithm, there are two options: a, the path we have is the shortest path still, or b, there is a new shortest path through this other node. The first case is trivial because nothing changes in either the adjacency matrix or in P-Tabby. The second case is more interesting because things do change. The first thing that changes is the distance in the adjacency matrix is updated to be the new path. Obtaining this value is trivial because you must calculate it before the comparison anyway, and it simply is looking up two values in the table. The update in the path table is a little more tricky. Therefore we leave this exercise up to the reader (or the solution is outside the scope of this paper, take your pick). Basically the value stored in the P table can be the arbitrary node I, it could stay the same, or it can change to some other node. Figuring out which to do is at the heart of the problem. So what you do is since you know that the new shortest path is the combination of the path from J to I and from I to K, you really just look at the value in the p table at [j][i] and copy it into the value of [j][k] since you know you’re going to visit I, and your shortest path is EXACTLY the path from j to I and I to k, the first node on the path from j to k must then also be the first node on the path from j to i. In this way you update the path table without any extra calculation, and since the complexity of moving values around the matrix is 1, your overall complexity is still n^3. When writing the code, you have to be sure to predicate correctly so as to avoid doubling your runtime by adding a hidden second comparison (perhaps in an if statement). Once again, the code is in the appendix. I don’t know the figure number yet because I haven’t made it yet. If you can’t figure out that it’s probably the second figure of code, then you do not deserve the knowledge in this paper and you should stop reading. Basically in the relation at the middle of the code, you have to finagle a little bit. Basically you eliminate the min statement and put in an if then flow. If the distance through I is less than the original distance THEN change the distance to the distance through I and replace the value in the P table. You don’t do anything otherwise. For efficiencies sake, it makes sense to calculate the sum of the path through I before the if statement to avoid having to do the addition twice. And then after you’ve run the code, you have your path table, your distance table, and you know how to recover the paths. I’d say you’re done. (Prof. Sorin inserts “This is one of those ideas I wish I’d thought of.”) Multithreaded Paths Unfortunately I’m not an expert on java thread libraries, so I’m not sure I could write the concurrent program to do this (though I could in C++), but this section is a discussion about how to use the previous two sections on network routing and path reconstruction to develop a multithreaded implementation of a all pairs shortest distance which could be optimized to run on any number of cores. The heart of FW is that distances can propagate out of the arbitrary nodes to all other nodes as you move through the for loops. In a multithreaded scheme, instead of sequentially propagating the values from each node to the next, you do them all at the same time. So you have your adjacency matrix to start, but once you split up the threads, this matrix and the path table are held individually by each thread, each containing only information pertinent to that particular thread. So thread a (representing node a) only contains information about paths through it instead of the entire table. The actual adjacency matrix must still exist, though, as a way of moving information between the threads. It is also important that each thread maintains information about its neighbors. So here is basically how it works. You start with your adjacency matrix, and then you create a thread for each node. Each thread knows its neighbors and the distance to them. It also knows the node which starts the path to any other node in the set. At start, it only has the neighbors as nodes in the paths with the rest being null, because it doesn’t have a path, and the distances to the other nodes is infinite. At each timestep, each thread forwards its distance table to its neighbors. This forwarding thought is slightly quaint and doesn’t necessarily work in a coding atmosphere like it does in a network solution. So what really happens is that it takes its distance table and copies the information into its location in the adjacency matrix. It then pulls out of the adjacency matrix the information of its neighbors when it is needed. Then, it has a for loop which iterates over each node in the graph, I, then a for loop iterating over each of the threads neighbors, n. Then we have a nice relation which does the work. Distance[i]=min(distance[i],distance[n]+n.distance[i]) Basically what this does is it says if the distance from one of my neighbors to a node plus my distance to the neighbor is LESS than my current distance to the node, update my distance. In this case, you would also update your threads path table to know that to get to node I, you must now go through node n instead of whatever it was previously. As this work is completing, you are constantly putting new information in the adjacency matrix and pulling the most recent information out as it is used. With this method running on every thread, the correct distances will propagate throughout the graph till the path and distances are known for each node. This method is not only better because it is optimized for parallel applications and multicore processors, but it actually uses fewer comparisons. With the single threaded FW, you always compute n^3 comparisons. In the multithreaded case, you have n threads which iterate over all n nodes which then iterate over its neighbors instead of all threads. This looks great on paper, we reduced the comparisons to some fraction of n^3. But, there is a problem. Since we are running them all at the same time, there is no guarantee that information will all propagate in an ideal manner like in FW. Therefore we must run each thread multiple times until the values for each thread completely stabilize. Therefore this method will run much slower on a single core machine, and likely will take much longer always unless you have a large number of processors with lots of cycles to waste. This is great in large systems though because each thread only needs to be aware of its neighbors, which in some instances may be a win. If this was actually an academic paper, I would implement both the single and multithreaded schemes on a variety of cores and compare the results for varying types of graphs, but its 4:16 and this is due at 6, so it’s a no go. Maybe in grad school. Conclusions Floyd Warshall with path reconstruction is easy and a good thing to know in case it shows up in a programming contest. Its such a great idea that perhaps someone should put it on Wikipedia. Acknowledgements I’d like to thank Owen Astrachan for inspiring this paper. I probably wouldn’t have written it had he not assigned it. I’d also like to thank Dan Sorin for providing the interjections throughout the paper, which he didn’t actually provide but undoubtedly would have had he read this paper. Lastly I’d like to thank Romit Choudhury for teaching me about networks. I’d lastly like to thank google images for the abstract. References [1] http://en.wikipedia.org/wiki/Floyd– Warshall_algorithm [2] http://en.wikipedia.org/wiki/Dijk stra's_algorithm Appednix public int[][] FW(int[][] adj){ int size=adj[0].length; for(int i=0;i<size;i++){ for(int j=0;j<size;j++){ for(int k=0;k<size;k++){ adj[j][k]=Math.min(adj[j][k],adj[j][i]+adj[i][k]); } } } return adj; } public int[][] FWwPR(int[][] adj){ int size=adj[0].length; int[][] p=new int[size][size]; for(int j=0;j<size;j++){ for(int k=0;k<size;k++){ if(adj[j][k]!=Integer.MAX_VALUE){ p[j][k]=k; } } } for(int i=0;i<size;i++){ for(int j=0;j<size;j++){ for(int k=0;k<size;k++){ int temp=adj[j][i]+adj[i][k]; if(temp<adj[j][k]){ adj[j][k]=temp; p[j][k]=p[j][i]; } } } } return p; }