Approximate Shortest-Path Computation in Networks

advertisement
NII International Internship Project:
Approximate Shortest-Path Computation in Networks
Supervisor: Michael HOULE, Visiting Professor
Scheduling and other optimization problems in transportation networks often involve
the computation of shortest paths within an underlying graph structure, where nodes
represent locations and edges represent the transport links between them. Each edge is
assigned a weight denoting the cost of using the transport link. Classical methods for
computing shortest paths, such as Dijkstra’s algorithm [1], often require many if not
most of the graph nodes to be visited, leading to substantial computation costs when
the network is large. However, some optimization heuristics may still perform well
when the paths computed are not necessarily the shortest possible.
The overall objective of this project is to design and implement an efficient and
effective search structure for approximate shortest-path computation in edge-weighted
graphs, by adapting an existing practical index for approximate similarity queries of
large multi-dimensional data sets: the spatial approximation sample hierarchy
(SASH) [2]. A SASH is a multi-level structure of random samples, recursively
constructed by building a SASH on a large randomly selected sample of data objects,
and then connecting each remaining object to several of their approximate nearest
neighbors from within the sample. Queries are processed by first locating
approximate neighbors within the sample, and then using the pre-established
connections to discover neighbors within the remainder of the data set. The SASH
index relies on a pairwise distance measure, but otherwise makes no assumptions
regarding the representation of the data. The SASH has been shown to be effective for
query-by-example operations on protein sequence, image, and text data sets, including
one consisting of more than 1 million vectors spanning more than 1.1 million terms –
far in excess of what spatial search indices can handle efficiently. For sets of this size,
the SASH can return a large proportion of the true neighbors roughly 2 orders of
magnitude faster than sequential search.
The specific goals of this project are:
 To adapt the hierarchical SASH construction to graph data, where the multi-level
structure of random samples is replaced by a hierarchy of induced subgraphs of
the full weighted graphs.
 To implement, test, and evaluate the approximate shortest path structure against
other techniques (including Dijkstra’s algorithm).
3. References
[1] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein, Introduction to
Algorithms, 2nd ed., MIT Press, 2001.
[2] M. E. Houle and J. Sakuma, "Fast approximate similarity search in extremely
high-dimensional data sets", in Proc. 21st IEEE International Conference on Data
Engineering (ICDE 2005), pp. 619-630, Tokyo, Japan, 2005.
Download