Constructing Popular Routes from Uncertain Trajectories Authors of Paper: Ling-Yin Wei (National Chiao Tung University, Hsinchu) Yu Zheng (Microsoft Research Asia) Wen-Chih Peng (National Chiao Tung University, Hsinchu) Paper reviewed by: Aniruddha Desai (University of Washington,Tacoma) Applications Scope: Infer popular routes from a set of uncertain trajectories Trip Planning (Travel / Tourism) Traffic Management (Transportation) Animal Movement studies Spatial Trajectories What is a trajectory? Sequence of points: Location (Latt, Long) & Time-stamp What are the moving objects? Humans,Vehicles, Animals etc. How are the trajectories collected? Ubiquitous location acquisition technologies / devices using GPS Uncertainty and Inference Trajectories generated at low or irregular frequencies. Routes between consecutive points on trajectories are uncertain. To infer a popular route we need to find similarity between two uncertain trajectories – this is hard to measure. “RICK” Route Inference framework based on Collective Knowledge Approach: aggregate uncertain trajectories in a mutually reinforcing way: uncertain + uncertain => certain Datasets: ◦ Real datasets used for conducting extensive experiments ◦ Check-in dataset from Foursquare – 6,600 trajectories from Manhattan (3 check-ins min) ◦ 15,000 taxi trajectories in Beijing. How does it work? Rick Overview: user specified query consists of a location sequence & a time span; RICK infers the top-k popular routes that pass through these locations within given time span Region Construction Historical uncertain trajectories used to construct a routable graph in a gridded space based on spatiotemporal characteristics Grid cell size (“l”) represents granularity of inferences Data points (or grid “cells”) “spatially close” if: |x - x’| <= 1 and |y - y’| <= 1 Region Construction (cont’d…) Data points “st-correlated” (spatio-temporally correlated) if they are spatially close (Rule 1 or Rule 2) and they mutually satisfy a temporal constraint q Connection support C is of a cell pair is a threshold for connectivity in the graph. Neighbor: If the connection support of a cell pair is >= C then they are neighbors. Region Construction (cont’d…) Region: Based on the connection support (above a specified threshold value ‘C’) between individual cell pairs regions are constructed. Cell pairs are merged into regions using an efficient recursive algorithm; Time complexity: O(cnm2) Where c = minimum loop iterations n = size (cardinality) of the set of cells in the grid space m = size (cardinality) of the dataset Edge Inference After the regions are constructed we infer edges. Two types of Edges: ◦ Edges within each region ◦ Edges among regions Edge Inference (cont’d…) Each vertex represents a cell and each edge indicates a transition relationship and has two attributes: ◦ Transition support ◦ Travel time Virtual bidirected edges between cells (vertices) are generated if cells are neighbors in a region. Shortest path inference approach is used. The direction, transition supports and travel time information for edge on shortest path is stored. Redundant edges and edges whose transition support is 0 are eliminated Route Inference Two phases: ◦ Route generation ◦ Route refinement Route generation: ◦ Top-k coarse routes are discovered with the routable graph Route Inference (cont’d…) If query location can not be mapped to a graph vertex we use MINDIST (nearest neighbor algorithm) to find the cells close to the query location. Local Routes: the top-k local routes between any two consecutive cells are searched in the cell sequence by an A*-like algorithm. Route score is computed based on the range of time interval between the two query locations. Based on top-k local routes top-k global routes are searched by a branch-and-bound search approach Route Inference (cont’d…) Two-Layer Routing Algorithm Before searching for local routes region sequences are generated to reduce the search space by using a lower bound of the transition times between the regions with respect to two given cells. Thus, multiple region sequences are possible Route Inference (cont’d…) Route Refinement: Use historical data points (of trajectories that traverse the cells on the rough route) that locate in the cells on the route generated. Adopt linear regression for set of points of each cell to derive a line segment. Concatenate line segments in the order of the inferred route Performance Evaluation Inferred routes are compared against ground-truth from raw-trajectories. Two metrics used: ◦ NDTW – normalized dynamic time warping distance ◦ MD - maximum distance between inferred route and the rawtrajectory of the ground truth. Compared RICK with existing approach MPR (Most Popular Route) as a baseline Time Efficiency is tested (avg. query time 0.5 secs). RICK outperforms the baseline by generating routes 300-700m closer to the ground-truth (than the those of the baseline). Visualization of Results Visualization of the query: “Central Park - > The Museum of Modern Art - > Times Square - > Empire State Building - > SoHo”, for top-1 (most popular) route inferred by RICK Note: The route does not just connect the query locations, but passes through other attractions along the “inferred” most popular route. Strengths Thorough / Credible The authors have conducted extensive experiments on real data. Their results show that the route inference framework is effective, efficient and measurably accurate. Organized / Easy to understand The content of the paper is very well organized and can be easily understood even by a naïve reader. Illustrations: (where provided) are very effective in describing spatial concepts. Weaknesses Connection Support: Not explained sufficiently, diagrams would have been helpful explain key concept Route generated using A*-like algorithm: Not explained the role of A*-like algorithm adequately in the context of inferred route generated. NDTW: “Normalized dynamic time warping” distance is not explained adequately; diagrams would have helped explain this key performance metric better. Thank you! Q&A