Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie Bao Chi-Yin Chow Mohamed F. Mokbel Department of Computer Science and Engineering University of Minnesota – Twin Cities Wei-Shinn Ku Department of Computer Science and Software Engineering Auburn University What is Range NN Queries Region • k-Range NN Queries in Euclidean Space – Given a spatial region, find the k nearest objects to every points within the region – E.g., Find the nearest hotel to a shopping mall • k-Range NN Queries in Road Networks – Given a set of road segments, find the k nearest objects to every points on the road segments 2 Usages of Range NN Queries • Uncertain locations – Measurement imprecision - due to the limitation of the underlying positioning techniques, e.g., 2G/3G and Wi-Fi – Sampling imprecision - due to continuous motion, network delays, and location update frequency iPhone's 3G Positioning • Privacy-preserving queries 3 5-Anonymous Area – Users do not want to reveal their exact location information to service providers – Their locations are blurred into spatial areas Related Works for k-RNN Queries • K-Nearest Neighbor in Road Networks – Query processing with pre-computed information Incremental Network Expansion (INE): a best first expansion over the road networks [Papadias et al., VLDB 2003] – Query processing with pre-computed information Use extra pre-computed quad-tree indexes to calculate the distances [Samet et al., SIGMOD 2008] • K-Range Nearest Neighbor in Euclidean Space – Pre-computed Voironi Diagrams [Chow et al., SSTD 2009] • K-Range Nearest Neighbor in Road Networks – Range Query + INE for every boundary node [Wang and Liu, PVLDB 2009] 4 Motivating Example • Computational redundancy in the existing solution – Range Query + Multiple kNN Queries [Wang and Liu, PVLDB 2009] k-NN for D k-NN for B Range k-NN for Search F Total number of road segments searched: 3 + 2 + 5 + 6 = 17 Total number of the road segments in the map: 6 Redundancy ratio: (17 - 6) / 6 = 183% (Worse if more boundary points) • Can we provide the results without the computational redundancy? 5 Problem Definition • Given: – A undirected graph G=(V, E) as road networks – Set of objects O – A query region R (a set of road segments) – A K value • Find: – Answer set A from O such that A contains the Knearest objects of every point in R based on the network distance in G • Objective: – Provide A without computational redundancy 6 Efficient k-RNN Query Processing • Step 1: Inside Query Step • Step 2: Outside Network Expansion Step – Multiple searching queues – Stop after closest node is searched – Switch to the queue with the smallest searched distance – Termination condition: covers the distance of its kth object Example 2-RNN 1st iteration Search from A Answer Set P1, P2 7 2nd iteration Search from B Answer Set P1, P2 B P1 P2 C Road Segment Set (Range) rd th 3 iteration Search from C Answer Set P1, P2 4 iteration Search from C Answer Set P1, P2, P3 P3 5th iteration Search from B Answer Set P1, P2, P3 Distance Calculation • Case 1: By a pre-computed shortest path table – Fast but more storage • Case 2: Calculation on the fly – Keep the distance information as the searching expands • Tradeoff between storage and speed 8 Search collision! A B E A 0 1 2 B 1 0 3 E 2 3 0 C 3 4 2 5 D 5 4 6 P1 2 1 4 P2 4 3 5 Experimental Results • Evaluate our algorithm without pre-computed results (KRNN-E), with pre-computed results (KRNN-F) • Baseline algorithm: [Wang and Liu, PVLDB 2009] • Road networks (Hennepin county, Minnesota, US) • 39,513 nodes and 54,444 road segments Parameter settings 9 Parameters Default Value Range K value 10 1 to 20 Number of Objects 600 200 to 1000 Query region size (ratio over total space) 0.018 0.002 to 0.050 Comparison with baseline(1/2) a) Impact of different k values b) Impact of different total objects on the map c) Impact of different query region size 10 Comparison with baseline(2/2) – Uniform distribution – Normal distribution • SD is the standard deviation to simulate the hot spot locations like downtown area Query Processing Time (s) • Impact of different distribution of the data objects 80 70 Baseline KRNN-F 60 50 KRNN-E 40 30 20 10 0 Uniform SD=1 SD=0.1 SD=0.01 SD=0.001 Different POI distributions 11 Tradeoff between storage and performance • Tuning parameter P – The percentage of the shortest distance table – Warm up process with 1000 k-RNN queries – Full size of the table is 980 MB 12 Conclusion • An efficient algorithm for k-Range Nearest Neighbor (k-RNN) queries in road networks without computational overhead Privacy preserved applications Uncertain locations • Experiment evaluation – Our solution outperforms the baseline algorithm – Tuning parameter P achieves a tradeoff 13 Q&A 14