K-Range Nearest Neighbor in Road Networks

advertisement
Efficient Evaluation of k-Range
Nearest Neighbor Queries in Road
Networks
Jie Bao
Chi-Yin Chow
Mohamed F. Mokbel
Department of Computer Science and Engineering
University of Minnesota – Twin Cities
Wei-Shinn Ku
Department of Computer Science and Software Engineering
Auburn University
What is Range NN Queries
Region
• k-Range NN Queries in
Euclidean Space
– Given a spatial region, find the k
nearest objects to every points within
the region
– E.g., Find the nearest hotel to a
shopping mall
• k-Range NN Queries in Road Networks
– Given a set of road segments, find the k
nearest objects to every points on the road
segments
2
Usages of Range NN Queries
• Uncertain locations
– Measurement imprecision - due to the
limitation of the underlying positioning
techniques, e.g., 2G/3G and Wi-Fi
– Sampling imprecision - due to continuous
motion, network delays, and location
update frequency
iPhone's 3G Positioning
• Privacy-preserving queries
3
5-Anonymous Area
– Users do not want to reveal their exact
location information to service providers
– Their locations are blurred into spatial
areas
Related Works for k-RNN Queries
• K-Nearest Neighbor in Road Networks
– Query processing with pre-computed information
Incremental Network Expansion (INE): a best first expansion over the
road networks
[Papadias et al., VLDB 2003]
– Query processing with pre-computed information
Use extra pre-computed quad-tree indexes to calculate the distances
[Samet et al., SIGMOD 2008]
• K-Range Nearest Neighbor in Euclidean Space
– Pre-computed Voironi Diagrams
[Chow et al., SSTD 2009]
• K-Range Nearest Neighbor in Road Networks
– Range Query + INE for every boundary node
[Wang and Liu, PVLDB 2009]
4
Motivating Example
• Computational redundancy in the existing solution
– Range Query + Multiple kNN Queries
[Wang and Liu, PVLDB 2009]
k-NN for
D
k-NN for
B Range
k-NN for
Search
F
Total number of road segments searched: 3 + 2 + 5 + 6 = 17
Total number of the road segments in the map: 6
Redundancy ratio: (17 - 6) / 6 = 183% (Worse if more boundary points)
• Can we provide the results without the
computational redundancy?
5
Problem Definition
• Given:
– A undirected graph G=(V, E) as road networks
– Set of objects O
– A query region R (a set of road segments)
– A K value
• Find:
– Answer set A from O such that A contains the Knearest objects of every point in R based on the
network distance in G
• Objective:
– Provide A without computational redundancy
6
Efficient k-RNN Query Processing
• Step 1: Inside Query Step
• Step 2: Outside Network
Expansion Step
– Multiple searching queues
– Stop after closest node is
searched
– Switch to the queue with the
smallest searched distance
– Termination condition: covers
the distance of its kth object
Example 2-RNN
1st iteration
Search from
A
Answer Set
P1, P2
7
2nd iteration
Search from
B
Answer Set
P1, P2
B
P1
P2
C
Road Segment
Set (Range)
rd
th
3 iteration
Search from
C
Answer Set
P1, P2
4 iteration
Search from
C
Answer Set
P1, P2, P3
P3
5th iteration
Search from
B
Answer Set
P1, P2, P3
Distance Calculation
• Case 1: By a pre-computed
shortest path table
– Fast but more storage
• Case 2: Calculation on the fly
– Keep the distance information as the
searching expands
• Tradeoff between storage and
speed
8
Search
collision!
A
B
E
A
0
1
2
B
1
0
3
E
2
3
0
C
3
4
2
5
D
5
4
6
P1
2
1
4
P2
4
3
5
Experimental Results
• Evaluate our algorithm without pre-computed results
(KRNN-E), with pre-computed results (KRNN-F)
• Baseline algorithm: [Wang and Liu, PVLDB 2009]
• Road networks (Hennepin county, Minnesota, US)
• 39,513 nodes and 54,444 road segments
Parameter settings
9
Parameters
Default
Value
Range
K value
10
1 to 20
Number of Objects
600
200 to 1000
Query region size
(ratio over total space)
0.018
0.002 to 0.050
Comparison with baseline(1/2)
a) Impact of different k values
b) Impact of different total objects on the map
c) Impact of different query region size
10
Comparison with baseline(2/2)
– Uniform distribution
– Normal distribution
• SD is the standard
deviation to simulate
the hot spot locations
like downtown area
Query Processing Time (s)
• Impact of different distribution of the data
objects
80
70
Baseline
KRNN-F
60
50
KRNN-E
40
30
20
10
0
Uniform
SD=1
SD=0.1 SD=0.01 SD=0.001
Different POI distributions
11
Tradeoff between storage and
performance
• Tuning parameter P
– The percentage of the shortest distance table
– Warm up process with 1000 k-RNN queries
– Full size of the table is 980 MB
12
Conclusion
• An efficient algorithm for k-Range Nearest Neighbor
(k-RNN) queries in road networks without
computational overhead
Privacy preserved
applications
Uncertain locations
• Experiment evaluation
– Our solution outperforms the baseline algorithm
– Tuning parameter P achieves a tradeoff
13
Q&A
14
Download