SMashQ: Spatial Mashup Framework for in Time-Dependent Road Networks

advertisement
Distributed and Parallel Databases manuscript No.
(will be inserted by the editor)
SMashQ: Spatial Mashup Framework for k-NN Queries
in Time-Dependent Road Networks
Detian Zhang · Chi-Yin Chow ·
Qing Li · Xinming Zhang · Yinlong Xu
Received: date / Accepted: date
Abstract The k-nearest-neighbor (k-NN) query is one of the most popular
spatial query types for location-based services (LBS). In this paper, we focus on
k-NN queries in time-dependent road networks, where the travel time between
two locations may vary significantly at different time of the day. In practice,
it is costly for a LBS provider to collect real-time traffic data from vehicles
or roadside sensors to compute the best route from a user to a spatial object
The work described in this paper was partially supported by grants from City University of
Hong Kong (Project No. 7002686 and 7002606) and the National Natural Science Foundation
of China under Grant No. 61073185 and 61073038.
Detian Zhang
Department of Computer Science and Technology, University of Science and Technology of
China, Hefei, China
Department of Computer Science, City University of Hong Kong, Hong Kong, China
USTC-CityU Joint Advanced Research Center, Suzhou, China
E-mail: [email protected]
Chi-Yin Chow
Department of Computer Science, City University of Hong Kong, Hong Kong, China
E-mail: [email protected]
Qing Li
Department of Computer Science, City University of Hong Kong, Hong Kong, China
USTC-CityU Joint Advanced Research Center, Suzhou, China
E-mail: [email protected]
Xinming Zhang
Department of Computer Science and Technology, University of Science and Technology of
China, Hefei, China
USTC-CityU Joint Advanced Research Center, Suzhou, China
E-mail: [email protected]
Yinlong Xu
Department of Computer Science and Technology, University of Science and Technology of
China, Hefei, China
USTC-CityU Joint Advanced Research Center, Suzhou, China
E-mail: [email protected]
2
Detian Zhang et al.
of interest in terms of the travel time. Thus, we design SMashQ, a server-side
spatial mashup framework that enables a database server to efficiently evaluate
k-NN queries using the route information and travel time accessed from an
external Web mapping service, e.g., Microsoft Bing Maps. Due to the expensive
cost and limitations of retrieving such external information, we propose three
shared execution optimizations for SMashQ, namely, object grouping, direction
sharing, and user grouping, to reduce the number of external Web mapping
requests and provide highly accurate query answers. We evaluate SMashQ
using Microsoft Bing Maps, a real road network, real data sets, and a synthetic
data set. Experimental results show that SMashQ is efficient and capable of
producing highly accurate query answers.
Keywords Spatial mashups · location-based services · k-nearest-neighbor
queries · time-dependent road networks · Web mapping services
1 Introduction
With the ubiquity of wireless Internet access, GPS-enabled mobile devices and
the advance in spatial database management systems, location-based services
(LBS) have been realized to provide valuable information for their users based
on their locations [15, 17]. LBS are an abstraction of spatio-temporal queries.
Typical examples of spatio-temporal queries include range queries (e.g., “How
many vehicles in a certain area”) [9, 20, 24] and k-nearest-neighbor (k-NN)
queries (e.g., “What are the three nearest gas stations to my car”) [13,20,22,27].
The distance between two locations in a road network is measured in terms
of the network distance, instead of the Euclidean distance, to consider the physical movement constrains of the road network [24]. It is usually defined by the
distance of their shortest path. However, this kind of distance measure would
hide the fact that the user may take longer time to travel to his/her nearest
object of interest (e.g., restaurant and hotel) than other ones due to many realistic factors, e.g., heterogeneous traffic conditions and traffic accidents. Travel
time (e.g., driving time) is in reality a more meaningful and reliable distance
measure for LBS in the road network [6, 7, 10]. Figure 1 depicts an example
where Alice wants to find the nearest clinic for emergency medical treatment.
A traditional shortest-path based NN query algorithm returns Clinic X. However, a driving-time based NN query algorithm returns Clinic Y because Alice
will spend less time to reach Y (3 mins.) than X (10 mins.).
Since travel time is highly dynamic, e.g., the driving time on a segment of
I-10 freeway in Los Angeles, USA between 8:30AM to 9:30AM changes from
30 minutes to 18 minutes, i.e., 40% decrease in driving time [6], it is almost
impossible to accurately predict the travel time between two locations in the
road network based on their network distance. The best way to provide realtime travel time computation is to continuously monitor the traffic in the road
network; however, it is difficult for every LBS provider to do so due to very
expensive deployment cost.
Title Suppressed Due to Excessive Length
3
Clinic X
Distance: 500 meters
Travel time: 10 mins
Alice: Where is my
nearest clinic?
Clinic Y
Distance: 1000 meters
Travel time: 3 mins
Fig. 1 Shortest-path-based nearest object X versus travel-time-based nearest object Y .
A mashup, one of the key Web 2.0 technologies, is a web application that
combines data, representation, and/or functionality from multiple web applications to create a new application [30]. A server-side spatial mashup (or GIS
mashup) provides a cost-effective way for a database server to access the route
and travel time information between two locations in the road network from
external Web mapping services, e.g., Google Maps [11], Yahoo! Maps [32],
Microsoft Bing Maps [19] and government agencies. Web mapping service
providers have many sources to collect data (e.g., historical traffic statistics and
real-time traffic conditions) to estimate the travel time and direction between
any two locations. The application scenario of this work is that location-based
service (LBS) providers can subscribe direction services from Web mapping
services through their APIs to provide services for their users based on current traffic conditions. In other words, our proposed framework enables the
LBS provider to outsource the real-time traffic monitoring and traffic analysis
operations to the Web mapping service provider, and thus, the LBS provider
can focus on its core business activities and reduce its operational costs.
However, existing spatial mashups suffer from the following limitations.
(1) It is costly to access direction information from a Web mapping service, e.g.,
retrieving travel time from the Microsoft MapPoint web service to a database
engine takes 502 ms while the time needed to read a cold and hot 8 KB buffer
page from disk is 27 ms and 0.0047 ms, respectively [18]. (2) There is a charge
on the number of requests issued to a Web mapping service, e.g., Google Maps
allows only 2,500 requests per day for evaluation users and 100,000 requests
per day for business license users [12]. A location-based service provider needs
to pay more to have higher usage limits. (3) The use of retrieved route information is restricted, e.g., the route information must not be pre-fetched, cached,
or stored, except only limited amount of content can be temporarily stored for
the purpose of improving system performance [12]. (4) Existing Web mapping
services only support primitive operations, e.g., the travel direction and time
between two locations. A database server has to issue a large number of exter-
4
Detian Zhang et al.
nal requests to collect small pieces of information from the supported simple
operations to process relatively complex spatial queries, e.g., k-NN queries.
In this paper, we design SMashQ, a framework that uses a server-side
Spatial Mashup paradigm to process k-NN Queries in the time-dependent
road network. Given a set of objects R and a k-NN query with a user U ’s
location, SMashQ finds at most k objects from R with the shortest travel
time from U ’s location. The objectives of SMashQ are to reduce the number
of external Web mapping requests and provide highly accurate query answers.
To accomplish these objectives, SMashQ first prunes the objects that are definitely not part of a query answer from R to find a set of candidate objects for
the query answer. Then, three shared execution optimizations, namely, object
grouping, direction sharing, and user grouping, are designed for SMashQ. The
main idea of the object grouping optimization is to group candidate objects to
a set of intersections I. For each intersection I ∈ I, one external Web mapping
request is issued to retrieve the route and travel time from U to I, and then
the travel time from I to each of its objects is estimated. To further reduce the
number of external requests, the direction sharing optimization aims at using
the route and travel time information from U to an intersection in I to find
that from U to other intersections in I. The user grouping optimization groups
queries to a set of intersections. For each intersection, its queries are processed
at the same time, as they can share the route and travel time information from
the intersection.
To evaluate the performance of SMashQ, we compare SMashQ with two
basic approaches using a real road network, two real-world data sets, synthetic data sets, and Microsoft Bing Maps [19]. Experimental results show
that SMashQ outperforms the basic approaches in terms of the number of
external Web mapping requests and it provides highly accurate travel time
estimation and k-NN query answers.
The remainder of this paper is organized as follows. Section 2 highlights
related work. Section 3 describes the system model. Section 4 presents an
exact algorithm. Section 5 describes the three shared execution optimizations
for SMashQ. Experimental results are analyzed in Section 6. Finally, Section 7
concludes this paper.
2 Related Work
In this section, we first highlight related work in the areas of spatial queries in
distributed systems and shared execution for spatial queries. Then, we focus
on the related work in the areas of spatial queries in road networks and spatial
queries with external data or services.
Spatial queries in distributed systems. Location-based query processing algorithms have been designed for various distributed systems, e.g.,
client-server systems [20, 23], peer-to-peer systems [26], mobile peer-to-peer
systems [4,34], and wireless sensor networks [8,31]. In the client-server systems,
a centralized database server stores all objects and employs query execution
Title Suppressed Due to Excessive Length
5
optimizations to process location-based queries issued by mobile users [20,23].
A distributed spatial index structure has been designed for the peer-to-peer
system to process location-based queries [26]. When a distributed server does
not have enough spatial object information to process a location-based query,
it uses the spatial index structure to enlist other peer servers that store the
required information for help. Mobile peer-to-peer systems enable users to
cache their nearby spatial objects to process location-based queries without
enlisting any server for help [4,34]. Wireless sensor networks (WSNs) are more
challenging than peer-to-peer systems, because WSNs have many limitations
(e.g., limited storage and computational capacity, scarce battery power, and
restricted communication ranges and bandwidth). Spatial query processing algorithms designed for WSNs have to be optimized for such limitations [8, 31].
Shared execution for spatial queries. Shared execution techniques
are one of the widely used query execution optimizations for spatial queries. A
shared execution scheme has been used to enable a system to scale up to a large
number of concurrent continuous range and k-nearest-neighbor queries [20,21].
The basic idea is to group similar queries together and then evaluate a group
of queries at the same time through a spatial join between moving objects
and queries. Another shared execution scheme has been designed to optimize
the query execution by clustering moving objects and queries based on their
common spatial and temporal properties [23]. However, none of existing shared
execution techniques is designed for sharing direction information retrieved
from a Web mapping service to process spatial queries.
Spatial queries in road networks. Existing location-based query processing algorithms can be categorized into two main classes: location-based
queries in time-independent road networks and in time-dependent road networks. (1) Time-independent road networks. In this class, location-based
query processing algorithms assume that the cost or weight of a road segment,
which can be in terms of distance or travel time, is constant (e.g., [14, 24, 25]).
These algorithms mainly rely on pre-computed distance or travel time information of road segments in road networks. However, the actual travel time of
a road segment may vary significantly during different times of the day due to
dynamic traffic on road segments [6,7]. (2) Time-dependent road networks.
The location-based query processing algorithms designed for time-dependent
road networks have the ability to support dynamic weights of road segments
and topology of a road network, which can change with time. Location-based
queries in time-dependent road networks are more realistic but also more challenging. George et al. [10] proposed a time-aggregated graph, which uses time
series to represent time-varying attributes. The time-aggregated graph can be
used to compute the shortest path for a given start time or to find the best
start time for a path that leads to the shortest travel time. In [6,7], Demiryurek
et al. proposed solutions for processing k-NN queries in time-dependent road
networks where the weight of each road segment is a function of time.
Spatial queries with external data or services. In this paper, we focus
on k-NN queries in time-dependent road networks. This paper distinguishes
itself from previous work [6, 7, 10] that it does not model the underlying road
6
Detian Zhang et al.
network based on different criteria. Instead, it employs third party Web mapping services, e.g., Microsoft Bing Maps, to compute the travel time of road
networks and provide the route and travel time information through serverside spatial mashups. Since the use of external Web mapping requests is more
expensive than accessing local data [18], we propose three shared execution
optimizations for SMashQ to reduce the number of such external requests and
provide highly accurate query answers.
There are some query processing algorithms designed to deal with expensive attributes that are accessed from external Web services (e.g., [1, 2, 18]).
To minimize the number of external requests, these algorithms mainly focused
on using either some cheap attributes that can be retrieved from local data
sources [1, 18] or sampling methods [2] to prune a whole set of objects into
a smaller set of candidate objects, and they only issue absolutely necessary
external requests for candidate objects. The closest work to this paper is [18],
where Levandoski et al. developed a framework for processing spatial skyline
queries by pruning candidate objects that are guaranteed not to be part of a
query answer and only issuing an external request for each remaining candidate object to retrieve the travel time from the user to the object. However,
all previous work did not study how to use shared execution techniques and
the road network topology to reduce the number of external requests that is
exactly the problem that we solved in this paper.
3 System Model
In this section, we describe our system architecture, road network model, and
problem definition. Figure 2 depicts the system architecture of SMashQ that
consists of three entities, users, a database server, and a Web mapping service
provider. Users send k-NN queries to the database server at a LBS provider
through their mobile devices at anywhere, anytime. The database server processes queries based on local data (e.g., the location and basic information
of restaurants) and external data (i.e., route and travel time information) accessed from the Web mapping service provider. In general, accessing external
data is much more expensive than accessing internal data [18].
A road map is modeled as a graph G = (V, E) comprising a set V of
vertices with a set E of edges, where each road segment is an edge and each
intersection of road segments is a vertex. For instance, Figure 3a depicts a
1: Location-based
queries
Users
SMashQ
2: Web mapping
requests
3: Route and
travel time
4: Answers
Database Server
(e.g., restaurant data)
(Low Cost Local Data)
Fig. 2 System architecture.
Web Mapping
Service Provider
(High Cost External Data)
Title Suppressed Due to Excessive Length
7
I1
I2
I5
I6
I9
I10
I13
(a) A road map
I14
I3
I4
I7
I8
I11
I12
I15
Road segment
Intersection
(b) A graph model
Fig. 3 Road network model.
real road map that is modeled as an undirected graph (Figure 3b), where an
edge represents a road segment (e.g., I1 I2 and I1 I5 ) and a square represents
an intersection of road segments (e.g., I1 and I2 ).
Our problem can be defined as follows. Given a set of objects R and a NN
query Q = (λ, k) from a user U , where λ is U ’s location, and k is the maximum
number of returned objects, SMashQ returns U at most k objects in R with
the shortest travel time from λ based on the route and travel time information
accessed from a Web mapping service provider, e.g., Microsoft Bing Maps [19].
Since the access to the Web mapping service is expensive, our objectives are
to reduce the number of external Web mapping requests and provide highly
accurate a k-NN query answer.
4 Exact Algorithm
In this section, we present an exact algorithm that produces accurate answers
for k-NN queries. Since there could be a very large number of objects in a
spatial data set, it is extremely inefficient to issue an external Web mapping
request to retrieve the route and travel time from a user to each object in the
data set. To this end, the exact algorithm employs the existing network expansion technique designed for processing range queries in a road network [24]
and existing pruning techniques designed for query processing with external
data [1,2,18] to minimize the number of external Web mapping requests. Given
an underlying road network modeled as a graph G = (V, E), a spatial object
set R, a query Q = (λ, k) from a user U , and U ’s maximum movement speed
U.max speed, the exact algorithm performs an incremental search from U to
find an accurate k-NN query answer.
8
Detian Zhang et al.
Algorithm 1 Exact Algorithm
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
input: G = (V, E), R, Q = (λ, k), U.max speed
Initialize a current answer set A, a node set S, and a priority queue Q
distmax ← ∞; Tmax ← ∞
Ni ← the nearest node from U
N Dist ← Dist(U, Ni )
Q.enqueue(Ni , N Dist); S ← {Ni }
while Q is not empty do
(Ni , N Dist) ← Q.dequeue()
if A contains k objects and N Dist > distmax then
break
end if
if Ni is an object (i.e., Ni ∈ R) then
Issue an external Web mapping request to retrieve T ime(U → Ni )
if A contains less than k objects then
A ← A ∪ {Ni }
if A contains k objects then
Tmax ← the largest travel time of the objects in A
distmax ← U.max speed × Tmax
end if
else if T ime(U → Ni ) < Tmax then
Replace the object with Tmax with Ni in A
Tmax ← the largest travel time of the objects in A
distmax ← U.max speed × Tmax
end if
end if
for Each neighbor node Nj of Ni , where Nj ∈
/ S do
Q.enqueue(Nj , N Dist + Dist(Ni , Nj )); S ← S ∪ {Nj }
end for
end while
return A
Algorithm 1 depicts the pseudo code of the exact algorithm. It considers
the intersections and objects in the graph as two types of nodes that will be
incrementally inserted into a priority queue Q. Each entry in Q is defined as
(Ni , N Dist), where Ni is a node identifier and N Dist is the network distance
from U to Ni .
When we dequeue Q, the node with the smallest distance is returned and
removed from Q. The dequeue operation of Q returns the node with the smallest network distance from U and removes it from Q. Without loss of generality,
we assume that U can only U-turn at an intersection. Initially, U ’s nearest node
Ni with their distance Dist(U, Ni ) is enqueued to Q (Lines 4 to 6 in Algorithm 1). For each iteration, a node (Ni , N Dist) is dequeued from Q. Ni is
processed based on the following two cases:
Case 1: Ni is an intersection. For each neighbor node Nj of Ni that
has not been enqueued to Q, Nj is enqueued to Q along with the network
distance from U to Nj , i.e., Dist(Ni , Nj ) + N Dist, where Dist(Ni , Nj ) is the
distance from Ni to Nj (Lines 26 to 28 in Algorithm 1).
Case 2: Ni is an object. An external Web mapping request is issued
to retrieve the travel time from U to Ni , T ime(U → Ni ). If a current answer A contains less than k objects, Ni is inserted into A (Lines 15 to 18
Title Suppressed Due to Excessive Length
9
3
3
I3
1 I2
2 I4
1 R4 3
R6
R5
1 2 1
3
I8
2 I7 R7
3
2
I5
1 I6 2 R8
R3
U
3
R9
R2 2
2 1
3 1 2
1
I10
I9 R1
I11 R10
I12
3
3
6
3
I13
I14
I15
3
I1
5
2
1
3
User
Object
Intersection
Fig. 4 An example for the exact algorithm.
in Algorithm 1). If Ni is the k-th object inserted into A, we set the largest
travel time of the objects in A to Tmax . In case that A contains k objects
and T ime(U → Ni ) is less than the largest travel time of the objects in A
(i.e., Tmax ), the object with Tmax is replaced with Ni and Tmax is updated
accordingly (Lines 21 to 23 in Algorithm 1). Then, the neighbor nodes of Ni
are processed as in Case 1.
The algorithm repeatedly dequeues nodes from Q until an accurate answer
is found. Whenever Tmax is updated, we calculate the maximum possible network distance distmax that U can travel within the time interval Tmax (i.e.,
distmax = U.max speed × Tmax ). For each iteration, if the network distance
of the node dequeued from Q is larger than distmax , the algorithm terminates
and returns the current answer A to U (Line 9 in Algorithm 1). This is because the travel time from U to any unprocessed object is larger than Tmax ,
and thus, it cannot be part of A.
Example. Figure 4 shows a running example for the exact algorithm,
where a user U issues a query Q = (λ, k = 2), U.λ is represented by a triangle,
10 objects (i.e., R = {R1 , . . . , R10 }) are represented by circles, the distance
of each edge is indicated by a number beside it, and the maximum movement
speed U.max speed is 25 m/s (i.e., 90 km/h). Since U is moving towards I6 and
the distance from U to I6 is 1 km, (I6 , 1) is enqueued to an empty Q. Figure 5
depicts the 12 iterations performed by the algorithm to find an accurate k-NN
query answer.
1. The algorithm dequeues (I6 , 1) from Q. Since I6 is an intersection, its
neighbor nodes (i.e., (R5 , 4), (I5 , 4), and (I10 , 4)) are enqueued to Q.
2. (R5 , 4) is dequeued. Since R5 is an object, an external Web mapping request
is issued to get the travel time from U to R5 , T ime(U → R5 ) = 350
seconds. Since the current answer A is empty, R5 is simply inserted into
10
Detian Zhang et al.
Node Ni Time(U®Ni)
(I6,1)
(R5,4)
(I5,4)
(I10,4)
(R4,5)
(R1,5)
(R2,6)
(I2,6)
(I11,7)
(I14,7)
(I9,7)
(R9,8)
350 s
500 s
400 s
200 s
600 s
Current
Answer A
(I6,1)
(R5,4), (I5,4), (I10,4)
(I5,4), (I10,4), (R4,5)
R5
(I10,4), (R4,5), (R2,6), (I1,9)
R5
(R4,5), (R1,5), (R2,6), (I11,7), (I14,7), (I1,9)
R5
(R1,5), (R2,6), (I2,6), (I11,7), (I14,7), (I1,9)
R5, R4
(R2,6), (I2,6), (I11,7), (I14,7), (I9,7), (I1,9)
R5, R1
(I2,6), (I11,7), (I14,7), (I9,7), (I1,9)
R5, R2
(I11,7), (I14,7), (I9,7), (I1,9), (I3,9)
R5, R2
(I14,7), (I9,7), (R9,8), (I1,9), (I3,9), (R10,9)
R5, R2
(I9,7), (R9,8), (I1,9), (I3,9), (R10,9), (I13,10), (I15,13) R5, R2
(R9,8), (I1,9), (I3,9), (R10,9), (I13,10), (I15,13)
R5, R2
(I1,9), (I3,9), (R10,9), (I13,10), (R8,10), (I15,13)
R5, R2
Priority Queue
Tmax distmax
¥
¥
¥
¥
¥
500 s
400 s
350 s
350 s
350 s
350 s
350 s
350 s
¥
¥
¥
¥
¥
12.5 km
10 km
8.75 km
8.75 km
8.75 km
8.75 km
8.75 km
8.75 km
Fig. 5 The intermediate steps of the exact algorithm.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
A. R4 is the only neighbor of R5 that has not been processed by the
algorithm, so (R4 , 5) is enqueued to Q.
(I5 , 4) is dequeued from Q and two of its neighbor nodes, (R2 , 6) and (I1 , 9),
are enqueued to Q.
This iteration dequeues another intersection (I10 , 4) from Q and enqueues
three of its neighbor nodes (R1 , 5), (I11 , 7), and (I14 , 7) to Q.
The algorithm dequeues (R4 , 5) from Q and issues an external request for
R4 to get T ime(U → R4 ) = 500 seconds. Since A contains less than k = 2
objects, R4 is inserted into A. Then, Tmax is set to the largest travel time
of the objects in A, i.e., 500 seconds. As U.max speed = 25 m/s, distmax
is set to 25m/s × 500s=12.5 km. One of R4 ’s neighbor nodes, i.e., (I2 , 6),
is enqueued to Q.
Since (R1 , 5) is an object, an external request is issued for R1 . As
T ime(U → R1 ) = 400 is less than Tmax , R4 is replaced with R1 . Then,
Tmax and distmax are updated to 400 seconds and 25 × 400 = 10 km, respectively. One of R1 ’s neighbor node, I9 , has not been processed by the
algorithm, so (I9 , 7) is enqueued to Q.
(R2 , 6) is dequeued from Q and an external request is issued. Since
T ime(U → R2 ) = 200 is less than Tmax , R1 is replaced with R2 in A. Then,
Tmax and distmax are updated to 350 seconds and 25 × 350 = 8.75 km,
respectively. As all the neighbor nodes of R2 have been processed by the
algorithm, no node is enqueued to Q.
(I2 , 6) is dequeued from Q. (I3 , 9) is enqueued to Q.
(I11 , 7) is dequeued from Q. (R9 , 8) and (R10 , 9) are enqueued to Q.
(I14 , 7) is dequeued from Q. (I13 , 10) and (I15 , 13) are enqueued to Q.
(I9 , 7) is dequeued from Q and no node is needed to be enqueued to Q.
The algorithm dequeues (R9 , 8) from Q and issues an external request for
R9 to get T ime(U → R9 ) = 600. Since T ime(U → R9 ) is larger than
Title Suppressed Due to Excessive Length
11
Tmax = 350, no update is needed for A. One of R9 ’s neighbor node, i.e.,
(R8 , 10), is enqueued to Q.
After the above 12 iterations, (I1 , 9) is dequeued from Q. Since
N Dist(U, I1 ) = 9 is larger than distmax = 8.75, the algorithm terminates
and returns A = {R5 , R2 } to U as a query answer.
5 SMashQ
Although the exact algorithm can prune a large number of objects to reduce
the number of external Web mapping requests, if the object density in the
vicinity of a querying user is very high, the database server would still need to
issue a large number of external requests. To this end, we propose three shared
execution optimizations, namely, object grouping, direction sharing, and user
grouping, for SMashQ based on the road network topology to further reduce
the number of external requests. This section is organized as follows: Section 5.1 presents the object grouping optimization for SMashQ. Section 5.2
extends SMashQ to support the direction sharing optimization. Section 5.3
discusses the user grouping optimization that further improves SMashQ. Finally, Section 5.4 analyzes the performance of our algorithms.
5.1 Object Grouping Optimization
We observe that many spatial objects are generally located in clusters in real
world. For example, many restaurants are located in a downtown area (Figure 3a). Thus, it makes sense to group nearby objects to a representative point.
The database server issues only one external request to the Web mapping service for an object group. Then, it shares the retrieved travel time and route
information from a user to the representative point among the objects in the
group to estimate the travel time from the user to each object.
There are two challenges in grouping objects: (a) How to select representative points in a road network? and (b) How to group objects to representative
points? Our solution is to select intersections as representative points in a road
network and group objects to them. The reason is twofold: (1) Since there is
only one possible path from the intersection of a group G to each object in G,
it is easy to estimate the travel time from the intersection to each object in
G. (2) A road segment should have the same conditions, e.g., speed limit and
direction. The estimation of travel time would be more nature and accurate
by considering a road segment as a basic unit [16].
Figure 6 gives an example for object grouping where objects R8 , R9 and
R10 are grouped together by intersection I11 . The database server only needs
one external Web mapping request to retrieve the route information and travel
time from U and I11 , and then it estimates the travel time from I11 to each of
its group members.
12
Detian Zhang et al.
I2
R4
R5
I1
I3
R6
I 7 R7
I5
U
I6
I4
R8
I8
R3
R9
R2
I9 R1
I10
I13
I14
I11 R10
I12
I15
Intersection of an object group
User
Intersection
Object
Fig. 6 Object grouping optimization.
5.1.1 Greedy Object Grouping Algorithm
To minimize the number of external Web mapping requests, we should find a
minimal set of intersections that cover all road segments having target objects.
Given a k-NN query issued by a user U and a set R of candidate objects, which
is computed by Algorithm 2, a subgraph Gu = (Vu , Eu ) is extracted from the
graph G of the underlying road network model, where Eu is an edge set where
each edge contains at least one candidate object in R and Vu is a vertex set that
consists of the intersections of the edges in Eu . Algorithm 2 is similar to the
exact algorithm (Algorithm 1), except that after Algorithm 2 issues k external
Web mapping requests for the k-nearest objects of U to find an initial answer,
and then computes Tmax and distmax for the initial answer (Lines 13 to 18
in Algorithm 2). After determining the initial answer, Algorithm 2 does not
issue any external Web mapping requests for other candidate objects. Instead,
Algorithm 2 adds such candidate objects to a candidate object set R (Line 21
in Algorithm 2). The greedy object grouping algorithm returns a vertex set
I ⊆ Vu that is a minimal one covering all the edges in Eu . The object grouping
problem is the vertex cover problem [5], which is NP-complete, so we use a
greedy algorithm to find I.
Algorithm. Algorithm 3 depicts the pseudo code of a greedy object grouping algorithm. The input consists of the graph G of the underlying road network model and a set R of candidate objects, which are found by Algorithm 2,
for a k-NN query issued by a user U . The algorithm first extracts a subgraph
Gu from G and calculates the number of edges connected to a vertex as the
degree of that vertex (Lines 3 to 4 in Algorithm 3). Then, it selects a vertex
v from Vu with the largest degree to I. In case of a tie, the vertex with the
shortest distance to U is selected. Each object on each edge connected to v
Title Suppressed Due to Excessive Length
13
Algorithm 2 Candidate object search algorithm
1: input: G = (V, E), R, Q = (λ, k), U.max speed
2: Initialize a current answer set A, a node set S, a priority queue Q, and a candidate
object set R
3: distmax ← ∞; Tmax ← ∞
4: Ni ← the nearest node from U
5: N Dist ← Dist(U, Ni )
6: Q.enqueue(Ni , N Dist); S ← {Ni }
7: while Q is not empty do
(Ni , N Dist) ← Q.dequeue()
8:
9:
if A contains k objects and N Dist > distmax then
10:
break
11:
end if
12:
if Ni is an object (i.e., Ni ∈ R) then
13:
if A contains less than k objects then
14:
Issue an external Web mapping request to retrieve T ime(U → Ni )
15:
A ← A ∪ {Ni }
16:
if A contains k objects then
17:
Tmax ← the largest travel time of the objects in A
18:
distmax ← U.max speed × Tmax
end if
19:
20:
else
21:
R ← R ∪ {Ni }
22:
end if
23:
end if
24:
for Each neighbor node Nj of Ni , where Nj ∈
/ S do
Q.enqueue(Nj , N Dist + Dist(Ni , Nj )); S ← S ∪ {Nj }
25:
26:
end for
27: end while
28: return R
Algorithm 3 Greedy object grouping algorithm
1: input: Graph G = (V, E) and a set of candidate objects R from Algorithm 2
2: I ← {∅}
3: Extract a subgraph Gu = (Vu , Eu ) from G = (V, E) such that each edge in Eu contains
some objects in R and Vu contains the intersections of the edges in Eu
4: Calculate the degree of each vertex in Vu
5: while Vu is NOT empty do
v ← the vertex with the largest degree in Vu
6:
7:
Assign each object on each edge connected to v to the object group of v
8:
Insert v into I and remove v from Vu
Remove edges connected to v from Eu
9:
10:
Update the degree of each vertex in Vu
11:
Remove the vertex without any edge from Vu
12: end while
is assigned to the object group of v. Note that each candidate object is assigned to one or two object groups. v is removed from Vu and the edges of
v are removed from the Eu . After that, the degrees of the vertices of the removed edges are updated accordingly. The selection process is repeated until
Vu becomes empty (Lines 5 to 12 in Algorithm 3).
Example. Figures 7 and 8 depict an example for the greedy object grouping algorithm, where a subgraph Gu = (Vu , Eu ) extracted from G, as depicted
14
Detian Zhang et al.
I1
Current Answer={R4, R5}
I2
R4
R5
I3
R6
I7 R7
I6
R8
I5
R2
U
I9 R1
I10
I13
I14
I4
I8
R3
R9
I11 R10
I12
I15
Intersection of a group
User
Intersection
Object
Fig. 7 Example of the object grouping optimization.
d=1
I5
I9 I10
d=1
I7
d=1
I5
I11 d=1
d=2 d=1 d=2
(a) I = { I11 }
I12
I9 I10
d=0
I7
d=0
I5
d=0
I12
d=2 d=1
(b) I = { I11, I9 }
I10
d=0
(c) I = { I11, I9 }
Fig. 8 Example of the greedy object grouping algorithm.
in Figure 8a. Note that since R4 and R5 is the current answer A, which is
found by Algorithm 2, the route and travel time from U to R4 and R5 have
been retrieved from the Web mapping service. Thus, R4 and R5 are not candidate objects in R = {R1 , R2 , R8 , R9 , R10 } (Figure 7), which are found by
Algorithm 2. Since edges I5 I9 , I9 I10 , I7 I11 and I11 I12 contain some candidate
objects in R, these four edges constitute Eu and their vertices constitute Vu .
Then, the greedy algorithm calculates the degree for each vertex in Vu . For
example, the degree of I11 is two because two edges I7 I11 and I11 I12 are connected to I11 . Figure 8a shows that I11 has the largest degree and is closer to
U than I9 , I11 is selected to I and the edges connected to I11 are removed from
Gu . Figure 8b shows that the degrees of I7 and I12 of the two deleted edges
are updated accordingly. Any vertex in Gu without any edge (i.e., I7 and I12 )
is removed from Vu . Since I9 has the largest degree, I9 is selected to I and
the edges connected to I9 are removed. Since I5 and I10 have no edge, they
are removed from Vu . Finally, Vu becomes empty, so the greedy algorithm terminates and clusters the candidate objects into two object groups (indicated
by dotted rectangles), I9 = {R1 , R2 } and I11 = {R8 , R9 , R10 }, as indicated in
Figure 7.
Title Suppressed Due to Excessive Length
15
5.1.2 Travel Time Estimation for Grouped Objects
Most Web mapping service providers, e.g., Microsoft Bing Maps, return
turn-by-turn route information for a request. For example, if the database
server sends a request to the Web mapping service to retrieve the route
and travel time information from user U to intersection I9 in Figure 7,
the service returns the turn-by-turn route information, i.e., Route(U →
I9 ) = {⟨Dist(U → I6 ), T ime(U → I6 )⟩, ⟨Dist(I6 → I5 ), T ime(I6 → I5 )⟩,
⟨Dist(I5 → I9 ), T ime(I5 → I9 )⟩}, where Dist(A → B) and T ime(A → B) are
the distance and travel time from location A to location B, respectively.
If a group contains only one object, the database server simply issues one
external request to retrieve the route information from the user to that object.
Otherwise, our algorithm performs two steps to estimate the travel time for
each object and select the best route (if appropriate) as follows.
Step 1: Travel time calculation. Based on whether a candidate object
is on the last road segment of a retrieved route, we can distinguish two cases.
Case 1: An object is on the last road segment. In this case, we assume
that the speed of the last road segment is constant. For example, given the
information of the route from U to I9 (i.e., Route(U → I9 )) retrieved from the
Web mapping service, object R2 is on the last road segment of the route, i.e.,
I5 I9 . Hence, the travel time from U to R2 is calculated as: T ime(U → R2 ) =
2 →I9 )
T ime(U → I9 ) − T ime(I5 → I9 ) × Dist(R
Dist(I5 →I9 ) .
Case 2: An object is NOT on the last road segment. In this case, a candidate object is not on a retrieved route, i.e., the object is on a road segment S
connected to the last road segment S ′ of the route. We assume that the travel
speed of S is the same as that of S ′ . For example, given the retrieved information of the route from U to I9 (i.e., Route(U → I9 )), object R1 is not on the last
road segment of the route, i.e., I5 I9 . Hence, the travel time from U to R1 is cal9 →R1 )
culated as: T ime(U → R1 ) = T ime(U → I9 ) + T ime(I5 → I9 ) × Dist(I
Dist(I5 →I9 ) .
Step 2: Route selection. The greedy object grouping algorithm may
assign a candidate object to two object groups. In this case, the database
server can find the routes from U to the object via each of the intersections.
This step selects the route with the shortest travel time. For example, if both
the intersections of edge I9 I10 , i.e., I9 and I10 , are selected (Figure 7), the travel
time from user U to R1 is selected as T ime(U → R1 ) = min(T ime(U → I9 →
R1 ), T ime(U → I10 → R1 )).
5.1.3 Algorithm
Algorithm 4 depicts the pseudo code of SMashQ with the object grouping
optimization. It first executes Algorithm 2 to find a candidate object set R,
a current answer A, the maximum travel time Tmax of the objects in A, and
the maximum travel distance distmax for a querying user U moving at its
maximum possible speed (i.e., U.max speed) for time period Tmax (Line 2 in
Algorithm 4). Then, it executes the greedy object grouping algorithm (i.e., Algorithm 3) to find an intersection set I. The intersections in I are sorted based
16
Detian Zhang et al.
Algorithm 4 SMashQ with the object grouping optimization
1: input: G = (V, E), R, Q = (λ, k), U.max speed
2: Execute Algorithm 2 to find a set of candidate objects R, a current answer A, Tmax
and distmax
3: Execute the Greedy Object Grouping Algorithm to get a set of intersections I (Algorithm 3)
4: I is sorted according to the intersection’s network distance from U in non-decreasing
order
5: while I is not empty do
Ii ← the intersection in I with the shortest network distance from U
6:
7:
Issue a Web mapping request to retrieve Route(U → Ii ) and T ime(U → Ii )
8:
for Each object Rj in the object group of Ii do
9:
Estimate the time from U to Rj , i.e., T ime(U → Rj )
10:
if T ime(U → Rj ) < Tmax then
11:
Replace the object with Tmax in A with Rj
12:
Tmax ← max{T ime(U → Rl )|Rl ∈ A}
13:
distmax ← U.max speed × Tmax
14:
for Each object Rk ∈ R with the shortest network distance from U > distmax
do
15:
Remove Rk from R and its associated object group(s)
end for
16:
17:
end if
end for
18:
19:
Remove Ii and the intersection with an empty object group from I
20: end while
21: return A
on their shortest network distance from U in non-decreasing order (Line 4 in
Algorithm 4). The algorithm repeatedly processes the intersection with the
shortest network distance Ii ∈ I from U because Ii has a higher chance to
find objects that would be part of a query answer and prune more objects
in R. For each Ii , the algorithm issues an external Web mapping request
to retrieve the route and travel time information from U to Ii , denoted as
Route(U → Ii ) and T ime(U → Ii ), respectively (Line 7 in Algorithm 4). For
each object Rj in the object group of Ii , the travel time from U to Rj is
estimated. If T ime(U → Rj ) < Tmax , the object with Tmax in A is replaced
with Rj . Tmax and distmax are then updated accordingly. After that, for each
object Rk ∈ R with the shortest network distance from U larger than distmax
(i.e., distmax = U.max speed × Tmax ), Rk is pruned from R and its associated
object group(s) (Lines 14 to 16 in Algorithm 4). Ii and the intersection with
an empty object group are removed from I. This pruning process is repeated
until I becomes empty (Lines 5 to 20 in Algorithm 4).
In the running example, the exact algorithm has to issue five external
Web mapping requests, i.e., U → R1 , U → R2 , U → R4 , U → R5 , and
U → R9 .. With the object grouping optimization, SMashQ only needs to issue
four requests, i.e., U → R4 , U → R5 , U → I9 , and U → I11 .
Title Suppressed Due to Excessive Length
17
5.2 Direction Sharing
As described in Section 5.1, the object grouping optimization can effectively
reduce the number of external Web mapping requests. To further reduce the
number of requests, we design the direction sharing optimization to further
boost shared execution in SMashQ. This section is organized as follows. We
present the main idea and challenges of the direction sharing optimization (Section 5.2.1), its technical details (Section 5.2.2), and the algorithm of SMashQ
with the object grouping and direction sharing optimizations (Section 5.2.3).
5.2.1 Main Idea and Challenges
Most Web mapping service providers will return the route Route(A → B) with
the shortest travel time, the turn-by-turn route and travel time information for
a request from location A to location B. Let X be an intermediate intersection
in Route(A → B), i.e., T = Route(A . . . → X → . . . B). We observe that every
prefix in T , i.e., Route(A → X), is also the route with the shortest travel time
from A to X. The correctness of this observation is shown in the following
theorem.
Theorem 1 Given a route T = Route(A . . . → X → . . . B), where X is an
intermediate intersection in T , with the shortest travel time from location A
to location B, every prefix in T (i.e., Route(A → X)) is also the route with
the shortest travel time from location A to intersection X.
Proof A simple cut-and-paste argument is used to prove this theorem [5].
Suppose that there is a route T ′ = Route(A . . . → Y → . . . X), where Y ∈
/ T,
with a travel time shorter than the prefix Route(A → X) in T , then we could
cut Route(A → X) out of T and paste in T ′ , resulting in a route with a shorter
travel time than T , thus contradicting the optimality of T .
⊔
⊓
Based on Theorem 1, given a set of intersections I computed by the greedy
object grouping algorithm for a k-NN query issued from a user U , we should
first process the intersection Ii ∈ I such that the direction information from
U to Ii can be shared with the largest subset I ′ of other intersections in I
(i.e., Ii has the highest sharing ability). The sharing ability of an intersection
→Ii ))∩I|
Ii ∈ I is measured as |Set(Route(U
, where Set(Route(U → Ii )) is a set
|I|
of intersections in Route(U → Ii ). The direction information from U to Ii will
be used to determine the route and travel time from U to each intersection in
I ′ without issuing additional external Web mapping requests.
Consider an example that the greedy object grouping algorithm finds I =
{I5 , I6 , I8 , I9 , I11 , I15 } which are represented by solid squares (Figure 9). If we
know that Route(U → I9 ) passes through I1 , I2 , I5 , I6 , and I9 , its direction
information can be shared with intersections I5 , I6 , and I9 , and thus the
sharing ability of I9 is |{I5 ,I66 ,I9 }| = 36 = 0.5. In this example, only three external
Web mapping requests are needed, i.e., U → I8 , U → I9 , and U → I15 , because
their direction information can be shared with all other intersections in I.
18
Detian Zhang et al.
I2
I1
I3
I4
Route(UoI8)
I7
I8
Route(UoI9)
I5
I6
Route(UoI11)
U
I9
I10
I13
I14
I11
I12
I15
Intersection of an object group
User
Intersection
Fig. 9 Direction sharing optimization.
Since the traffic condition would be very dynamic, as discussed in Section 1,
we could not accurately predict the route between two locations. We will describe a histogram approach (Section 5.2.2) that is employed to estimate the
sharing ability for an intersection based on the historical route information
retrieved from the Web mapping service.
5.2.2 Histogram Approach
In this section, we describe a histogram approach that is used to maintain
historical route information and provide a heuristic measure of sharing ability
for an intersection. Since querying users are mobile users, it is impossible
to store an entry from every possible location to an intersection. Coupled
with the fact that the sharing ability measure is not based on the querying
user’s location, the histogram approach only maintains the historical route
information from one intersection to another one. Given a querying user U who
is moving towards an intersection Iu , Iu is considered as a starting location
for the sharing ability measure.
The histogram approach basically maintains a two-dimensional matrix H in
which an entry H[Ii , Ij ] (1 ≤ i, j ≤ |V |, where V is the number of intersections
in the graph G of the road network model) stores the number of intersections
from intersection Ii to intersection Ij , inclusive of Ii and Ij , based on the latest
route information from Ii to Ij . The rationale behind the histogram approach
is that H[Ii , Ij ] is used to estimate the sharing ability from Ii to Ij because
a route with more intersections has a higher chance to be shared with other
intersections. Initially, if Ii = Ij , H[Ii , Ij ] = 1; otherwise, H[Ii , Ij ] = −1.
Given a route from Iu to Id , we can distinguish two cases for the sharing
ability measure based on the value of H[Iu , Id ]:
Case 1: H[Iu , Id ] > 0. The sharing ability of Id is simply equal to H[Iu , Id ].
Title Suppressed Due to Excessive Length
19
Algorithm 5 SMashQ with the object grouping and direction sharing optimizations
1: input: G = (V, E), R, Q = (λ, k), U.max speed, H
2: Execute Algorithm 2 to find a set of candidate objects R, a current answer A, Tmax
and distmax
3: Execute the Greedy Object Grouping Algorithm to get a set of intersections I (Algorithm 3)
4: I is sorted according to the intersection’s sharing ability in non-decreasing order
5: while I is not empty do
6:
Ih ← the intersection in I with the highest sharing ability
7:
Issue a Web mapping request to retrieve Route(U → Ih ) and T ime(U → Ih )
8:
I ′ ← the intersections included in Set(Route(U → Ih )) ∩ I
9:
for Each intersection Ii ∈ I ′ do
10:
for Each object Rj in the object group of Ii do
11:
Estimate the time from U to Rj , i.e., T ime(U → Rj )
if T ime(U → Rj ) < Tmax then
12:
13:
Replace the object with Tmax in A with Rj
Tmax ← max{T ime(U → Rl )|Rl ∈ A}
14:
15:
distmax ← U.max speed × Tmax
16:
for Each object Rk ∈ R with the shortest network distance from U >
distmax do
17:
Remove Rk from R and its associated object group(s)
18:
end for
19:
end if
20:
end for
21:
end for
22:
Remove Ii , the intersection with an empty object group, and I ′ from I
Update H based on Route(U → Ih ) accordingly
23:
24: end while
25: return A
Case 2: H[Iu , Id ] = −1. Since the histogram has no historical route information from Iu to Id , we employ a shortest path algorithm [5] (e.g., Dijkstra’s
algorithm and A∗ algorithm) to find the shortest network path from Iu and
Id . Then, H[Iu , Id ] and the sharing ability of Id are set to the number of
intersections in the shortest network path, inclusive of Iu and Id .
After SMashQ receives the route and travel time information from U to
an selected intersection Id (i.e., Route(U → Id ) and T ime(U → Id )),
it updates the entry in H corresponds to the route from the first intersection to Id in Route(U → Id ) and every entry in H corresponds to
each prefix route in Route(U → Id ). For example, the database server receives Route(U → I9 ) = {⟨Dist(U → I6 ), T ime(U → I6 )⟩, ⟨Dist(I6 →
I2 ), T ime(I6 → I2 )⟩, ⟨Dist(I2 → I1 ), T ime(I2 → I1 )⟩, ⟨Dist(I1 →
I5 ), T ime(I1 → I5 )⟩, ⟨Dist(I5 → I9 ), T ime(I5 → I9 )⟩} from the Web mapping service provider, it will update H[I6 , I2 ] = 2, H[I6 , I1 ] = 3, H[I6 , I5 ] = 4,
and H[I6 , I9 ] = 5.
5.2.3 Algorithm
Algorithm 5 depicts the pseudo code of SMashQ with the object grouping and direction sharing optimizations. Similar to Algorithm 4, Algorithm 5
20
Detian Zhang et al.
executes Algorithm 2 to find a set R of candidate objects, a current answer
A, the maximum travel time Tmax of the objects in A, and the maximum
travel distance distmax for a querying user U moving at its maximum possible speed (i.e., U.max speed) for time period Tmax (Line 2 in Algorithm 5).
Then, Algorithm 3 is executed to find a set of intersections I. There are three
modifications in the pruning process for the direction sharing optimization.
The first modification is that the intersections in I are sorted by their sharing ability in non-decreasing order, instead of their shortest network distance
from U in Algorithm 4 (Line 4 in Algorithm 5), and the algorithm repeatedly
processes the intersection Ih with the highest sharing ability (Line 6 in Algorithm 5). Since the route and travel time information retrieved for a request
from U to Ih can also be shared with the prefix intersections included in I, the
second modification is to use such information to process the object groups of
a set intersections I ′ included in both the set of intersection in Route(U → Ih )
and I (i.e., I ′ = Set(Route(U → Ih )) ∩ I) (Lines 8 to 22 in Algorithm 5). The
third modification is to update the histogram H based on the retrieved route
information, i.e., Route(U → Ih ), accordingly.
5.3 User Grouping
To further reduce the number of external Web mapping requests for a database
server with a high workload, e.g., a large number of users or continuous queries,
we design another shared execution for grouping users. Similar to object grouping, we consider intersections in the road network as representative points. The
database server only needs to issue one external request from the intersection
of a user group Gu to the intersection of an object group Go to estimate the
travel time from each user Ui in Gu to each object Rj in Go .
The key difference between user grouping optimization and object grouping
optimization is that users are moving. Grouping users has to consider their
movement direction, so a user is grouped to the nearest intersection to which
the user is moving. Figure 10 depicts an example, where user A on edge I9 I10
is moving towards I10 , so A is grouped to the user group of I10 . Similarly,
users D and E are also grouped to the user group of I10 . Users C and B on
edges I1 I2 and I6 I2 , respectively, are both moving to I2 , so they are grouped
to the user group of I2 .
After having grouped users, we process queries in terms of user groups
instead of users. The database server finds an overall query answer A for all
users in a user group, and then extract the query answer from A for each user
in the group. For example, given a user group Gu and the intersection Iu of
Gu , the database server first search the kmax nearest objects to Iu , where kmax
is the largest value of k of the k-NN queries issued by the users in Gu . Then,
kmax and Iu are used to replace k and U in the algorithm of SMashQ with the
object grouping optimization (i.e., Algorithm 4) or the algorithm of SMashQ
with the object grouping and direction sharing optimizations (Algorithm 5) to
find an overall query answer A for all the users in Gu . For the ki -NN query of
Title Suppressed Due to Excessive Length
21
each user Ui in Gu , the database server takes the ki objects with the shortest
travel time in A as Ui ’s query answer Ai , and estimates the travel time from
the user’s location to Iu as in Section 5.1.2 to calculate the travel time from Ui
to each object in Ai . Therefore, each query in Gu can be effectively processed.
5.4 Performance Analysis
We now evaluate the performance of our algorithms. Since accessing travel
time and direction information from an external Web mapping service is more
expensive than local data processing [18], we compare the performance of our
algorithms in terms of the number of external Web mapping requests. Let M
and N be the number of querying users and the average number of candidate
objects per query, respectively.
– The exact algorithm needs to issue N × M external Web mapping queries
to answer all M queries; hence, its cost is CostExact = N × M
– By using the object grouping optimization, SMashQ can reduce the average
number of candidate objects from N to N ′ through grouping objects to
intersections, where N ′ ≤ N ; hence, CostO = N ′ × M .
– If SMashQ employs the object grouping and direction sharing optimization
techniques, it can further reduce N ′ to N ′′ by sharing the direction information among intersections, where N ′′ ≤ N ′ ; hence, CostOD = N ′′ × M .
– When SMashQ employs all the optimization techniques (i.e., object grouping, direction sharing, and user grouping), it can reduce M to M ′ by grouping users to intersections, where M ′ ≤ M ; hence, CostODU = N ′′ × M ′ .
I1
I2
R4
R5
C
B
I4
R6
I7 R7
I6
R8
D
R9
A R1
E
I10
I11
I13
I14
I5
R2
I9
I3
I8
R3
R10
I12
I15
Intersection of a user group
User
Intersection
Object
Fig. 10 User grouping optimization.
22
Detian Zhang et al.
In general, N ′′ ≪ N ′ ≪ N and M ′ ≪ M , so CostExact ≪ CostO ≪
CostOD ≪ CostODU . We will verify our performance analysis through the
experiments in Section 6.
6 Performance Evaluation
In this section, we first give our evaluation model in Section 6.1, including a
basic approach, performance metrics, and default experiment settings. Then,
we analyze the experiment result in Section 6.2.
6.1 Evaluation Model
Basic approach. To our best knowledge, there is no other existing algorithms
about k-NN queries using spatial mashups in time-dependent road networks.
We design a basic approach (denoted as BasicExact), as proposed in our previous work [33]. The basic idea of BasicExact is to issue external Web mapping
requests for the k-nearest objects to find an initial answer, calculate distmax
based on the largest travel time in the initial answer and the user’s maximum
movement speed, and find other objects within the network distance distmax
from the user as a candidate object set R for a query answer. Then, BasicExact issues one external Web mapping request from the querying user to each
candidate object in R to get an exact k-NN query answer. We also consider
the exact algorithm as another basic approach (denoted as Exact) to evaluate
the performance of our SMashQ with only the object grouping and direction
sharing optimizations (denoted as SMashQ-OD) and our SMashQ with all the
three shared execution optimizations (i.e., object grouping, direction sharing,
and user grouping) (denoted as SMashQ-ODU).
Performance metrics. We evaluate the performance of the basic approaches and our SMashQ algorithms in terms of four metrics. (1) The average number of external Web mapping requests per query and (2) the average
response time per query are metrics evaluating the effectiveness of our algorithms. The response time of a query is defined as the sum of the local CPU
processing time and the response time from Microsoft Bing Maps (including the communication cost and the CPU processing time at Microsoft Bing
Maps). It is calculated as the sum of its local CPU query processing time
and the multiplication of the number of external requests and the average response time per external request. From our experimental results, the average
response time per external request from Microsoft Bing Maps is 14.3 milliseconds. (3) The accuracy of estimated travel time (denoted as Acc T ime) is a
metric computed by:
(
Acc T ime = 1 − min
)
|Tb − T |
,1 ,
T
(1)
Title Suppressed Due to Excessive Length
23
(a) The restaurant data
(b) The hotel data
Fig. 11 Two real data sets.
where T and Tb are the actual travel time (retrieved by the exact algorithm)
and the estimated one, respectively, and 0 ≤ Acc T ime ≤ 1. (4) The accuracy
of k-NN query answers (denoted as Acc Ans) evaluates the accuracy returned
by our SMashQ algorithms is calculated by:
Acc Ans =
b ∩ A|
|A
,
|A|
(2)
b is a
where A is an exact query answer returned by the exact algorithm and A
query answer returned by our SMashQ algorithms and 0 ≤ Acc Ans ≤ 1.
Experiment settings. We evaluated the performance of the basic approaches (i.e., BasicExact and Exact) and the SMashQ algorithms (i.e.,
SMashQ-OD and SMashQ-ODU) using C++ and Microsoft Bing Maps [19]
with a real road network of Hennepin County, MN, USA [29]. We select an area
of 8×8 km2 that contains 6,109 road segments and 3,593 intersections, and the
latitude and longitude of its left-bottom and right-top corners are (44.898441,
-93.302791) and (44.970094, -93.204015), respectively. We retrieved two real
data sets within the selected area from Google Maps, i.e., a restaurant data
set of 1,835 restaurants and a hotel data set of 363 hotels, as depicted in Figures 11a and 11c, respectively. Unless mentioned otherwise, we used the real
restaurant data set and uniformly generated 5,000 querying users in the road
network for all the experiments, and the default requested number of nearest
objects of k-NN queries (i.e., the value of k) is 10. The maximum allowed
speed is 90 km per hour.
24
Detian Zhang et al.
20
BasicExact
Exact
1200 SMashQ-OD
SMashQ-ODU
1000
Average Query Response Time (s)
Average No. of External Requests
1400
800
600
400
200
0
16
BasicExact
Exact
SMashQ-OD
SMashQ-ODU
14
12
10
8
6
4
2
0
1
10
20
30
40
Requested Number of Nearest Objects (k)
50
1
(a) Number of external requests
0.96
0.96
0.94
0.94
0.92
0.9
0.88
0.86
0.84
10
20
30
40
Requested Number of Nearest Objects (k)
50
(b) Average query response time
Accuracy of Query Answers
Accuracy of Estimated Travel Time
18
SMashQ-OD
SMashQ-ODU
0.82
0.92
0.9
0.88
0.86
0.84
SMashQ-OD
SMashQ-ODU
0.82
1
10
20
30
40
Requested Number of Nearest Objects (k)
(c) Accuracy of travel time
50
1
10
20
30
40
Requested Number of Nearest Objects (k)
50
(d) Accuracy of query answers
Fig. 12 Effect of the Requested Number of Nearest Objects (k).
6.2 Experiment Results
We evaluated the scalability and efficiency of the basic approaches (i.e.,
BasicExact and Exact) and the SMashQ algorithms (i.e., SMashQ-OD and
SMashQ-ODU) with respect to the number of required number of nearest objects, i.e., the value of k (Section 6.2.1), the number of objects (Section 6.2.2),
and the number of querying users (Section 6.2.3).
6.2.1 Effect of the Requested Number of Nearest Objects
Figure 12 shows the performance of the basic approaches and our SMashQ
algorithms with respect to various numbers of requested number of nearest
objects (i.e., k) from 1 to 50. As depicted in Figure 12a, the SMashQ algorithms (i.e., SMashQ-OD and SMashQ-ODU) outperform the basic approaches
in terms of the average number of external Web mapping requests. The exact algorithm (Exact) reduces the number of external requests by 32% on
average compared to BasicExact. SMashQ-OD further significantly reduces the
number of external requests by 62%. With the user grouping optimization,
SMashQ-ODU yields the least number of external requests. This is because
Title Suppressed Due to Excessive Length
25
20
BasicExact
Exact
1200 SMashQ-OD
SMashQ-ODU
1000
Average Query Response Time (s)
Average No. of External Requests
1400
800
600
400
200
0
1000
2000
3000
4000
Number of Objects
16
BasicExact
Exact
SMashQ-OD
SMashQ-ODU
14
12
10
8
6
4
2
0
1000
5000
(a) Number of external requests
2000
3000
Number of Objects
4000
5000
(b) Average query response time
0.94
0.92
SMashQ-OD
SMashQ-ODU
0.93
Accuracy of Query Answers
Accuracy of Estimated Travel Time
18
0.92
0.91
0.9
0.89
0.88
0.87
0.86
0.85
0.84
1000
2000
3000
4000
Number of Objects
(c) Accuracy of travel time
5000
SMashQ-OD
SMashQ-ODU
0.91
0.9
0.89
0.88
0.87
0.86
0.85
1000
2000
3000
4000
Number of Objects
5000
(d) Accuracy of query answers
Fig. 13 Effect of the number of objects using synthetic data.
SMashQ-ODU has the ability to process queries from several users at the same
time instead of processing them one by one in SMashQ-OD. It is expected that
when k gets larger, there are more candidate objects, and thus the database
server needs to issue more external requests. Since processing external requests
is more expensive than local query processing, the query response time has the
same trend as the number of external requests (Figure 12b).
The two basic approaches BasicExact and Exact provide the exact travel
time and k-NN query answers. Conversely, the shared execution optimizations
designed for SMashQ require travel time estimation to reduce the number of
external requests, so we need to evaluate the accuracy of estimated travel time
(measured by Equation 1) and the accuracy of k-NN query answers (measured
by Equation 2) provided by SMashQ. Figures 12c and 12d depict that both
SMashQ-OD and SMashQ-ODU provide highly accurate travel time (at least
84% by SMashQ-OD and 83% by SMashQ-ODU) and query answers (at least
83% by SMashQ-OD and 82% by SMashQ-ODU), respectively. Both SMashQOD and SMashQ-ODU need to estimate the travel time from an intersection
to each object in an object group, but SMashQ-ODU also needs to estimate
the travel time from each querying user in a user group to an intersection.
As a result, the accuracy of SMashQ-ODU is slightly worse than SMashQ-
26
Detian Zhang et al.
8
BasicExact
Exact
500 SMashQ-OD
SMashQ-ODU
Average Query Response Time (s)
Average No. of External Requests
600
400
300
200
100
0
BasicExact
Exact
SMashQ-OD
6 SMashQ-ODU
7
5
4
3
2
1
0
363(hotels)
1835(restaurants)
Number of Objects
363(hotels)
1835(restaurants)
Number of Objects
(a) Number of external requests
(b) Average query response time
0.95
0.93
SMashQ-OD
SMashQ-ODU
0.94
0.93
0.92
0.91
0.9
0.89
Accuracy of Query Answers
Accuracy of Estimated Travel Time
0.96
SMashQ-OD
SMashQ-ODU
0.92
0.91
0.9
0.89
0.88
363(hotels) 1835(restaurants)
Number of Objects
(c) Accuracy of travel time
363(hotels) 1835(restaurants)
Number of Objects
(d) Accuracy of query answers
Fig. 14 Effect of the number of objects using real data.
OD. When k increases, the accuracy of our SMashQ algorithms improves. The
reason for this is that the larger k value requires them to process objects
further away from the querying user, so the ratio of the distance that we
need to estimate its travel time to the total distance between two locations
decreases. Thus, the accuracy of the travel time estimation improves, as k gets
larger (Figure 12c). The more accurate travel time estimation leads to more
accurate query answers, as depicted in Figure 12d.
The results of this experiment confirm that our SMashQ algorithms not
only effectively reduce the number of external Web mapping requests, but they
also provide highly accurate k-NN query answers.
6.2.2 Effect of the Number of Objects
Both synthetic data and real data sets were used to evaluate the performance
of SMashQ with respect to the number of objects.
Synthetic data. In the synthetic data set, the objects are randomly distributed in the road network. Figure 13 depicts the performance of the basic
approaches and our SMashQ algorithms with respect to the increase of the
number of synthetic objects from 1,000 to 5,000. Figure 13a depicts that our
Title Suppressed Due to Excessive Length
27
14
BasicExact
Exact
SMashQ-OD
SMashQ-ODU
800
700
Average Query Response Time (s)
Average No. of External Requests
900
600
500
400
300
200
100
0
2000
4000
6000
8000
Number of Querying Users
10
8
6
4
2
0
2000
10000
(a) Number of external requests
4000
6000
8000
Number of Querying Users
10000
(b) Average query response time
0.93
0.93
SMashQ-OD
SMashQ-ODU
0.92
Accuracy of Query Answers
Accuracy of Estimated Travel Time
BasicExact
Exact
SMashQ-OD
SMashQ-ODU
12
0.91
0.9
0.89
0.88
0.87
2000
4000
6000
8000
Number of Querying Users
10000
(c) Accuracy of travel time
SMashQ-OD
SMashQ-ODU
0.92
0.91
0.9
0.89
0.88
0.87
2000
4000
6000
8000
Number of Querying Users
10000
(d) Accuracy of query answers
Fig. 15 Effect of the number of querying users.
SMashQ algorithms require much less external requests than the basic approaches, when the number of objects increases, and thus our SMashQ algorithms can reduce the query response time (Figure 13b). The accuracy of travel
time estimation and query answers gets worse, when the number of objects increases, as depicted in Figures 13c and 13d, respectively. This is because when
there are more objects, the required search range is smaller; hence, the ratio
of the distance that we need to estimate its travel time to the total distance
between two locations increases. As a result, a small error in the travel time
estimation has a larger impact on the accuracy measure. However, SMashQOD and SMashQ-ODU still provide accurate travel time estimation and query
answers (over 85% accuracy).
Real data. The experimental results of the two real data sets (363 hotels and 1,835 restaurants) exhibit the same trend as the synthetic data.
The SMashQ algorithms perform much better than the basic approaches in
both the data sets, when there are more objects (Figures 14a and 14b). The
SMashQ algorithms also achieve high accuracy for travel time estimation and
query answers. The accuracy of travel time estimation is over 90% and the
accuracy of query answer is over 88%, as depicted in Figures 14c and 14d,
respectively.
28
Detian Zhang et al.
The experimental results of both the synthetic and real data sets confirm
that our SMashQ algorithms are able to scale up to a large number of objects.
6.2.3 Effect of the Number of Querying Users
Figure 15 depicts the performance of the basic approaches and our SMashQ
algorithms with respect to the increase of the number of querying users from
2,000 to 10,000. The number of querying users only slightly affects the performance of the basic approaches and SMashQ-OD because these algorithms
process k-NN queries one by one (Figures 15a and 15b). Conversely, SMashQODU is able to process a group of queries issued from different users at the
same time. When the number of users increases, there is a higher chance to
form larger user groups, and thus, the number of external Web mapping requests per query of SMashQ-ODU reduces. The number of querying users has
no obvious impact on the accuracy of travel time estimation and query answer,
as depicted in Figures 15c and 15d, respectively. The results of this experiment also verify that the SMashQ algorithms can scale up to a large number
of users.
7 Conclusion
In this paper, we proposed a spatial mashup framework SMashQ for a database
server to process k-nearest-neighbor (NN) queries in time-dependent road networks. SMashQ issues external requests to the Web mapping service, e.g.,
Google Maps and Microsoft Bing Maps, to retrieve the route and travel time
between two locations. Based on the retrieved route and travel time information, SMashQ is able to answer k-NN queries. An exact algorithm was designed
for SMashQ based on the existing pruning techniques for external data. Since
external Web mapping requests are expensive, three shared execution techniques, namely, object grouping, direction sharing, and user grouping, were
proposed for SMashQ to effectively reduce the number of external requests.
We conducted extensive experiments to evaluate the performance of SMashQ
with the shared execution optimizations using Microsoft Bing Maps, the real
road network, the real data sets, and the synthetic data set. The experimental results show that SMashQ significantly reduces the number of external
requests, it provides highly accurate query answers, and it is able to scale up
to large numbers of objects and users.
7.1 Future Research Directions
We have two future research directions for this work. In this paper, we assumed
that the location-based service provider is trusted, so there is no privacy issue
for the user. However, if the user does not trust the service provider, one possible solution is place a location anonymizer between the user and SMashQ that
Title Suppressed Due to Excessive Length
29
is responsible for blurring a user’s exact location into a cloaked road segment
set S that satisfies the user’s privacy requirements, namely, k-anonymity and
minimum road segment length [3]. The location anonymizer can transform the
user’s exact location into a set of external intersections of S, and then send all
these external intersections to SMashQ as the user’s location to find a k-NN
query answer for each external intersection. After the location anonymizer receives the query answers, since it knows the user’s location, it can compute
the exact answer and send it to the user. As a result, SMashQ and the Web
mapping service provider should not be able to identify the actual querying
user with a probability larger than 1/k or pinpoint the user’s location within
S. Stricter privacy requirements lead to more external intersections that have
to be processed by SMashQ. Thus, new query optimization techniques should
also be developed for SMashQ to find query answers for external intersections
efficiently.
Google Maps API has recently provided a new service, called distance
matrix, that returns the network distance and travel time of every combination
from a set of source locations to a set of destination locations defined in a
distance matrix request [28]. Unfortunately, we have chosen Microsoft Bing
Maps to implement SMashQ, but it does not support the distance matrix
service. The second research direction is to investigate how to incorporate
the distance matrix service into SMashQ, overcome its limitations (e.g., the
evaluation license of Google Distance Matrix API limits to 100 locations per
query and 10 seconds and 2,500 locations per 24 hours), and design new query
optimization techniques for SMashQ.
References
1. Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible
databases. In: Proceedings of the IEEE International Conference on Data Engineering (2002)
2. Chang, K.C.C., Hwang, S.W.: Minimal probing: Supporting expensive predicates for
top-k queries. In: Proceedings of the ACM Conference on Management of Data (2002)
3. Chow, C.Y., Mokbel, M.F., Bao, J., Liu, X.: Query-aware location anonymization for
road networks. GeoInformatica 15(3), 571–607 (2011)
4. Chow, C.Y., Mokbel, M.F., Leong, H.V.: On efficient and scalable support of continuous
queries in mobile peer-to-peer environments. IEEE Trans. on Mobile Computing 10(10),
1473–1487 (2011)
5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms (3.
ed.). MIT Press (2009)
6. Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: Efficient k-nearest neighbor search
in time-dependent spatial networks. In: Proceedings of the International Conference on
Database and Expert Systems Applications (2010)
7. Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: Towards k-nearest neighbor search
in time-dependent spatial network databases. In: International Workshop on Databases
in Networked Systems (2010)
8. Fu, T.Y., Peng, W.C., Lee, W.C.: Parallelizing itinerary-based KNN query processing
in wireless sensor networks. IEEE Trans. on Knowledge and Data Engineering 22(5),
711–729 (2010)
9. Gedik, B., Liu, L.: Mobieyes: A distributed location monitoring service using moving
location queries. IEEE Trans. on Mobile Computing 5(10), 1384–1402 (2006)
30
Detian Zhang et al.
10. George, B., Kim, S., Shekhar, S.: Spatio-temporal network databases and routing algorithms: A summary of results. In: Proceedings of the International Symposium on
Spatial and Temporal Databases (2007)
11. Google Maps: URL http://maps.google.com
12. Google Maps/Google Earth APIs Terms of Service: URL http://code.google.com/
apis/maps/terms.html
13. Hu, H., Xu, J., Lee, D.L.: A generic framework for monitoring continuous spatial queries
over moving objects. In: Proceedings of the ACM Conference on Management of Data
(2005)
14. Huang, X., Jensen, C.S., Saltenis, S.: The islands approach to nearest neighbor querying
in spatial networks. In: Proceedings of the International Symposium on Spatial and
Temporal Databases (2005)
15. Jensen, C.S.: Database aspects of location-based services. In: Location-Based Services,
pp. 115–148. Morgan Kaufmann (2004)
16. Kanoulas, E., Du, Y., Xia, T., Zhang, D.: Finding fastest paths on a road network
with speed patterns. In: Proceedings of the IEEE International Conference on Data
Engineering (2006)
17. Lee, D.L., Zhu, M., Hu, H.: When location-based services meet databases. Mobile
Information Systems 1(2), 81–90 (2005)
18. Levandoski, J.J., Mokbel, M.F., Khalefa, M.E.: Preference query evaluation over expensive attributes. In: Proceedings of the International Conference on Information and
Knowledge Management (2010)
19. Microsoft Bing Maps: URL http://www.bing.com/maps/
20. Mokbel, M.F., Xiong, X., Aref, W.G.: SINA: scalable incremental processing of continuous queries in spatio-temporal databases. In: Proceedings of the ACM Conference on
Management of Data (2004)
21. Mokbel, M.F., Xiong, X., Hammad, M.A., Aref, W.G.: Continuous query processing of
spatio-temporal data streams in PLACE. Geoinformatica 9(4), 343–365 (2005)
22. Mouratidis, K., Papadias, D., Hadjieleftheriou, M.: Conceptual partitioning: An efficient method for continuous nearest neighbor monitoring. In: Proceedings of the ACM
Conference on Management of Data (2005)
23. Nehme1, R.V., Rundensteiner, E.A.: SCUBA: Scalable cluster-based algorithm for evaluating continuous spatio-temporal queries on moving objects. Proceedings of the International Conference on Extending Database Technology (2006)
24. Papadias, D., Zhang, J., Mamoulis, N., Tao, Y.: Query processing in spatial network
databases. In: Proceedings of the International Conference on Very Large Data Bases
(2003)
25. Samet, H., Sankaranarayanan, J., Alborzi, H.: Scalable network distance browsing in
spatial databases. In: Proceedings of the ACM Conference on Management of Data
(2008)
26. Tanin, E., Harwood, A., Samet, H.: Using a distributed quadtree index in peer-to-peer
networks. The International Journal on Very Large Data Bases 16(2), 165–178 (2007)
27. Tao, Y., Papadias, D., Shen, Q.: Continuous nearest neighbor search. In: Proceedings
of the International Conference on Very Large Data Bases (2002)
28. The Google Distance Matrix API (Last visited: May 18, 2012): URL https://
developers.google.com/maps/documentation/distancematrix/
29. TIGER/Line Shapefiles 2009 for: Hennepin County, Minnesota: URL http://www2.
census.gov/cgi-bin/shapefiles2009/county-files?county=27053
30. Vancea, A., Grossniklaus, M., Norrie, M.C.: Database-driven web mashups. In: IEEE
ICWE (2008)
31. Wu, S.H., Chuang, K.T., Chen, C.M., Chen, M.S.: DIKNN: An itinerary-based KNN
query processing algorithm for mobile sensor networks. In: Proceedings of the IEEE
International Conference on Data Engineering (2007)
32. Yahoo! Maps: URL http://maps.yahoo.com
33. Zhang, D., Chow, C.Y., Li, Q., Zhang, X., Xu, Y.: Efficient evaluation of k-NN queries
using spatial mashups. In: Proceedings of the International Symposium on Spatial and
Temporal Databases (2011)
34. Zhu, Q., Lee, D.L., Lee, W.C.: Collaborative caching for spatial queries in mobile P2P
networks. In: Proceedings of the IEEE International Conference on Data Engineering
(2011)
Download