Slides - University of Melbourne

advertisement
The Min-dist Location Selection Query
Jianzhong Qi Rui Zhang Lars Kulik Dan Lin Yuan Xue
University of Melbourne
8/04/2015
Outline

Backgrounds

Algorithms






.2.
Sequential Scan Algorithm
Quasi-Voronoi Cell
Nearest Facility Circle
Maximum NFC Distance
Experiments
Conclusions
Motivation


.3.
The min-dist location selection problem

Problem setting: a set of facilities serving a set of clients

If we want to set up a new facility, choose a location from a
set of potential locations to minimize the average distance
between the facilities and the clients
Motivating applications

Urban planning simulations: deploy public facilities

Multiple player online games: place players
Motivation: urban planning simulation
Modeling urban dynamics [1]
.4.
Motivation: online computer games
An online game example [2]
.5.
Problem Definition




.6.
A set of clients, C
A set of existing facilities, F
A set of potential locations, P
Select a potential location for a new facility to
minimize the average distance between a client and
her nearest facility
Related Work

The min-dist optimal location problem [3]




A set of clients C
A set of existing facilities F
A candidate region Q
Compute a location in Q for a new facility to minimize the
average distance between a client and her nearest facility
Q
.7.
Related Work
Location Optimization Problems
Problem
Optim.
Solution
Distance
Function
Space
Function
[4]
Max-inf
Continuous
L2
C, F
[5]
Max-inf
Discrete
L2
C, F
[6]
Max-inf
Continuous
L1
C, F
[7]
Max-inf
Discrete
L2
C, P
[8]
Max-inf
Discrete
L2
C, F, P
[3]
Min-dist
Continuous
L1
C, F
[9]
Min-dist
Continuous
Network
C, F, E
[10]
Min-dist
Discrete
L2
C, P
Proposed
Min-dist
Discrete
L2
C, F, P
.8.
Datasets
Algorithms: Problem Redefinition


Larger distance reduction 
smaller average client-facility distance
The influence Set of p, IS(p)

c  IS(p) dist(c,p) dist(c,c'sneareastexsisting facility)
IS(p1)
IS(p2)

The distance reduction of p, dr(p)

.9.
dr(p)  cIS(p) dist(c,c's neareast exsisting facility)  dist(c,p)
Algorithms: Sequential Scan

Sequential Scan Algorithm

Sequentially check all the potential locations

For every potential location p




Report the one with the largest dr value
Drawback – repeated dataset accesses
Key algorithm design considerations


.10.
Sequentially check all the clients, compute IS(p) and dr(p)
Restrict the search space for IS(p)
Share the computation for determining the influence sets
of multiple potential locations
Algorithms: Quasi-Voronoi Cell

A potential location’s surrounding existing facilities
constraint its search space for IS
The Quasi-Voronoi Cell (QVC) [11]
.11.
Algorithms: Nearest Facility Circle

Constraint the search space from clients’ perspective

Nearest facility circle of a client c, NFC(c)
p NFC(c)  c  IS(p)



.12.
An R-tree on the NFCs
An R-tree on the potential locations
Synchronous traversal
Algorithms: Maximum NFC Distance

An index reduced version of NFC

NFC requires two R-trees to index the clients




Key insight


Combine two R-trees together
A single value to describe a region that encloses the NFCs
of the clients in an R-tree node N

.13.
One for the NFCs
The other for the clients
Inefficient to maintain with clients coming and leaving constantly
The Maximum NFC Distance
Algorithms: Maximum NFC Distance

Maximum NFC Distance (MND)

.14.
The largest distance between the points on the NFCs and
the MBR of a node on the clients
Algorithms: Maximum NFC Distance

Efficient MND Computation

Only requires checking four points per node
The four candidate furthest points (CFP): Iv1, Iv2, Ih1, Ih2

 MND(N)  max{dist(I,N)|I  CFP(N)}
.15.
Experiments: settings

Hardware


2.66GHz Intel(R) Core(TM)2 Quad CPU,3GB RAM
Datasets


Synthetic datasets: Uniform, Gaussian, Zipfian
Parameter
Value
Disk page size
4KB
Client set size
10K, 50K, 100K, 500K, 1000K
Existing facility set size
0.1K, 0.5K, 1K, 5K, 10K
Potential location set size
1K, 5K, 10K, 50K, 100K
; σ2 (Gaussian distribution )
0; 0.125, 0.25, 0,5, 1, 2
N; ∂ (Zipfian distribution)
1000; 0.1, 0.3, 0.6, 0.9, 1.2
Real datasets: populated places and cultural landmarks in US
and North America [13]


.16.
US: |C| = 15206, |F| = 3008, |P| = 3009
NA: |C| = 24493, |F| = 4601, |P| = 4602
Experiments: dataset cardinality
MND is as good as NFC in running time and I/O.
They both outperform SS and QVC by one order of magnitude.
.17.
Experiments: dataset cardinality
MND reduces 40% in index size compared to NFC
.18.
Experiments: data distribution

Gaussian

Real
MND shows the best overall performance
.19.
Conclusions

A new location optimization problem



Two approaches from commonly used techniques



Quasi-Voronoi Cell
Nearest Facility Circle
A new approach MND


.20.
Urban simulation
Massively multiplayer online games
High efficiency
No additional index
Reference
[1] http://www.simcenter.org.
[2] http://connect.in.com/free-online-games-com/photos-540361-9095265.html.
[3] D. Zhang, Y. Du, T. Xia, and Y. Tao, “Progressive computation of the min-dist
optimal-location query,” in VLDB, 2006.
[4] S. Cabello, J. M. D´ıaz-B´a˜nez, S. Langerman, C. Seara, and I. Ventura,
“Reverse facility location problems.” in CCCG, 2005.
[5] T. Xia, D. Zhang, E. Kanoulas, and Y. Du, “On computing top-t most influential
spatial sites.” in VLDB, 2005.
[6] Y. Du, D. Zhang, and T. Xia, “The optimal-location query.” in SSTD, 2005.
[7] Y. Gao, B. Zheng, G. Chen, and Q. Li, “Optimal-location-selection query
processing in spatial databases,” TKDE, vol. 21, pp. 1162–1177, 2009.
[8] J. Huang, Z. Wen, J. Qi, R. Zhang, J. Chen, and Z. He, “Top-k most influential
locations selection,” in CIKM, 2011.
[9] X. Xiao, B. Yao, and F. Li, “Optimal location queries in road network databases,” in
ICDE, 2011.
[10] http://www.esri.com/.
[11] I. Stanoi, M. Riedewald, D. Agrawal, and A. E. Abbadi, “Discovery of influence
sets in frequently updated databases,” in VLDB, 2001.
[12] http://www.rtreeportal.org.
.21.
Thank you!
Jianzhong Qi
jiqi@student.unimelb.edu.au
Download