The Min-dist Location Selection Query Jianzhong Qi Rui Zhang Lars Kulik Dan Lin Yuan Xue University of Melbourne 8/04/2015 Outline Backgrounds Algorithms .2. Sequential Scan Algorithm Quasi-Voronoi Cell Nearest Facility Circle Maximum NFC Distance Experiments Conclusions Motivation .3. The min-dist location selection problem Problem setting: a set of facilities serving a set of clients If we want to set up a new facility, choose a location from a set of potential locations to minimize the average distance between the facilities and the clients Motivating applications Urban planning simulations: deploy public facilities Multiple player online games: place players Motivation: urban planning simulation Modeling urban dynamics [1] .4. Motivation: online computer games An online game example [2] .5. Problem Definition .6. A set of clients, C A set of existing facilities, F A set of potential locations, P Select a potential location for a new facility to minimize the average distance between a client and her nearest facility Related Work The min-dist optimal location problem [3] A set of clients C A set of existing facilities F A candidate region Q Compute a location in Q for a new facility to minimize the average distance between a client and her nearest facility Q .7. Related Work Location Optimization Problems Problem Optim. Solution Distance Function Space Function [4] Max-inf Continuous L2 C, F [5] Max-inf Discrete L2 C, F [6] Max-inf Continuous L1 C, F [7] Max-inf Discrete L2 C, P [8] Max-inf Discrete L2 C, F, P [3] Min-dist Continuous L1 C, F [9] Min-dist Continuous Network C, F, E [10] Min-dist Discrete L2 C, P Proposed Min-dist Discrete L2 C, F, P .8. Datasets Algorithms: Problem Redefinition Larger distance reduction smaller average client-facility distance The influence Set of p, IS(p) c IS(p) dist(c,p) dist(c,c'sneareastexsisting facility) IS(p1) IS(p2) The distance reduction of p, dr(p) .9. dr(p) cIS(p) dist(c,c's neareast exsisting facility) dist(c,p) Algorithms: Sequential Scan Sequential Scan Algorithm Sequentially check all the potential locations For every potential location p Report the one with the largest dr value Drawback – repeated dataset accesses Key algorithm design considerations .10. Sequentially check all the clients, compute IS(p) and dr(p) Restrict the search space for IS(p) Share the computation for determining the influence sets of multiple potential locations Algorithms: Quasi-Voronoi Cell A potential location’s surrounding existing facilities constraint its search space for IS The Quasi-Voronoi Cell (QVC) [11] .11. Algorithms: Nearest Facility Circle Constraint the search space from clients’ perspective Nearest facility circle of a client c, NFC(c) p NFC(c) c IS(p) .12. An R-tree on the NFCs An R-tree on the potential locations Synchronous traversal Algorithms: Maximum NFC Distance An index reduced version of NFC NFC requires two R-trees to index the clients Key insight Combine two R-trees together A single value to describe a region that encloses the NFCs of the clients in an R-tree node N .13. One for the NFCs The other for the clients Inefficient to maintain with clients coming and leaving constantly The Maximum NFC Distance Algorithms: Maximum NFC Distance Maximum NFC Distance (MND) .14. The largest distance between the points on the NFCs and the MBR of a node on the clients Algorithms: Maximum NFC Distance Efficient MND Computation Only requires checking four points per node The four candidate furthest points (CFP): Iv1, Iv2, Ih1, Ih2 MND(N) max{dist(I,N)|I CFP(N)} .15. Experiments: settings Hardware 2.66GHz Intel(R) Core(TM)2 Quad CPU,3GB RAM Datasets Synthetic datasets: Uniform, Gaussian, Zipfian Parameter Value Disk page size 4KB Client set size 10K, 50K, 100K, 500K, 1000K Existing facility set size 0.1K, 0.5K, 1K, 5K, 10K Potential location set size 1K, 5K, 10K, 50K, 100K ; σ2 (Gaussian distribution ) 0; 0.125, 0.25, 0,5, 1, 2 N; ∂ (Zipfian distribution) 1000; 0.1, 0.3, 0.6, 0.9, 1.2 Real datasets: populated places and cultural landmarks in US and North America [13] .16. US: |C| = 15206, |F| = 3008, |P| = 3009 NA: |C| = 24493, |F| = 4601, |P| = 4602 Experiments: dataset cardinality MND is as good as NFC in running time and I/O. They both outperform SS and QVC by one order of magnitude. .17. Experiments: dataset cardinality MND reduces 40% in index size compared to NFC .18. Experiments: data distribution Gaussian Real MND shows the best overall performance .19. Conclusions A new location optimization problem Two approaches from commonly used techniques Quasi-Voronoi Cell Nearest Facility Circle A new approach MND .20. Urban simulation Massively multiplayer online games High efficiency No additional index Reference [1] http://www.simcenter.org. [2] http://connect.in.com/free-online-games-com/photos-540361-9095265.html. [3] D. Zhang, Y. Du, T. Xia, and Y. Tao, “Progressive computation of the min-dist optimal-location query,” in VLDB, 2006. [4] S. Cabello, J. M. D´ıaz-B´a˜nez, S. Langerman, C. Seara, and I. Ventura, “Reverse facility location problems.” in CCCG, 2005. [5] T. Xia, D. Zhang, E. Kanoulas, and Y. Du, “On computing top-t most influential spatial sites.” in VLDB, 2005. [6] Y. Du, D. Zhang, and T. Xia, “The optimal-location query.” in SSTD, 2005. [7] Y. Gao, B. Zheng, G. Chen, and Q. Li, “Optimal-location-selection query processing in spatial databases,” TKDE, vol. 21, pp. 1162–1177, 2009. [8] J. Huang, Z. Wen, J. Qi, R. Zhang, J. Chen, and Z. He, “Top-k most influential locations selection,” in CIKM, 2011. [9] X. Xiao, B. Yao, and F. Li, “Optimal location queries in road network databases,” in ICDE, 2011. [10] http://www.esri.com/. [11] I. Stanoi, M. Riedewald, D. Agrawal, and A. E. Abbadi, “Discovery of influence sets in frequently updated databases,” in VLDB, 2001. [12] http://www.rtreeportal.org. .21. Thank you! Jianzhong Qi jiqi@student.unimelb.edu.au