Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg www.ntnu.no VLDB’ 2011 - Seattle, USA 1 Outline • Top-k spatial preference queries • Current approaches • Our approach – Mapping to distance-score space – Query processing – Materialization (index construction) • Experimental evaluation • Conclusion www.ntnu.no VLDB’ 2011 - Seattle, USA 2 Motivation • Increasing number of Web information systems specialized in location-based queries • Systems are limited to simple spatial queries – Example: return objects in a given spatial location • Top-k spatial preference query – Ranks data objects based on the score of feature objects in their spatial neighborhood – Combines spatial and non-spatial scores www.ntnu.no VLDB’ 2011 - Seattle, USA 3 Top-k spatial preference queries • Given a set of data objects and scored feature objects hotel y • Query – Spatial neighborhood – Features of interest (e.g., bars) b2(0.6) b1(0.9) c1(0.6) Top-1 • Returns p2 Top-1 p1 – Ranked set of k best data objects • Score of a data object café bar c2(0.4) c4(0.8) c3(0.2) – Obtained from feature objects in its spatial neighborhood b3(0.3) Top-1 p3 x www.ntnu.no VLDB’ 2011 - Seattle, USA 4 Score function • Aggregation of partial scores – Any monotone function: sum, max, and min • Partial score – Score of a data object for a set of feature objects – Defined by the score of a single feature object • Highest score • Satisfies the spatial constraint • Spatial constraint – Range, nearest neighbor, and influence www.ntnu.no VLDB’ 2011 - Seattle, USA 5 Example (agg=sum) Range score(p)=1.5 www.ntnu.no Nearest neighbor score(p)=1.0 VLDB’ 2011 - Seattle, USA Influence score(p)=0.6 6 Current approaches • Naïve – Compute the score of all objects, select the top-k – Very costly • State-of-the-art [1,2] – Data objects and feature objects are indexed by multi-dimensional indices [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. www.ntnu.no VLDB’ 2011 - Seattle, USA 7 Current approaches • Probing algorithms (SP and GP) – Requires computing the score for all objects • Branch and bound algorithms (BB and BB*) – Compute an upper-bound score for the entries in the data objects R-tree – Prune entries whose upper-bound score is smaller than the score of the k-th object found • Feature join algorithm (FJ) – Create combinations of feature sets with high score – Combinations whose score is smaller than the score of the k-th object found are pruned www.ntnu.no VLDB’ 2011 - Seattle, USA 8 Motivation behind our idea… • Few feature objects are necessary to compute the score of a data object y c2(0.6) c1(0.5) – Features not dominated by any other feature in terms of both distance and score p1 • Nice properties ? c4(0.4) c3(0.2) c5(0.8) x hotel www.ntnu.no – Small size in practice – Sufficient to support any neighborhood condition and query parameter café VLDB’2011 - Seattle, USA 9 Our framework • Mapping to distance-score space – Pairs of objects (p, t) with t Fi to be examined • Identify SKY(p, Fi) – Minimum set of pairs required to compute the score of p according to Fi for any query • Materialize SKY(p, Fi) – Stored in a R-tree, one R-tree Ri per feature set Fi – Efficient query processing and maintenance • Query processing algorithm www.ntnu.no VLDB’ 2011 - Seattle, USA 10 Mapping to the distance-score space hotel café (p1,c1) (p1,c2) (p2,c3) c1(0.9) (p2,c2) c3(0.5) c2(0.7) (p1,c4) p2 • Mapping (p1,c3) (p2,c4) • Skyline – Pairs (object, feature) – Space [distance X score] www.ntnu.no pair (p2,c) (p2,c1) c4(0.3) p1 pair (p1,c) – Minimize: distance – Maximize: score VLDB’ 2011 - Seattle, USA 11 Theoretical properties • SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query – Maintaining SKY(p, Fi) is sufficient to answer any spatial preference query (stored in an R-tree) • SKY(p, Fi) is the minimum set required – The data required to process range queries permits processing nn and influence queries • The proofs of the theorems can be found in the paper www.ntnu.no VLDB’ 2011 - Seattle, USA 12 Access to partial scores • Only node entries that satisfy the spatial constraint are accessed – Items are retrieved in decreasing order of score • Minor modifications to support nn and influence root: e1: (p3,t4) (p2,t1) (p1,t3) www.ntnu.no e1 e2 Max-heap: <p <e13(0.8),p > 2(0.6)> e2: (p3,t4) (p2,t4) (p3,t4) VLDB’ 2011 - Seattle, USA 13 Query processing • Compute top-k data objects progressively aggregating partial scores retrieved from Ri – Similar to Fagin’s algorithm (NRA) • Algorithm – Each time an object p is retrieved from Ri, any unseen object p’ in Ri has a score(p’) ≤ score(p) – Keep track of lower and upper-bound score of the seen objects – Terminates when the lower-bound of the k-th object is better than the upper-bound of the remaining objects www.ntnu.no VLDB’ 2011 - Seattle, USA 14 Example (range, r=4.5) hotel hotel X X R1 restaurant p3(0.8) www.ntnu.no R2 bar p1(0.9) + Object R1 R2 p3 0.8 p1 - = 1.7 Score Upper-bound - 0.8 1.7 0.9 0.9 1.7 VLDB’ 2011 - Seattle, USA 15 Example (range, r=4.5) R1 R2 p2(0.6) www.ntnu.no p2(0.6) + = 1.2 Object R1 R2 Score Upper-bound p3 0.8 - 0.8 1.4 p1 - 0.9 0.9 1.5 p2 0.6 0.6 1.2 1.2 VLDB’ 2011 - Seattle, USA 16 Example (range, r=4.5) R1 R2 p1(0.2) = 0.5 Object R1 R2 Score Upper-bound p3 0.8 1.1 1.1 Top-1 p1 0.2 0.6 0.3 0.9 1.1 1.2 1.1 1.2 p2 www.ntnu.no p3(0.3) + 0.6 VLDB’ 2011 - Seattle, USA 17 Materialization • Objects are partitioned into regions – The distance among objects in the same region is small – The skyline set of the objects in the same region is similar with high probability • Compute SKY(R, Fi) for the region R – SKY(p, Fi) SKY(R, Fi), p R • Advantage – The feature set is accessed only once to compute the dynamic skyline of all objects in the region www.ntnu.no VLDB’ 2011 - Seattle, USA 18 Experimental evaluation • We compare our approach (SFA) against SP, GP, BB, BB*, and FJ algorithms [1,2] • All approaches are implemented in Java • Measures: response time, I/O, update time, index construction time, and index size [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. www.ntnu.no VLDB’ 2011 - Seattle, USA 19 Variables studied • Data distribution – Uniform (UN), Synthetic (CN), Real (RL) • Cardinality (object and features) – 50K, 100K, 200K, 400K, 800K, 1600K • Number of results (k) – 10, 20, 30, 40, 50 • Number of feature sets – 1, 2, 3, 4 5 • Query range (r), for range and influence queries – 10, 40, 160, 640, 2560 www.ntnu.no VLDB’ 2011 - Seattle, USA 20 Datasets Datasets Number of data objects Number of feature objects Dynamic skyline set Wal-Mart (WM) 11K 4K 1.98 Hotels (HT) 11K 31K 4.82 Synthetic (CN) 100K 100K 11.26 Uniform (UN) 100K 100K 12.04 www.ntnu.no VLDB’ 2011 - Seattle, USA 21 Number of features a) I/O varying the number of feature sets www.ntnu.no b) response time varying the number of feature sets VLDB’ 2011 - Seattle, USA 22 Scalability a) response time varying |Fi| www.ntnu.no b) response time varying |O| VLDB’ 2011 - Seattle, USA 23 Real datasets a) range www.ntnu.no b) influence VLDB’ 2011 - Seattle, USA c) nearest neighbor 24 Conclusion • Top-k spatial preference queries are a useful tool for novel location-based applications • We propose a new approach for processing top-k spatial preference queries efficiently – We find and materialize SKY(p, Fi) – We prove that SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query – The size of SKY(p, Fi) is small in practice • We propose algorithms to process queries using our index • The efficiency of our approach is verified through experiments on synthetic and real datasets www.ntnu.no VLDB’ 2011 - Seattle, USA 25 Thanks! More information: João B. Rocha-Junior joao@idi.ntnu.no http://www.idi.ntnu.no/~joao www.ntnu.no VLDB’ 2011 - Seattle, USA 26