GEOMETRY APPROACH FOR K-REGRET QUERY ICDE 2014 1 PENG PENG, RAYMOND CHI-WING WONG CSE, HKUST OUTLINE 1. Introduction 2. Contributions 3. Preliminary 4. Related Work 5. Geometry Property 6. Algorithm 7. Experiment 2 8. Conclusion 1. INTRODUCTION Multi-criteria Decision Making: • Design a query for the user which returns a number of “interesting” objects to a user Traditional queries: 3 • Top-k queries • Skyline queries 1. INTRODUCTION Top-k queries • Utility function π ⋅ : 0,1 π → [0,1] • Given a particular utility function π, the utility of all the points in D can be computed. • The output is a set of k points with the highest utilities. Skyline queries 4 • No utility function is required. • A point is said to be a skyline point if a point is not dominated by any point in the dataset. • Assume that a greater value in an attribute is more preferable. • We say that q is dominated by p if and only if π π ≤ π[π] for each π ∈ [1, π] and there exists an π ∈ [1, π] such that π[π] < π[π]. • The output is a set of skyline points. LIMITATIONS OF TRADITIONAL QUERIES Traditional Queries • Top-k queries • Advantage: the output size is given by the user and it is controllable. • Disadvantage: the utility function is assumed to be known. • Skyline queries • Advantage: there is no assumption that the utility function is known. • Disadvantage: the output size cannot be controlled. Recently proposed Query in VLDB2010 • K-regret queries 5 • Advantage: There is no assumption that the utility function is known and the output size is given by the user and is controllable. 2. CONTRIBUTIONS ο We give some theoretical properties of k-regret queries ο We give a geometry explanation of a k-regret query. ο We define happy points, candidate points for the k-regret query. ο Significance: All existing algorithms and new algorithms to be developed for the k-regret query can also use our happy points for finding the solution of the k-regret query more efficiently and more effectively. ο We propose two algorithms for answering a k-regret query 6 ο GeoGreedy algorithm ο StoredList algorithm ο We conduct comprehensive experimental studies 3. PRELIMINARY Notations in k-regret queries We have π« = ππ , ππ , ππ , ππ . Let πΊ = ππ , ππ . • Utility function π x : 0,1 π → [0,1]. • π(0.5,0.5) is an example where π(0.5,0.5) = 0.5 ⋅ πππΊ + 0.5 ⋅ π»π. • Consider 3 utility functions, namely, π 0.3,0.7 , π(0.5,0.5) , π(0.7,0.3) . • πΉ = {π 0.3,0.7 , π(0.5,0.5) , π(0.7,0.3) }. • Maximum utility ππππ₯ π, π = max π(π). • ππππ₯ π, π 0.5,0.5 = π 0.5,0.5 (π2 ) = 0.845 , • ππππ₯ π·, π 0.5,0.5 = π 0.5,0.5 π1 = 0.870. 7 π∈π 3. PRELIMINARY Notations in k-regret queries • Regret ratio ππ π, π = 1 − ππππ₯ π,π ππππ₯ π·,π . Measures how bad a user with f feels after receiving the output S. If it is 1, the user feels bad; if it is 0, then the user feels happy. ππππ₯ π, π 0.5,0.5 ππ π, π 0.5,0.5 =1− ππππ₯ π, π 0.3,0.7 ππ π, π 0.3,0.7 0.845 0.870 0.901 0.901 =1− = 0.901, = 0; = 0.811, ππππ₯ π·, π 0.7,0.3 0.811 0.916 = 0.870, = 0.029; = 0.901, ππππ₯ π·, π 0.3,0.7 =1− ππππ₯ π, π 0.7,0.3 ππ π, π 0.7,0.3 = 0.845, ππππ₯ π·, π 0.5,0.5 = 0.916, = 0.115. • Maximum regret ratio πππ π = max ππ(π, π). π∈πΉ 8 Measures how bad a user feels after receiving the output S. A user feels better when πππ(π) is smaller. • πππ π = max 0, 0.029, 0.115 = 0.115. 3. PRELIMINARY Problem Definition • Given a d-dimensional database π· of size n and an integer k, a k-regret query is to find a set of S containing at most k points such that πππ(π) is minimized. • Let ππππ be the maximum regret ratio of the optimal solution. Example 9 • Given a set of points π1 , π2 , π3 , π4 each of which is represented as a 2-dimensional vector. • A 2-regret query on these 4 points is to select 2 points among π1 , π2 , π3 , π4 as the output such that the maximum regret ratio based on the selected points is minimized among other selections. 4. RELATED WORK Variations of top-k queries • Personalized Top-k queries (Information System 2009) - Partial information about the utility function is assumed to be known. • Diversified Top-k queries (SIGMOD 2012) - The utility function is assumed to be known. - No assumption on the utility function is made for a k-regret query. Variations of skyline queries • Representative skyline queries (ICDE 2009) - The importance of a skyline point changes when the data is contaminated. • K-dominating skyline queries (ICDE 2007) - The importance of a skyline point changes when the data is contaminated. - We do not need to consider the importance of a skyline point in a k-regret query. Hybrid queries • Top-k skyline queries (OTM 2005) - The importance of a skyline point changes when the data is contaminated. • π-skyline queries (ICDE 2008) - No bound is guaranteed and it is unknown how to choose π. 10 - The maximum regret ratio used in a k-regret query is bounded. 4. RELATED WORK K-regret queries • Regret-Minimizing Representative Databases (VLDB 2010) • Firstly propose the k-regret queries; • Proves a worst-case upper bound and a lower bound for the maximum regret ratio of the k-regret queries; • Propose the best-known fastest algorithm for answering a kregret query. • Interactive Regret Minimization (SIGMOD 2012) • Propose an interactive version of k-regret query and an algorithm to answer a k-regret query. • Computing k-regret Minimizing set (VLDB 2014) 11 • Prove the NP-completeness of a k-regret query; • Define a new k-regret minimizing set query and proposed two algorithms to answer this new query. 5. GEOMETRY PROPERTY • Geometry explanation of the maximum regret ratio πππ(π) given an output set S Happy point and its properties 12 • GEOMETRY EXPLANATION OF πππ(πΊ) • Maximum regret ratio πππ π = max ππ(π, π). π∈πΉ How to compute πππ(π) given an output set π? 13 • The function space F can be infinite. • The method used in “Regret-Minimizing Representative Databases” (VLDB2010): Linear Programming • It is time consuming when we have to call Linear Programming independently for different π’s. GEOMETRY EXPLANATION OF πππ(πΊ) • Maximum regret ratio πππ π = max ππ(π, π). π∈πΉ We compute πππ π with Geometry method. 14 • Straightforward and easily understood; • Save time for computing πππ(π). AN EXAMPLE IN 2-D πͺπππ(π«), where π« = {π1 , π2 , π3 , π4 , π5 , π6 }. π1 π4 π2 π3 π6 1 15 1 π5 AN EXAMPLE IN 2-D πͺπππ(πΊ), where S= {π1 , π2 }. 1 π1 π4 π2 π3 π6 1 16 π5 GEOMETRY EXPLANATION OF πππ(πΊ) Critical ratio • A π-critical point given π denoted by π’ is defined as the intersection between the vector ππ and the surface of πΆπππ£(π). π′ π 0.8π₯ + 0.7π¦ = 1 π = (0.67,0.82) π′ = (0.6,0.74) π 17 • Critical ratio ππ π, π = GEOMETRY EXPLANATION OF πππ(πΊ) Lemma 0: • πππ(π) = max(1 − ππ π, π ) π∈π· 18 • According to the lemma shown above, we compute ππ(π, π) at first for each π which is outside πΆπππ£ π and find the greatest value of 1 − ππ π, π which is the maximum regret ratio of π. AN EXAMPLE IN 2-D Suppose that π = 2 , and the output set is S = {π1 , π3 }. ππ π2 , π = ππ π6 , π = π5 π2′ π2 π6′ π6 . . π5 . 1 So, πππ π = max(1 − ππ π, π ) π∈π· = max{1 − ππ π5 , π , 1 − ππ π2 , π , 1 − ππ(π6 , π)}. π5′ π1 π4 π2 π2′ π3 π6′ π6 1 19 ππ π5 , π = π5′ HAPPY POINT The set ππΆ is defined as a set of π-dimensional points of size π, where for each point πππ and π ∈ π, π , we have πππ [π] = π when π = π, and πππ [π] = π when π ≠ π. In a 2-dimensional space, π½πͺ = {πππ , πππ }, where πππ = π, π , πππ = (π, π). π1 π4 π2 π3 π6 π£π2 20 π£π1 π5 HAPPY POINT In the following, we give an example of πͺπππ( π ∪ π½πͺ) in a 2dimensonal case. Example: πͺπππ( ππ ∪ π½πͺ) π1 π4 πͺπππ( ππ ∪ π½πͺ) π2 π3 π6 π£π2 21 π£π1 π5 HAPPY POINT Definition of domination: • We say that q is dominated by p if and only if π π ≤ π[π] for each π ∈ [1, π] and there exists an π ∈ [1, π] such that π[π] < π[π]. Definition of subjugation: 22 • We say that q is subjugated by p if and only if q is on or below all the hyperplanes containing the faces of πΆπππ£({π} ∪ ππΆ) and π is below at least one hyperplane containing a face of πΆπππ£({π} ∪ ππΆ). • We say that q is subjugated by p if and only if π π ≤ π(π) for each π ∈ πΉ and there exists a π ∈ πΉsuch that π π < π(π). AN EXAMPLE IN 2-D π2 subjugates π4 because π4 is below both the line π2 π£π1 and the line π2 π£π2 . π2 does not subjugates π3 because ππ is above the line π2 π£π2 . π1 π4 π2 π3 π6 π£π2 23 π£π1 π5 HAPPY POINT Lemma 1: • There may exist a point in π·ππππ£ , which cannot be found in the optimal solution of a k-regret query. Example: • In the example shown below, the optimal solution of a 3-regret query is π5 , π6 , π2 , where π2 is not a point in π·ππππ£ . π1 π2 π3 π6 π£π2 24 π£π1 π5 AN EXAMPLE IN 2-D Lemma 2: • π·ππππ£ ⊂ π·βππππ¦ ⊂ π·π ππ¦ππππ Example: • π·π ππ¦ππππ = π1 , π2 , π3 , π4 , π5 , π6 π£π1 π5 π1 π4 π2 π3 π6 π£π2 25 • π·ππππ£ = π1 , π2 , π5 , π6 • π·βππππ¦ = π1 , π2 , π3 , π5 , π6 HAPPY POINT All existing studies are based on π«πππππππ as candidate points for the k-regret query. Lemma 3: • Let ππππ be the maximum regret ratio of the optimal solution. Then, there exists an optimal solution of a k-regret query, which is a subset of π·βππππ¦ when ππππ < ½ . Example: 26 • Based on Lemma 3, we compute the optimal solution based on π·βππππ¦ instead of π·π ππ¦ππππ . 6. ALGORITHM Geometry Greedy algorithm (GeoGreedy) • Pick π boundary points of the dataset π· of size π and insert them into an output set; • Repeatedly compute the regret ratio for each point which is outside the convex hull constructed based on the up-to-date output set, and add the point which currently achieves the maximum regret ratio into the output set; • The algorithm stops when the output size is k or all the points in π·ππππ£ are selected. Stored List Algorithm (StoredList) • Preprocessing Step: • Call GeoGreedy algorithm to return the output of an π-regret query; • Store the points in the output set in a list in terms of the order that they are selected. • Query Step: 27 • Returns the first k points of the list as the output of a k-regret query. 7. EXPERIMENT Datasets Experiments on Synthetic datasets Experiments on Real datasets • Household dataset : π = 903077, π = 6 • NBA dataset: π = 21962, π = 5 • Color dataset: π = 68040, π = 9 • Stocks dataset: π = 122574, π = 5 Algorithms: • Greedy algorithm (VLDB 2010) • GeoGreedy algorithm • StoredList algorithm Measurements: 28 • The maximum regret ratio • The query time 7. EXPERIMENT Experiments • Relationship Among π·ππππ£ , π·βππππ¦ , π·π ππ¦ππππ 29 • Effect of Happy Points • Performance of Our Method RELATIONSHIP AMONG π·ππππ£ , π·βππππ¦ , π·π ππ¦ππππ π«πππππ π«πππππππ Household 926 1332 9832 NBA 65 75 447 Color 124 151 1023 Stocks 396 449 3042 30 π«ππππ Dataset EFFECT OF HAPPY POINTS Household: maximum regret ratio The result based on π·π ππ¦ππππ 31 The result based on π·βππππ¦ EFFECT OF HAPPY POINTS Household: query time The result based on π·π ππ¦ππππ 32 The result based on π·βππππ¦ PERFORMANCE OF OUR METHOD Experiments on Synthetic datasets • Maximum regret ratio Effect of n 33 Effect of d PERFORMANCE OF OUR METHOD Experiments on Synthetic datasets • Query time Effect of n 34 Effect of d PERFORMANCE OF OUR METHOD Experiments on Synthetic datasets • Maximum regret ratio Effect of large k 35 Effect of k PERFORMANCE OF OUR METHOD Experiments on Synthetic datasets • Query time Effect of large k 36 Effect of k 8. CONCLUSION • We studied a k-regret query in this paper. • We proposed a set of happy points, a set of candidate points for the k-regret query, which is much smaller than the number of skyline points for finding the solution of the k-regret query more efficiently and effectively. • We conducted experiments based on both synthetic and real datasets. • Future directions: 37 • Average regret ratio minimization • Interactive version of a k-regret query 38 THANK YOU! GEOGREEDY ALGORITHM 39 GeoGreedy Algorithm GEOGREEDY ALGORITHM An example in 2-d: In the following, we compute a 4-regret query using GeoGreedy algorithm. π5 π1 π4 π2 π3 π6 1 40 1 GEOGREEDY ALGORITHM Line 2 – 4: • π = {π5 , π6 } π6 1 41 1 π5 GEOGREEDY ALGORITHM Line 2 – 4: • π = {π5 , π6 }. Line 5 – 10 (Iteration 1): • Since ππ π2 , π > ππ(π1 , π) and ππ π1 , π < 1, we add π2 in π. π5 π1 1 π1′ π4 π2 π3 1 π6 42 π2′ GEOGREEDY ALGORITHM Line 5 – 10 (Iteration 2): • After Iteration 1, π = {π1 , π5 , π6 }. • We can only compute ππ π2 , π which is less than 1 and we add π1 in π. π5 π1 π4 π2′ π2 π3 π6 1 43 1 STOREDLIST ALGORITHM Stored List Algorithm 44 • Pre-compute the outputs based on GeoGreedy Algorithm for π ∈ [1, π]. • The outputs with a smaller size is a subset of the outputs with a larger size. • Store the outputs of size n in a list based on the order of the selection. STOREDLIST ALGORITHM After two iterations in GeoGreedy Algorithm, the output set π = {π1 , π2 , π5 , π6 }. Since the critical ratio for each of the unselected points is at least 1, we stop GeoGreedy Algorithm and π is the output set with the greatest size. We stored the outputs in a list L which ranks the selected points in terms of the orders they are added into π. That is, πΏ = [π5 , π6 , π1 , π2 ]. 45 When a 3-regret query is called, we returns the set π5 , π6 , π1 . EFFECT OF HAPPY POINTS NBA: maximum regret ratio The result based on π·π ππ¦ππππ 46 The result based on π·βππππ¦ EFFECT OF HAPPY POINTS NBA: query time The result based on π·π ππ¦ππππ 47 The result based on π·βππππ¦ EFFECT OF HAPPY POINTS Color: maximum regret ratio The result based on π·π ππ¦ππππ 48 The result based on π·βππππ¦ EFFECT OF HAPPY POINTS Color: query time The result based on π·π ππ¦ππππ 49 The result based on π·βππππ¦ EFFECT OF HAPPY POINTS Stocks: maximum regret ratio The result based on π·π ππ¦ππππ 50 The result based on π·βππππ¦ EFFECT OF HAPPY POINTS Stocks: query time The result based on π·π ππ¦ππππ 51 The result based on π·βππππ¦ PRELIMINARY Example: πΉ = {π 0.3,0.7 , π(0.5,0.5) , π(0.7,0.3) }, where π(π,π) = π ⋅ πππΊ + π ⋅ π»π. We have π« = ππ , ππ , ππ , ππ . Let πΊ = ππ , ππ . Since ππππ₯ π, π 0.3,0.7 and ππππ₯ π·, π 0.3,0.7 we have ππ π, π 0.3,0.7 = 0.901, = 0.901, 0.901 = 1 − 0.901 = 0. Similarly, 0.845 0.870 0.811 − 0.916 ππ π, π 0.5,0.5 =1− = 0, ππ π, π 0.7,0.3 =1 = 0.115. 52 So, we have ππ π = max 0,0.029,0.115 = 0.115 AN EXAMPLE IN 2-D Points (normalized): • π1 , π2 , π3 , π4 , π5 , π6 π5 π£π1 π1 π4 π5 π1 π4 π2 π3 π6 π2 π3 π6 π£π2 1 53 1