Preference Queries over Sets Xi Zhang Jan Chomicki SUNY at Buffalo April 15, 2011 Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 1 / 21 Tuple Preferences vs Set Preferences Tuple Preferences Well known preferences: top-k, skyline etc. Set Preferences Preferences between sets of tuples. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 2 / 21 Motivating Example Alice is buying 3 books as gifts. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Title a1 a2 a3 a4 a5 a6 a7 a8 ... Genre sci-fi biography sci-fi romance sci-fi romance biography sci-fi ... Preference Queries over Sets Rating 5.0 4.8 4.5 4.4 4.3 4.2 4.0 3.5 ... Price $15.00 $20.00 $25.00 $10.00 $15.00 $12.00 $18.00 $18.00 ... Vendor Amazon B&N Amazon B&N Amazon B&N Amazon Amazon ... April 15, 2011 3 / 21 Motivating Example Alice is buying 3 books as gifts. Title a1 a2 a3 a4 a5 a6 a7 a8 ... Genre sci-fi biography sci-fi romance sci-fi romance biography sci-fi ... Rating 5.0 4.8 4.5 4.4 4.3 4.2 4.0 3.5 ... Price $15.00 $20.00 $25.00 $10.00 $15.00 $12.00 $18.00 $18.00 ... Vendor Amazon B&N Amazon B&N Amazon B&N Amazon Amazon ... She has the following wishes... (C1) Spend as little money as possible. (C2) Get one sci-fi book. (C3) Prioritize (C2) over (C1) Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 3 / 21 Motivating Example Alice is buying 3 books as gifts. Title a1 a2 a3 a4 a5 a6 a7 a8 ... Genre sci-fi biography sci-fi romance sci-fi romance biography sci-fi ... Rating 5.0 4.8 4.5 4.4 4.3 4.2 4.0 3.5 ... Price $15.00 $20.00 $25.00 $10.00 $15.00 $12.00 $18.00 $18.00 ... Vendor Amazon B&N Amazon B&N Amazon B&N Amazon Amazon ... She has the following wishes... (C1) Spend as little money as possible. the cheapest 3 books (C2) Get one sci-fi book. (C3) Prioritize (C2) over (C1) Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 3 / 21 Tuple Preference - Winnow (Chomicki [Cho03]) Tuple Preference: t1 is preferred to t2 t1 ¡C t2 ô Xi Zhang, Jan Chomicki (SUNY at Buffalo) t1 .rating ’sci-fi’ ^ t1.price t2.price Preference Queries over Sets April 15, 2011 4 / 21 Tuple Preference - Winnow (Chomicki [Cho03]) Tuple Preference: t1 is preferred to t2 t1 ¡C t2 ô t1 .rating ’sci-fi’ ^ t1.price t2.price Winnow Operator ωC : return all tuples in a database relation r for which there is no preferred tuple in r ωC prq tt P r| Xi Zhang, Jan Chomicki (SUNY at Buffalo) Dt1 P r.t1 ¡C tu Preference Queries over Sets April 15, 2011 4 / 21 Tuple Preference - Winnow (Chomicki [Cho03]) Tuple Preference: t1 is preferred to t2 t1 ¡C t2 ô t1 .rating ’sci-fi’ ^ t1.price t2.price Winnow Operator ωC : return all tuples in a database relation r for which there is no preferred tuple in r ωC prq tt P r| Dt1 P r.t1 ¡C tu Set Preference - tuple set tt1 , t2 , . . . , tk u is preferred to tuple set tt11, t12, . . . , t1k u Fixed cardinality (k) assumption Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 4 / 21 Profile-based Set Preference k-subsets: subsets of relation r, with fixed cardinality k Set Pref. (C1) (C2) (C3) Quantities of Interest total cost # of sci-fi books total cost, # of sci-fi books Xi Zhang, Jan Chomicki (SUNY at Buffalo) Desired Value or Order Preference Queries over Sets 1 (C2)(C1) April 15, 2011 5 / 21 Profile-based Set Preference k-subsets: subsets of relation r, with fixed cardinality k Set Pref. (C1) (C2) (C3) Quantities of Interest total cost # of sci-fi books total cost, # of sci-fi books Desired Value or Order 1 (C2)(C1) features F1 , F2 , . . . , Fm Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 5 / 21 Profile-based Set Preference k-subsets: subsets of relation r, with fixed cardinality k Set Pref. (C1) (C2) (C3) Quantities of Interest total cost # of sci-fi books total cost, # of sci-fi books Desired Value or Order features F1 , F2 , . . . , Fm profile Xi Zhang, Jan Chomicki (SUNY at Buffalo) 1 (C2)(C1) preferences over profiles xF1, F2, . . . , Fmy Preference Queries over Sets April 15, 2011 5 / 21 Features and Set Preferences Features: mini-SQL Queries F1 F2 SELECT sum(price) FROM $S SELECT count(title) FROM $S WHERE genre=’sci-fi’ Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 6 / 21 Features and Set Preferences Features: mini-SQL Queries F1 F2 SELECT sum(price) FROM $S SELECT count(title) FROM $S WHERE genre=’sci-fi’ Set Preferences s1 s1 s1 Ï Ï Ï C1 C2 C3 s2 s2 s2 ô F1 ps1 q F1 ps2 q. ô F2 ps1 q 1 ^ F2 ps2 q 1. ô pF2 ps1 q 1 ^ F2 ps2 q 1q _pF2 ps1 q 1 ^ F2 ps2 q 1 ^ F1 ps1 q F1 ps2 qq _pF2 ps1 q 1 ^ F2 ps2 q 1 ^ F1 ps1 q F1 ps2 qq. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 6 / 21 Additive Features Assume additive features in this talk. Efficient optimizations exist for additive features. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 7 / 21 Additive Features Assume additive features in this talk. Efficient optimizations exist for additive features. Additive Feature A feature F is additive iff for every subset s of relation r, and every tPrs F ps Y ttuq F psq f ptq F pttuq f ptq Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 7 / 21 Computing the “Best” Sets Naive Algorithm Generate all k-subsets of relation r and compute their profiles. Run the winnow operator over all the profiles and get the “best” profiles. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 8 / 21 Computing the “Best” Sets Naive Algorithm Generate all k-subsets of relation r and compute their profiles. Run the winnow operator over all the profiles and get the “best” profiles. Too many candidate k-subsets! Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 8 / 21 Computing the “Best” Sets Naive Algorithm Generate all k-subsets of relation r and compute their profiles. Run the winnow operator over all the profiles and get the “best” profiles. Too many candidate k-subsets! Example k 3, |r| 1000 Xi Zhang, Jan Chomicki (SUNY at Buffalo) ñ 1000 3 166167000 candidate subsets Preference Queries over Sets April 15, 2011 8 / 21 Superpreference Optimization Goal: Generate as few candidate k-subsets as possible Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 9 / 21 Superpreference Optimization Goal: Generate as few candidate k-subsets as possible “Superpreference” Find a “superpreference” (¡ ) over the relation r, such that t1 ¡ t2 ô s1 Y tt1 u ÏC s1 Y tt2 u. for every (k-1)-subset s1 of r containing neither t1 nor t2 . Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 9 / 21 Superpreference Optimization Goal: Generate as few candidate k-subsets as possible “Superpreference” Find a “superpreference” (¡ ) over the relation r, such that t1 ¡ t2 ô s1 Y tt1 u ÏC s1 Y tt2 u. for every (k-1)-subset s1 of r containing neither t1 nor t2 . Pruning Condition Let coverptq tt1 P r|t1 ¡ tu, i.e. all tuples preferred to t. Dt P s, coverptq s ñ s is not a best subset. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 9 / 21 Superpreference Optimization Goal: Generate as few candidate k-subsets as possible “Superpreference” Find a “superpreference” (¡ ) over the relation r, such that t1 ¡ t2 ô s1 Y tt1 u ÏC s1 Y tt2 u. for every (k-1)-subset s1 of r containing neither t1 nor t2 . Pruning Condition Let coverptq tt1 P r|t1 ¡ tu, i.e. all tuples preferred to t. Dt P s, coverptq s ñ s is not a best subset. Systematic Construction Possible if all the features are additive. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 9 / 21 Example - “Superpreference” Set preference: (C5)X(C6) (C5) Alice wants to spend as little money as possible on sci-fi books. (C6) Alice wants the average rating of books to be as high as possible. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 10 / 21 Example - “Superpreference” Set preference: (C5)X(C6) (C5) Alice wants to spend as little money as possible on sci-fi books. (C6) Alice wants the average rating of books to be as high as possible. Features: F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’ F6 SELECT avg(rating) FROM $S Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 10 / 21 Example - “Superpreference” Set preference: (C5)X(C6) (C5) Alice wants to spend as little money as possible on sci-fi books. (C6) Alice wants the average rating of books to be as high as possible. Features: F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’ F6 SELECT avg(rating) FROM $S Profile preference: s1 ÏC s2 F5ps1q F5ps2q ^ F6ps1q ¡ F6ps2q Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 10 / 21 Example - “Superpreference” Set preference: (C5)X(C6) (C5) Alice wants to spend as little money as possible on sci-fi books. (C6) Alice wants the average rating of books to be as high as possible. Features: F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’ F6 SELECT avg(rating) FROM $S Profile preference: ÏC s2 F5ps1q F5ps2q ^ F6ps1q ¡ F6ps2q “Superpreference” formula (assuming price ¡ 0) s1 t1 ¡ t2 Xi Zhang, Jan Chomicki (SUNY at Buffalo) ¡ t2 .rating ^ t2 .genre ’sci-fi’ ^pt1 .price t2 .price _ t1 .genre ’sci-fi’q. t1 .rating Preference Queries over Sets April 15, 2011 10 / 21 M-relation Goal Avoid redundancy in generating profiles Book: Title a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 Genre sci-fi biography sci-fi romance sci-fi romance biography sci-fi romance history Rating 5.0 4.8 4.5 4.4 4.3 4.2 4.0 3.5 4.0 4.0 Price $15.00 $20.00 $25.00 $10.00 $15.00 $12.00 $18.00 $18.00 $20.00 $19.00 Vendor Amazon B&N Amazon B&N Amazon B&N Amazon Amazon Amazon Amazon Redundancy Example = = pt pt prof ileΓ a1 , a2 , a7 prof ileΓ a1 , a2 , a9 15.00, 13.8 x y uq uq Exchangeable Tuples a7 , a9 zt u prof ileΓ ps Y ta7 uq prof ileΓ ps Y ta9 uq For any 2-subset s of Book a7 , a9 Profile Γ tF5 , F6 u F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’ F6 SELECT sum(rating) FROM $S Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 11 / 21 M-relation Generation Book: Title ... a7 a8 a9 a10 M-relation: Genre ... biography sci-fi romance history Rating ... 4.0 3.5 4.0 4.0 Price ... $18.00 $18.00 $20.00 $19.00 Vendor ... Amazon Amazon Amazon Amazon ... m7,9,10 m8 A5 ... $0.00 $18.00 A6 ... 4.0 3.5 Acnt ... 3 1 Profile Γ tF5 , F6 u F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’ F6 SELECT sum(rating) FROM $S M-relation Generation SQL SELECT CASE WHEN r.genre=’sci-fi’ THEN r.price ELSE 0 END AS A5 , r.rating AS A6 , count(*) AS Acnt FROM r GROUP BY A5 , A6 Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 12 / 21 Set Preference via M-relation Set preference over the original relations M-relation Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets ñ set preference over its April 15, 2011 13 / 21 Conclusions and Future Work Conclusions A formal “logic + SQL” framework for specifying restricted set preferences and implementing set preference queries. Effective optimizations yielding improvements of several orders of magnitude. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 14 / 21 Conclusions and Future Work Conclusions A formal “logic + SQL” framework for specifying restricted set preferences and implementing set preference queries. Effective optimizations yielding improvements of several orders of magnitude. Future Work Query optimization for non-additive features. Set preference elicitation. Embedding “best-subset” generation in relational query languages. Additional set ranking or browsing techniques for result navigation. Relaxing the fixed cardinality assumption: Superpreference depends on it assumption while M-relation does not. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 14 / 21 Thank you! Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 15 / 21 Dataset and Set Preferences Dataset 8000 book quotes from Amazon Schema: xtitle, genre, rating, price, vendory Features SELECT sum(price) FROM $S WHERE genre=’sci-fi’ SELECT sum(rating) FROM $S SELECT sum(rating) FROM $S WHERE genre=’sci-fi’ SELECT sum(price) FROM $S SELECT count(title) FROM $S WHERE genre=’sci-fi’ F10 SELECT sum(rating) FROM $S WHERE rating >= 4.0 F5 F6 F7 F8 F9 and price < 20.00 Set Preferences Set Pref. Name SP1 SP2 SP3 Profile Schema Γ F5 , F6 F9 , F10 F11 , F12 x x x Xi Zhang, Jan Chomicki (SUNY at Buffalo) y y y Profile Pref. Formula C F5 s1 F5 s2 F6 s1 F6 s2 F9 s1 F9 s2 F10 s1 F10 s2 F11 s1 F11 s2 F12 s1 F12 s2 p q p q^ p q¡ p q p q¡ p q^ p q p q p q¡ p q^ p q¡ p q Preference Queries over Sets April 15, 2011 16 / 21 Performance of Different Algorithms 1e+009 1e+008 1e+007 1e+006 100000 10000 1000 100 10 100 MREL SM 300 Xi Zhang, Jan Chomicki (SUNY at Buffalo) 1 500 700 Input Size (n) MREL SM 900 1K 300 500 700 Input Size (n) 0.1 0.01 0.001 0.0001 1e-005 SUPER MREL SM MS SUPER MREL SM MS MS 1 Generate Percentage Set Pref 2 # of sets in Generation (g) NAIVE SUPER 1e+009 1e+008 1e+007 1e+006 100000 10000 1000 100 10 100 MS Generate Percentage Set Pref 1 # of sets in Generation (g) NAIVE SUPER 900 1K Preference Queries over Sets 0.1 0.01 0.001 0.0001 1e-005 April 15, 2011 17 / 21 Performance of Different Algorithms Set Pref 3 Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 18 / 21 Related Work Guha et al. [GGK 03] Problem: find an optimal subset of tuples Set property: aggrpAq parameter Set preference: objective function min { max Binshtok et al. [BBS 07] Problem: find a optimal subset of items Set property: predicate Set preference: TCP-net or scoring function Consider subsets of any cardinality, in the fixed-cardinality case, it is subsumed by our framework desJardins and Wagstaff [dW05] Consider fixed-cardinality set preference Consider two set features: diversity and depth Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 19 / 21 References I Maxim Binshtok, Ronen I. Brafman, Solomon Eyal Shimony, Ajay Mani, and Craig Boutilier. Computing optimal subsets. In AAAI, pages 1231–1236, 2007. Jan Chomicki. Preference formulas in relational queries. ACM Trans. Database Syst., 28(4):427–466, 2003. Marie desJardins and Kiri Wagstaff. DD-pref: A language for expressing preferences over sets. In AAAI, pages 620–626, 2005. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 20 / 21 References II Sudipto Guha, Dimitrios Gunopulos, Nick Koudas, Divesh Srivastava, and Michail Vlachos. Efficient approximation of optimization queries under parametric aggregation constraints. In VLDB, pages 778–789, 2003. Xi Zhang, Jan Chomicki (SUNY at Buffalo) Preference Queries over Sets April 15, 2011 21 / 21