Preference Queries over Sets Xi Zhang Jan Chomicki April 15, 2011

advertisement
Preference Queries over Sets
Xi Zhang
Jan Chomicki
SUNY at Buffalo
April 15, 2011
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
1 / 21
Tuple Preferences vs Set Preferences
Tuple Preferences
Well known preferences: top-k, skyline etc.
Set Preferences
Preferences between sets of tuples.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
2 / 21
Motivating Example
Alice is buying 3
books as gifts.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Title
a1
a2
a3
a4
a5
a6
a7
a8
...
Genre
sci-fi
biography
sci-fi
romance
sci-fi
romance
biography
sci-fi
...
Preference Queries over Sets
Rating
5.0
4.8
4.5
4.4
4.3
4.2
4.0
3.5
...
Price
$15.00
$20.00
$25.00
$10.00
$15.00
$12.00
$18.00
$18.00
...
Vendor
Amazon
B&N
Amazon
B&N
Amazon
B&N
Amazon
Amazon
...
April 15, 2011
3 / 21
Motivating Example
Alice is buying 3
books as gifts.
Title
a1
a2
a3
a4
a5
a6
a7
a8
...
Genre
sci-fi
biography
sci-fi
romance
sci-fi
romance
biography
sci-fi
...
Rating
5.0
4.8
4.5
4.4
4.3
4.2
4.0
3.5
...
Price
$15.00
$20.00
$25.00
$10.00
$15.00
$12.00
$18.00
$18.00
...
Vendor
Amazon
B&N
Amazon
B&N
Amazon
B&N
Amazon
Amazon
...
She has the following wishes...
(C1) Spend as little money as possible.
(C2) Get one sci-fi book.
(C3) Prioritize (C2) over (C1)
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
3 / 21
Motivating Example
Alice is buying 3
books as gifts.
Title
a1
a2
a3
a4
a5
a6
a7
a8
...
Genre
sci-fi
biography
sci-fi
romance
sci-fi
romance
biography
sci-fi
...
Rating
5.0
4.8
4.5
4.4
4.3
4.2
4.0
3.5
...
Price
$15.00
$20.00
$25.00
$10.00
$15.00
$12.00
$18.00
$18.00
...
Vendor
Amazon
B&N
Amazon
B&N
Amazon
B&N
Amazon
Amazon
...
She has the following wishes...
(C1) Spend as little money as possible.
the cheapest 3 books
(C2) Get one sci-fi book.
(C3) Prioritize (C2) over (C1)
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
3 / 21
Tuple Preference - Winnow (Chomicki [Cho03])
Tuple Preference: t1 is preferred to t2
t1
¡C t2
ô
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
t1 .rating
’sci-fi’ ^ t1.price t2.price
Preference Queries over Sets
April 15, 2011
4 / 21
Tuple Preference - Winnow (Chomicki [Cho03])
Tuple Preference: t1 is preferred to t2
t1
¡C t2
ô
t1 .rating
’sci-fi’ ^ t1.price t2.price
Winnow Operator ωC : return all tuples in a database relation r for
which there is no preferred tuple in r
ωC prq tt P r|
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Dt1 P r.t1 ¡C tu
Preference Queries over Sets
April 15, 2011
4 / 21
Tuple Preference - Winnow (Chomicki [Cho03])
Tuple Preference: t1 is preferred to t2
t1
¡C t2
ô
t1 .rating
’sci-fi’ ^ t1.price t2.price
Winnow Operator ωC : return all tuples in a database relation r for
which there is no preferred tuple in r
ωC prq tt P r|
Dt1 P r.t1 ¡C tu
Set Preference - tuple set tt1 , t2 , . . . , tk u is preferred to tuple set
tt11, t12, . . . , t1k u
Fixed cardinality (k) assumption
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
4 / 21
Profile-based Set Preference
k-subsets: subsets of relation r, with fixed cardinality k
Set Pref.
(C1)
(C2)
(C3)
Quantities of Interest
total cost
# of sci-fi books
total cost, # of sci-fi books
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Desired Value or Order
Preference Queries over Sets
1
(C2)™(C1)
April 15, 2011
5 / 21
Profile-based Set Preference
k-subsets: subsets of relation r, with fixed cardinality k
Set Pref.
(C1)
(C2)
(C3)
Quantities of Interest
total cost
# of sci-fi books
total cost, # of sci-fi books
Desired Value or Order
1
(C2)™(C1)
features F1 , F2 , . . . , Fm
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
5 / 21
Profile-based Set Preference
k-subsets: subsets of relation r, with fixed cardinality k
Set Pref.
(C1)
(C2)
(C3)
Quantities of Interest
total cost
# of sci-fi books
total cost, # of sci-fi books
Desired Value or Order
features F1 , F2 , . . . , Fm
profile
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
1
(C2)™(C1)
preferences
over profiles
xF1, F2, . . . , Fmy
Preference Queries over Sets
April 15, 2011
5 / 21
Features and Set Preferences
Features: mini-SQL Queries
F1
F2
SELECT sum(price) FROM $S
SELECT count(title) FROM $S WHERE genre=’sci-fi’
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
6 / 21
Features and Set Preferences
Features: mini-SQL Queries
F1
F2
SELECT sum(price) FROM $S
SELECT count(title) FROM $S WHERE genre=’sci-fi’
Set Preferences
s1
s1
s1
Ï
Ï
Ï
C1
C2
C3
s2
s2
s2
ô F1 ps1 q F1 ps2 q.
ô F2 ps1 q 1 ^ F2 ps2 q 1.
ô pF2 ps1 q 1 ^ F2 ps2 q 1q
_pF2 ps1 q 1 ^ F2 ps2 q 1 ^ F1 ps1 q F1 ps2 qq
_pF2 ps1 q 1 ^ F2 ps2 q 1 ^ F1 ps1 q F1 ps2 qq.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
6 / 21
Additive Features
Assume additive features in this talk.
Efficient optimizations exist for additive features.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
7 / 21
Additive Features
Assume additive features in this talk.
Efficient optimizations exist for additive features.
Additive Feature
A feature F is additive iff for every subset s of relation r, and every
tPrs
F ps Y ttuq F psq f ptq
F pttuq f ptq
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
7 / 21
Computing the “Best” Sets
Naive Algorithm
Generate all k-subsets of relation r and compute their profiles.
Run the winnow operator over all the profiles and get the “best”
profiles.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
8 / 21
Computing the “Best” Sets
Naive Algorithm
Generate all k-subsets of relation r and compute their profiles.
Run the winnow operator over all the profiles and get the “best”
profiles.
Too many candidate k-subsets!
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
8 / 21
Computing the “Best” Sets
Naive Algorithm
Generate all k-subsets of relation r and compute their profiles.
Run the winnow operator over all the profiles and get the “best”
profiles.
Too many candidate k-subsets!
Example
k
3, |r| 1000
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
ñ
1000
3
166167000 candidate subsets
Preference Queries over Sets
April 15, 2011
8 / 21
Superpreference Optimization
Goal: Generate as few candidate k-subsets as possible
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
9 / 21
Superpreference Optimization
Goal: Generate as few candidate k-subsets as possible
“Superpreference”
Find a “superpreference” (¡ ) over the relation r, such that
t1
¡
t2
ô
s1 Y tt1 u ÏC s1 Y tt2 u.
for every (k-1)-subset s1 of r containing neither t1 nor t2 .
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
9 / 21
Superpreference Optimization
Goal: Generate as few candidate k-subsets as possible
“Superpreference”
Find a “superpreference” (¡ ) over the relation r, such that
t1
¡
t2
ô
s1 Y tt1 u ÏC s1 Y tt2 u.
for every (k-1)-subset s1 of r containing neither t1 nor t2 .
Pruning Condition
Let coverptq tt1
P r|t1 ¡ tu, i.e. all tuples preferred to t.
Dt P s, coverptq † s ñ s is not a best subset.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
9 / 21
Superpreference Optimization
Goal: Generate as few candidate k-subsets as possible
“Superpreference”
Find a “superpreference” (¡ ) over the relation r, such that
t1
¡
t2
ô
s1 Y tt1 u ÏC s1 Y tt2 u.
for every (k-1)-subset s1 of r containing neither t1 nor t2 .
Pruning Condition
Let coverptq tt1
P r|t1 ¡ tu, i.e. all tuples preferred to t.
Dt P s, coverptq † s ñ s is not a best subset.
Systematic Construction
Possible if all the features are additive.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
9 / 21
Example - “Superpreference”
Set preference: (C5)X(C6)
(C5) Alice wants to spend as little money as possible on sci-fi books.
(C6) Alice wants the average rating of books to be as high as possible.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
10 / 21
Example - “Superpreference”
Set preference: (C5)X(C6)
(C5) Alice wants to spend as little money as possible on sci-fi books.
(C6) Alice wants the average rating of books to be as high as possible.
Features:
F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’
F6 SELECT avg(rating) FROM $S
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
10 / 21
Example - “Superpreference”
Set preference: (C5)X(C6)
(C5) Alice wants to spend as little money as possible on sci-fi books.
(C6) Alice wants the average rating of books to be as high as possible.
Features:
F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’
F6 SELECT avg(rating) FROM $S
Profile preference:
s1
ÏC s2 F5ps1q F5ps2q ^ F6ps1q ¡ F6ps2q
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
10 / 21
Example - “Superpreference”
Set preference: (C5)X(C6)
(C5) Alice wants to spend as little money as possible on sci-fi books.
(C6) Alice wants the average rating of books to be as high as possible.
Features:
F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’
F6 SELECT avg(rating) FROM $S
Profile preference:
ÏC s2 F5ps1q F5ps2q ^ F6ps1q ¡ F6ps2q
“Superpreference” formula (assuming price ¡ 0)
s1
t1
¡
t2
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
¡ t2 .rating ^ t2 .genre ’sci-fi’
^pt1 .price t2 .price _ t1 .genre ’sci-fi’q.
t1 .rating
Preference Queries over Sets
April 15, 2011
10 / 21
M-relation
Goal
Avoid redundancy in generating profiles
Book:
Title
a1
a2
a3
a4
a5
a6
a7
a8
a9
a10
Genre
sci-fi
biography
sci-fi
romance
sci-fi
romance
biography
sci-fi
romance
history
Rating
5.0
4.8
4.5
4.4
4.3
4.2
4.0
3.5
4.0
4.0
Price
$15.00
$20.00
$25.00
$10.00
$15.00
$12.00
$18.00
$18.00
$20.00
$19.00
Vendor
Amazon
B&N
Amazon
B&N
Amazon
B&N
Amazon
Amazon
Amazon
Amazon
Redundancy Example
=
=
pt
pt
prof ileΓ a1 , a2 , a7
prof ileΓ a1 , a2 , a9
15.00, 13.8
x
y
uq
uq
Exchangeable Tuples a7 , a9
zt
u
prof ileΓ ps Y ta7 uq prof ileΓ ps Y ta9 uq
For any 2-subset s of Book a7 , a9
Profile Γ tF5 , F6 u
F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’
F6 SELECT sum(rating) FROM $S
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
11 / 21
M-relation Generation
Book:
Title
...
a7
a8
a9
a10
M-relation:
Genre
...
biography
sci-fi
romance
history
Rating
...
4.0
3.5
4.0
4.0
Price
...
$18.00
$18.00
$20.00
$19.00
Vendor
...
Amazon
Amazon
Amazon
Amazon
...
m7,9,10
m8
A5
...
$0.00
$18.00
A6
...
4.0
3.5
Acnt
...
3
1
Profile Γ tF5 , F6 u
F5 SELECT sum(price) FROM $S WHERE genre=’sci-fi’
F6 SELECT sum(rating) FROM $S
M-relation Generation SQL
SELECT CASE WHEN r.genre=’sci-fi’ THEN r.price ELSE 0 END AS A5 ,
r.rating AS A6 , count(*) AS Acnt FROM r
GROUP BY A5 , A6
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
12 / 21
Set Preference via M-relation
Set preference over the original relations
M-relation
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
ñ set preference over its
April 15, 2011
13 / 21
Conclusions and Future Work
Conclusions
A formal “logic + SQL” framework for specifying restricted set
preferences and implementing set preference queries.
Effective optimizations yielding improvements of several orders of
magnitude.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
14 / 21
Conclusions and Future Work
Conclusions
A formal “logic + SQL” framework for specifying restricted set
preferences and implementing set preference queries.
Effective optimizations yielding improvements of several orders of
magnitude.
Future Work
Query optimization for non-additive features.
Set preference elicitation.
Embedding “best-subset” generation in relational query languages.
Additional set ranking or browsing techniques for result navigation.
Relaxing the fixed cardinality assumption: Superpreference depends
on it assumption while M-relation does not.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
14 / 21
Thank you!
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
15 / 21
Dataset and Set Preferences
Dataset
8000 book quotes from Amazon
Schema: xtitle, genre, rating, price, vendory
Features
SELECT sum(price) FROM $S WHERE genre=’sci-fi’
SELECT sum(rating) FROM $S
SELECT sum(rating) FROM $S WHERE genre=’sci-fi’
SELECT sum(price) FROM $S
SELECT count(title) FROM $S WHERE genre=’sci-fi’
F10 SELECT sum(rating) FROM $S WHERE rating >= 4.0
F5
F6
F7
F8
F9
and price < 20.00
Set Preferences
Set Pref. Name
SP1
SP2
SP3
Profile Schema Γ
F5 , F6
F9 , F10
F11 , F12
x
x
x
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
y
y
y
Profile Pref. Formula C
F5 s1
F5 s2
F6 s1
F6 s2
F9 s1
F9 s2
F10 s1
F10 s2
F11 s1
F11 s2
F12 s1
F12 s2
p q p q^ p q¡ p q
p q¡ p q^ p q p q
p q¡ p q^ p q¡ p q
Preference Queries over Sets
April 15, 2011
16 / 21
Performance of Different Algorithms
1e+009
1e+008
1e+007
1e+006
100000
10000
1000
100
10
100
MREL
SM
300
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
1
500
700
Input Size (n)
MREL
SM
900 1K
300
500
700
Input Size (n)
0.1
0.01
0.001
0.0001
1e-005
SUPER MREL
SM
MS
SUPER MREL
SM
MS
MS
1
Generate Percentage
Set
Pref 2
# of sets in Generation (g)
NAIVE
SUPER
1e+009
1e+008
1e+007
1e+006
100000
10000
1000
100
10
100
MS
Generate Percentage
Set
Pref 1
# of sets in Generation (g)
NAIVE
SUPER
900 1K
Preference Queries over Sets
0.1
0.01
0.001
0.0001
1e-005
April 15, 2011
17 / 21
Performance of Different Algorithms
Set
Pref 3
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
18 / 21
Related Work
Guha et al. [GGK 03]
Problem: find an optimal subset of tuples
Set property: aggrpAq parameter
Set preference: objective function min { max
Binshtok et al. [BBS 07]
Problem: find a optimal subset of items
Set property: predicate
Set preference: TCP-net or scoring function
Consider subsets of any cardinality, in the fixed-cardinality case, it is
subsumed by our framework
desJardins and Wagstaff [dW05]
Consider fixed-cardinality set preference
Consider two set features: diversity and depth
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
19 / 21
References I
Maxim Binshtok, Ronen I. Brafman, Solomon Eyal Shimony, Ajay
Mani, and Craig Boutilier.
Computing optimal subsets.
In AAAI, pages 1231–1236, 2007.
Jan Chomicki.
Preference formulas in relational queries.
ACM Trans. Database Syst., 28(4):427–466, 2003.
Marie desJardins and Kiri Wagstaff.
DD-pref: A language for expressing preferences over sets.
In AAAI, pages 620–626, 2005.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
20 / 21
References II
Sudipto Guha, Dimitrios Gunopulos, Nick Koudas, Divesh Srivastava,
and Michail Vlachos.
Efficient approximation of optimization queries under parametric
aggregation constraints.
In VLDB, pages 778–789, 2003.
Xi Zhang, Jan Chomicki (SUNY at Buffalo)
Preference Queries over Sets
April 15, 2011
21 / 21
Download