Slides

advertisement
RankSQL: Query Algebra and Optimization for
Relational Top-k Queries
AUTHORS:
Chengkai Li
Kevin Chen-Chuan Chang
Ihab F. Ilyas
Sumin Song
Presenter:
Roman Yarovoy
October 3, 2007
1
Before RankSQL
• Ranking (top-k) queries: Query result is
sorted by rank and limited to top k results.
• Support for ranking was lacking from
RDBMS.
• Previously, isolated cases of top-k query
processing were studied.
• No way to integrate top-k operations with
other relational operations.
2
Previous (traditional) approach
•
Query processing without ranking support:
1. Evaluate select-project-join (SPJ) query and
materialize the result.
2. Sort the result according to a given ranking function.
3. Take only top k tuples.
•
Associated problems:
–
No interest in total order of all the results.
–
Evaluating ranking function(s) can be expensive.
3
Key contribution
Li et al. proposed:
Extending relational algebra to support
ranking as a first-class database
construct.
Consequence: Rank-aware relational query engine 
Rank-aware query optimization.
4
Top-k query: Example 1
• R
• T
TID
a1
a2
p1
p2
p3
TID
b1
b2
p4
p5
r1
27
50
0.5
0.3
0.25
t1
47
55
0.5
0.1
r2
17
60
0.6
0.5
0.65
t2
66
65
0.5
0.7
r3
47
90
0.7
0.7
0.95
t3
27
15
0.5
0.8
r4
87
10
0.8
0.1
0.35
t4
99
95
0.5
0.2
r5
47
70
0.9
0.1
0.75
5
Example 1 (cont’d)
SELECT *
FROM R r, T t
WHERE r.a1=t.b1 AND r.a2>t.b2
ORDER-BY p1+p2+p3+p4+p5
LIMIT 2
(where F = p1+p2+p3+p4+p5)
TID
a1
a2
b1
b2
F
r3/t1
47
90
47
55
2.95
r1/t3
27
50
27
15
2.35
r5/t1
47
70
47
55
2.35
6
Rank-relational algebra
•
•
•
There was no way to express such query in relational
algebra.
Extend relational algebra by adding rank as a first-class
operation.
Based on the observations of first-class constructs (eg.
selection), two requirements are needed to support
ranking:
1.
2.
Splitting – Predicate-by-predicate rank evaluation.
Interleaving – Swapping rank operator with other
operators (i.e. ranking is not only applied after filtering).
7
Ranking Principle
• Def: Given a ranking function F and a set of evaluated
predicates P={p1, p2, … , pn}, maximal-possible score of a
tuple t is defined as:
• Ranking Principle: If FP[t1] > FP[t2], then t1 must be
ranked before t2.
8
Rank-Relation
Def: For monotonic scoring function F(p1, …, pn) and a
subset P of {p1, …, pn}, a relation R augmented with
ranking induced by P is called a rank-relation, denoted
by RP.
• Implicit attribute of RP is the score of tuple t, that is
FP[t].
• Order relationship of RP :
– For all t1, t2 Є RP : t1 < RP t2 ↔ FP[t1] < FP[t2]
9
Operators of rank-relations
• Rank (or μ) operator “adds” a predicate p to set P.
μp(RP) ≡ R P U{p}.
• Example 2: μp1(R{p2}) ≡ R{p1, p2}, where
– i.e.
F=∑(p1, p2, p3).
TID
a1
a2
p1
p2
p3
r3
47
90
0.7
0.7
0.95
2.4
r2
17
60
0.6
0.5
0.65
2.1
r5
47
70
0.9
0.1
0.75
2.0
r4
87
10
0.8
0.1
0.35
1.9
r1
27
50
0.5
0.3
0.25
1.8
F{p1, p2}
10
Extended operators
11
Example 3: Extended Join
πa1,a2,b2(σc (R{p1, p2 p3} JOIN T{p4, p5}))
SELECT r.a1, r.a2, t.b1
FROM R r, T t
WHERE c
ORDER-BY F
LIMIT 2
TID
a1
a2
b2
F {p1, p2, p3, p4, p5}
r2/t4 17 60 99
2.45
r4/t4 87 10 99
1.95
r1/t4 27 50 99
1.75
(F = ∑ P and c = r.a1+r.a2 < t.b1)
12
Extended operators (cont’d)
• Note:
– Cartesian product is defined similarly to join,
but not discussed in the paper.
– Projection operator π has not changed.
– Computation is based on both Boolean and
ranking logical properties.
– Perform Boolean operations and maintain
the order induced by all given ranking
predicates.
13
Equivalence relations
•
•
•
In the extended rank-relational model, ranking is a firstclass construct.
Can derive algebraic equivalences from the definitions
of operators (Proofs are omitted).
Example 4:
1. σc(RP) ≡ (σcR)P
2. RP1 ∩ TP2 ≡ (R ∩ T)P1 U P2
•
Thus, we can interleave the rank operator with other
operators (i.e. push μ down across operators).
14
Equivalence relations (cont’d)
15
Equivalence relations (cont’d)
• Note:
– Proposition 1 states that ranking can be done in
stages (i.e. one predicate at the time).
– By Propositions 2, 3, and 4, the relations hold
commutative and associative laws.
– By Propositions 4 and 5, μ can be swapped with
other operators.
16
Incremental execution
• Blocking operators (eg. sort) lead to materialization of
intermediate results.
• Goal: To avoid materialization and implement a pipelining
execution strategy.
• We want to split rank computation into stages and to
reduce the number of tuples considered in the upcoming
stages.
• We can output (i.e. advance to the next stage) a tuple t,
whenever t has a score which is greater or equal to the
score of any future tuple t′′ .
17
Incremental execution (cont’d)
• Apply μp to RP and maintain priority queue ordered by
P U{p}.
– Let X = set of tuples from preceding stage.
• Draw t′ from X.
• If FP U{p}[t] ≥ FP[t′] and
future t′′ drawn from x,
FP[t′] ≥ FP[t′′]
for any
then FP U{p}[t] ≥ FP U{p}[t′′] and t can be output
(proceed to next stage).
18
Example 5: Top 2 of W
• Given F = AVG(p6, p7, p8)
μp7
• idxScanp6(W)
TID x p6 p7 p8
F
3
0.9
0.8
0.2
7
0.8
0.7
0.1
w3
5
0.7
w4
1
0.5
w1
w2
0.6
0.4
0.1
0.9
μp8
TID
x
F
29/30
w1
3
14/15
w2
9/10
TID
x
F
9/10
w1
3
19/30
7
5/6
w4
1
3/5
w3
5
23/30
w2
7
8/15
w4
1
19/30
w3
5
7/15
5/6
19
Different evaluation plans
• There exist algorithms to implement rank-aware
operators as well as incremental evaluation.
• Efficiency of query evaluation will now depend not
only on the regular operators, but also on the rankaware operators.
• Due to algebraic equivalence laws, we can define
additional evaluation plans.
• Hence, we want a query optimizer to take additional
execution plans into consideration.
20
Rank-aware optimizer
• Extended algebra  Extended search space.
• Impact on enumeration algorithm:
– Li et al. designed a 2-dimension enumeration algorithm:
Dimension 1 = Join size, Dimension 2 = Ranking
predicates.
– The algorithm is exponential in both dimensions.
– Heuristics applied to reduce search space.
• Impact on cost model:
– For ranking queries, it is more difficult to estimate the
query cardinality of the intermediate results, whose
accuracy is the core of the cost model.
– Authors proposed to estimate cardinality by randomly
sampling tuples.
21
Critique
• Erroneous examples.
• No example of “tie-breaking” function.
• Bad explanation of incremental evaluation.
22
Future research directions
• Cardinality estimation: New/improved
techniques for random sampling over joins.
• Dynamically determined/chosen k.
• Exploring physical properties of rank-aware
execution plans.
23
Download