BEST-EFFORT TOP-K QUERY PROCESSING UNDER BUDGETARY CONSTRAINTS Steven Williams

advertisement
BEST-EFFORT TOP-K QUERY
PROCESSING
UNDER BUDGETARY CONSTRAINTS
Steven Williams
Spring 2016
2
OUTLINE
1. Top-k query processing
2. Budgetary Constraints
3. Motivating Example
4. Proposed algorithm
5. Results
6. Questions
3
TOP-K QUERY PROCESSING
• Pre-computed lists over
multiple attributes.
sorted
• Combine scores by
some monotonic
aggregation function.
n
• Two accesses modes:
– sorted access (Cs)
– random access (Cr)
m
• Objective: Compute k
objects with highest
scores.
4
NRA ALGORITHM
R1
highi
f = SUM
R2
a
+ 0.90
d
+ 0.87
b
+ 0.60
a
+ 0.85
c
+ 0.50
f
++0.25
0.25
…
…
…
…
d
0.40
c
0.20
worst score
best score
Top-2
[0.90
[1.75 , 1.75]
1.77]
1.77]
1.37]
[0.87 , 1.47]
mink
candidates
[0.60 , 0.85]
1.45]
[0.50 , 0.75]
[0.25 , 0.75]
mink > best score of candidates
5
BUDGET CONSTRAINTS
Top-2
Costs
+ 1Access
Cs
+ 2 Cs
+1C
s
s
Sorted
Access
Cost +
=2
CC
s
Random Access Cost +
= 2CC
r
+ 1 Cs
s
Cs = 1, Cr = 3
f = SUM
Budget = 10
12
2 C = 12
4
6
8
10
NRA:B,
Given budget
+1C
1C
+2
precision = 0.50
maximize
result
quality
+1C
+2C
s
sr
s
s
+ 1 Cs
+1
2 Crs
+ 1 Cs
+ 1 Cr
s
TA: 4
6
1
2
5
7
3 Cs + 4
6
1
2
5
7
3 Cr = 28
precision = 0
6
MOTIVATING
EXAMPLE
USELESS
Q
7
PROPOSED ALGORITHMS
• Sorted Accesses
• Efficient Plan
• Solution with Adaptive a
• Sorted and Random Accesses
• Efficient Plan
• Solution with Adaptive a
8
RESULTS UNDER LIMITED BUDGET
K results for unlimited
budget
Results for limited budget
9
EFFICIENT PLAN – SORTED
ACCESS
π†π¨πšπ₯:
Find a plan t such that: π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯ 𝑑 ∈ 𝑇 Λ
Plans for B = 10
𝑑
| 𝑅𝑑 ∩ 𝑅𝑒π‘₯π‘Žπ‘π‘‘ |
| 𝑅𝑑 |
≤𝐡
Plan:
{ R1 , 4 }, { R2 , 6
}
denoted Ropt
10
OBSERVATIONS
B = 180
1. Prefer high scores
2. Prefer large score reductions
Uniform allocation
Non-uniform allocation
11
SCORE UTILITIES
Score reduction:
Score gain:
π‘’π‘‘π‘–π‘™π‘Žπ‘ 
1
𝐿𝑖 , π‘₯ =
∗
π‘₯
π‘’π‘‘π‘–π‘™π‘Žπ‘  𝑅1 , 3 =
π‘π‘œπ‘ π‘– +π‘₯
π‘ π‘π‘œπ‘Ÿπ‘’π‘– (𝑗) x = 3
π‘’π‘‘π‘–π‘™π‘ π‘Ÿ 𝐿𝑖 , π‘₯ = β„Žπ‘–π‘”β„Žπ‘– − π‘ π‘π‘œπ‘Ÿπ‘’π‘– (π‘π‘œπ‘ π‘– + π‘₯)
𝑗=π‘π‘œπ‘ π‘–
1
∗ (0.95 + 0.93 + 0.92)
3
= 0.93
π‘’π‘‘π‘–π‘™π‘ π‘Ÿ 𝑅1 , 3 = 0.95 − 0.92
= 0.03
12
OPTIMIZATION PROBLEM
𝑒𝑑𝑖𝑙 𝐿𝑖 , π‘₯ = 𝛼 ∗ π‘”π‘Žπ‘–π‘› + 1 − 𝛼 ∗ π‘Ÿπ‘’π‘‘π‘’π‘π‘‘π‘–π‘œπ‘›
π‘š
π‘šπ‘Žπ‘₯π‘–π‘šπ‘–π‘§π‘’
gain
)a)
reduction
(1-a(
𝑒𝑑𝑖𝑙(𝐿𝑖 , π‘₯)
𝑖=1
subject to:
π‘š
𝑏𝑖 = 𝑏
𝑖=1
time
13
ADAPTIVE 𝛼
1
0.9
0.8
0.7
0.6
pΜ‚
pΜ‚ k 0.5k
0.4
0.3
0.2
0.1
0
• 𝛼 is 1 until we’ve seen k
objects
• Afterwards, 𝛼 is the
average probability of
the candidate objects
in the candidate set to
get into the top-k.
0
500
1000
1500
2000
spent budget
TREC query, k=100
2500
3000
3500
a ο€½ pˆ
k
ο€½
1
|cand .set|
οƒ₯
pk
cοƒŽcand. set
(c)
14
RANDOM ACCESSES
When to switch from SA to RA?
Gathering
with Sorted
)a(
Not enough good
candidates, RA is
wasted
Probing
with
Random
(1-a(
Not enough RAs to
prune the
candidates
time
15
RANDOM ACCESSES
• Switch from Sorted to Random:
• R = (1 – alpha) * S
• S – total cost of sorted accesses
• R – total cost of random accesses
• Which items to access?
• maximize expected score
S+R>B
16
RESULTS
17
EVALUATION METHODS
•percentage of optimal precision
precision a lg
precision opt
Ropt
•SME
Ralg
Rexact
Ropt
RESULTS – SORTED
ACCESS
percentage of Optimal Precision
TREC,
k=100
90%
NRA
KBA
Fair
Ranking
80%
70%
60%
50%
500
1000
2000
3000
4000
Budget (#SA)
•Less budget, more improvement
5000
19
RESULTS – VARIED K
percentage of Optimal Precision
IMDB, B=400
90%
NRA
80%
KBA
70%
Fair
60%
Ranking
50%
40%
30%
20%
20
50
k
•Lower K, more improvement.
100
20
RESULTS – NUMBER OF
LISTS
percentage of Optimal Precision
Zipf, K=100,
B=4000
100%
NRA
KBA
Fair
80%
Ranking
60%
40%
2
3
4
Number of Lists
•More lists, more improvement.
5
6
21
70%
65%
60%
55%
50%
45%
40%
percentage of Optimal Precision
75%
80%
70%
SA (Ranking)
CA SA
(Ranking)
CA
LAST
60%
Adaptive_Expected
LAST
Adaptive_Expected
50%
40%
500
percentage of Optimal Precision
TRE
C,
k=10
0,Cr=
10
percentage of Optimal Precision
RESULTS – RANDOM
ACCESSES
1000
500
2000
1000
3000
2000
Budget
4000
5000
3000
4000
5000
Budget
80%
70%
TREC,
CA
K=100,
LAST
Cr=100
Adaptive_Expected
SA (Ranking)
60%
50%
40%
500
1000
2000
3000
Budget
4000
5000
22
QUESTIONS
Download