Ranking in DB

advertisement
Ranking in DB
Laks V.S. Lakshmanan
Depf. of CS
UBC
Why ranking in query answering?
1/3
• Mutimedia data – fuzzy querying: e.g., “find top 2
red objects with a soft texture”.
Obj
A
D
C
B
E
4/8/2015
Score
0.9
0.8
0.4
0.3
0.1
Overall score
Combine scores
Obj
D
B
A
E
C
Score
0.85
0.80
0.75
0.65
0.60
2
Why ranking? 2/3
• IR: “find top 5 documents relevant to
`computational’, `neuroscience’ and `brain
theory’.
– IR systems maintain full text indexes; inverted lists of
docs w.r.t. each keyword.
– Same Q/A paradigm as before.
• Buying a home: several criteria – price, location,
area, #BRs, school district. ORDER BY query in
SQL.
• Finding hotels while traveling.
4/8/2015
3
Why ranking? 3/3
• Data stream, e.g., of network flow data: “find 10
users with the max. BW consumption and max.
#packets communicated”. – score may be complex
aggregation of these two measures.
• In a social net, find 5 items tagged as most relevant
to “lawn mowing” and blonging to users socially
close to the seeker.
• And now, find top-k recs (recommender systems).
• etc.
• Fagin et al. – pioneering papers PODS’96, 01,
JCSS 2003. Burgeoned into a field now.
• Focus on middleware algorithm, which given a score
combo. function, computes top-k answers by
probing diff. subsystems (or ranked lists).
4/8/2015
4
Computational model
• Naïve method.
• How to compute top-K efficiently?
• Access methods:
– Sorted access (sequential access) [SA].
– Random access [RA].
• Diff. optimization metrics:
–
–
–
–
Overall running time of algorithm.
SA < RA: minimize RAs.
RA not possible #: avoid RAs.
Combined optimization.
• Has led to a variety of algorithms.
• Memory vs. disk model.
• For the most part, assume score agg. is a monotone
function; use SUM in examples.
4/8/2015
#: typical in IR systems.
5
Fagin’s Algorithm (FA)
• m lists sorted by descending scores.
• Access (SA) all lists in parallel.
– For each new object seen, fetch scores from other
lists by RA. Overall score t(x) = t(x1, …, xm). Store
(obj, score) in set Y.
– Remember each object seen (under SA) in all lists in
set H.
• Repeat until |H| >= K.
• Sort Y in descending order of scores, breaking
ties arbitrarily, and output top K.
4/8/2015
6
Example of FA
L1
H(0.95)
L2
L3
L4
Answers seen in
>=1 list, i.e., Y
unsorted.
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
D(0.80)
J(0.30)
E
F
G
J
Answers seen (under SA) in
all 4 lists, i.e., H.
7
Example of FA
L1
H(0.95)
L2
L3
L4
Answers seen in
>=1 list, i.e., Y
unsorted.
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
D(0.80)
J(0.30)
E
F
G
J
Answers seen (under SA) in
all 4 lists, i.e., H.
8
Example of FA
L1
H(0.95)
L2
L3
L4
Answers seen in
>=1 list, i.e., Y
unsorted.
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
D(0.80)
J(0.30)
E
F
G
3.30
J
Answers seen (under SA) in
all 4 lists, i.e., H.
9
Example of FA
L1
H(0.95)
L2
L3
L4
Answers seen in
>=1 list, i.e., Y
unsorted.
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
D(0.80)
J(0.30)
E
F
G
J
3.30
2.65
Answers seen (under SA) in
all 4 lists, i.e., H.
10
Example of FA
L1
H(0.95)
L2
L3
L4
Answers seen in
>=1 list, i.e., Y
unsorted.
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
3.40
D
H(0.90)
B(0.85)
D(0.80)
J(0.30)
E
3.05
F
G
J
3.30
2.65
Answers seen (under SA) in
all 4 lists, i.e., H.
11
Example of FA
L1
H(0.95)
L2
L3
L4
Answers seen in
>=1 list, i.e., Y
unsorted.
A
J(1.00)
C(0.95)
E(1.00)
B
3.05
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
3.40
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
D(0.80)
E
3.05
F
G
3.15
C(0.70)
H
3.30
A(0.65)
I
J(0.30)
J
2.65
Answers seen (under SA) in
all 4 lists, i.e., H.
12
Example of FA
L1
H(0.95)
L2
L3
L4
Answers seen in
>=1 list, i.e., Y
unsorted.
A
J(1.00)
C(0.95)
E(1.00)
B
3.05
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
D
3.40
2.55
C(0.80
E
3.05
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
H(0.90)
B(0.85)
D(0.80)
F
G
3.15
C(0.70)
H
3.30
A(0.65)
I
J(0.30)
J
2.65
Answers seen (under SA) in
all 4 lists, i.e., H.
13
Example of FA
L1
H(0.95)
L2
L3
Answers seen in
>=1 list, i.e., Y
unsorted.
L4
A
J(1.00)
C(0.95)
E(1.00)
B
3.05
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
D
3.40
2.55
C(0.80
E
3.05
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
H(0.90)
B(0.85)
F
D(0.80)
G
3.15
C(0.70)
H
3.30
A(0.65)
I
J(0.30)
J
2.65
Answers seen (under SA) in
all 4 lists, i.e., H.
H
14
Example of FA
L1
H(0.95)
L2
L3
Answers seen in
>=1 list, i.e., Y
unsorted.
L4
A
J(1.00)
C(0.95)
E(1.00)
B
3.05
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
D
3.40
2.55
C(0.80
E
3.05
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
H(0.90)
B(0.85)
F
D(0.80)
G
3.15
C(0.70)
H
3.30
A(0.65)
I
J(0.30)
J
2.65
Answers seen (under SA) in
all 4 lists, i.e., H.
H, G
15
Example of FA
L1
H(0.95)
L2
L3
L4
Answers seen in
>=1 list, i.e., Y
unsorted.
A
J(1.00)
C(0.95)
E(1.00)
B
3.05
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
D
3.40
2.55
C(0.80
E
3.05
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
H(0.90)
B(0.85)
D(0.80)
F
G
3.15
C(0.70)
H
3.30
A(0.65)
I
2.05
J
2.65
J(0.30)
Answers seen (under SA) in
all 4 lists, i.e., H.
H, G, B, C
|H| = 4.
16
FA Example concluded
• A, F – not seen in any list. Yet, we are sure they
can’t make it to top-4. Why?
• Based on where the cursors are now, what’s the
max. possible score for A, F?
• What assumptions are being made about t()?
• FA is shown to be optimal with very high
probability [Fagin: PODS 1996].
• But can be beaten by other algorithms on
specific inputs.
• What about buffer size?
4/8/2015
17
Threshold Algorithm
• Do parallel SA on all m lists.
• For each object x seen under SA in a list, fetch
its scores from other lists by RA and compute
overall score.
• If |Buffer| < K add x to Buffer;
• Else if score(x) <= k-th score in buffer, toss;
• Else replace bottom of buffer with (x, score(x)) & resort.
• Stop when threshold <= k-th score in buffer.
• Threshold := t(worst score seen on L1, …, worst
score seen on Lm).
• Output the top-K objects & scores (in buffer).
4/8/2015
18
TA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
D(0.80)
E
F
G
J
J(0.30)
19
TA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
D(0.80)
E
F
G
J
J(0.30)
20
TA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
Threshold Bar:
F(0.50)
I(0.30)
J(0.30)
x1
x2
x3 x4
0.95 1.00 0.95 1.00
4/8/2015
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
D(0.80)
E
F
G
3.30
J
21
TA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
D(0.70)
D
H(0.90)
E
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
H(0.65)
G(0.60)
A(0.30)
3.40
B(0.85)
D(0.80)
J(0.30)
3.05
F
G
J
3.30
2.65
Threshold Bar: T = 3.90.
x1
x2
x3 x4
0.95 1.00 0.95 1.00
22
TA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
3.05 X
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
3.40
E(0.85)
G(0.85)
D(0.70)
D
H(0.90)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
H(0.65)
G(0.60)
A(0.30)
B(0.85)
E
3.05
F
G
3.15
C(0.70)
H
3.30
A(0.65)
I
D(0.80)
J(0.30)
J
2.65 X
Threshold Bar: T=3.60.
x1
x2
x3 x4
0.90 0.95 0.80 0.95
23
TA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
3.05 X
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
D
3.40
2.55 X
E
3.05
D(0.70)
H(0.90)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
H(0.65)
G(0.60)
A(0.30)
B(0.85)
F
G
3.15
C(0.70)
H
3.30
A(0.65)
I
D(0.80)
J(0.30)
J
2.65 X
Threshold Bar: T=3.30.
x1
x2
x3 x4
0.85 0.85 0.70 0.90
24
TA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
3.05 X
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
D
3.40
2.55 X
E
3.05
D(0.70)
H(0.90)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
H(0.65)
G(0.60)
A(0.30)
B(0.85)
F
G
3.15
C(0.70)
H
3.30
A(0.65)
I
D(0.80)
J(0.30)
J
2.65 X
Threshold Bar: T=3.10.
x1
x2
x3 x4
0.80 0.80 0.65 0.85
25
TA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
3.05 X
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
D
3.40
2.55 X
E
3.05
D(0.70)
H(0.90)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
I(0.30)
4/8/2015
H(0.65)
G(0.60)
A(0.30)
B(0.85)
F
G
3.15
C(0.70)
H
3.30
A(0.65)
I
D(0.80)
J(0.30)
J
2.65 X
Threshold Bar: T=2.90.
==> can stop!
x1
x2
x3 x4
0.75 0.75 0.60 0.80 26
TA Remarks
4/8/2015
27
TA is Instance Optimal
4/8/2015
28
TA IO Proof (contd.)
4/8/2015
29
Proof (contd.)
4/8/2015
30
Proof (contd.)
4/8/2015
31
Proof (contd.)
4/8/2015
32
Proof (concluded)
4/8/2015
33
No Random Access Algorithm
• What if RA > SA or RA wasn’t allowed?
• Do SA on all lists in parallel. At depth d:
– Maintain worst scores x1, …, xm.
– x any object seen in lists {1, …, i}.
• Best(x) = t(x1, …, xi, xi+1, …, xm).
• Worst(x) = t(x1, …, xi, 0, …, 0).
– TopK contains K objects with max worst scores at
depth d. Break ties using Best. M = k-th Worst score
in TopK.
– Object y is viable if Best(y) > M.
• Stop when TopK contains >=K distinct objects
and no object outside TopK is viable. Return
TopK.
4/8/2015
34
NRA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
C(0.70)
H
D(0.65)
F(0.60)
I(0.50)
A(0.65)
I
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
4/8/2015
I(0.30)
D(0.70)
H(0.65)
G(0.60)
A(0.30)
[0.95, 3.90]
D
H(0.90)
B(0.85)
D(0.80)
J(0.30)
E
[1.00, 3.90]
F
G
J
[0.95, 3.90]
[1.00, 3.90]
35
NRA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
[0.90, 3.60]
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
[1.90, 3.75]
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
4/8/2015
I(0.30)
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
E
[1.00, 3.65]
F
G
[0.95, 3.60]
C(0.70)
H
[0.95, 3.65]
A(0.65)
I
D(0.80)
J(0.30)
J
[1.80, 3.65]
36
NRA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
[0.90, 3.35]
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
[1.90, 3.65]
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
4/8/2015
I(0.30)
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
E
[0.70, 3.30]
[1.85, 3.40]
F
G
[1.80, 3.35]
C(0.70)
H
[1.85, 3.40]
A(0.65)
I
D(0.80)
J(0.30)
J
[1.80, 3.55]
37
NRA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
[1.75, 3.20]
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
[2.70, 3.55]
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
4/8/2015
I(0.30)
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
E
[0.70, 3.15]
[1.85, 3.30]
F
G
[1.80, 3.25]
C(0.70)
H
[3.30, 3.30]
A(0.65)
I
D(0.80)
J(0.30)
J
[1.80, 3.45]
38
NRA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
[1.75, 3.10]
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
[2.70, 3.50]
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
4/8/2015
I(0.30)
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
E
[1.50, 3.00]
[2.60, 3.20]
F
G
[3.15, 3.15]
C(0.70)
H
[3.30, 3.30]
A(0.65)
I
D(0.80)
J(0.30)
J
[1.80, 3.35]
39
NRA Example
L1
H(0.95)
L2
L3
L4
A
J(1.00)
C(0.95)
E(1.00)
B
[3.05, 3.05]
B(0.90)
C(0.95)
J(0.80)
G(0.95)
C
[3.40, 3.40]
E(0.85)
G(0.85)
C(0.80
H(0.80)
G(0.75)
E(0.75)
I(0.70)
B(0.75)
B(0.55)
D(0.65)
F(0.60)
I(0.50)
A(0.60)
A(0.50)
E(0.45)
I(0.55)
J(0.55)
D(0.40)
F(0.40)
F(0.45)
F(0.50)
4/8/2015
I(0.30)
D(0.70)
H(0.65)
G(0.60)
A(0.30)
D
H(0.90)
B(0.85)
E
[1.50, 2.95]
[2.60, 3.15]
F
G
[3.15, 3.15]
C(0.70)
H
[3.30, 3.30]
A(0.65)
I
[0.70, 2.70]
D(0.80)
J(0.30)
J
[1.80, 3.20]
40
NRA Features
• What sort of t() do we need to assume, for
NRA to work correctly?
• How large can the buffers get?
• How does the amount of bookkeeping
compare with TA?
• NRA is instance optimal over algo’s not
making RA (and of course, not making
wild guesses).
4/8/2015
41
Combined optimization
• What if we are told cost(RA) =
.cost(SA)?
• Can we find algo’s better than NRA and
TA in this case?
• Combined algorithm = CA. (See Fagin et
al.’s paper for details.)
4/8/2015
42
Worrying about I/O cost
• Based on Bast et al. VLDB 2006.
• Inverted lists of (itemID, score) entries in
desc. score order, as usual, but on disk.
• Blocks sorted by itemID; across blocks still
in desc. score order.
•  Inverted Block Index (IBI) Algorithm.
• What is an IBI?
4/8/2015
43
A Motivating Example
List 1
Doc17 : 0.8
Doc78 : 0.2
.
·
·
·
·
List 2
Doc25 : 0.7
Doc38 : 0.5
Doc14 : 0.5
Doc83 : 0.5
·
Doc17 : 0.2
·
List 3
Doc83 : 0.9
Doc17 : 0.7
Doc61 : 0.3
·
·
·
·
Round 1 (SA on 1,2,3)
Doc17 : [0.8 , 2.4]
Doc25 : [0.7 , 2.4]
Doc83 : [0.9 , 2.4]
unseen: ≤ 2.4
4/8/2015
44
A Motivating Example
List 1
Doc17 : 0.8
Doc78 : 0.2
.
·
·
·
·
List 2
Doc25 : 0.7
Doc38 : 0.5
Doc14 : 0.5
Doc83 : 0.5
·
Doc17 : 0.2
·
Round 1 (SA on 1,2,3)
Doc17 : [0.8 , 2.4]
Doc25 : [0.7 , 2.4]
Doc83 : [0.9 , 2.4]
unseen: ≤ 2.4
4/8/2015
List 3
Doc83 : 0.9
Doc17 : 0.7
Doc61 : 0.3
·
·
·
·
Round 2 (SA on 1,2,3)
Doc17 : [1.5 , 2.0]
Doc25 : [0.7 , 1.6]
Doc83 : [0.9 , 1.6]
unseen: ≤ 1.4
45
A Motivating Example
List 1
Doc17 : 0.8
Doc78 : 0.2
.
·
·
·
·
List 2
Doc25 : 0.7
Doc38 : 0.5
Doc14 : 0.5
Doc83 : 0.5
·
Doc17 : 0.2
·
List 3
Doc83 : 0.9
Doc17 : 0.7
Doc61 : 0.3
·
·
·
·
Round 1 (SA on 1,2,3) Round 2 (SA on 1,2,3) Round 3 (SA on 2,2,3!)
Doc17 : [1.5 , 2.0]
Doc17 : [1.5 , 2.0]
Doc17 : [0.8 , 2.4]
Doc25 : [0.7 , 1.6]
Doc83 : [1.4 , 1.6]
Doc25 : [0.7 , 2.4]
Doc83 : [0.9 , 1.6]
unseen: ≤ 1.0
Doc83 : [0.9 , 2.4]
unseen: ≤ 1.4
unseen: ≤ 2.4
4/8/2015
46
A Motivating Example
List 1
Doc17 : 0.8
Doc78 : 0.2
.
·
·
·
·
Round 1 (SA on 1,2,3)
Doc17 : [0.8 , 2.4]
Doc25 : [0.7 , 2.4]
Doc83 : [0.9 , 2.4]
unseen: ≤ 2.4
4/8/2015
List 2
Doc25 : 0.7
Doc38 : 0.5
Doc14 : 0.5
Doc83 : 0.5
·
Doc17 : 0.2
·
Round 2 (SA on 1,2,3)
Doc17 : [1.5 , 2.0]
Doc25 : [0.7 , 1.6]
Doc83 : [0.9 , 1.6]
unseen: ≤ 1.4
List 3
Doc83 : 0.9
Doc17 : 0.7
Doc61 : 0.3
·
·
·
·
Round 3 (SA on 2,2,3!)
Doc17 : [1.5 , 2.0]
Doc83 : [1.4 , 1.6]
unseen: ≤ 1.0
Round 4 (RA for Doc17)
Doc17 : 1.7
all others < 1.7
done!
Note deviation from
round-robin.
47
IBI Algorithm
• Same setting as NRA/CA, except use IBI.
• Maintain two lists: Top-K items (T = d1, …, dk) and
StillHaveASHot (SHASH) (S = dk+1, …, dk+q) items.
• Pos_i = curr cursor position on list Li.
• high_i = score in Li at curr cursor position (upper bounds
score of unseen items).
• For items d in S:
– Which attr scores are known E(d).
– Which attr scores are unknown E~(d).
– Worst(d) = total score from E(d).
– Best(d) = Worst(d) +  {high_i(d) | i E~(d)}.
(Exactly as Fagin.)
4/8/2015
48
IBI Algorithm (contd.)
• In each round, compute:
– min-k = min{Worst(d) | d  T}.
– bestscore that any unseen doc can have = sum of all high_i’s.
– For dj  S: def_j = min-k – worst(d_j). [denotes deficit below
qualification level for top-k.]
• T sorted in desc. Worst(); S sorted in desc. Best().
[sorting on (score, ItemID) for fast processing.]
• Invatiant: min-k >= max{Worst(d) | d  S}.
• Termination: when min-k >= max{Best(d) | d  S}.
• Can remove an obj from S whenever its Best <= min-k.
 stop when S = {}.
• Early termination AND minimal bookkeeping are BOTH
important for performance.
4/8/2015
49
More on IBI Framework
• Instead of scheduling SAs using RR, use a
differential approach for diff. lists based on
expected score reductions at future cursor
positions (Knapsack).
• Do SA*RA*.
• Order RAs based on estimated Prob[dj
can get into top-k answers].
4/8/2015
50
Download