Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC Why ranking in query answering? 1/3 • Mutimedia data – fuzzy querying: e.g., “find top 2 red objects with a soft texture”. Obj A D C B E 4/8/2015 Score 0.9 0.8 0.4 0.3 0.1 Overall score Combine scores Obj D B A E C Score 0.85 0.80 0.75 0.65 0.60 2 Why ranking? 2/3 • IR: “find top 5 documents relevant to `computational’, `neuroscience’ and `brain theory’. – IR systems maintain full text indexes; inverted lists of docs w.r.t. each keyword. – Same Q/A paradigm as before. • Buying a home: several criteria – price, location, area, #BRs, school district. ORDER BY query in SQL. • Finding hotels while traveling. 4/8/2015 3 Why ranking? 3/3 • Data stream, e.g., of network flow data: “find 10 users with the max. BW consumption and max. #packets communicated”. – score may be complex aggregation of these two measures. • In a social net, find 5 items tagged as most relevant to “lawn mowing” and blonging to users socially close to the seeker. • And now, find top-k recs (recommender systems). • etc. • Fagin et al. – pioneering papers PODS’96, 01, JCSS 2003. Burgeoned into a field now. • Focus on middleware algorithm, which given a score combo. function, computes top-k answers by probing diff. subsystems (or ranked lists). 4/8/2015 4 Computational model • Naïve method. • How to compute top-K efficiently? • Access methods: – Sorted access (sequential access) [SA]. – Random access [RA]. • Diff. optimization metrics: – – – – Overall running time of algorithm. SA < RA: minimize RAs. RA not possible #: avoid RAs. Combined optimization. • Has led to a variety of algorithms. • Memory vs. disk model. • For the most part, assume score agg. is a monotone function; use SUM in examples. 4/8/2015 #: typical in IR systems. 5 Fagin’s Algorithm (FA) • m lists sorted by descending scores. • Access (SA) all lists in parallel. – For each new object seen, fetch scores from other lists by RA. Overall score t(x) = t(x1, …, xm). Store (obj, score) in set Y. – Remember each object seen (under SA) in all lists in set H. • Repeat until |H| >= K. • Sort Y in descending order of scores, breaking ties arbitrarily, and output top K. 4/8/2015 6 Example of FA L1 H(0.95) L2 L3 L4 Answers seen in >=1 list, i.e., Y unsorted. A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) D(0.80) J(0.30) E F G J Answers seen (under SA) in all 4 lists, i.e., H. 7 Example of FA L1 H(0.95) L2 L3 L4 Answers seen in >=1 list, i.e., Y unsorted. A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) D(0.80) J(0.30) E F G J Answers seen (under SA) in all 4 lists, i.e., H. 8 Example of FA L1 H(0.95) L2 L3 L4 Answers seen in >=1 list, i.e., Y unsorted. A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) D(0.80) J(0.30) E F G 3.30 J Answers seen (under SA) in all 4 lists, i.e., H. 9 Example of FA L1 H(0.95) L2 L3 L4 Answers seen in >=1 list, i.e., Y unsorted. A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) D(0.80) J(0.30) E F G J 3.30 2.65 Answers seen (under SA) in all 4 lists, i.e., H. 10 Example of FA L1 H(0.95) L2 L3 L4 Answers seen in >=1 list, i.e., Y unsorted. A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) 3.40 D H(0.90) B(0.85) D(0.80) J(0.30) E 3.05 F G J 3.30 2.65 Answers seen (under SA) in all 4 lists, i.e., H. 11 Example of FA L1 H(0.95) L2 L3 L4 Answers seen in >=1 list, i.e., Y unsorted. A J(1.00) C(0.95) E(1.00) B 3.05 B(0.90) C(0.95) J(0.80) G(0.95) C 3.40 E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) D(0.80) E 3.05 F G 3.15 C(0.70) H 3.30 A(0.65) I J(0.30) J 2.65 Answers seen (under SA) in all 4 lists, i.e., H. 12 Example of FA L1 H(0.95) L2 L3 L4 Answers seen in >=1 list, i.e., Y unsorted. A J(1.00) C(0.95) E(1.00) B 3.05 B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) D 3.40 2.55 C(0.80 E 3.05 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) H(0.90) B(0.85) D(0.80) F G 3.15 C(0.70) H 3.30 A(0.65) I J(0.30) J 2.65 Answers seen (under SA) in all 4 lists, i.e., H. 13 Example of FA L1 H(0.95) L2 L3 Answers seen in >=1 list, i.e., Y unsorted. L4 A J(1.00) C(0.95) E(1.00) B 3.05 B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) D 3.40 2.55 C(0.80 E 3.05 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) H(0.90) B(0.85) F D(0.80) G 3.15 C(0.70) H 3.30 A(0.65) I J(0.30) J 2.65 Answers seen (under SA) in all 4 lists, i.e., H. H 14 Example of FA L1 H(0.95) L2 L3 Answers seen in >=1 list, i.e., Y unsorted. L4 A J(1.00) C(0.95) E(1.00) B 3.05 B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) D 3.40 2.55 C(0.80 E 3.05 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) H(0.90) B(0.85) F D(0.80) G 3.15 C(0.70) H 3.30 A(0.65) I J(0.30) J 2.65 Answers seen (under SA) in all 4 lists, i.e., H. H, G 15 Example of FA L1 H(0.95) L2 L3 L4 Answers seen in >=1 list, i.e., Y unsorted. A J(1.00) C(0.95) E(1.00) B 3.05 B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) D 3.40 2.55 C(0.80 E 3.05 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) H(0.90) B(0.85) D(0.80) F G 3.15 C(0.70) H 3.30 A(0.65) I 2.05 J 2.65 J(0.30) Answers seen (under SA) in all 4 lists, i.e., H. H, G, B, C |H| = 4. 16 FA Example concluded • A, F – not seen in any list. Yet, we are sure they can’t make it to top-4. Why? • Based on where the cursors are now, what’s the max. possible score for A, F? • What assumptions are being made about t()? • FA is shown to be optimal with very high probability [Fagin: PODS 1996]. • But can be beaten by other algorithms on specific inputs. • What about buffer size? 4/8/2015 17 Threshold Algorithm • Do parallel SA on all m lists. • For each object x seen under SA in a list, fetch its scores from other lists by RA and compute overall score. • If |Buffer| < K add x to Buffer; • Else if score(x) <= k-th score in buffer, toss; • Else replace bottom of buffer with (x, score(x)) & resort. • Stop when threshold <= k-th score in buffer. • Threshold := t(worst score seen on L1, …, worst score seen on Lm). • Output the top-K objects & scores (in buffer). 4/8/2015 18 TA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) D(0.80) E F G J J(0.30) 19 TA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) D(0.80) E F G J J(0.30) 20 TA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) Threshold Bar: F(0.50) I(0.30) J(0.30) x1 x2 x3 x4 0.95 1.00 0.95 1.00 4/8/2015 D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) D(0.80) E F G 3.30 J 21 TA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) D(0.70) D H(0.90) E C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 H(0.65) G(0.60) A(0.30) 3.40 B(0.85) D(0.80) J(0.30) 3.05 F G J 3.30 2.65 Threshold Bar: T = 3.90. x1 x2 x3 x4 0.95 1.00 0.95 1.00 22 TA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B 3.05 X B(0.90) C(0.95) J(0.80) G(0.95) C 3.40 E(0.85) G(0.85) D(0.70) D H(0.90) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 H(0.65) G(0.60) A(0.30) B(0.85) E 3.05 F G 3.15 C(0.70) H 3.30 A(0.65) I D(0.80) J(0.30) J 2.65 X Threshold Bar: T=3.60. x1 x2 x3 x4 0.90 0.95 0.80 0.95 23 TA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B 3.05 X B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) D 3.40 2.55 X E 3.05 D(0.70) H(0.90) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 H(0.65) G(0.60) A(0.30) B(0.85) F G 3.15 C(0.70) H 3.30 A(0.65) I D(0.80) J(0.30) J 2.65 X Threshold Bar: T=3.30. x1 x2 x3 x4 0.85 0.85 0.70 0.90 24 TA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B 3.05 X B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) D 3.40 2.55 X E 3.05 D(0.70) H(0.90) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 H(0.65) G(0.60) A(0.30) B(0.85) F G 3.15 C(0.70) H 3.30 A(0.65) I D(0.80) J(0.30) J 2.65 X Threshold Bar: T=3.10. x1 x2 x3 x4 0.80 0.80 0.65 0.85 25 TA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B 3.05 X B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) D 3.40 2.55 X E 3.05 D(0.70) H(0.90) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) I(0.30) 4/8/2015 H(0.65) G(0.60) A(0.30) B(0.85) F G 3.15 C(0.70) H 3.30 A(0.65) I D(0.80) J(0.30) J 2.65 X Threshold Bar: T=2.90. ==> can stop! x1 x2 x3 x4 0.75 0.75 0.60 0.80 26 TA Remarks 4/8/2015 27 TA is Instance Optimal 4/8/2015 28 TA IO Proof (contd.) 4/8/2015 29 Proof (contd.) 4/8/2015 30 Proof (contd.) 4/8/2015 31 Proof (contd.) 4/8/2015 32 Proof (concluded) 4/8/2015 33 No Random Access Algorithm • What if RA > SA or RA wasn’t allowed? • Do SA on all lists in parallel. At depth d: – Maintain worst scores x1, …, xm. – x any object seen in lists {1, …, i}. • Best(x) = t(x1, …, xi, xi+1, …, xm). • Worst(x) = t(x1, …, xi, 0, …, 0). – TopK contains K objects with max worst scores at depth d. Break ties using Best. M = k-th Worst score in TopK. – Object y is viable if Best(y) > M. • Stop when TopK contains >=K distinct objects and no object outside TopK is viable. Return TopK. 4/8/2015 34 NRA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B B(0.90) C(0.95) J(0.80) G(0.95) C E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) C(0.70) H D(0.65) F(0.60) I(0.50) A(0.65) I A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) 4/8/2015 I(0.30) D(0.70) H(0.65) G(0.60) A(0.30) [0.95, 3.90] D H(0.90) B(0.85) D(0.80) J(0.30) E [1.00, 3.90] F G J [0.95, 3.90] [1.00, 3.90] 35 NRA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B [0.90, 3.60] B(0.90) C(0.95) J(0.80) G(0.95) C [1.90, 3.75] E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) 4/8/2015 I(0.30) D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) E [1.00, 3.65] F G [0.95, 3.60] C(0.70) H [0.95, 3.65] A(0.65) I D(0.80) J(0.30) J [1.80, 3.65] 36 NRA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B [0.90, 3.35] B(0.90) C(0.95) J(0.80) G(0.95) C [1.90, 3.65] E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) 4/8/2015 I(0.30) D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) E [0.70, 3.30] [1.85, 3.40] F G [1.80, 3.35] C(0.70) H [1.85, 3.40] A(0.65) I D(0.80) J(0.30) J [1.80, 3.55] 37 NRA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B [1.75, 3.20] B(0.90) C(0.95) J(0.80) G(0.95) C [2.70, 3.55] E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) 4/8/2015 I(0.30) D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) E [0.70, 3.15] [1.85, 3.30] F G [1.80, 3.25] C(0.70) H [3.30, 3.30] A(0.65) I D(0.80) J(0.30) J [1.80, 3.45] 38 NRA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B [1.75, 3.10] B(0.90) C(0.95) J(0.80) G(0.95) C [2.70, 3.50] E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) 4/8/2015 I(0.30) D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) E [1.50, 3.00] [2.60, 3.20] F G [3.15, 3.15] C(0.70) H [3.30, 3.30] A(0.65) I D(0.80) J(0.30) J [1.80, 3.35] 39 NRA Example L1 H(0.95) L2 L3 L4 A J(1.00) C(0.95) E(1.00) B [3.05, 3.05] B(0.90) C(0.95) J(0.80) G(0.95) C [3.40, 3.40] E(0.85) G(0.85) C(0.80 H(0.80) G(0.75) E(0.75) I(0.70) B(0.75) B(0.55) D(0.65) F(0.60) I(0.50) A(0.60) A(0.50) E(0.45) I(0.55) J(0.55) D(0.40) F(0.40) F(0.45) F(0.50) 4/8/2015 I(0.30) D(0.70) H(0.65) G(0.60) A(0.30) D H(0.90) B(0.85) E [1.50, 2.95] [2.60, 3.15] F G [3.15, 3.15] C(0.70) H [3.30, 3.30] A(0.65) I [0.70, 2.70] D(0.80) J(0.30) J [1.80, 3.20] 40 NRA Features • What sort of t() do we need to assume, for NRA to work correctly? • How large can the buffers get? • How does the amount of bookkeeping compare with TA? • NRA is instance optimal over algo’s not making RA (and of course, not making wild guesses). 4/8/2015 41 Combined optimization • What if we are told cost(RA) = .cost(SA)? • Can we find algo’s better than NRA and TA in this case? • Combined algorithm = CA. (See Fagin et al.’s paper for details.) 4/8/2015 42 Worrying about I/O cost • Based on Bast et al. VLDB 2006. • Inverted lists of (itemID, score) entries in desc. score order, as usual, but on disk. • Blocks sorted by itemID; across blocks still in desc. score order. • Inverted Block Index (IBI) Algorithm. • What is an IBI? 4/8/2015 43 A Motivating Example List 1 Doc17 : 0.8 Doc78 : 0.2 . · · · · List 2 Doc25 : 0.7 Doc38 : 0.5 Doc14 : 0.5 Doc83 : 0.5 · Doc17 : 0.2 · List 3 Doc83 : 0.9 Doc17 : 0.7 Doc61 : 0.3 · · · · Round 1 (SA on 1,2,3) Doc17 : [0.8 , 2.4] Doc25 : [0.7 , 2.4] Doc83 : [0.9 , 2.4] unseen: ≤ 2.4 4/8/2015 44 A Motivating Example List 1 Doc17 : 0.8 Doc78 : 0.2 . · · · · List 2 Doc25 : 0.7 Doc38 : 0.5 Doc14 : 0.5 Doc83 : 0.5 · Doc17 : 0.2 · Round 1 (SA on 1,2,3) Doc17 : [0.8 , 2.4] Doc25 : [0.7 , 2.4] Doc83 : [0.9 , 2.4] unseen: ≤ 2.4 4/8/2015 List 3 Doc83 : 0.9 Doc17 : 0.7 Doc61 : 0.3 · · · · Round 2 (SA on 1,2,3) Doc17 : [1.5 , 2.0] Doc25 : [0.7 , 1.6] Doc83 : [0.9 , 1.6] unseen: ≤ 1.4 45 A Motivating Example List 1 Doc17 : 0.8 Doc78 : 0.2 . · · · · List 2 Doc25 : 0.7 Doc38 : 0.5 Doc14 : 0.5 Doc83 : 0.5 · Doc17 : 0.2 · List 3 Doc83 : 0.9 Doc17 : 0.7 Doc61 : 0.3 · · · · Round 1 (SA on 1,2,3) Round 2 (SA on 1,2,3) Round 3 (SA on 2,2,3!) Doc17 : [1.5 , 2.0] Doc17 : [1.5 , 2.0] Doc17 : [0.8 , 2.4] Doc25 : [0.7 , 1.6] Doc83 : [1.4 , 1.6] Doc25 : [0.7 , 2.4] Doc83 : [0.9 , 1.6] unseen: ≤ 1.0 Doc83 : [0.9 , 2.4] unseen: ≤ 1.4 unseen: ≤ 2.4 4/8/2015 46 A Motivating Example List 1 Doc17 : 0.8 Doc78 : 0.2 . · · · · Round 1 (SA on 1,2,3) Doc17 : [0.8 , 2.4] Doc25 : [0.7 , 2.4] Doc83 : [0.9 , 2.4] unseen: ≤ 2.4 4/8/2015 List 2 Doc25 : 0.7 Doc38 : 0.5 Doc14 : 0.5 Doc83 : 0.5 · Doc17 : 0.2 · Round 2 (SA on 1,2,3) Doc17 : [1.5 , 2.0] Doc25 : [0.7 , 1.6] Doc83 : [0.9 , 1.6] unseen: ≤ 1.4 List 3 Doc83 : 0.9 Doc17 : 0.7 Doc61 : 0.3 · · · · Round 3 (SA on 2,2,3!) Doc17 : [1.5 , 2.0] Doc83 : [1.4 , 1.6] unseen: ≤ 1.0 Round 4 (RA for Doc17) Doc17 : 1.7 all others < 1.7 done! Note deviation from round-robin. 47 IBI Algorithm • Same setting as NRA/CA, except use IBI. • Maintain two lists: Top-K items (T = d1, …, dk) and StillHaveASHot (SHASH) (S = dk+1, …, dk+q) items. • Pos_i = curr cursor position on list Li. • high_i = score in Li at curr cursor position (upper bounds score of unseen items). • For items d in S: – Which attr scores are known E(d). – Which attr scores are unknown E~(d). – Worst(d) = total score from E(d). – Best(d) = Worst(d) + {high_i(d) | i E~(d)}. (Exactly as Fagin.) 4/8/2015 48 IBI Algorithm (contd.) • In each round, compute: – min-k = min{Worst(d) | d T}. – bestscore that any unseen doc can have = sum of all high_i’s. – For dj S: def_j = min-k – worst(d_j). [denotes deficit below qualification level for top-k.] • T sorted in desc. Worst(); S sorted in desc. Best(). [sorting on (score, ItemID) for fast processing.] • Invatiant: min-k >= max{Worst(d) | d S}. • Termination: when min-k >= max{Best(d) | d S}. • Can remove an obj from S whenever its Best <= min-k. stop when S = {}. • Early termination AND minimal bookkeeping are BOTH important for performance. 4/8/2015 49 More on IBI Framework • Instead of scheduling SAs using RR, use a differential approach for diff. lists based on expected score reductions at future cursor positions (Knapsack). • Do SA*RA*. • Order RAs based on estimated Prob[dj can get into top-k answers]. 4/8/2015 50