BEST-EFFORT TOP-K QUERY PROCESSING UNDER BUDGETARY CONSTRAINTS Steven Williams Spring 2016 2 OUTLINE 1. Top-k query processing 2. Budgetary Constraints 3. Motivating Example 4. Proposed algorithm 5. Results 6. Questions 3 TOP-K QUERY PROCESSING • Pre-computed lists over multiple attributes. sorted • Combine scores by some monotonic aggregation function. n • Two accesses modes: – sorted access (Cs) – random access (Cr) m • Objective: Compute k objects with highest scores. 4 NRA ALGORITHM R1 highi f = SUM R2 a + 0.90 d + 0.87 b + 0.60 a + 0.85 c + 0.50 f ++0.25 0.25 … … … … d 0.40 c 0.20 worst score best score Top-2 [0.90 [1.75 , 1.75] 1.77] 1.77] 1.37] [0.87 , 1.47] mink candidates [0.60 , 0.85] 1.45] [0.50 , 0.75] [0.25 , 0.75] mink > best score of candidates 5 BUDGET CONSTRAINTS Top-2 Costs + 1Access Cs + 2 Cs +1C s s Sorted Access Cost + =2 CC s Random Access Cost + = 2CC r + 1 Cs s Cs = 1, Cr = 3 f = SUM Budget = 10 12 2 C = 12 4 6 8 10 NRA:B, Given budget +1C 1C +2 precision = 0.50 maximize result quality +1C +2C s sr s s + 1 Cs +1 2 Crs + 1 Cs + 1 Cr s TA: 4 6 1 2 5 7 3 Cs + 4 6 1 2 5 7 3 Cr = 28 precision = 0 6 MOTIVATING EXAMPLE USELESS Q 7 PROPOSED ALGORITHMS • Sorted Accesses • Efficient Plan • Solution with Adaptive a • Sorted and Random Accesses • Efficient Plan • Solution with Adaptive a 8 RESULTS UNDER LIMITED BUDGET K results for unlimited budget Results for limited budget 9 EFFICIENT PLAN – SORTED ACCESS ππ¨ππ₯: Find a plan t such that: ππππππ₯ π‘ ∈ π Λ Plans for B = 10 π‘ | π π‘ ∩ π ππ₯πππ‘ | | π π‘ | ≤π΅ Plan: { R1 , 4 }, { R2 , 6 } denoted Ropt 10 OBSERVATIONS B = 180 1. Prefer high scores 2. Prefer large score reductions Uniform allocation Non-uniform allocation 11 SCORE UTILITIES Score reduction: Score gain: π’π‘ππππ 1 πΏπ , π₯ = ∗ π₯ π’π‘ππππ π 1 , 3 = πππ π +π₯ π πππππ (π) x = 3 π’π‘πππ π πΏπ , π₯ = βππβπ − π πππππ (πππ π + π₯) π=πππ π 1 ∗ (0.95 + 0.93 + 0.92) 3 = 0.93 π’π‘πππ π π 1 , 3 = 0.95 − 0.92 = 0.03 12 OPTIMIZATION PROBLEM π’π‘ππ πΏπ , π₯ = πΌ ∗ ππππ + 1 − πΌ ∗ ππππ’ππ‘πππ π πππ₯ππππ§π gain )a) reduction (1-a( π’π‘ππ(πΏπ , π₯) π=1 subject to: π ππ = π π=1 time 13 ADAPTIVE πΌ 1 0.9 0.8 0.7 0.6 pΜ pΜ k 0.5k 0.4 0.3 0.2 0.1 0 • πΌ is 1 until we’ve seen k objects • Afterwards, πΌ is the average probability of the candidate objects in the candidate set to get into the top-k. 0 500 1000 1500 2000 spent budget TREC query, k=100 2500 3000 3500 a ο½ pˆ k ο½ 1 |cand .set| ο₯ pk cοcand. set (c) 14 RANDOM ACCESSES When to switch from SA to RA? Gathering with Sorted )a( Not enough good candidates, RA is wasted Probing with Random (1-a( Not enough RAs to prune the candidates time 15 RANDOM ACCESSES • Switch from Sorted to Random: • R = (1 – alpha) * S • S – total cost of sorted accesses • R – total cost of random accesses • Which items to access? • maximize expected score S+R>B 16 RESULTS 17 EVALUATION METHODS •percentage of optimal precision precision a lg precision opt Ropt •SME Ralg Rexact Ropt RESULTS – SORTED ACCESS percentage of Optimal Precision TREC, k=100 90% NRA KBA Fair Ranking 80% 70% 60% 50% 500 1000 2000 3000 4000 Budget (#SA) •Less budget, more improvement 5000 19 RESULTS – VARIED K percentage of Optimal Precision IMDB, B=400 90% NRA 80% KBA 70% Fair 60% Ranking 50% 40% 30% 20% 20 50 k •Lower K, more improvement. 100 20 RESULTS – NUMBER OF LISTS percentage of Optimal Precision Zipf, K=100, B=4000 100% NRA KBA Fair 80% Ranking 60% 40% 2 3 4 Number of Lists •More lists, more improvement. 5 6 21 70% 65% 60% 55% 50% 45% 40% percentage of Optimal Precision 75% 80% 70% SA (Ranking) CA SA (Ranking) CA LAST 60% Adaptive_Expected LAST Adaptive_Expected 50% 40% 500 percentage of Optimal Precision TRE C, k=10 0,Cr= 10 percentage of Optimal Precision RESULTS – RANDOM ACCESSES 1000 500 2000 1000 3000 2000 Budget 4000 5000 3000 4000 5000 Budget 80% 70% TREC, CA K=100, LAST Cr=100 Adaptive_Expected SA (Ranking) 60% 50% 40% 500 1000 2000 3000 Budget 4000 5000 22 QUESTIONS