BEST-EFFORT TOP-K QUERY PROCESSING UNDER BUDGETARY CONSTRAINTS Steven Williams

BEST-EFFORT TOP-K QUERY PROCESSING UNDER BUDGETARY CONSTRAINTS Steven Williams Spring 2016 2 OUTLINE 1. Top-k query processing 2. Budgetary Constraints 3. Motivating Example 4. Proposed algorithm 5. Results 6. Questions 3 TOP-K QUERY PROCESSING • Pre-computed lists over multiple attributes. sorted • Combine scores by some monotonic aggregation function. n • Two accesses modes: – sorted access (Cs) – random access (Cr) m • Objective: Compute k objects with highest scores. 4 NRA ALGORITHM R1 highi f = SUM R2 a + 0.90 d + 0.87 b + 0.60 a + 0.85 c + 0.50 f ++0.25 0.25 … … … … d 0.40 c 0.20 worst score best score Top-2 [0.90 [1.75 , 1.75] 1.77] 1.77] 1.37] [0.87 , 1.47] mink candidates [0.60 , 0.85] 1.45] [0.50 , 0.75] [0.25 , 0.75] mink > best score of candidates 5 BUDGET CONSTRAINTS Top-2 Costs + 1Access Cs + 2 Cs +1C s s Sorted Access Cost + =2 CC s Random Access Cost + = 2CC r + 1 Cs s Cs = 1, Cr = 3 f = SUM Budget = 10 12 2 C = 12 4 6 8 10 NRA:B, Given budget +1C 1C +2 precision = 0.50 maximize result quality +1C +2C s sr s s + 1 Cs +1 2 Crs + 1 Cs + 1 Cr s TA: 4 6 1 2 5 7 3 Cs + 4 6 1 2 5 7 3 Cr = 28 precision = 0 6 MOTIVATING EXAMPLE USELESS Q 7 PROPOSED ALGORITHMS • Sorted Accesses • Efficient Plan • Solution with Adaptive a • Sorted and Random Accesses • Efficient Plan • Solution with Adaptive a 8 RESULTS UNDER LIMITED BUDGET K results for unlimited budget Results for limited budget 9 EFFICIENT PLAN – SORTED ACCESS 𝐆𝐨𝐚𝐥: Find a plan t such that: 𝑎𝑟𝑔𝑚𝑎𝑥 𝑡 ∈ 𝑇 Λ Plans for B = 10 𝑡 | 𝑅𝑡 ∩ 𝑅𝑒𝑥𝑎𝑐𝑡 | | 𝑅𝑡 | ≤𝐵 Plan: { R1 , 4 }, { R2 , 6 } denoted Ropt 10 OBSERVATIONS B = 180 1. Prefer high scores 2. Prefer large score reductions Uniform allocation Non-uniform allocation 11 SCORE UTILITIES Score reduction: Score gain: 𝑢𝑡𝑖𝑙𝑎𝑠 1 𝐿𝑖 , 𝑥 = ∗ 𝑥 𝑢𝑡𝑖𝑙𝑎𝑠 𝑅1 , 3 = 𝑝𝑜𝑠𝑖 +𝑥 𝑠𝑐𝑜𝑟𝑒𝑖 (𝑗) x = 3 𝑢𝑡𝑖𝑙𝑠𝑟 𝐿𝑖 , 𝑥 = ℎ𝑖𝑔ℎ𝑖 − 𝑠𝑐𝑜𝑟𝑒𝑖 (𝑝𝑜𝑠𝑖 + 𝑥) 𝑗=𝑝𝑜𝑠𝑖 1 ∗ (0.95 + 0.93 + 0.92) 3 = 0.93 𝑢𝑡𝑖𝑙𝑠𝑟 𝑅1 , 3 = 0.95 − 0.92 = 0.03 12 OPTIMIZATION PROBLEM 𝑢𝑡𝑖𝑙 𝐿𝑖 , 𝑥 = 𝛼 ∗ 𝑔𝑎𝑖𝑛 + 1 − 𝛼 ∗ 𝑟𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑚 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 gain )a) reduction (1-a( 𝑢𝑡𝑖𝑙(𝐿𝑖 , 𝑥) 𝑖=1 subject to: 𝑚 𝑏𝑖 = 𝑏 𝑖=1 time 13 ADAPTIVE 𝛼 1 0.9 0.8 0.7 0.6 p̂ p̂ k 0.5k 0.4 0.3 0.2 0.1 0 • 𝛼 is 1 until we’ve seen k objects • Afterwards, 𝛼 is the average probability of the candidate objects in the candidate set to get into the top-k. 0 500 1000 1500 2000 spent budget TREC query, k=100 2500 3000 3500 a  pˆ k  1 |cand .set|  pk ccand. set (c) 14 RANDOM ACCESSES When to switch from SA to RA? Gathering with Sorted )a( Not enough good candidates, RA is wasted Probing with Random (1-a( Not enough RAs to prune the candidates time 15 RANDOM ACCESSES • Switch from Sorted to Random: • R = (1 – alpha) * S • S – total cost of sorted accesses • R – total cost of random accesses • Which items to access? • maximize expected score S+R>B 16 RESULTS 17 EVALUATION METHODS •percentage of optimal precision precision a lg precision opt Ropt •SME Ralg Rexact Ropt RESULTS – SORTED ACCESS percentage of Optimal Precision TREC, k=100 90% NRA KBA Fair Ranking 80% 70% 60% 50% 500 1000 2000 3000 4000 Budget (#SA) •Less budget, more improvement 5000 19 RESULTS – VARIED K percentage of Optimal Precision IMDB, B=400 90% NRA 80% KBA 70% Fair 60% Ranking 50% 40% 30% 20% 20 50 k •Lower K, more improvement. 100 20 RESULTS – NUMBER OF LISTS percentage of Optimal Precision Zipf, K=100, B=4000 100% NRA KBA Fair 80% Ranking 60% 40% 2 3 4 Number of Lists •More lists, more improvement. 5 6 21 70% 65% 60% 55% 50% 45% 40% percentage of Optimal Precision 75% 80% 70% SA (Ranking) CA SA (Ranking) CA LAST 60% Adaptive_Expected LAST Adaptive_Expected 50% 40% 500 percentage of Optimal Precision TRE C, k=10 0,Cr= 10 percentage of Optimal Precision RESULTS – RANDOM ACCESSES 1000 500 2000 1000 3000 2000 Budget 4000 5000 3000 4000 5000 Budget 80% 70% TREC, CA K=100, LAST Cr=100 Adaptive_Expected SA (Ranking) 60% 50% 40% 500 1000 2000 3000 Budget 4000 5000 22 QUESTIONS

BEST-EFFORT TOP-K QUERY PROCESSING UNDER BUDGETARY CONSTRAINTS Steven Williams

Related documents

Products

Support

BEST-EFFORT TOP-K QUERY PROCESSING UNDER BUDGETARY CONSTRAINTS Steven Williams

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib