Davide Mottin Alice Marascu, Senjuti Basu Roy, Gautam Das, Themis Palpanas, Yannis Velegrakis Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem query = Alarm, DSL, Manual No answer CAR DB {} Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 2 Ranking results based on user preferences IR [Baeza11] and database solutions [Chaudhuri04] Query relaxation Modify some of the query conditions [Mishra09] (-) Suggests all the modification together (-) Does not take user feedback into account Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 3 Suggests one relaxation at a time Takes user feedback into account Models user preferences Optimization centric relaxation suggestions User centric (effort, relevance) System-centric (profit) Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 4 Exponential number of relaxations Modeling user preferences System encoding of different objective functions Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 5 A probabilistic optimization framework • Based on probability that user says yes to relaxation Q’ of query Q Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 6 Probability of accepting relaxation Q’ of Q belief of user that an answer will be found in the database: Prior likelihood the user will like the answers of relaxed query: Pref Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem Probability to reject a relaxation Cost for a relaxation Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem Maximize profit Pref: favors solutions with highest values of individual tuples a static function Maximize answer relevance Pref: favors solutions with most relevant tuples to original query Semi-dynamic function (computed only once with the user query Minimize user effort Pref: favors solutions with least number of user interactions fully dynamic function (changes at every relaxation) Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 9 Minimum Effort Objective Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 10 Query : (Alarm, DSL, Manual) Relaxation nodes Choice nodes 1 1 2 0.3 0.7 0 0 0.3 0 1 1 1 0.7 0 Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 11 Exact algorithm (FastOpt): Upper and lower bound for each node Pruning can be enabled for this algorithm Approximate algorithm (CDR): Nodes cost approximated by probability distribution Relaxation nodes: min/max distribution of Cost Choice nodes: sum distribution of Cost Approximated by computing the convolution cost Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 12 Idea: prune non-optimal relaxations in advance • Upper and lower bound of cost function • Prune branches using upper/lower bounds reasoning Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 13 (1,1,1) (1,1,1) [1,1.938] [1,3] (?,1,1) (?,1,1) [1,1.9] [1,2.33] (1,?,1) [1,3] yes 33% no 67% yes 33% yes 33% no 67% (-,1,1) (#,1,1) (-,1,1) [0,1.4] (#,1,1) [0,0] [0,0] (#,?,1) [1.1] (#,1,?) [1,1.2] yes 36% no 64% (#,-,1) [0,0] (#,#,1) [0,1] [0,2] (1,-,1) [0,2] yes 20% (#,1,-) [0,1] no 67% (#,1,#) [0,0] yes 33% (1,-,1) (#,-,1) [0,1] no 67% no 67% (1,1,#) [0,2] (1,-,?) [1,1.2] yes 60% no 40% (-,-,1) [0,0] yes 33% (1,#,1) [1,1.4] (1,1,-) [0,2] [0,2] (?,-,1) [1,1.4] no 80% Prune!!! (1,?,1) (1,1,?) [2,2.802] [1,3] (?,#,1) [1,1.67] yes 20% (1,-,-) [0,1] (1,#,1) [1,2] no 80% (1,-,#) [0,0] (1,#,?) [1,2] yes 33% no 67% (-,#,1) [0,0] (#,#,1) [0,1] yes 33% (1,#,-) [0,1] no 67% (1,#,#) [0,1] Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 14 Datasets: US Home dataset: 38k tuples, 18 attributes Car dataset: 100k tuples, 31 attributes Syntetic datasets: 20k to 500k tuples Baseline algorithms: Previous works: top-k, query-refinement, ranking Random relaxation Greedy: choose the first non empy otherwise random Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 15 1. Interactive vs non-interactive • Measure user satisfaction with our interactive approach vs relax at-once approaches • 100 Amazon Turk users, 5 queries each 2. Objective functions effectiveness • Compare proposed relaxations with objective function goals (max profit, min effort, max user relevance) • Three tasks • 100 users, 5 queries Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 16 Scalability results: FastOpt (Exact): timely exact answers for small queries CDR (Approximate) real time answers for queries size 10 results close to optimal User study results Interactive methods preferred over non-interactive Objective functions correctly achieve their optimization goals Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 17 6 5 Cost 4 FullTree FastOpt CDR Greedy Random • CDR close to optimal • Random and Greedy produce 1.5 more relaxations 3 2 1 0 3 4 5 Query size 6 7 Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 18 10000 1000 Query me (sec) Exponential behaviour FullTree CDR FastOpt 100 Efficient for small queries 10 1.4 sec for query size 10!!! 1 0.1 0.01 0.001 3 4 5 6 7 Query size 8 9 10 Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 19 Interac>ve# Mul>@ Relaxa>ons# top@ k# Why@ Not# 100%# 80%# 60%# 40%# 20%# 0%# ###################Favored# ########Answers#Quality# #######################Usability# Users prefer interactive systems to relaxations all at once Better quality answers Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 20 Introduce novel principled, user-centric and interactive approach for the empty-answer problem Propose exact and approximate algorithms Demonstrate scalability of proposed techniques with database and query size Show effectiveness of the different objective functions Verify quality of the answers and superior usability of our interactive approach Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 21 Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem Dynamic Semi-Dynamic Sta c Number of steps 1 6 Profit 0.8 0.6 0.4 0.2 Dynamic Semi-Dynamic Sta c 3 4 5 6 Query Size 7 Dynamic Semi-Dynamic Sta c 0.6 5 0.5 4 0.4 3 0.3 2 0.2 1 0.1 0 0 0.7 Answer Quality 1.2 0 3 4 5 6 Query Size 7 3 4 5 6 Query Size 7 Objective functions achieve their goals Dynamic and Semi-Dynamic very similar in performance Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 23 Idea: use cost distribution instead of actual cost. 1. b-size histogram in each node 2. Construct the tree first L levels 3. Expand the branch with the biggest probability of being the optimal Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 24 1. compute the probability that the cost is smaller than the siblings 2. choose the son with the highest probability Expand this! Pr(n1<n2) = 0.6 n1 n2 Pr(n2<n1) = 0.4 Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 25 [Mishra09] C. Mishra and N. Koudas, “Interactive query refinement,” in EDBT,2009. [Roy08] S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania, “Minimum-effort driven dynamic faceted search in structured databases,” in CIKM, 2008. [Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “Probabilistic ranking of database query results,” in VLDB, 2004. [Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto, Modern Information Retrieval, 2011. Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das Davide Mottin A Probabilistic Optimization Framework for the Empty-Answer Problem 26