Adaptive Query Processing with Eddies Amol Deshpande University of Maryland Roadmap Adaptive Query Processing: Motivation Eddies [AH’00] STAIRs [DH’04] and SteMs [RDH’03] Experimental Study Implementation in PostgreSQL [Des’03] Continuous queries [MSHR’02] (very briefly) Open problems Query Processing in Database Systems Declarative Query Database System Results Query Processing: Example select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Database System Name Level Name Course Course Instructor Joe Junior Joe CS1 CS2 Smith Jen Senior Jen CS2 Students Enrolled Courses Query Processing: Example select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Name Level Course Joe Junior CS1 Jen Senior CS2 Students Name Level Course Instructor Jen Senior CS2 Smith Enrolled Enrolled Courses Course Instructor CS2 Smith Courses Name Level Name Course Joe Junior Joe CS1 Jen Senior Jen CS2 Students Enrolled Example Query: Execution Plans SEC SEC E E C CE SE S S C E C S E Students Courses S E Students Enrolled A Query Execution Plan C E Courses Enrolled An alternate Execution Plan Cost-based Query Optimization Estimate cost of each plan and choose the best SEC Cost = g(|SE|, |C|, R) Input sizes E C + SE S C E Courses S E Students Enrolled A Query Execution Plan Cost = f(|S|, |E|, R) = Runtime Parameters Cost (Plan) Cost-based Query Optimization Results Query Optimizer Declarative Query Compiled Query Query Executor Plan Disk(s) Cost-based Query Optimization Results Declarative Query Disk(s) Compiled Query Query Executor Plan Network Query Optimizer Wide area data sources: e.g. remote tables, web data sources Cost-based Query Optimization Results Declarative Query Disk(s) Streaming data e.g. Stock tickers Network logs Sensor networks Compiled Query Query Executor Plan Network Query Optimizer Estimation Errors Cost = g(|SE|, |C|, R) SEC E C SE S C E Erroneous estimation of intermediate Input sizes may not be available result sizes Courses S E Students Enrolled A Query Execution Plan Estimation Errors Cost = g(|SE|, |C|, R) SEC E C SE S C E Courses S E Students Enrolled A Query Execution Plan Unknown runtime parameters Effect on the cost function may be unpredictable How to solve this problem ? More sophisticated estimation techniques Sophisticated summary structures e.g. MHists [PI’97], Wavelets [VWI’98] Feedback loop in the optimization process e.g. [SLMK’01, BC’02] Adaptive query processing Can’t always build and maintain synopses Runtime environments can be very unpredictable So…adapt query plans mid-way during execution Eddies: Extreme Adaptivity static plans late binding interoperator intraoperator per tuple Traditional DBMS Dynamic QEP, Parametric, Competitive Query Scrambling, MidQuery Re-opt XJoin, DPHJ Convergent QP Eddies Telegraph & TelegraphCQ (at UC Berkeley) Eddies [AH’00] SteMs [RDH’03] Continuous queries [MSHR’02, CF’02, C+’03, K+’03] Implementation in PostgreSQL [Des04] Fault-tolerance and load balancing [SHB’04] STAIRs [DH’03] Other work Distributed eddies, Content-based Routing [BB’05] Roadmap Adaptive Query Processing: Motivation Eddies [AH’00] STAIRs [DH’04] and SteMs [RDH’03] Experimental Study Implementation in PostgreSQL [Des’03] Continuous queries [MSHR’02] (very briefly) Open problems Eddies [AH’00] select * from S where pred1(S) and pred2(S) Plans considered by the optimizer pred1(S) pred2(S) S Output pred2(S) pred1(S) S Decision made apriori based on statistics Sort by (1-s)/c, where s = selectivity, c = cost Output Eddies [AH’00] select * from S where pred1(S) and pred2(S) Executing the query using an Eddy pred2(S) Eddy S An eddy operator • Intercepts tuples from source(s) and output tuples from operators • Query executed by routing tuples between the operators • Uses feedback from the operators to route Output pred1(S) Change routing ==> Change query execution plan used Per-tuple State select * from S where pred1(S) and pred2(S) Executing the query using an Eddy pred2(S) Eddy Output S Two Bitmaps 1) Ready bits - which operators can a tuple be routed to next 2) Done bits - which operators has a tuple already been through For selection queries, ready is a bitcomplement of done pred1(S) Example: Ready(t2) = [1, 0] Ready(t1) 1] - can be routed to pred1 either Done(t1) = [0, 1] Done(t2) 0] - done not done pred2 either Eddies: Routing Policy Choosing which operator to route a given tuple to The brain of the eddy Send here 99% of the time Send to the other operator 1% of the time Lottery Scheduling [Avnur 00] Simplified Description 1. Maintain for each operator: tuples sent tuples returned cost per tuple 2. Choose (roughly) based on the above 3. Explore by randomly sending tuples in the wrong orders sent = 100 received = 2 pred2(S) Eddy Output S sent = 10 received = 20 pred1(S) A Join Query select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Name Level Course Joe Junior CS1 Jen Senior CS2 Students Name Level Course Instructor Jen Senior CS2 Smith Enrolled Enrolled Courses Course Instructor CS2 Smith Courses Name Level Name Course Joe Junior Joe CS1 Jen Senior Jen CS2 Students Enrolled Eddies [AH’00] Query execution using an eddy A traditional query plan Output E S S C E C S E C Eddy Output E S E E A key difference: Tuples can’t be arbitrarily routed to any operator E.g. S tuples can’t be routed to E Join C Use ready bits to identify this C Query Execution using Eddies S Insert with key hash(joe) HashTable S.Name Joe Joe Joe S E C E HashTable E.Name Jr Junior Junior Eddy Output HashTable E.Course HashTable C.Course No matches; Eddy processes the next tuple E C Probe to find matches Query Execution using Eddies S Probe Joe Joe S E C Joe CS1 HashTable S.Name Joe Jr Jen Sr E HashTable E.Name Joe CS1 CS1 Jr CS1 Eddy Joe Jr Output HashTable E.Course HashTable C.Course Joe CS2 CS1 Jr CS1 E C Smith Insert Query Execution using Eddies S Probe Jen S E C Jen CS2 Jen Sr. CS2 CS2 Smith Smith Jen HashTable S.Name HashTable E.Name Joe Jr Joe CS1 Jen Sr Jen CS2 Sr. CS2 Jen CS2 Smith Smith Eddy Jen E Output HashTable E.Course HashTable C.Course Joe CS2 CS2 Jr CS1 Smith Jen CS2 E C Smith Probe Per-tuple State S S Join E E Join C Ready 1 0 Done 0 0 Joe S E C HashTable S.Name E HashTable E.Name Junior Eddy Output HashTable E.Course E HashTable C.Course C Per-tuple State S S Join E E Join C Ready 1 1 Done 0 0 S E C Joe CS1 HashTable S.Name Joe Jr Jen Sr Eddy E HashTable E.Name Output HashTable E.Course HashTable C.Course CS2 E C Smith Per-tuple State S S Join E E Join C Ready 0 1 Done 1 0 Joe S E C Jr HashTable S.Name Joe Jr Jen Sr E HashTable E.Name Joe CS1 CS1 Eddy Output HashTable E.Course HashTable C.Course CS2 E C Smith Eddies: Postmortem Output E S Students Output C E E Courses C Course Instructor CS2 Smith Enrolled Courses S E Students Name Level Joe Junior Jen Senior Enrolled Name Level Name Course Course Instructor Name Course Joe Junior Joe CS1 CS2 Smith Jen CS2 Jen Senior Eddy executes different query execution plans for different parts of data Joins and Lottery Scheduling Lottery scheduling doesn’t work well with joins Example: Delayed Data Sources SETUP: |S E| >> |E C| Execution plan 1 Execution plan 2 SEC SEC E E C CE SE S S S C E E C C Cost (Plan 1) > Cost (Plan 2) S E E Example: Delayed Data Sources SETUP: |S E| >> |E C| E and C arrive early; S is delayed S E C time SETUP: |S E| >> |E C| E and C arrive early; S is delayed S0 sent and received suggest S Join E is better option S S –S0 E S E C C Eddy time (S –S0)E Eddy learns the correct sizes Eddy decides to route E to E Too Late !! C S E HashTable S.Name HashTable E.Name S0 E Output HashTable E.Course HashTable C.Course S0 E SE C E C SETUP: |S E| >> |E C| E and C arrive early; S is delayed State got embedded as a result of earlier routing decisions S E S S C C E E S E C Eddy E HashTable S.Name HashTable E.Name S E Output HashTable E.Course HashTable C.Course SE C Execution Plan Used E Too Late !! Query is executed using the worse plan. C Joins and Lottery Scheduling Lottery scheduling doesn’t work well with joins Not clear how any routing policy can work without reasonable knowledge of future Whatever the current state in the join operators, an adversary can send tuples to make it look very bad Two possible solutions: Allow manipulation of state (STAIRs) [DH’04] Don’t embed state in the operators (SteMs) [RDH’03] Roadmap Adaptive Query Processing: Motivation Eddies [AH’00] STAIRs [DH’04] and SteMs [RDH’03] Experimental Study Implementation in PostgreSQL [Des’03] Continuous queries [MSHR’02] (very briefly) Open problems STAIRs [DH’04] Expose join state to the eddy Provide state management primitives That guarantee correctness of execution That can be used to manipulate embedded state in the operators Also allow support for cyclic queries etc New Operator: STAIR S HashTable S.Name S E C Eddy E HashTable E.Name Output HashTable E.Course E HashTable C.Course C New Operator: STAIR Storage, Transformation and Access for Intermediate Results S.Name STAIR HashTable E.Name STAIR HashTable S E C Eddy Output HashTable HashTable E.Course STAIR C.Course STAIR Query execution using STAIRS Similar to using Join Operators Build into S.Name STAIR Probe into E.Name STAIR S.Name STAIR HashTable s1 E.Name STAIR HashTable s1 s1 s1 S E C Eddy Output HashTable HashTable E.Course STAIR C.Course STAIR STAIR: Operations Build (insert): Insert the given tuple into the STAIR Probe (lookup): Find matching tuples for the given tuple State Management Operations: Demotion Promotion State Management Primitive: Demotion Replace a tuple in a STAIR with a projection of that tuple S.Name STAIR HashTable s1 Demoting e2c1 to e2 S E C E.Name STAIR HashTable e1 e2 e2c1 e2c1 e1 e2c1 e1 Eddy Output s1e1 e2 s1e1 e2 HashTable HashTable c1 e2 s1e1 E.Course STAIR Can be thought of as undoing work C.Course STAIR State Management Primitive: Promotion Replace a tuple in a STAIR with the result of joining it with other tuples S.Name STAIR Two arguments: Promoting e1 using E C • A tuple • A join to be used to promote this tuple S E C HashTable E.Name STAIR s1 HashTable e1 e1c1 e2c1 e1c1e1 Eddy Output e1 e1 e1c1 HashTable HashTable c1 e2 s1e1 e1 E.Course STAIR Can be thought of as precomputation of work C.Course STAIR STAIRs: Correctness Theorem: For any sequence of applications of the state management operations, STAIRs will produce the correct query output. STAIRs will produce every result tuple There will be no spurious duplicates Lifting Burden of History: Delayed Data Sources SETUP: |S E| >> |E C| E and C arrive early; S is delayed S0 S S E HashTable S.Name HashTable E.Name S0 E E S E C C time Eddy learns the correct selectivities Eddy decides to route E to E C Eddy Output HashTable E.Course HashTable C.Course S0 E C E C SETUP: |S E| >> |E C| E and C arrive early; S is delayed S.Name STAIR HashTable S0 S0 E.Name STAIR HashTable S E E S E C C EC Eddy E time E EC HashTable learns the correct EddyEddy decides to migrate E selectivities decides to route C ByEddy promoting E using E E to CE EEC C C.Course STAIR Output HashTable S0 E E E.Course STAIR SETUP: |S E| >> |E C| E and C arrive early; S is delayed S.Name STAIR HashTable S0 E.Name STAIR HashTable S S –S0 E S E C C EC S –S0 (S –S0) E C Output Eddy HashTable time HashTable C C.Course STAIR S0 E E E.Course STAIR E C S.Name STAIR HashTable S S C E E.Name STAIR HashTable S0 EC E UNION S E E S E C E C Output HashTable S – S0 C Eddy HashTable C Most of the data is processed using the correct plan C.Course STAIR SE E E.Course STAIR Further Motivating Adaptive State Management Eager pre-computation for faster response times Query scrambling [UFA’98] Partial results [RH’02] Selective caching of intermediate results Continuous queries over streams Cyclic queries Adapting the join spanning tree used Making State Migration Decisions Another policy question Optimal migration decisions Requires knowledge of future selectivities and the sizes of relations Roadmap Adaptive Query Processing: Motivation Eddies [AH’00] STAIRs [DH’04] and SteMs [RDH’03] Experimental Study Implementation in PostgreSQL [Des’03] Continuous queries [MSHR’02] (very briefly) Open problems Alternative: SteMs [RDH’03] Don’t embed the state in the operators at all Note: Not the original motivation for SteMs Focus was on increasing opportunities for adaptivity by breaking up the join operators We will focus on a very simplistic version of the operator Query Execution using SteMs Store S tuples Allow probes using E tuples ie. If an E tuple is routed to it, find matching S tuples Could use any indexing technique to find matches S E C S SteM Store E tuples Allow probes using S and C tuples Need to build two internal indexes E SteM Eddy C SteM Query Execution using SteMs S SteM Probe Joe Jr Jen Sr Insert E SteM Jen S Jen E C CS2 Smith Jen Sr. CS2 Jen Smith CS2 Jen CS2 Joe CS1 Jen CS2 CS2 Eddy Jen Jen CS2 Smith CS2 C SteM CS2 Smith Jen Sr. CS2 Probe Smith Query Execution using SteMs State inside the operators is independent of previous routing decisions Because no intermediate tuples are ever stored Doesn’t have the same problem as the join or STAIR operators Optimal routing policy easy to write down Similarities to queries with only selections But not storing intermediate results increases the computation cost significantly SteMs: Drawbacks Recomputation of intermediate result tuples Constrained plan choices Available plans depend highly on the arrival order SETUP: |S E| >> |E C| E and C arrive early; S is delayed S0 S –S0 can only be routed to E SteM for probing and is forced to be executed as (S Join E) Join C S SteM S S0 E SteM E C E S E time C Eddy C SteM C Under the mechanism, there is no way to execute the other plan for this setup SteMs: Drawbacks Recomputation of intermediate result tuples Constrained plan choices Available plans depend highly on the arrival order Though more subtle, the second drawback might be the more important one Recap An eddy operator Can affect the query execution plan(s) used by routing different tuples differently Eddy w/ Selections: Well understood Even if selections are correlated Babu, Munagala et al [SIGMOD 2004, ICDT 2005] Recap Eddies for multi-way joins Sort-merge Hybrid-Hash Opportunities for adaptivity depend on the join operators used Higher adaptivity tends to push logic into the eddy ==> Routing policies very important Index-nested Nested-loop Joins loop joins Blocking opeators Similarities to See [AH’00] Little adaptivity selections Pipelined/ Symmetric Hash Join SteMs/ STAIRs Suffers from Policy issues not state accumulation well-understood problems Roadmap Adaptive Query Processing: Motivation Eddies [AH’00] STAIRs [DH’04] and SteMs [RDH’03] Experimental Study Implementation in PostgreSQL [Des’03] Continuous queries [MSHR’02] (very briefly) Open problems Implementation Details In PostgreSQL Database System code base In the context of TelegraphCQ project Highly efficient implementation [SIGREC’04] Eddy, SteMs, STAIRs export get_next() functions Routing decisions are made per batch Can control batch size Routing decisions made for all possible ready bitmaps Decisions are encoded in arrays that are indexed with ready bits Efficiently find the operator to route to Results - Overheads (1) All plans have identical costs, so adaptivity plays no role Results - Overheads (2) Policies used for experiments Routing policy: Observe: Selectivities of predicates on base tables Domain sizes of join attributes Compute join selectivities and use them to route tuples Migration policy: Tie state migration decisions to routing decisions Follow the routing policy decisions to make sure that most tuples are routed correctly Caveats : May end doing migrations late in the query execution May thrash State Migration: Illustrative Example select * from customer c, orders o, lineitem l where c.custkey = o.custkey and o.orderkey = l.orderkey and c.nationkey = 1 and c.acctbal > 9000 and l.shipdate > date ’1996-01-01’ Setup: lineitem arrives sorted on shipdate ==> selectivity(l.shipdate > …) very low initially ==> orders routed to join with lineitem (bad) No explicit delays introduced Illustrative Example (1) Illustrative Example (2) Experiments: Synthetic Workload Modeled after the Wisconsin Benchmark 20 Tables for varying sizes Randomly generated queries Environment Rates proportional to table sizes; no delays or Random initial delays introduced or Random data rates Traditional vs STAIRs SteMs vs STAIRs Joins vs STAIRs Roadmap Adaptive Query Processing: Motivation Eddies [AH’00] STAIRs [DH’04] and SteMs [RDH’03] Experimental Study Implementation in PostgreSQL [Des’03] Continuous queries [MSHR’02] (very briefly) Open problems Continous Query Processing Eddies ideal for executing continuous queries over data streams Dynamic runtime conditions make a static plan unsuitable Queries typically executed over sliding windows Find average over last one week Note: Continuous vs Multi-query processing Not identical Data streams literature does not make this difference explicit Application environments tend to have a large number of simultaneous queries Continous Query Processing CACQ [Madden et al 2002] Focus on sharing work as much as adaptivity Uses SteMs augmented with a deletion operator To handle sliding windows Also uses predicate indexes For handling a large number of queries on the same set of streams but with different predicates E.g. millions of stock alerts over a few streams Roadmap Adaptive Query Processing: Motivation Eddies [AH’00] STAIRs [DH’04] and SteMs [RDH’03] Experimental Study Implementation in PostgreSQL [Des’03] Continuous queries [MSHR’02] (very briefly) Open problems Some open problems (1) Eddies for continuous query processing Much work since CACQ, but not a solved problem E.g. computational inefficiency of SteMs Many other proposed CQ architectures face the same problem MJoins (NiagaraCQ) Stanford STREAM processor (earlier version) Later added intermediate result caches Note: These two don’t use eddies explicitly Routing policies for CQ still an open question Different from routing policies for non-CQ queries Some open problems (2) Routing policies Whether eddies will succeed depends on the routing policies Little work so far... SteMs, STAIRs Theoretical analysis of optimization space, and practical viability analysis needed Especially in the context of continuous query processing Some open problems (3) Eddies for multi-query processing (non-CQ) SteMs may be sufficient for CQ processing, but not for normal multi-query processing Parallel, distributed environments, P2P, Grid.. Disk: Flexibility demanded by adaptive techniques at odds against the careful scheduling typically done by DBMSs XJoins Very little work on understanding this Some open problems (4) Optimization with expanded plan space Eddies can explore a plan space much larger than traditional plan space They allow relations to be broken into pieces, with each piece executed separately Can we explore this plan space in a nonadaptive setting ? Recent work on: Conditional Planning [Deshpande et al, ICDE 2005] Content-based Routing [Babu et al, VLDB 2005] Summary Increasing need for adaptivity Eddy: A highly adaptive query processor Executes queries by routing tuples through operators SteMs, STAIRs New operators proposed to handle problems with traditional join operators Very promising especially for continuous and wide-area query processing Exciting research lies ahead… The End Questions ? Fatal Flaw: Burden of Routing History Routing decisions get embedded in the state S E C Future adaptibility is severly constrained S HashTable S.Name E HashTable E.Name Joe Jr Joe Jen Sr Jen Eddy CS1 CS2 Smith Output HashTable E.Course HashTable C.Course Joe CS2 Jen Jr CS1 CS2 E C Smith Example: Delayed Data Sources SETUP: |S E| >> |E C| Execution plan 1 Execution plan 2 SEC SEC E E C CE SE S S S C E E C C Cost (Plan 1) > Cost (Plan 2) S E E Example: Delayed Data Sources SETUP: |S E| >> |E C| E and C arrive early; S is delayed S E C time A plan may have to be chosen without any Earliest time sufficient information statistical information about the data may be available to choose optimal plan Tricky State Configurations: 1 Want to undo the decision to route E1 to S S HashTable S.Name E HashTable E.Name E1 E2C S0 S E C Eddy Result S0EC already produced Output HashTable E.Course HashTable C.Course S0E1 E2 C E E C Tricky State Configurations: 2 S S E C I E HashTable S.Name HashTable E.Name S E1 E2C1 E2C2I HashTable E.Course HashTable C.Course SE1 E2 C1 C2I Eddy HashTable C.Intstructor HashTable I.Instructor C2 SE1C1 SE2C1 I C I E C