CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads Debabrata Dash, Anastasia Ailamaki, Neoklis Polyzotis 1 High Cost of DB Tuning • Enterprises spend a lot on DBMS DBMS Cost DBA Task Breakdown 1000 $/TB/yr 20 15 Installation Backup Other Tuning & Admin 10 5 0 Disk HW/SW DBA Sources: MS Azure, Forrester Research; 2010 oracle.com/us/products/database/039433.pdf Need to reduce administration and tuning cost 2 A New Approach to Index Tuning Index Tuning: Select indexes that maximize performance Existing Approaches CoPhy Portability No Yes Scalability Sampling/pruning Yes Generality No Yes Quality Feedback Not with constraints Yes Interactivity No Yes CoPhy: Convert to a compact Binary Integer Program (BIP). Solve using mature solvers. − BIP: Minimize f ( x) s.t. g ( x) 0 where f and g are linear, x {0,1}n Outline • Introduction • BIP formulation – – – – Existing formulation Discovering structure Exploiting the structure Benefits • Experimental Results • Conclusion 4 Index Tuning Problem T1 Join T2 Workload Candidates T1 T2 I1 I2 Index Tuning I3 I4 Constraints Optimal Indexes ? ? Existing Approaches Index Advisor Fast What-If Optimizer [INUM07,C-PQO08] • Greedy approaches • Bottom-up [CN97, VZ+00] • Top-down [BC05] • BIP-based approach [CS96, PA07] 6 What-If DBMS Optimizer Existing BIP T1 t1 T2 t2 I1 I2 I3 I4 x 1x 2 x3 x4 {0,1} A1 , A2 , ... are atomic configurations C1 , C2 , ... are corresponding costs {0,1} minimize 1C1 2C2 ... Min. Cost Sucht that : 1 2 1 Select one atomic conf. Index presence 11t11, 1 t1 t2 1, t2 1 ... Program size = O(# T1 Indexes x # T2 Indexes) 7 CoPhy vs. Existing Approaches Index Advisor Fast What-If Optimizer What-If DBMS Optimizer [INUM07,C-PQO08] CoPhy Index Advisor Fast What-If Optimizer 8 What-If DBMS Optimizer Fast What-If: INUM Plan What-If Optimizer T1 Join T2 Template Plan Place Holder Place Holder I1 I3 I1 I4 Instantiated Plan I1 Instantiated Plan I1 I3 I4 A template plan can be reused for many index combinations 9 Cost Structure Cost of template plan under A i[1, n ], a A[ i ] ia Atomic Configuration A Cost of optimal plan under A min{k i[1,n ],a A[i ] kia } Linear Composability of Query Costs Linear Composability is exhibited by both INUM, C-PQO 10 Exploiting Linear Composability minimize t1 11 x1 12 x2 13 t2 21 x3 22 x4 23 t1 T1 T2 t2 I1 I2 I3 I4 x 1x 2 x3 x4 Such that : t1 x1 x2 1 t2 x3 x4 1 Program size = O(# T1 Indexes + # T2 Indexes) BIP Solver explores the index combinations with the knowledge of the objective Exposing the cost model leads to linearly growing BIPs 11 More Complex BIPs We extend the BIP to handle: • Complex queries • Update costs • Complex Constraints [Bruno08]: – – – – – Storage constraint Index constraints Column constraints Generators Soft constraints BIP formulation does not restrict the expressive power of the DBA 12 CoPhy’s Architecture Workload Candidate Generator BIP Generator BIP Solver INUM What-If DBMS Optimizer Constraints CoPhy Selected Bounds Indexes Theorem: CoPhy computes an optimal index configuration 13 Unique Features Enabled by the BIP • Portability: No change to the optimizer – Requires only the what-if APIs • Scalability: By solving large BIPs in seconds – No need to select workload, candidate indexes • Generality: The formulation can be reused • Quality feedback: All modern BIP solvers provide this – Can stop at near-optimal values • Interactive tuning: By solving BIPs incrementally – Interactively add/drop candidate indexes – Enables efficient multi-objective optimization 14 Outline • Introduction • BIP formulation – – – – Existing formulation Discovering structure Exploiting the structure Benefits • Experimental Results • Conclusion 15 Experimental Setup • • • • Two commercial DBMS -- SystemA, SystemB 1 GB TPC-H database 1 GB index size constraint Algorithms: – ToolA, ToolB – the commercial designers – ILP – The state of the art BIP [PA07] – CoPhyA, CoPhyB – Our approach on the systems • Queries generated using 15 TPC-H templates • Metric: Cost with Suggested Indexes Savings 1 Original Cost 16 Better Speedup Comparison ToolA ToolB CoPhyA 94 100 97 94 CophyB 97 94 97 % Savings 80 61 60 40 35 61 32 61 29 20 0 250 500 1000 Workload Size on SystemA 250 500 1000 Workload size on SystemB # of queries SystemA # of queries SystemB # Candidates: CoPhy ~2000, ToolA ~200, ToolB ~50 • Replacing heuristic algorithms improves savings • Using larger set of candidates also helps 17 Tool Execution Time Comparison ToolA CoPhyA Execution time (minutes) 1000 ToolB 419 ILP CoPhy CophyB 66.1 100 39.9 Better 23.1 11.8 10 6.2 6.1 8.3 4.8 3.2 2 2.2 1.2 1 8.3 4.8 2 1 250 500 1000 Workload on SystemA # of queries SystemA 250 500 1000 250 500 1000 Workload on SystemB Workload on SystemA # of queries SystemB # of queries SystemA Scalable index tuning eliminates the workload selection problem 18 Conclusion • Index tuning using a novel compact BIP – – – – Generic, scalable, efficient, and high quality Quality feedback Incremental index selection Multi-objective optimization • Future Work: – Incorporating other workload types – Applying the approach to other tuning problems 19 Backup Sildes 20 BIP for Multiple Plans C1 x11 111 x12 112 x13 113 x14 114 25 y1 x11 x12 y1 x13 x14 y1 y1 25 Matching logic C1 x23 211 x24 212 x23 213 x14 214 25 y2 x23 x24 y2 T1 T2 I1 I2 I3 I4 x25 x26 y2 y1 y2 ... 1 One plan per query Minimize C1 C2 y2 15 Minimize cost x23x24 21 x25x26 More Complex BIPs • Storage constraint x11 x1 Build indexes when used ..... s x S i i Size under a fixed constant 22 CoPhy vs. FLP 2399 Better Tool execution time (seconds) 2500 2000 Solving Build Time 1379 1500 1000 INUM 710 500 499 293 123 0 FLP CoPhy 250 FLP CoPhy 500 # of queries FLP CoPhy 1000 Offloading the search process to the solver improves both the problem construction and solving times 23