SQL Query Optimization: Why Is It So Hard To Get Right? David J. DeWitt Microsoft Jim Gray Systems Lab Madison, Wisconsin dewitt@microsoft.com © 2010 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied in this presentation. A Third Keynote? 2 A Third Keynote? Got to show off the PDW Appliance Got to tell you about SQL 11 The “Impress Index” How can I Possibly 1 My boss’s boss 2 IMPRESS YOU? My boss Me 3 Cool! Awesome! Day 1 (Ted Kummert) Day 2 (Quentin Clark) Day 3 (David DeWitt) How About a Quiz to Start! • Who painted this picture? o Mondrian? o Picasso? o Ingres? • Actually it was the SQL Server query optimizer!! o Plan space for TPC-H query 8 as the parameter values for Acct-Bal and ExtendedPrice are varied o Each color represents a different query plan o Yikes! P1 P3 P2 P4 SQL Server 4 Who Is This Guy Again? DeWitt • Spent 32 years as a computer science professor at the University of Wisconsin • Joined Microsoft in March 2008 o Runs the Jim Gray Systems Lab in Madison, WI o Lab is closely affiliated with the DB group at University of Wisconsin • 3 faculty and 8 graduate students working on projects o Built analytics and semijoin components of PDW V1. Currently working on a number of features for PDW V2 Today … I am going to talk about SQL query optimization You voted for this topic on the PASS web site Don’t blame me if you didn’t vote and wanted to hear about map-reduce and no-sql database systems instead Starting with the fundamental principals Map-Reduce Query Optimization My hope is that you will leave understanding why all database systems sometimes produce really bad plans Anonymous Quote 7 The Role of the Query Optimizer (100,000 ft view) SQL Statement Magic Happens Query Optimizer Awesome Query Plan What’s the Magic? Consider Query 8 of the TPC-H benchmark: Select o_year, sum(case A very big haystack to when nation = 'BRAZIL' then volume be searching through else 0 end) / sum(volume) from ( Should not take hours select YEAR(O_ORDERDATE) as o_year, or days to pick* a(1plan! L_EXTENDEDPRICE - L_DISCOUNT) as volume, n2.N_NAME as nation from PART, SUPPLIER, LINEITEM, ORDERS, CUSTOMER, NATION n1, … There about 22 million NATION n2, REGION alternative ways of executing where P_PARTKEY = L_PARTKEY and S_SUPPKEY = L_SUPPKEY this query! and L_ORDERKEY = O_ORDERKEY and O_CUSTKEY = C_CUSTKEY and C_NATIONKEY = n1.N_NATIONKEY and n1.N_REGIONKEY = R_REGIONKEY and R_NAME = 'AMERICA‘ and S_NATIONKEY = n2.N_NATIONKEY and O_ORDERDATE between '1995-01-01' and '1996-12-31' and P_TYPE = 'ECONOMY ANODIZED STEEL' The QO must select a plan that and S_ACCTBAL <= constant-1 runs in seconds or minutes, not and L_EXTENDEDPRICE <= constant-2 ) as all_nations days or weeks! group by o_year order by o_year Plan 1 Plan 2 Plan 3 Plan 4 Plan 5 Some Historical Background • Cost-based query optimization was invented by Pat Selinger as part of the IBM System R project in the late 1970s (System R became DB2) • Remains the hardest part of building a DBMS 30+ years later o Progress is hindered by fear of regressions o Far too frequently the QO picks an inefficient plan • Situation further complicated by advances in hardware and the rest of the DBMS software o Hardware is 1000X bigger and faster o DB software is 10X faster o Queries over huge amounts of data are possible IF the QO picks the right plan Huge! 10X Queries 1000X Software Hardware 10 More Precisely: The Role of the Query Optimizer Transform SQL queries into an efficient execution plan Database System SQL Query Parser Logical operator tree Logical operators: what they do e.g., union, selection, project, join, grouping Query Optimizer Query Execution Engine Physical operator tree Physical operators: how they do it e.g., nested loop join, sort-merge join, hash join, index join 11 Reviews A First Example Date CID … SELECT Average(Rating) FROM Reviews WHERE MID = 932 or Avg (Rating) Avg_agg [Cnt, Sum] Select MID = 932 Rating … … Query Execution Engine Query Optimizer Parser … MID Filter MID = 932 Avg_agg [Cnt, Sum] Index Lookup MID = 932 Reviews MID Index Scan Logical operator tree Reviews Query Plan #1 Reviews Query Plan #2 12 Query Plan #1 Avg_agg [Cnt, Sum] Filter MID = 932 Scan Reviews • Plan starts by scanning the entire Reviews table o # of disk I/Os will be equal to the # of pages in the Reviews table o I/Os will be sequential. Each I/O will require about 0.1 milliseconds (0.0001 seconds) • Filter predicate “MID = 932” is applied to all rows • Only rows that satisfy the predicate are passed on to the average computation 13 Query Plan #2 Avg_agg [Cnt, Sum] Index Lookup MID = 932 MID Index Reviews • MID index is used to retrieve only those rows whose MID field (attribute) is equal to 932 o Since index is not “clustered”, about one disk I/O will be performed for each row o Each disk I/O will require a random seek and will take about 3 milliseconds (ms) • Retrieved rows will be passed to the average computation 14 Which Plan Will be Faster? Vs. Avg_agg [Cnt, Sum] Avg_agg [Cnt, Sum] • Index Lookup MID = 932 Filter MID = 932 MID Index Scan Reviews Reviews Query Plan #1 Query Plan #2 ? • ? or • • Query optimizer must pick between the two plans by estimating the cost of each To estimate the cost of a plan, the QO must: o Estimate the selectivity of the predicate MID=932 o Calculate the cost of both plans in terms of CPU time and I/O time The QO uses statistics about each table to make these estimates The “best” plan depends on how many reviews there are for movie with MID = 932 ? Best Query Plan How many reviews for the movie with MID = 932 will there be? 15 A Slightly More Complex Query • Consider the query: • Optimizer might first enumerate three physical plans: Cost = 100 seconds Filter Rating > 9 Filter 7/1 < Date > 7/31 SF = .01 SF = .10 SELECT * FROM Reviews WHERE 7/1< date < 7/31 AND rating > 9 Cost = 11 seconds Cost = 25 seconds Filter Rating > 9 Filter 7/1 < Date < 7/31 Index Lookup 7/1 < Date > 7/31 Index Lookup Rating > 9 Sequential Scan Reviews Date Index Reviews SF = .10 SF = .01 Rating Index Reviews • Then, estimate selectivity factors • Then, calculate total cost • Finally, pick the plan with the lowest cost 16 Query Optimization: The Main Steps ✓ 1 Enumerate logically equivalent plans by applying equivalence rules 2 For each logically equivalent plan, enumerate all alternative physical query plans 3 Estimate the cost of each of the alternative physical query plans 4 Run the plan with lowest estimated overall cost Equivalence Rules Select and join operators commute with each other Select Select Select Select Customers Customers Customers Movies Reviews Reviews Join Customers Join operators are associative Join Join Join Customers Reviews Join Customers Join Reviews Movies 18 Equivalence Rules (cont.) Select operator distributes over joins Select Join Select Join Customers Project [Name] Project [CID, Name] Reviews Reviews Customers Project operators cascade Project [Name] Customers Customers 19 Example of Equivalent Logical Plans SELECT M.Title, M.Director FROM Movies M, Reviews R, Customers C WHERE C.City = “N.Y.” AND R.Rating > 7 AND M.MID = R.MID AND C.CID = R.CID • One possible logical plan: Join C.CID = R.CID Project Find titles and director names of movies with a rating > 7 from customers residing in NYC M.Title, M.Director R.MID = M.MID Movies Join Movies MID Title Director Earnings 1 C.City = “N.Y” Customers Select Customers Select Reviews R.Rating > 7 2 … Reviews Date CID MID Rating 5 7/3 11 2 8 11 7/3 5 2 4 … … CID Name Address City 20 Five Logically “Equivalent” Plans The “original” plan Project Project Selects distribute over joins rule Join Join Select Join Select Movies Movies Join Select Select Customers Reviews Project Customers Reviews Project Join Select Join Project Movies Select Join Join Select Select Customers Movies Select Movies Select Join Join Reviews Customers Customers Reviews Selects commute rule Reviews 21 Four More! Project The “original” plan Join Join Movies Project Select Select Select Customers Join commutativity rule Reviews Join Join Select Project Customers Movies Reviews Join Join Select Select Movies Project Customers Select commutativity rule Select Join Join Reviews Select Reviews Select Customers Join Movies Project Select Join Reviews Customers Movies 22 9 Logically Equivalent Plans, In Total Project Project Project Join Join Project Join Join Join Select Movies Select Movies Movies Select Movies Join Select Select Join Select Select Customers Reviews Join Select All 9 logical Customers plans will produce the same result Reviews Customers Reviews Customers Reviews Project For each of these 9 plans there is a large Project number of Join Join alternative physical plans that the optimizer can choose from Join Select Select Movies Project Select Select Customers Movies Project Select Join Select Project Reviews Join Join Join Select Customers Reviews Join Select Select Join Select Movies Customers Customers Join Reviews Reviews Customers Movies Movies Reviews 23 Query Optimization: The Main Steps ✓ ✓ 1 Enumerate logically equivalent plans by applying equivalence rules 2 For each logically equivalent plan, enumerate all alternative physical query plans 3 Estimate the cost of each of the alternative physical query plans 4 Run the plan with lowest estimated overall cost Physical Plan Example • Assume that the optimizer has: o Three join strategies that it can select from: • nested loops (NL), sort-merge join (SMJ), and hash join (HJ) o Two selection strategies: • sequential scan (SS) and index scan (IS) • Consider one of the 9 logical plans • There are actually 36 possible physical alternatives for this single logical plan. Project (I was too lazy to draw pictures of all 36). Join • With 9 equivalent logical plans, 324 = (9 * 36) physical plans that the Jointhere areMovies optimizer must enumerate and cost as part of the search for the best Select Select execution plan for the query Customers Reviews And this was a VERY simple query! Project • • Here is one possible physical plan Later we will look at how dynamic programming is used to explore the space NL of logical and physical plans w/o enumerating the entire plan space HJ SS Customers Movies IS Reviews 25 Query Optimization: The Main Steps ✓ ✓ 1 Enumerate logically equivalent plans by applying equivalence rules 2 For each logically equivalent plan, enumerate all alternative physical query plans ✓3 Estimate the cost of each of the alternative physical query plans. • • 4 Estimate the selectivity factor and output cardinality of each predicate Estimate the cost of each operator Run the plan with lowest estimated overall cost Selectivity Estimation • Task of estimating how many rows will satisfy a predicate such as Movies.MID=932 • Plan quality is highly dependent on quality of the estimates that the query optimizer makes • Histograms are the standard technique used to estimate selectivity factors for predicates on a single table 5 4 3 • Many different flavors: o o o o Equi-Width Equi-Height Max-Diff … 2 1 0 27 Histogram Motivation Some examples: # of Reviews for each customer (total of 939 rows) 180 #1) Predicate: CID = 9 Actual Sel. Factor = 55/939 = .059 157 160 140 125 #2) Predicate: 2 <= CID <= 3 Actual Sel. Factor = 135/939 = .144 120 100 83 83 80 37 40 20 56 55 52 60 17 6 10 5 48 38 56 43 37 19 5 7 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Customer ID (CID) values in Reviews Table In general, there is not enough space in the catalogs to store summary statistics for each distinct attribute value The solution: histograms 28 Equi-Width Histogram Example Count All buckets cover roughly the same key range 180 160 140 120 100 Example #1: Predicate: CID = 9 Example #2: Predicate: CID = 5 Actual Sel. Factor = 55/939= .059 Actual Sel. Factor = 10/939 = .011 Estimated Sel. Factor = (186/4)/939 = .050 Estimated Sel. Factor = (309/4)/993 =.082 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Count CID Values Yikes! 8X error!! 350 309 300 250 206 186 200 150 146 92 100 50 0 1-4 5-8 9-12 13-16 Equi-width histogram 17-20 29 Equi-Height Histograms Count 180 160 140 120 Divide ranges so that all buckets contain roughly the same number of values 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Count 200 156 157 1-5 6 142 150 148 161 175 100 50 0 7-8 9-11 12-15 16-20 Equi-height histogram 30 Equi-width vs. Equi-Height Equi-width 350 309 300 250 150 206 186 200 146 92 100 50 0 1-4 5-8 9-12 13-16 Example #2: Predicate: CID = 6 Example #1: Predicate: CID = 5 Actual Sel. Factor = 157/939 = .167 Actual Sel. Factor = 10/939 = .011 Estimated Sel. Factor = (309/4)/993 = .082 Estimated Sel. Factor = (309/4)/993 =.082 17-20 Equi-height 200 156 157 142 150 148 161 175 Example Example#1: #2: Predicate: Predicate:CID CID==56 100 Actual ActualSel. Sel.Factor Factor==10/939 157/939 == .011 .167 Estimated EstimatedSel. Sel.Factor Factor==(156/5)/993 (157/1)/993==.033 .167 50 0 1-5 6 7-8 9-11 12-15 16-20 31 Histogram Summary • Histograms are a critical tool for estimating selectivity factors for selection predicates Errors still occur, however! • Other statistics stored by the DBMS for each table include # of rows, # of pages, … 32 Query Optimization: The Main Steps ✓ ✓ 1 Enumerate logically equivalent plans by applying equivalence rules 2 For each logically equivalent plan, enumerate all alternative physical query plans ✓3 Estimate the cost of each of the alternative physical query plans. • • 4 Estimate the selectivity factor and output cardinality of each predicate Estimate the cost of each operator Run the plan with lowest estimated overall cost Estimating Costs • Two key costs that the optimizer considers: o I/O time – cost of reading pages from mass storage o CPU time – cost of applying predicates and operating on tuples in memory • Actual values are highly dependent on CPU and I/O subsystem on which the query will be run vs. o Further complicating the job of the query optimizer • For a parallel database system such as PDW, the cost of redistributing/shuffling rows must also be considered o Were you paying attention or updating your Facebook page when I talked about parallel query processing 2 years ago? 34 An Example Reviews Date CID MID • Query: Rating o SELECT Avg(Rating) FROM Reviews WHERE MID = 932 • Two physical query plans: Avg_agg [Cnt, Sum] Avg_agg [Cnt, Sum] Filter MID = 932 Which plan is cheaper ??? Index Lookup MID = 932 MID Index Sequential Scan Reviews Reviews Plan #1 Plan #2 35 Plan #1 Avg_agg [Cnt, Sum] Filter MID = 932 Scan • Average computation is applied to 100 rows • Filter is applied to 10M rows • The optimizer estimates that 100 rows will satisfy the predicate • Table is 100K pages with 100 rows/page • Sorted on date Optimizer estimates total execution time of 9 seconds • At 0.1 microseconds/row, avg consumes .00001 seconds of CPU time • At 0.1 microseconds/row, filter consumes 1 second of CPU time • Reviews is scanned sequentially at 100 MB/second • I/O time of scan is 8 seconds Reviews 36 Plan #2 Avg_agg [Cnt, Sum] • Average computation is applied to 100 rows Index Lookup MID = 932 • 100 rows are estimated to satisfy the predicate MID Index Optimizer estimates total execution time of 0.3 seconds • At 0.1 microseconds/row, average consumes .00001 seconds of CPU time • 100 rows are retrieved using the MID index • Since table is sorted on date field (and not MID field), each I/O requires a random disk I/O – about .003 seconds per disk I/O • I/O time will be .3 seconds Reviews The estimate for Plan #1 was 9 seconds, so Plan #2 is clearly the better choice 37 But … • What if the estimate of the number of rows that satisfy the predicate MID = 932 is WRONG? o E.g. 10,000 rows instead of 100 rows Sequential scan is better here 1000 Time (#sec) 100 10 1 Non-clustered Index is better here 0.1 0.01 10 100 Sequential Scan 1,000 # of rows 10,000 100,000 Non-Clustered Index 38 Estimating Join Costs • Three basic join methods: M.Title, M.Director Join C.CID = R.CID C.City = “NY” Select Customers Project R.MID = M.MID o o o Nested-loops join Sort-merge join Hash-join Movies Join Select Reviews R.Rating > 7 • Very different performance characteristics • Critical for optimizer to carefully pick which method to use when 39 Sort-Merge Join Algorithm Reviews.MID = Movies.MID Sort Merge Join Sort Reviews Movies (|R| pages) (|M| pages) Sort Reviews on MID column Cost = 4 * |R| I/Os (unless already sorted) Sort Movies on MID column Cost = 4 * |M| I/Os (unless already sorted) “Merge” two sorted tables: Scan each table sequential in tandem { For current row r of Reviews Cost = |R| + |M| I/Os For current row m of Movies if r.MID = m.MID produce output row Advance r and m cursors } Total I/O cost = 5*|R| + 5*|M| I/Os Main Idea: Sort R and M on the join column (MID), then scan them to do a ``merge’’ (on join column), and output result tuples. 40 Nested-Loops Join Reviews.MID = Movies.MID Nested Loops Join Reviews (|R| pages) Movies (|M| pages) For each page Ri, 1≤ i ≤ |R|, of Reviews { Read page Ri from disk For each Mj, 1≤ j ≤ |M|, of Movies { Read page Mj from disk For all rows r on page Ri { For all rows m on page Mj { if r.MID = m.MID produce output row } } } } I/O Cost = |R| + |R| * |M| Main Idea: Scan R, and for each tuple in R probe tuples in M (by scanning it). Output result tuples. 41 Index-Nested Loops Reviews.MID = Movies.MID Nested Loops Join Index Lookup using r.MID Reviews Cost = |R| + |R| * (||R||/|R|) * 2 (|R|Notice pages)that since Reviews is ordered on Sorted date column(and not MID), so each the on Date column MID row of the Movies table retrieved incurs Index two random disk I/Os: Movies (|M| pages) • one to the index and • one to the table For each page Ri, 1≤ i ≤ |R|, of Reviews { Read page Ri from disk For all rows r on page Ri { Use MID index on Movies to fetch rows with MID attributes = r.MID Form output row for each returned row } } • 2 I/Os: 1 index I/O + 1 movie I/O as Reviews table is sorted on date column • ||R|| is # of rows in R • ||R||/|R| gives the average number of rows of R per page Main Idea: Scan R, and for each tuple in R probe tuples in M (by probing its index). Output result tuples. 42 Estimating Result Cardinalities • Consider the query SELECT * FROM Reviews WHERE 7/1 < date < 7/31 AND rating > 9 • Assume Reviews has 1M rows • Assume following selectivity factors: Sel. Factor # of qualifying rows 7/1 < date < 7/31 0.1 100,000 Review > 9 0.01 10,000 • How many output rows will the query produce? o If predicates are not correlated • .1 * .01 * 1M = 1,000 rows o If predicates are correlated could be as high as • .1 * 1M = 100,000 rows Why does this matter? 43 This is Why! Project Join Rating > 9 and 7/1 < date < 7/31 Select Assume that: R.MID = M.MID • Movies • • Reviews 10000 Note that each join algorithm has a region where it provides the best performance Time (#sec) 1000 100 Reviews table is 10,000 pages with 80 rows/page Movies table is 2,000 pages The primary index on Movies is on the MID column INL N L SM 10 1 0.000001 0.00001 0.0001 0.001 0.01 0.1 1 The consequences of incorrectly estimating the selectivity of the predicate on Reviews can be HUGE Selectivity factor of predicate on Reviews table Nested Loops Sort Merge Index NL 44 Multidimensional Histograms • Used to capture correlation between attributes • A 2-D example 500 466 450 392 361 400 314 211 350 191 300 250 200 150 100 151 198 363 289 258 238 97 269 144 343 315 175 212 192 229 152 319 249 216 303 98 196 102 156 50 0 45 A Little Bit About Estimating Join Cardinalities • Question: Given a join of R and S, what is the range of possible result sizes (in #of tuples)? o Suppose the join is on a key for R and S Students(sid, sname, did), Dorm(did,d.addr) Select S.sid, D.address From Students S, Dorms D Where S.did = D.did What is the cardinality? A student can only live in at most 1 dorm: • each S tuple can match with at most 1 D tuple • cardinality (S join D) = cardinality of S 46 Estimating Join Cardinality • General case: join on {A} (where {A} is key for neither) o estimate each tuple r of R generates uniform number of matches in S and each tuple s of S generates uniform number of matches in R, e.g. SF = min(||R|| * ||S|| / NKeys(A,S) = 100*20/10 = 200 ||S|| * ||R|| / NKeys(A,R)) = 20*100/75 = 26.6 e.g., SELECT M.title, R.title FROM Movies M, Reviews R WHERE M.title = R.title Movies: 100 tuples, 75 unique titles 1.3 rows for each title Reviews: 20 tuples, 10 unique titles 2 rows for each title Query Optimization: The Main Steps ✓ ✓ 1 Enumerate logically equivalent plans by applying Enumerate equivalence rules 2 enumerate all For each logically equivalent plan, enumerate alternative physical query plans ✓3 How big is the plan Estimate the cost of each of thespace alternative queryinvolving plans. forphysical a query N tables? • • 4 Estimate the selectivity factor and output cardinality of each predicate Estimate the cost of each operator It turns out that the answer depends on plan the “shape” of the query overall cost Run the with lowest estimated Two Common Query “Shapes” “Star” Join Queries Number of logically equivalent alternatives B Join A Join Join F # of Tables Star 2 2 4 48 5 384 6 3,840 8 645,120 10 18,579,450 C Join D Chain 2 40 224 1,344 54,912 2,489,344 “Chain” Join Queries A Join B Join C Join D Join F In practice, “typical” queries fall somewhere between these two extremes 49 Pruning the Plan Space • Consider only left-deep query plans to reduce the search space Solution: Join Join Join Join A Join Left Deep Bushy B Join These E are counts of logical plans only! Join D With: Use some form of dynamic programming C Join i) 3 join methods D E (either bottom Aup orBtop down) ii) n joins in a query to search the plan space heuristically n There will be 3 physical C Star Join Queries Chain Join Queries plans for each logical plan # of Tables Bushy Sometimes Left-Deep Bushy Left Deep these heuristics will 2For a left-deep, 82table star join2 query there will 2 be: 2 cause the best plan to be missed!! 4 48 12 40 8 5i) 10,080 different 384 logical plans 48 224 16 ii) 22,044,960 different physical plans!! 6 3,840 240 1,344 32 8 645,120 10,080 54,912 128 10 18,579,450 725,760 2,489,344 512 Example: 50 Bottom-Up QO Using Dynamic Programming • Optimization is performed in N passes (if N relations are joined): o o o Pass 1: Find the best (lowest cost) 1-relation plan for each relation. Pass 2: Find the best way to join the result of each orders 1-relation plan (as the Interesting include outer/left table) to another relation (as the inner/right table) to generate orders that facilitate the all 2-relation plans. execution of joins, aggregates, Pass N: Find best way to join result of a (N-1)-relation plan (as outer) to the and order by clauses N’th relation to generate all N-relation plans. subsequently by the query • At each pass, for each subset of relations, prune all plans except those o o Lowest cost plan overall, plus Lowest cost plan for each interesting order of the rows • Order by, group by, aggregates etc. handled as the final step In spite of pruning plan space, this approach is still exponential in the # of tables. An Example: Select Join B Join Select Join A Legend: SS – sequential scan IS – index scan 5 – cost C Select D First, generate all single relation plans: A 13 B SS IS A A 27 SS B C 73 38 All tables D SS IS C C 18 42 SS IS D D 95 All single relation plans Prune 52 Then, All Two Relation Plans 13 SS A IS A 27 SS B 73 IS C 18 42 SS D All single relation plans after pruning 53 Two Relation Plans Starting With A 13 SS 27 IS A Select A 73 IS B A A.a = B.a SS SS SS SS IS A B A B A 1013 315 Select C D Single relation plans SS SMJ NLJ SS Join Select 42 SMJ B Join D NLJ Join A C Join B 18 Select SS B 822 SS IS B A 293 Prune Let’s assume there are 2 alternative join methods for the QO to select from: 1. NLJ = Nested Loops Join 2. SMJ = Sort Merge Join 54 Two Relation Plans Starting With B 13 SS IS A SS 27 73 IS B A SS SS SS SS SS B A B A B 1013 315 Select Join C SS Single relation plans SMJ IS A SS IS B 756 A 293 Join NLJ B Join D 42 NLJ B Select D SMJ Join A C NLJ B.b = D.b 18 Select SMJ NLJ SMJ Select SS SS SS SS SS B D B D B D 1520 432 2321 Prune IS C SS B IS C 932 55 Two Relation Plans Starting With C 13 SS IS A SS 27 NLJ C B IS 42 A B Join Join Select Join C Select D SS D C Join Select 18 IS B A B.C = C.c 73 Select Single relation plans SMJ SS B C 6520 IS SS B C 932 Prune 56 Two Relation Plans Starting With D 13 SS IS A SS 27 NLJ SMJ SS SS SS D B D B Select 42 Join B Join A Join Select C Select D SS D C B.b = D.b SS 18 IS B A 1520 73 Select Single relation plans Join B D 432 Prune 57 Next, All Three Relation Plans SMJ IS A SMJ SS SS B B SMJ IS A Select SS SS B D D B Join Select C Select D SMJ SS B Join A SMJ SS Join IS C SMJ SS SS B B IS Pruned two relation plans C 58 Next, All Three Relation Plans SMJ IS A SMJ SS SS B B D IS C NLJ B SS IS SMJ IS A B Select Join C Select D SS Fully pruned two relation plans B 1) Considering the Two Relation Plans That Started With A SMJ SMJ IS SS D B A SMJ Join SS SMJ IS C A B NLJ SS IS A Join SMJ SS SMJ Select C IS A SS SS B D 59 Next, All Three Relation Plans SMJ IS SMJ SS SS B B D IS C NLJ SS SS B D A SS SS B D SMJ SS SS B D SS A SS SS B D Select C B NLJ A IS SMJ SS SS B D C SMJ IS SMJ Join Join Fully pruned two relation plans SMJ SS SMJ B D IS SMJ Join Select NLJ SS SMJ A SMJ SS A Select A 2) Considering the Two Relation Plans That Started With B IS SMJ SS SS B D C 60 Next, All Three Relation Plans SMJ IS SMJ SS SS B B D IS A B C SS IS C B Join C Select D Fully pruned two relation plans 3) Considering the Two Relation Plans That Started With C SMJ SMJ IS SS Select D B SMJ IS Join SS SMJ C SMJ B NLJ IS SS IS SS B C NLJ SMJ A Join SMJ SS A Select A IS C SS SS B D 61 You Have Now Seen the Theory • But the reality is: o Optimizer still pick bad plans too frequently for a variety of reasons: • Statistics can be missing, out-of-date, incorrect • Cardinality estimates assume uniformly distributed values but data values are skewed • Attribute values are correlated with one another: o Make = “Honda” and Model = “Accord” • Cost estimates are based on formulas that do not take into account the characteristics of the machine on which the query will actually be run o Regressions happen due hardware and software upgrades What can be done to improve the situation? 62 Opportunities for Improvement • Develop tools that give us a better understanding of what goes wrong • Improve plan stability • Use of feedback from the QE to QO to improve statistics and cost estimates • Dynamic re-optimization 63 Towards a Better Understanding of QO Behavior • Picasso Project – Jayant Haritsa, IIT Bangalore o Bing “Picasso Haritsa” to find the project’s web site o Tool is available for SQL Server, Oracle, PostgreSQL, DB2, Sybase • Simple but powerful idea: • For a given query such as SELECT * from A, B WHERE A.a = B.b and A.c <= constant-1 and B.d <= constant-2 • Systematically vary constant-1 and constant-2 • Obtain query plan and estimated cost from the query optimizer for each combination of input parameters • Plot the results 64 Example: TPC-H Query 8 select o_year, sum(case when nation = 'BRAZIL' then volume else 0 end) / sum(volume) from ( select YEAR(O_ORDERDATE) as o_year, L_EXTENDEDPRICE * (1 - L_DISCOUNT) as volume, n2.N_NAME as nation from PART, SUPPLIER, LINEITEM, ORDERS, CUSTOMER, NATION n1, NATION n2, REGION where P_PARTKEY = L_PARTKEY and S_SUPPKEY = L_SUPPKEY and L_ORDERKEY = O_ORDERKEY and O_CUSTKEY = C_CUSTKEY and C_NATIONKEY = n1.N_NATIONKEY and n1.N_REGIONKEY = R_REGIONKEY and R_NAME = 'AMERICA‘ and S_NATIONKEY = n2.N_NATIONKEY and O_ORDERDATE between '1995-01-01' and '1996-12-31' and P_TYPE = 'ECONOMY ANODIZED STEEL' and S_ACCTBAL <= constant-1 and L_EXTENDEDPRICE <= constant-2 ) as all_nations group by o_year order by o_year Resulting Plan Space • SQL Server 2008 R2 • A total of 90,000 queries o 300 different values for both L_ExtendedPrice and S_AcctBal • 204 different plans!! o Each distinct plan is assigned a unique color • Zooming in to the [0,20:0,20] region: Key takeaway: If plan choice is so sensitive to the constants used, it will undoubtedly be sensitive to errors in statistics and cardinality estimates Intuitively, this seems very bad! 66 Estimated Execution Costs • Plan exhibits what is probably a “flaw” in the cost model: o As L_Extended price is increased, estimated cost first increases sharply before decreasing • Still a mystery why QO is indeed harder than rocket science!! 67 How Might We Do Better? • Recall this graph of join algorithm performance Project Rating > 9 and 7/1 < date < 7/31 Select Join R.MID = M.MID Movies Time (#sec) 10000 1000 100 INL N L SM 10 1 Reviews Selectivity factor of predicate on Reviews table Nested Loops Sort Merge Index NL • While the two “nested loops” algorithms are faster at low selectivity factors, they are not as “stable” across the entire range of selectivity factors 68 “Reduced” Plan Diagrams • Robustness is somehow tied to the number of plans o Fewer plans => more robust plans • For TPC-H query 8, it is possible to use only 30 plans (instead of 204) by picking more robust plans that are slightly slower (10% max, 2% avg) • Since each plan covers a larger region it will be less sensitive to errors in estimating cardinalities and costs Reduced plan space for TPC-H query 8 69 How Might We Do Better? • At QO time, have the QO annotate compiled query plans with statistics (e.g. expected cardinalities) and check operators INL INL SMJ IS C Check IS SS A SMJ Check B IS C • • IS A Check SS B At runtime, check operators collect the actual statistics and compare actual vs. predicted Opens up a number of avenues for improving QO performance o Build an optimizer that learns o Do dynamic reoptimization of “in flight” queries 70 Helping the Optimizer Learn INL Query Check Query PlanIS Optimizer SMJ Check Catalogs Statistics Original & Observed IS C Executor Database A Check SS Observed Stats B Statistics Tracker Optimization of subsequent queries benefits from the observed statistics 71 Dynamic Reoptimization 1) Observed output size of SMJ is 5x larger than expected INL Query Check Optimizer Query PlanIS SMJ Check IS C Executor A Database Tmp1 Check SS B 2) Output of SMJ is materialized as Tmp1 in tempdb INL Tmp1 Actual Stats IS A 3) Remainder of query is returned to the optimizer for reoptimization 72 Key Points To Remember For The Quiz • Query optimization is harder than rocket science o The other components are trivial in comparison • Three key phases of QO o Enumeration of logical plan space o Enumeration of alternative physical plans o Selectivity estimation and costing • The QO team of every DB vendor lives in fear of regressions o How exactly do you expect them to make forward progress? And…If You’ve Spent My Entire Keynote Chatting on Facebook… At least, check out my lab “Microsoft Jim Gray Systems Lab”: Many Thanks To: Rimma Nehme , Pooja Darera, Il-Sung Lee, Jeff Naughton, Jignesh Patel, Jennifer Widom and Donghui Zhang for their many useful suggestions and their help in debugging these slides 75 Finally… Thanks for inviting me to give a talk again 76