SQL tuning and cardinality

You’re Smarter than a Database Overcoming the optimizer’s bad cardinality estimates About me     Bobby Durrett US Foodservice bobbyandmarielle@yahoo.com Scripts in http://www.geocities.com/bobbyandma rielle/sqltuning.zip What you know What the database knows Before SQL  Example - mainframe Datacom/DB COBOL   List index names Write loops read a from one index i1 where one.c=10 while more table one rows exist get next row read b from two index i2 where two.a = one.a while more table two rows exist get next row print one.a,two.b end while end while SQL  Tell what you want, not how to get it select one.a,two.b from one,two where one.c=10 and one.a=two.a; Pre-SQL versus SQL     Pre-SQL code very efficient – runs in megabytes – VSE mainframe COBOL Labor intensive SQL can be inefficient – runs in gigabytes (if you are lucky!) Much more productive – do in minutes what took hours before – create tables What the database doesn’t know    Optimizer has a limited set of statistics that describe the data It can miscalculate the number of rows a query will return, its cardinality A cardinality error can lead optimizer to choose a slow way to run the SQL Example plan/Cardinality ------------------------------------------------| Id | Operation | Name | Rows | Cost ------------------------------------------------| 0 | SELECT STATEMENT | | 10 | 3 |* 1 | TABLE ACCESS FULL| TEST1 | 10 | 3 ------------------------------------------------   Plan = how Oracle will run your query Rows = how many rows optimizer thinks that step will return Cost = estimate of time query will take, a function of the number of rows How to fix cardinality problems   Find out if it really is a cardinality issue Determine the reason it occurred    Single column Multiple columns Choose a strategy    Give the optimizer more information Override optimizer decision Change the application Four examples   Four examples of how the optimizer calculates cardinality Full scripts and their outputs on portal, pieces on slides – edited for simplicity Step 1: Find out if it really is a cardinality issue  Example 1  Data select a,count(*) from test1 group by a; A COUNT(*) ---------- ---------1 10  Query select * from test1 where a=1; Step 1: Find out if it really is a cardinality issue  Get estimated cardinality from plan ------------------------------------------| Id | Operation | Name | Rows | ------------------------------------------| 0 | SELECT STATEMENT | | 10 | |* 1 | TABLE ACCESS FULL| TEST1 | 10 | ------------------------------------------ Do query for actual number of rows select count(*) from test1 where a=1; Step 1: Find out if it really is a cardinality issue  Plan is a tree – find cardinality and select count(*) on part of query represented by that part of plan. join table join table table Step 2: Understand the reason for the wrong cardinality  Unequal distribution of data:  Within a single column    Last name “Smith” or “Jones” Among multiple columns –   Address Zipcode and State Step 2: Understand the reason for the wrong cardinality  Example 2 - Unequal distribution of values in a single column   1,000,000 rows with value 1 1 row with value 2 select a,count(*) from TEST2 group by a; A COUNT(*) ---------- ---------1 1000000 2 1 Step 2: Understand the reason for the wrong cardinality  SQL statement – returns one row select * from TEST2 where a=2; Step 2: Understand the reason for the wrong cardinality  Plan with wrong number of rows = 500,000  Full scan instead of range scan – 100 times slower --------------------------------------------| Operation | Name | Rows | --------------------------------------------| SELECT STATEMENT | | 500K| | INDEX FAST FULL SCAN| TEST2INDEX | 500K| --------------------------------------------- Step 2: Understand the reason for the wrong cardinality  Column statistics – two distinct values LOW HIGH NUM_DISTINCT ---------- ---------- -----------1 2 2  Table statistic – total # of rows – 1,000,001 NUM_ROWS ---------1000001 Step 2: Understand the reason for the wrong cardinality    Rows in plan = (rows in table)/ (distinct values of column) 500000=1000001/2 Optimizer knew that there were only two values – assumed they had equal number of rows Step 2: Understand the reason for the wrong cardinality  Example 3 - Combinations of column values not equally distributed     1,000,000 rows with values 1,1 1,000,000 rows with values 2,2 1 row with value 1,2 ~ Equal numbers of 1’s and 2’s in each column A B COUNT(*) ---------- ---------- ---------1 1 1000000 1 2 1 2 2 1000000 Step 2: Understand the reason for the wrong cardinality  SQL statement – retrieves one row select sum(a+b) from TEST3 where a=1 and b=2; Step 2: Understand the reason for the wrong cardinality   Plan with wrong number of rows = 500,000 Inefficient full scan ---------------------------------------------| Operation | Name | Rows | ---------------------------------------------| SELECT STATEMENT | | 1 | | SORT AGGREGATE | | 1 | | INDEX FAST FULL SCAN| TEST3INDEX | 500K| ---------------------------------------------- Step 2: Understand the reason for the wrong cardinality  Column statistics C LOW HIGH NUM_DISTINCT - ---------- ---------- -----------A 1 2 2 B 1 2 2  Table statistic – total # of rows – 2,000,001 NUM_ROWS ---------2000001 Step 2: Understand the reason for the wrong cardinality    Rows in plan = (rows in table)/ (distinct values A * distinct values B) 500000=2000001/(2 * 2) Optimizer assumes all four combinations (1,1),(1,2),(2,1),(2,2) equally likely Step 2: Understand the reason for the wrong cardinality  How to tell which assumption is in play?  Select count(*) each column select a,count(*) from TEST3 group by a; select b,count(*) from TEST3 group by b;  count(*) each column combination select a,b,count(*) from TEST3 group by a,b; Step 3: Choose the best strategy for fixing the cardinality problem  Giving the optimizer more information    Overriding optimizer decisions   Histograms SQL Profiles Hints Changing the application Try to use optimizer as much as possible to minimize development work Step 3: Choose the best strategy for fixing the cardinality problem  Giving the optimizer more information – using histograms    Works for unequal distribution within a single column A histogram records the distribution of values within a column in up to 254 “buckets” Works best on columns with fewer than 255 distinct values Step 3: Choose the best strategy for fixing the cardinality problem  Run gather_table_stats command to get histograms on the column – 254 is max number of buckets method_opt=>'FOR ALL COLUMNS SIZE 254' Step 3: Choose the best strategy for fixing the cardinality problem  Plan for Example 2 with correct number of rows with histogram Uses range scan ----------------------------------------| Operation | Name | Rows | ----------------------------------------| SELECT STATEMENT | | 1 | | INDEX RANGE SCAN| TEST2INDEX | 1 | ---------------------------------------- Step 3: Choose the best strategy for fixing the cardinality problem  Column statistics – two buckets LOW HIGH NUM_DISTINCT NUM_BUCKETS ---------- ---------- ------------ ----------1 2 2 2  Table statistic – unchanged NUM_ROWS ---------1000001 Step 3: Choose the best strategy for fixing the cardinality problem  Time without histograms (1 second): Elapsed: 00:00:01.00  Time with histograms(1/100th second): Elapsed: 00:00:00.01 Step 3: Choose the best strategy for fixing the cardinality problem  Giving the optimizer more information – using SQL Profiles   Works for unequal distribution among multiple columns Includes information about the relationship between columns in the SQL – correlated columns or predicates Step 3: Choose the best strategy for fixing the cardinality problem  SQL Tuning Advisor gathers statistics on the columns ...DBMS_SQLTUNE.CREATE_TUNING_TASK(... ...DBMS_SQLTUNE.EXECUTE_TUNING_TASK(...  Accept the SQL Profile it creates to use the new statistics ...DBMS_SQLTUNE.ACCEPT_SQL_PROFILE (... Step 3: Choose the best strategy for fixing the cardinality problem  Example 3 plan with correct number of rows = 1 using SQL profile -------------------------------------------------| Operation | Name | Rows | Bytes | -------------------------------------------------| SELECT STATEMENT | | 1 | 6 | | SORT AGGREGATE | | 1 | 6 | | INDEX RANGE SCAN| TEST3INDEX | 1 | 6 | -------------------------------------------------| Step 3: Choose the best strategy for fixing the cardinality problem  Time without a profile (1 second): Elapsed: 00:00:01.09  Time with a profile(1/100th second): Elapsed: 00:00:00.01 Step 3: Choose the best strategy for fixing the cardinality problem  Overriding optimizer decisions – using hints    Example 4 has unequal distribution of column values across two tables – histograms and SQL Profiles don’t work Hint forces index range scan Small amount of additional code – not like Cobol on mainframe Step 3: Choose the best strategy for fixing the cardinality problem  Example 4 - SMALL table   MANY relates to 1 – there are many rows with value 1 FEW relates to 2 – there are few with value 2 insert into SMALL values ('MANY',1); insert into SMALL values ('FEW',2); Step 3: Choose the best strategy for fixing the cardinality problem  Example 4 - LARGE table: 1,000,000 rows with value 1  1 row with value 2 NUM COUNT(*) ---------- ---------1 1000000 2 1  Step 3: Choose the best strategy for fixing the cardinality problem  SQL statement – returns one row select B.NUM from SMALL A,LARGE B where A.NUM=B.NUM and A.NAME='FEW'; Step 3: Choose the best strategy for fixing the cardinality problem  Plan with wrong number of rows = 125,000 ---------------------------------------------| Operation | Name | Rows | ---------------------------------------------| SELECT STATEMENT | | 125K| | HASH JOIN | | 125K| | TABLE ACCESS FULL | SMALL | 1 | | INDEX FAST FULL SCAN| LARGEINDEX | 1000K| ---------------------------------------------- Step 3: Choose the best strategy for fixing the cardinality problem  Column statistics – two buckets on all columns – using histograms LOW HIGH NUM_DISTINCT NUM_BUCKETS ---------- ---------- ------------ ----------1 2 2 2 LOW HIGH NUM_DISTINCT NUM_BUCKETS ---- ---- ------------ ----------FEW MANY 2 2 Step 3: Choose the best strategy for fixing the cardinality problem  Table statistics – SMALL has 2 rows, LARGE 1000001 NUM_ROWS ---------2 NUM_ROWS ---------1000001 Step 3: Choose the best strategy for fixing the cardinality problem     125000=1000001/8 Optimizer appears to assume all eight combinations of the three columns’ values are equally likely Can’t verify formula – references don’t include formula with histograms Even worse without histograms – cardinality is 500000 Step 3: Choose the best strategy for fixing the cardinality problem  No SQL profile from SQL Tuning Advisor: There are no recommendations to improve the statement.  Neither histograms nor SQL profiles help example 4 Step 3: Choose the best strategy for fixing the cardinality problem  Statement with hints:   Use index Don’t do full scan select /*+ INDEX(B LARGEINDEX) NO_INDEX_FFS(B LARGEINDEX) B.NUM from SMALL A,LARGE B where a.NUM=B.NUM and A.NAME='FEW'; */ Step 3: Choose the best strategy for fixing the cardinality problem  Time without a hint (1 second): Elapsed: 00:00:01.03  Time with a hint (1/100th second): Elapsed: 00:00:00.01 Step 3: Choose the best strategy for fixing the cardinality problem  Changing the application   Change your tables so that the optimizer gets your SQL’s cardinality right Requires more work designing tables, but keeps productivity benefits of SQL Step 3: Choose the best strategy for fixing the cardinality problem  Example 4 – moved NAME column to LARGE table and split table in two    One million (‘MANY’,1) rows in LARGEA One (‘FEW’,2) row in LARGEB Query: select NUM from (select * from largea union select * from largeb) where NAME='FEW'; Step 3: Choose the best strategy for fixing the cardinality problem  Plan is just as efficient as with hint:   Number of rows = 2 (reality is 1) Range Scan -------------------------------------------------------------| Id | Operation | Name | Rows | -------------------------------------------------------------| 0 | SELECT STATEMENT | | 2 | | 1 | VIEW | | 2 | | 2 | SORT UNIQUE | | 2 | | 3 | UNION-ALL | | | | 4 | TABLE ACCESS BY INDEX ROWID| LARGEA | 1 | |* 5 | INDEX RANGE SCAN | LARGEAINDEX | 1 | | 6 | TABLE ACCESS BY INDEX ROWID| LARGEB | 1 | |* 7 | INDEX RANGE SCAN | LARGEBINDEX | 1 | -------------------------------------------------------------- Step 3: Choose the best strategy for fixing the cardinality problem  Time without table change (1 second): Elapsed: 00:00:01.03  Time with table change (1/100th second): Elapsed: 00:00:00.01 Conclusion  SQL improves productivity, optimizer has limits   Identify cases where cardinality is wrong Understand why the database got it wrong    One column Multiple columns Choose best strategy to fix    Give optimizer more info Override optimizer’s choices Redesign tables References     Cost Based Optimizer Fundamentals, Jonathan Lewis Metalink Note:212809.1, Limitations of the Oracle Cost Based Optimizer Metalink Note:68992.1, Predicate Selectivity Histograms – Myths and Facts, Wolfgang Breitling Select Journal, Volume 13, Number 3

SQL tuning and cardinality

Related documents

Products

Support

SQL tuning and cardinality

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib