stats_gathering_oak_121202

Guiding Practices for Gathering Database Statistics Martin Widlake Database Architecture, Performance & Training Ora600 Limited http://mwidlake.wordpress.com www.ora600.org.uk ORA00 Ltd 1/45 Abstract • Guiding practices for Database Statistics gathering • The Cost Based Optimizer continues to improve and stats gathering is now more efficient than ever - but it still seems to be that most Oracle Sites struggle with performance issues due to poor stats. It's like the annoying, embarrassing rash that simple won't go away. I will cover the options available and general principles for sorting out the stats issue, which should lead to more stable and good performance ie a more comfortable life . This should calm the annoying rash and give you some potential treatments should it flare up again. ORA600 Ltd 2/45 Who am I and why am I doing this Talk? • 20+ years of Oracle experience, mostly as a developer, development DBA, Architect, Performance guy. • Tested the CBO under V7.3 and became a cautious advocate of it in V8. Been fighting the issues since! • I keep getting pulled into designing “better” methods of gathering stats for clients and, frankly, I’d rather do other things {thus the presentations and blog posting telling everyone what I know}. • I am of the opinion that, over all the CBO now gets 99% of SQL execution plans good enough, if the stats are good. • Stats gathering can be quite interesting. Honest! ORA600 Ltd 3/45 These slides will be on the UKOUG web site I am going to talk around some slides (the ones with pictures on and key points) and skip over some - as we all get tired of reading powerpoint slides in presentations. The others are there to fill in the chat. Ask questions, Email me mwidlake@btinternet.com mwidlake.wordpress.com ORA600 Ltd 4/45 Quick Quiz • What is the most common version of Oracle you currently use? (8, 9, 10.1, 10.2, 11.1, 11.2) • What is the latest version you use in production? • Who relies on the Automated Stats Collection job on their database? (If “Yes”, have you altered the schedule?) • Who has intermittent performance issues when code goes bad either “over night” or after stats are collected? • Who has a site-crafted stats gathering regime? • (If your site wrote it’s own, did it take 2X, 4X, 8X or more the effort to get right than you expected?). ORA600 Ltd 5/45 Possibly the Single Most Common Cause of Poor Database Performance Poor or missing object statistics are probably the single most common and easily fixed cause of In my poor opinion, the introduction database performance. of the automated stats gathering process with Most issue with individual SQL statements Oracle 10g was probably the single greatest performing poorly are fixed by gathering accurate performance enhancement by Oracle Corp statistics on the tables involved. In the last 15 years. The worst of all situations is to have statistics on a few tables.And The ICost Based Optimiser don’t really like it is invoked and has to use very poor defaults for everything else. Ora 600 Ltd 6/45 Automatic Stats Collection Auto Stats Job Preparation FLUSH_DATABASE_ MONITORING_INFO Global and Table Prefs (stale_pct,est_pct, met_op. Degree..) DBA_TAB_MODIFICATIONS (10% BY DEFAULT) SYS.COL_USAGE$ Data Dictionary Information Existing Statistics OBJ_FILTER_LIST (STALE AND EMPTY) Runs gathers In the scheduled window ORA600 Ltd 7/45 GATHER_DATABASE_STATS_JOB_PROC From the 10g Tuning guide:The GATHER_DATABASE_STATS_JOB_PROC procedure collects statistics on database objects when the object has no previously gathered statistics or the existing statistics are stale because the underlying object has been modified significantly (more than 10% of the rows).The DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC is an internal procedure, but its operates in a very similar fashion to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The primary difference is that the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure prioritizes the database objects that require statistics, so that those objects which most need updated statistics are processed first. This ensures that the most-needed statistics are gathered before the maintenance window closes. That last sentence is the only major change in the 11g documentation. Ora 600 Ltd 8/45 Automated DBMS_STATS Job • If it works for you, then fine, leave it be and work on something else. If it almost works for you, fix the exceptions, leave the main job alone and work on something else. • If you have a VLDB (or you downloaded this as you had an issue with stats gathering) It is almost certainly not good enough for you. • It is an attempt at a single solution to work for every situation and it does not. Even Oracle Corp have admitted, it just simply does not work for VLDBs, it chokes on large objects • Turn it off (maybe leave it running for DICTIONARY stats) and write the replacement. You will write something that does a lot of what this job does. Your replacement will almost certainly be more complex than you initially plan. Sorry. • There is no one single solution to stats gathering that is right for any large, complex system. {Sorry again} ORA600 Ltd 9/45 DBA _TAB_MODIFICATIONS • All inserts, updates, deletes and truncate operations on monitored tables are flushed to this table. So V10 upwards, that is everything. • Under V9 flushed every 3 hours, under V10.1 every 15 minutes, under V10.2/V11 it is not automatically flushed. • It is flushed to by calls to schema/db dbms_stats GATHER calls or by calling DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO It doesgenerally not capture directbut inserts, things that avoid the It• seems accurate I haveappends, witnessed it missing the odd SQL layer. “insert-into-select” statements. And it does not see direct-insert activity. • Increments, including over database restarts. • Row is deleted when stats are gathered on the Table OR PARTITION (and only at the correct level). Ora 600 Ltd 10/45 show_tab_mods OBJ_NAME NO_INS NO_UPD NO_DEL T LAST_FLUSH -- show_tab_mods --------------------------------------- --------- --------- ------ - ---------------- Martin Widlake 11-nov-07 MIDDLEOFFICE.POSITIONLIQUIDATIONAUDIT9,588 0 0 N 080602 10:06:03 -- quick check on recent table changes MIDDLEOFFICE.POSITIONKEEPINGGROUPS2 1 0 N 080602 10:06:03 -- NB flush if want up-to-date info (15 min interval) GATEWAY.TREE_RELATIONS44 0 43 N 080602 10:06:03 set lines 100 pause on COMMHOME.COMM_TRAD_PCTTRAD_BAND4 0 4 N 080602 10:06:03 col obj_name form a50 GATEWAY.INSTRUMENT_INFO2 1 0 N 080602 10:06:03 col no_ins form 9999,999 GATEWAY.TRADINGPERIODPROFILEDATA1 0 1 N 080602 10:06:03 col no_upd form 9999,999 BOBJ_LOGIN.ORDERS-ORDERS_12345678 1,066 9,381 0 N 080531 06:05:03 col no_del form 999,999 DATAFIX.TRADEORD_VOLT2 0 0 N 080531 06:05:03 select table_owner||'.'||table_name||'-'||partition_name obj_name BOBJ_LOGIN.ORDERS_HISTORY-OH_54545454 9,379 0 0 N 080531 06:05:03 ,inserts no_ins DATAFIX.MAXORDERS2 0 0 N 080531 06:05:03 ,updates no_upd ,deletes no_del BOBJ_LOGIN.ORDERAUDIT-ORDERAUDIT23 1375610 0 0 N 080530 10:05:02 ,substr(truncated,1,1) T COMMHOME.COMM_ON_DEPOSIT_H4 0 0 N 080530 10:05:02 ,to_char(timestamp,'YYMMDD hh:mm:ss') 95,449 last_flush GATEWAY.BIN$TnRrf2V98FrgQwrckArwWg==$00 0 N 080530 10:05:02 from dba_tab_modifications GATEWAY.BIN$TnRrf2V38FrgQwrckArwWg==$0882 0 0 N 080530 10:05:02 where timestamp > sysdate -31 GATEWAY.DEALINGRECNMSAI290 5,052 0 N 080530 10:05:02 and table_owner not like 'SYS%' GATEWAY.BIN$TnRkBv02UezgQwrckApR7A==$016,913 0 0 N 080530 10:05:02 order by timestamp desc / clear colu Ora 600 Ltd 11/45 -- mdw 11/05/03 TAB_NAME ANLYZD_ROWS LAST_ANLZD TOT_ROWS CHNGS PCT_C -- mdw 17/01/08 Modified to look at dba_tab_modifications ---------------------------------------------- -------- -----set pause on pages 24 lines 110 pause --------------'Any Key>' GATEWAY.ACCESSGROUPS 4 0 .000 colu anlyzd_rows form 99999,999,999 4 080212 16:03:12 MIDDLEOFFICE.ACCOUNTAUDIT 4,725,464 080512 22:37:16 4,898,302 173,738 .037 colu tot_rows form 99999,999,999 GATEWAY.ACCOUNTBEHAVIOURS 14 080212 16:03:18 14 0 .000 colu tab_name form a30 GATEWAY.ENBZHEERHWV 149,136 080522 22:06:39 150,922 6,156 .041 colu chngs form 99,999,999 DATAFIX.VRHERHHHEHH_08_RBK 17,650 080425 22:00:04 17,650 0 .000 colu pct_c form 999.999 DATAFIX.AFEOUNSFEWS_190505 42,757 071105 22:01:00 42,757 0 .000 select dbta.owner||'.'||dbta.table_name tab_name DATAFIX.ACSFFWEGGEE_310108 182 080131 22:00:03 182 .000 ,dbta.num_rows anlyzd_rows0 MM_AUDIT.EFEFEFGHOME_AUDIT 513,230 080509 22:07:34 526,192 12,962 .025 ,to_char(dbta.last_analyzed,'yymmdd hh24:mi:ss') last_anlzd DATAFIX.AEGFWSCEFOE_TEMP 10,000 071105 22:00:52 10,000 0 .000 ,nvl(dbta.num_rows,0)+nvl(dtm.inserts,0) GATEWAY.AEFSSTEFFFE 2,083 080212 16:03:36 2,08tot_rows 0 .000 -nvl(dtm.deletes,0) MM_AUDIT.ACCOUNSGSEGEGGEIT 2,083 071105 22:00:21 2,083 0 .000 ,nvl(dtm.inserts,0)+nvl(dtm.deletes,0)+nvl(dtm.updates,0) chngs MIDDLEOFFICE.ASGESGEPOTALLOCAT 123,944 080602 22:00:40 123,944 0 .000 ,(nvl(dtm.inserts,0)+nvl(dtm.deletes,0)+nvl(dtm.updates,0)) GATEWAY.ENBZHEERHWV 39,136 080522 22:06:39 44,922pct_c 6,156 .131 /greatest(nvl(dbta.num_rows,0),1) DATAFIX.RWERNGNONNGNVZBYNO 17,650 080425 22:00:04 17,650 0 .000 from dba_tables dbta DATAFIX.WEGEBTRUUUXTDWTTHH 32,707 071105 22:01:00 32,707 0 .000 left outer join dba_tab_modifications dtm DATAFIX.ACGERTIIJJYKYNJNIY 189 080131 22:00:03 189 0 .000 on dbta.owner = dtm.table_owner MM_AUDIT.KUKLYUTDJJ4YUNYJJ 313,230 080509 22:07:34 426,192 112,962 .251 and dbta.table_name = dtm.table_name DATAFIX.AERGERHGGRE_TEMP 10,000 071105 22:00:52 10,000 0 .000 and dtm.partition_name is null GATEWAY.AEFSSTEFFFE 2,083 080212 16:03:36 15,231 0 .000 where dbta.table_name like upper(nvl('&Tab_name','WHOOPS')) / clear colu Ora 600 Ltd 12/45 SYS.COL_USAGE$ • Every time a SQL statement is parsed, information about columns referenced in table joins and where predicates is stored in the internal table SYS.COL_USAGE% • This is what DBMS_STATS uses to help decide which of the indexed columns to gather stats on when method_opt “for all indexed columns” is used. • It might also play a part in deciding which columns to gather histograms on, as I have tested adding very skewed columns to a table and the automatic stats collection does not gather histograms and neither does a specific call to gather_stats with method_opt=“auto”. • It can also be useful to use, to help identify if an index is missing or even is likely to be used Ora 600 Ltd 13/45 OWNER COLUMN_NAME -- chk_col_usageTAB_NAME ------------------------------------------------------------------ this is a rip-off of Tim Gormans' script to look at the column usage info that,in EQUAL_PREDS EQI_JOINS NONEGI_JNS RANGE_PRDS LIKE_PRDS NULL_PRDS TS -9i,10g and 11 beta at least, is not--------revealed--------in a DB------------view. Gits ----------- --------------------------col owner form a22 wrap COMMHOME COMM_TRAD_PCTTRAD_H TREENODEID col tab_name form a30 wrap 2 0 0 0 0 0 02 JUN 2011 12:44:22 col column_name form a30 wrap col equal_preds COMM_TRAD_PIPREFUND form 9999,999 COMMHOME CC col eqi_joins form 9999,999 0 3 0 0 0 0 02 JUN 2011 10:59:13 col noneqi_jns form 9999,999 COMMHOME COMM_TRAD_PIPREFUND HOMEID col range_prds form 9999,999 col like_prds form 4,581 10 9999,9990 0 0 0 15 JUN 2011 20:38:44 col null_prds form 9999,999 COMMHOME COMM_TRAD_PIPREFUND ISLEAF select oo.name owner 163tab_name0 0 0 0 0 15 JUN 2011 20:38:44 , o.name COMMHOME COMM_TRAD_PIPREFUND TREENODEID , c.name column_name , u.equality_preds 177 169 equal_preds 0 0 0 0 15 JUN 2011 20:38:44 , u.equijoin_preds eqi_joins SYS TS$ FLAGS , u.nonequijoin_preds nonegi_jns 3,102 0 0 0 0 0 14 JUN 2011 23:33:37 , u.range_preds range_prds , u.like_preds like_prds SYS TS$ NAME , u.null_preds null_prds 2,708 1,954 5 0 392 0 15 JUN 2011 12:22:23 , u.timestamp ts SYS TS$ u ONLINE$ from sys.col_usage$ 3,553 o 0 0 0 0 0 15 JUN 2011 06:38:16 , sys.obj$ , sys.user$ oo SYS TS$ TS# , sys.col$ c 0 86 0 0 15 JUN 2011 18:08:32 where1,132 o.obj# =8,555 u.obj# GATEWAY TRADINGPERIODPROFILEDATA TRADINGCLOSE and oo.user# = o.owner# and c.obj# 0 = u.obj# 0 0 1 0 0 29 MAR 2011 05:23:45 and c.col# = u.intcol# GATEWAY TRADINGPERIODPROFILES INSTGROUPID and o.name like upper(nvl('&tab_name','%'))||'%' 46 237 0 0 0 0 12 JUN 2011 11:09:17 and oo.name like upper(nvl('&tab_own','%'))||'%' GATEWAY PROFILENAME order by 1,2,3 TRADINGPERIODPROFILES / 0 67 0 0 1 0 12 JUN 2011 00:53:36 clear colu GATEWAY TRADINGPERIODPROFILES SOURCEID 44 234 0 0 0 0 12 JUN 2011 11:09:17 14/45 Ora 600 Ltd Statistics Hierarchy There is more than one type of “stats” that Oracle can gather and which have different impacts and are best gathered in different ways. Gather “once” – How big your memory objects are Areas of memory, number of users, size of caches X$ sys-only “objects” FIXED OBJECT STATISTICS OBJECT STATISTICS Auto Gather DICTIONARY STATISTICS Gather regularly, probably via auto stats job Do not enhance or do one-offs Essentially “normal” stats for sys.obj$-type things Gather regularly, via auto job and enhancements Tables, Indexes, Columns What DBAs/Developers mean by “Stats” Increasing Impact Gather “once” – how fast the hardware is. Multi-Block read : Single-Block +Speed of your COU SYSTEM STATISTICS System Statistics • In effect, these stats are just the CPU speed and the relative speeds of Single-Block Reads (SBR) and Multi-Block Reads (MBR). • The actual speed of single- and multi- block reads are recorded, in milliseconds, but it is the ratio between them that counts. • If Multi-Block reads are found to be the same or faster than SingleBlock reads, Oracle 10 ignores the data collected, does not store it. • The CBO converts all IO and CPU cost into units of single-block reads. That is what the COST is in explain plan. It is also what you see in AWR. • Gathering System Statistics may: • push oracle towards or away from high-CPU actions like sorts. • Alter the likelihood of full table scans and fast full index scans as oracle better understands the cost of the multi-block actions. ORA600 Ltd 16/45 System Statistics • V10 and 11 come with a default set of system statistics. You can GATHER_SYSTEM_STATS with a fake workload or based on activity on your system over a period of time. • I advise the latter – but ensure your system has a representative workload. • You only need to gather the System Statistics “once” (but ensure you do so with an “average load”). • Re-gathering is only required if your storage changes (eg add more spindles), if there is a major system change or the server(s) you use change significantly in CPU utilisation • Gathering at day and night and storing/swapping system stats is often suggested – but seems to be a bit of an urban myth. • You may wish to gather system stats 4 or 5 times and DBMS_STATS.SET_SYSTEM_STATS to the average. • NB Not RAC aware – system stats gathered on one node apply to all ORA600 Ltd 17/45 Fixed Object Statistics • These are statistics on the in-memory “dynamic performance” objects, the x$ and similar tables (what V$ views sit on). • Need to gather “once” and only re-gather if something significant changes such as allocating much more memory to the instance or the number of user sessions greatly increasing. • Gathering Fixed Object stats will aid internal SQL, checking session details, looking at memory constructs. I have seen a small improvement in parse speed. Certain dictionary queries run faster. • Re-gather after upgrade etc. • If they have never been gathered the impact can be significant, I have yet to personally see a major change as a result of re-gathering (I just do so once a year on “just in case” principles). ORA600 Ltd 18/45 Dictionary Statistics • Statistics on the internal tables owned by SYS and other internal Oracle users. SYS.OBJ$, SYS.TAB%, SYS.TS$, those tables. • Are gathered as part of the default statistics gathering job. • In effect just like gathering schema statistics on the SYS, SYSTEM, OUTLN and other users. Support (and recommended {*} ) from V10. V9, keep with the RBO. • Can take several hours to gather on a database with tens of thousands of objects or more. • Can significantly aid parsing and other internal SQL, as well as DBA scripts running on the DBA views and also the underlying tables. ORA600 Ltd 19/45 Dictionary Statistics • Gather them regularly. Weekly to monthly. • If you use the default automatic statistics gathering job, it is collecting dictionary statistics for you and is fine. • If you disable the automatic statistics gathering for your schema stats either: • Leaving it running for ONLY Dictionary statistics: DBMS_PARAM.SET_PARAM(‘AUTOSTATS_TARGET’,’ORACLE’) • Organise regular dictionary stats gathering by your own methods. • Not gathering Dictionary Stats, especially on very large/complex database could lead to very poor dictionary/parse performance. • To be honest, with 10.1/10.2 at least, even gathering Dictionary Stats for systems with massive numbers of segments can fail to resolve some slow Dictionary performance  ORA600 Ltd 20/45 That was the pre-amble. Getting the System, Fixed Object and Dictionary stats gathered gives you a solid base to tackle Object Statistics ORA600 Ltd 21/45 Why are Stats Key to Performance? • Used by the Cost Based Optimiser (CBO). • CBO examines the SQL statement and works out the various ways in which it could satisfy the query (up to about 2,000 plans under 10). • For each step the CBO works out the cost, which is expected IO plus CPU (if turned on) and the cardinality, the number of records that step will return. • The cardinality is passed back to the next step and can be a multiplier of that step’s cost. • The costs are added up and Oracle then picks the plan with the lowest overall cost(*) and runs that plan (*) This is a slight lie, but the principal is true Ora 600 Ltd 22/45 Why are Stats Key to Performance? • The CBO is very logical, it uses just the figures it is presented with and simple calculations to make it’s decision. No magic involved. • If those figures are wrong, ie the statistics are not representative then the costs calculated will be incorrect. • A small error can cascade up the code and cause a large difference to other steps and cause the plan to change. • With edge cases, a small difference often result in a different plan being chosen. • That different plan is often sub-optimal, sometimes seriously so, occasionally hundreds or thousands of times slower. • Cost/Cardinality being too low , often 1, is the most common cause of poor performance, due to prompting nested loop plans and incorrect driving tables Ora 600 Ltd 23/45 Automated Stats Gathering • If the automated job is working for you, leave it alone. • If you turned off the automated job and “wrote your own” under V10 or before and now are on 11 – consider going back to the automated job. It is faster and more accurate. • If you turned off the automated job and did not do anything about your Dictionary stats, that is bad. • Under V11 you can control %stale and defaults like method_opt at table level. Consider doing that. • If you decided to “roll your own” I would advise you leave the automated job running but just for Oracle’s objects: dbms_stats.set_global_prefs(autostats_target,’ORACLE’) ORA600 Ltd 24/45 Stats Gathering • Version 9, swap to DBMS_STATS and write your own • Version 10, use the automated job (at least for dictionary stats) write your own version/exceptions for your tables. • V11.1 – Test, do not trust me, but I would still say go auto. • V11.2 – Use the automated job and if you must intervene, use default sample size so you get one pass NDV. • V11 – look at rolling up stats for partitions, subpartitions – but read up on it extensively. You have to ensure you gather all partitions or sub-partitions. • With all versions of Oracle you will have exceptions, usually overnight batch or partitioned tables. If you are the DBA, this is part of your job. ORA600 Ltd 25/45 New Oracle 11 NDV • One of the most demanding parts of generating object statistics is gathering Number of Distinct Values • Oracle 11 introduced the single-pass NDV function. It scans the data once, uses much less memory, is faster and more accurate. • You have to use: ESTIMATE_PERCENT=DBMS_STATS.AUTO_SAMPLE_SIZE • You cannot use BLOCK sampling. Amit poddar via JL http://jonathanlewis.files.wordpress.com/2011/12/one-pass-distinct-samplingpresentation.pdf http://structureddata.org/2007/09/17/oracle-11g-enhancements-to-dbms_stats/ https://blogs.oracle.com/optimizer/entry/improvement_of_auto_sampling_statisti cs_gathering_feature_in_oracle_database_11g ORA600 Ltd 26/45 The Automated Job may Choke ORA600 Ltd 27/45 select substr(operation,1,30) operation ,to_char(start_time,'DD-MON-YYYY HH24:MI:SS.FF') START_tme ,to_char(END_TIME,'DD-MON-YYYY HH24:MI:SS.FF') end_tme from sys.WRI$_OPTSTAT_OPR order by start_time desc OPERATION -----------------------------gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_dictionary_stats gather_dictionary_stats gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) gather_database_stats(auto) START_TME -----------------------------02-JUN-2011 22:00:02.979206 31-MAY-2011 06:00:02.811976 30-MAY-2011 22:00:01.976379 29-MAY-2011 22:00:01.416256 28-MAY-2011 22:00:02.243542 27-MAY-2011 22:00:03.588237 26-MAY-2011 22:00:01.602425 24-MAY-2011 11:42:31.771667 24-MAY-2011 11:42:11.396340 24-MAY-2011 06:00:02.905945 23-MAY-2011 22:00:01.732964 22-MAY-2011 22:00:01.421518 21-MAY-2011 22:00:01.942455 20-MAY-2011 22:00:03.066981 19-MAY-2011 22:00:02.571718 17-MAY-2011 06:00:01.462810 16-MAY-2011 22:00:01.096761 Ora 600 Ltd END_TME --------------------02-JUN-2011 23:04:37 31-MAY-2011 06:56:21 30-MAY-2011 22:19:31 29-MAY-2011 23:36:14 29-MAY-2011 00:12:18 27-MAY-2011 23:14:24 26-MAY-2011 23:14:05 24-MAY-2011 11:42:35 24-MAY-2011 11:42:15 24-MAY-2011 06:20:38 23-MAY-2011 22:58:35 23-MAY-2011 06:00:05 22-MAY-2011 06:00:01 21-MAY-2011 06:00:01 19-MAY-2011 23:04:27 17-MAY-2011 08:02:34 16-MAY-2011 23:14:56 28/45 Fixing Choked Stats • Once the automated job chokes, it will continue to choke. Every night. This is because it tries the same thing each night. • The longer weekend run should sort things out – until it chokes. • Identify the table (Look for the gather statement interactively during the window, pull off the list of tables to gather options=list_stale, check for large objects with 10% difference...) • Do a manual gather with something like: block_sample=>true, estimate_percent=>0.1, degree=>8 method_opt=>for all columns size 1,cascade=> false, noinvalidate=>false, granualrity=>global • Once that is run you can afford to do a larger sample size and do the indexes. Do the PK first. • You probably need to lock the stats on the table and treat it as an exception. ORA600 Ltd 29/45 The Biggest “Wrong Stats” Issues The below are the worst “stats” causes of performance issues, in order, in my opinion based on experience 1. The stats say a segment is empty and it is not. 2. Your WHERE predicates are out of range for the known column values. 3. The stats say a segment holds than 10* less data than it does. The more orders of magnitude out, the worse. 4. Histograms. (there and not need or need and not there. Ouch) 5. The correlation between columns is not understood by the optimiser e.g. That values in tab1.x “line up” with those in tab2.y 6. Edge cases that experts get excited about but 99% of us never see Ora 600 Ltd 30/45 Stats Issues are a VLDB thing? • I can only go one what I have seen and, to a less turstworthy level (*ironic given the sources), what I have heard... • OLTP systems with a need for the fastest absolute response time to small data queries are NOT troubled by object stats. • Edge cases that balance on correlation or swapping to nested loop from hash or using Cartesian join are specific to OLTP and a set of requirements where stats gathering are, well, redundant. • Where pain occurs is when a plan that hashes several segments together {often including partition exclusion} swaps to either a nested loop or Cartesian merge join that is not suitable • When it comes down to it, the plan for large volumes of data is right for all volumes of data. If it does it in 30 seconds inefficiently, doing it in 33 second efficiently is spot-on good enough. ORA600 Ltd 31/45 Single Values Expected Outside of range 100 Oracle 10 with no Histograms 10,000 values Expect No Of values 0 100 200 300 400 300 400 300 400 300 400 100 Oracle 10 with Histograms 10,000 values Expect No Of values 0 100 200 100 Oracle 11 with no Histograms 10,000 values Expect No Of values 0 100 200 100 Oracle 11 with Histograms 10,000 values Expect No Of values 0 100 Or 200 100 10,000 values Expect No Of values 0 100 ORA600 Ltd 200 300 400 32/45 Range Values Expected Outside of range 100 Oracle 10 with no Histograms 10,000 values Expect No Of values 0 200 300 100 Oracle 10 with Histograms 10,000 values Expect No Of values 0 200 300 100 Oracle 11 with no Histograms 10,000 values Expect No Of values 0 200 300 100 Oracle 11 with Histograms 10,000 values Expect No Of values 0 200 ORA600 Ltd 300 33/45 Histograms and Data Ranges • Oracle 10 deals with column values being “out of range” by decreasing cardinality over that range • Eg if the low_value is 200 and the high_value is 300 and there are 1000 rows, that is 10 rows for 225. • For 350, it is 50% outside the range, so 10 rows is reduced by 50% to 5 rows. At value 400 it drops to 1 row. This is fine if you gather at 10%. • Histograms massively alter this “out of range” half-life. I have seen massive issues with dates. A table covering 5 years of data, with histograms on the date, can reduce a value only a week out of range to less then 1% of the average value. • This is a big issue on large tables that have stats gathered only occasionally (as not changed by a big percentage). ORA600 Ltd 34/45 Histograms and Partitions • This Out of Range issue is thrown into sharp relief with Partitions. • If the partitions are daily or weekly and have histograms on them, SQL statement selecting data for the latest hour can become “out of range” sooner than you can believe. • Spot this by the cardinality being 1 rather than several hundred or thousand. • Can happen even without histograms, especially with daily partitions • Solutions? Use dynamic sampling on latest daily or weekly partitions (your in-house code either does not collect or deletes such stats), insert half-day stats or gather very aggresively. ORA600 Ltd 35/45 Replacing the Automatic Stats Job The CBO is Complex. Stats are in effect a constantly evolving part of your code base. A simple approach will only work on a simple system. (And I really am Sorry!) ORA600 Ltd 36/45 Replacing the AUTO stats job • Don’t • Tweak the current job – make it run at different times, alter the %stale and defaults at table levels. • Locks the stats on the tables that give you issues and treat these as your exceptions. • If you MUST replace the Auto job it will hurt: • Initial estimate for “simple” replacement will be a week or two. • If you plan and estimate it, that will come out as four weeks. • It will take you 2 months. • You will emulate most of what the AUTO job does. • Your solution will probably need to look at segment size, DBA_TAB_MODIFICATIONS and have several control table that allow you to specify METHOD_OPT, SAMPLE SIZE and GRANULARITY at table level... ORA600 Ltd 37/45 Auto Stats Replacement OBJ_CTL OWNER TAB_NAME PART_NAME DFLT_STALE DFLT_SAMPLE DFLT_METHOD_OPT IDX_PCT Your Stats Job STATS_CTL DBA_TAB_MODS DBA_SEGMENTS LIST_STALE/EMPTY COPY RULES SAVE_STATS STATS_LOG BLOCK SAMPLE TAB THEN INDEX ORDER BY SIZE ASC ALL COLUMNS SIZE 1 PARALLEL ORA600 Ltd 38/45 Block or Row Sample • When you stats an ESTIMATE_PERCENT for a table gather statement, you can also state if it is block or row sample. It defaults to Row. • Row sample selects the percentage rows scattered across all Breaking of news! blocks. If you have eg 16k block size and over 100 rows per block, a 1%sample size will visit every block. There appears to be some issue with block sampling on • Block sample selects whole blocks, which greatly reduces the version 10.2 physical IO You can use the SAMPLE command in normal SQL • Block sample size gives low column cardinalities and is susceptible to select statements and that is what Oracle does to gather getting high-low values that are not close to the true edges. stats. However, the BLOCK SAMPLE seems to vary the • The under-sampling of columns depending on spread, but my actual number of blocks itvaries checks for a given sample tests show that 5% block sample size is about as good as 0.8% row size. sample size, but still 10 times faster. It is fine for 99% of cases. I will investigate when I have time. • Oracle V11 up – the new NDV and speed of stats pushes me back towards AUTO SAMPLE SIZE ORA600 Ltd 39/45 Write Something to Gather Stats on all Segments • You need to gather stats on each segment that needs stats. I strongly suggest any in-house code works segment-by-segment such that all tables, partitions and index segments are processed “as one” • You could use GATHER_SCHEMA_STATS and LIST the objects into an array and process. This gets object s Oracle would deem in need of stats: o STALE list those objects that have stats but have changed by 10% o EMPTY list those objects with no current stats o AUTO is supposed to list both but is buggy in 10.2.0.3 • Alternatively, run through all segments in the schemas you are interested in and used DBA_TAB_MODIFICATIONS directly. Slower (*) but much more control • I used the above on 10.1 but on current 10.2 system, the data dictionary is too slow. We decided to revert to GATHER_SCHEMA_STATS ORA600 Ltd 40/45 Do not write code to roll back stats. • Oracle keeps all stats changed for 31 day, by default. • You can alter this with DBMS_STATS.ALTER_STATS_HISTORY_RETENTION, check it with DBMS_STATS.GET_STATS_HISTORY_RETENTION (usualy 31). • If the stats history is causing you issues use PURGE_STATS to get rid of them but, be warned, once gone they are gone. But then, how often do you RESTORE stats? • Unless you REALLY want to be able to identify sets of stats, just use Oracle’s in-built feature where it stores previous stats for 31 days and you can recover them. • This hidden feature can create a LOT of data, especially if you have lots of partitions with lots of column stats. It goes into SYSTEM TS. http://mwidlake.wordpress.com/2009/08/03/why-is-my-system-tablespace-so-big/ ORA600 Ltd 41/45 Rolling Back Statistics • Most people are aware of the potential to have a user_statistics_table. This is a table you create that you can put stats into using EXPORT_XXXX_STATS and retrieve them with IMPORT_XXXX_STATS • You use the CREATE_STAT_TABLE table to create the table and give a STATID to sets of stats you wish to export and import. • One little Gotcha. The documentation is not clear, if you state a user statistics table in GATHER_XXXX_STATISTICS then they are NOT placed into the stats table, they are put in the dictionary and the OLD values are put in the stats table. • Rolling back stats works for system, fixed_object, database and object stats. I’ve tested it, it works – at least under 10.1 and 10.2. • Oracle 11 allows you to create Pending Stats which is very nice. It beats the manually version I developed for V10 in...ohh... 2005? ORA600 Ltd 42/45 Rolling Back • Gathering System ,Fixed Object and Dictionary statistics are “system wide” changes. Do you have a suitable system to test this on? • All have equivalent DBMS_STATS.RESTORE_XXXX_STATISTICS commands that, when I tested on 10.2.0.3, worked correctly (including blanking stats that were previously null). • You could save the stats being replaced in your own statistics table and recover from there. The only advantage is it is easier to interrogate stats you saved into your own stats table. • Deleting System Stats reinstigates the default values seen after install. • Deleting Fixed Object stats deletes the stats. • Deleting Dictionary stats is not a very good idea as it works. • I said you could restore Dictionary stats but, even though I tested it, I am not going to promise that all stats will be the same after you restore... ORA600 Ltd 43/45 Tables Cascade or Not? • If you gather statistics on a Table or Table Partition, the default is to CASCADE to relevant index segments. • The default will gather between 1000 and 2000 index blocks. I find this to be overkill. • By NOT cascading to index partitions but doing the indexes specifically and setting a sample size derived from the index size, you can use smaller samples sizes. • However, this is more code to write and test. It may be more pragmatic to just cascade, though each index segment may end up taking as long as the table segment to be gathered. • Consider global indexes and partitioned global indexes. ORA600 Ltd 44/45 DBMS_STATS Package • Many of us still refer to “analyze the table” but of course we all now mean “gather statistics with DBMS_STATS”. Don’t we . • Don’t use ANALYZE any more, especially not on production systems. • DBMS_STATS gathers better statistics, including histograms, and cascade down to partitions and sub-partitions as needed. • Can be run for specific segments (table, index, partition thereof etc), for a schema or for the database and can be set to only gather “stale” objects. • Can use Parallel to make up for the slower running, can set sample size by row and block (see more later) • In these slides, any procedure or function is in DBMS_STATS unless otherwise specified. ORA600 Ltd 45/45 Stability and Performance Dichotomy • To get the best performance, you want the stats to be as up-to-date and accurate as possible. • When the stats change, it is like having a miniture code Please note, the actual result of the data processing does release. The processing of the application can change – not change, the functionality is preserved, but the time after all, that is the idea. taken can change, the idea being it changes for the better • How many people here would allow a code release on their business critical systems at any time? • Most people are not actually concerned with good performance, they are concerned with poor performance. • Code can suddenly start running slowly “over night” when stats are gathered. Just one very bad statement can actually make the system unusable. Ora 600 Ltd 46/45 Why are Bad Stats worse than No Stats? • As an example, stats showing a table or partition is empty will cause the CBO to think it is a cost of 1 to scan and only one record will be found. Nested loops to access it and Cartesian Merge plans will occur. • More later, but stats saying there is no data later than a month ago will cause the CBO to again assume there is as little as one record to find. • If there are no stats, Oracle will use dynamic sampling or use defaults, which are usually better than bad stats • Spotting Bad Stats is not as easy as spotting missing Stats, especially as Oracle stores high/low values and histograms in internal formats. • Old stats (a specific type of bad stats) can cause a plan to flip to another plan without notice, with NO CHANGE. ORA600 Ltd 47/45 Intervention methods for stats • Stated low percentage stats gathering for large objects – use BLOCK SAMPLING and do tables and indexes specifically. • Set statistics manually or COPY_TABLE_STATS. The tricky part is the column stats and any histograms you have to have. • Delete and lock stats and allow Dynamic Sampling to occur. • Gather stats at lowest level (sub-partition or Partition) and allow them to be calculated at higher levels via Incremental stats. Made possible via synopses (but needs care to implement) • The overall aim is always the same. You do not need accurate stats, you need stats that are good enough to give execution plans that will work for large data volumes ORA600 Ltd 48/45 When to Gather • Most sites have stats gathering over night, even if they replace the auto stats gathering job. This may well not be the best time, • You may be better off collecting stats in the late evening or early morning. You can use PARALLEL to get the job done more quickly. • You are very unlikely to benefit from gathering all stats in one window. • If you are processing date into a table, gather the stats at the right time(s). This is probably after the load, or it could be several times in the process. • So long as only specific gathering is done, there is great benefit to be had gathering stats as your batch process proceeds. ORA600 Ltd 49/45 When to Gather-Batch Processing • I’m soooooo tired of this conversation and I have had it so many times... Batch processing means you need to think about stats • You have a table that is empty. You stuff it full of data. You use it to load data into your live database. Then you may or may not truncate it. Ask yourself, when is the volume of data significant and when do you gather stats. • If it is a global temporary table you may need to set some stats. • If I was to be given a pound for each time I had seen that the batch processing had no concept of stats gathering I would have £17.50. • So long as only specific gathering is done, there is great benefit to be had gathering stats as your batch process proceeds. ORA600 Ltd 50/45 Incremental Stats • I’m sorry but I ripped it out. There is nothing between “you can do it” and “here is how it works” that is not 1 min or 30 mins. • Do NOT attempt before V11, even though it is back-ported to 10.2.0.4 and .5. I’ve tried, it was hard. It was not nice. • You gather stats at the lowest level, sub-partition or partition at a reasonable level. These are summed up. Synopses are used to allow NDV calculations. Better than the crap I came up with.. • YOU HAVE TO BE RIGOROUS in ensuring all stats are gathered at the lowest levels and issues do not occur, but it works well if controlled. • No one seems to mention this, but, it all falls apart when you add new columns to a table. It won’t handle this as far as I have seen and those tables with stats that no longer update? Urrrrgghhhh. ORA600 Ltd 51/45 One Extreme - Dynamic Sampling Delete stats from a table and lock the table. • If there are no stats for a table, they are gathered from the table as it is now, thus avoiding issues with the table having recently changed. • You set the level of DYNAMIC SAMPLING at the instance level. In 10.2 it defaults to 2, gather stats on any segment lacking them, • At level 3, Guesses for filter predicates are checked and at level 4 correlation between columns are checked. These can greatly improve performance. • Only the data sampled can be considered and this is going to either be much less than that considered by a GATHER statement or else will take a long time to sample. • Levels above 4 increase the number of blocks assessed by the dynamic sample, up to 10 which is “every block”. Don’t do that. ORA600 Ltd 52/45 Dynamic Sampling • Dynamic sampling can be set at the database level (optimizer_dynamic_sampling) or in hints. • I would suggest setting the database level to 3 or 4 on a Datawarehouse, but I’ve not had much success getting Live sites to do this. • Dynamic sampling does extend parse times (sometimes to several seconds, potentially) so not suitable for sites with high SQL statement turnover (ie no use of binds). Which is a shame as binds cause other issues, with histograms and correlation! • Some people argue that, on DWs, delete all user-table (and index)stats, lock the tables and use dynamic sampling alone. • I repeat, Not suitable for OLTP-like SQL activity. ORA600 Ltd 53/45 Other Extreme - Freezing Stats • DBMS_STATS.LOCK_TABLE_STATS and LOCK_SCHEMA_STATS allow you to lock the stats. Can’t lock at partition or index. • The automated stats collection will leave those segments (and their dependents) alone unless you use the ‘force=true’ parameter. • Individual gather/set/delete/import stats statements on a locked table will cause an error. • Locking populated stats will help preserve an execution plan, which gives stability but prevents improvements. However, you can’t stop time. • If you delete the stats first and lock them, they stay empty, allowing dynamic sampling to prevail. • If you lock the stats, people or processes gathering stats THAT SHOULD NOT BE will fail. Can re-enable via the force parameter Ora 600 Ltd 54/45 Locking Stats • If you lock the stats on a table then Oracle will not collect them anymore. • If you lock a table with empty stats, the CBO will dynamically sample the table when the SQL statment is parsed. This is fine for long running SQL like on Data Warehouses and utterly unacceptable for OLTP systems where you need the answer in 100ms. • If you have gathered stats on an empty table and lock the stats you are in a world of pain as the CBO will think there are no rows and base it’s plan on that. • If you gather stats when you know that they will be good, and then lock them, then your execution plans will probably be good. • If you have a complex situation but know what stats are needed, you can SET those stats and lock the table. No more stats will be gathered. Everything will be fine until the world moves on. ORA600 Ltd 55/45 Histograms • Histograms and bind variables do not mix well on Oracle 10 due to bind peeking. Oracle 11 is supposed to fix this. • In essence, if the first bind values seen by the parse are not typical (or match a low-cardinality plan) the plan chosen can perform very poorly for larger returned data sets. • These bad plans can get stuck in the SGA. Eg takes 11 minutes to run and kicked off every 10 minutes. You can’t get the code out of the SGA. • Histograms increase the time taken to gather stats significantly. Oracle 10.1 and 10.2 massively over-gather histograms. • If you write your own code you can stop histograms being gathered. • But Histograms are really good for some code. • If only there was a “parse” hint  ORA600 Ltd 56/45 Further Information • PL/SQL Packages and Types Reference, DBMS_STATS • Blogs - Jonathan Lewis (generally), Doug Burns (recent series on partitions and stats), Christian antognini, follow links to others • My Blog, I intend to do some more posts on stats gathering. • Email me and ask, I’m happy to answer general questions if I can. • Your system. ORA600 Ltd 57/45 Table Sample Size • Generally speaking, the larger the table the smaller the sample size you need for “good enough” statistics. • I usually decide on a percentage sample size derived from the number of blocks in the table, aiming for 1000 to 10,000 blocks. • SIZE_AUTO samples 0.01% then 0.1% (or similar) and sees if the stats change significantly, then increase the percentage size until the stats are stable or a compute is quicker. • SIZE_AUTO sounds clever but works out, in practice, to usually be inefficient. Just does not cope with lots of large segments. If you think about it, it almost guarantees gathering at a samples size larger than you need before stopping. • May want to gather the table only with no CASCADE to indexes as Oracle over-samples indexes with the cascade option (I hardly every use CASCADE in replacements for the auto job) ORA600 Ltd 58/45 I over-ran, didn’t I? And I kept it to 45 slides.  Oh Well, next SIG... ORA600 Ltd 59/45 Template Picture Slide ORA600 Ltd 60/45

stats_gathering_oak_121202

Related documents

Products

Support

stats_gathering_oak_121202

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib