Managing Cost-Based Optimiser Statistics in PeopleSoft David Kurtz Go-Faster Consultancy Ltd. david.kurtz@go-faster.co.uk www.go-faster.co.uk Who Am I? • Oracle Database Specialist – Independent consultant • System Performance tuning – PeopleSoft ERP – Oracle RDBMS • Book – www.psftdba.com • UKOUG – PeopleSoft Technology SIG Committee Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 2 • If you are an Oracle DBA – Next Week – In Birmingham – 3-days Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 3 Agenda • Cost Based Optimiser Statistics – Permanent Tables • Database/Schema-Wide Maintenance – Interim Tables • Working Storage/Temporary/Reporting • Populated by Batch • Statistics Managed by Batch • I am going to talk about Oracle databases – The principles apply to all databases – The specifics are different Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 4 Cost Based Optimiser • A mathematical model – Given a SQL statement – Uses volumetric statistics – Determines ‘optimal’ execution plan • Like predicting the weather – Doesn’t always produce the best answer • Deficiencies in the model • Inaccuracies in data (statistics) Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 5 What do you mean by ‘optimal’? • An execution plan has a cost – That cost is an estimate of how long the query will take. – In practice that may not be the best execution plan. – The Optimiser might not know enough about the data. • Skew • Correlation Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 6 What do you mean ‘correlation’? • Imagine table of people – Attributes include: Star sign, Month of Birth – How many people born in August? • 1 in 12 – How many people are Leos? • 1 in 12 – How many Leos born in August? • 1 in 144? No. • 1 in 16, We know that data is correlated Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 7 What do you mean ‘correlation’? • Example from HR. PS_JOB (emplid VARCHAR2(11) … ,effdt DATE …) Managing Cost-Based Optimiser Statistics in PeopleSoft • From a real system 931012 rows 142224 EMPLIDs 7456 EFFDTs • There is one day with 77985 rows! ©2009 www.go-faster.co.uk 8 General Principles • Tell the database the truth about the objects it is working on. – At least enough of the truth for the Optimiser to make the right decision • System Statistics reflect performance of disks • Object Statistics reasonably up to date • Histograms to describe skew of individual columns. – Oracle 11g will build histograms on combinations of columns Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 9 Corner Cases • There will be times when that is not enough – Hints (which are directives) • And in extreme cases – Manually adjust statistics – ‘Tuning by Cardinality Feedback’ (Wolfgang Breitling) – http://www.centrexcc.com/papers.html Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 10 System Statistics • Introduced in Oracle 9i • Not automatically collected – In which case arbitrary defaults used • Tells Oracle about – CPU Speed • Cost of code path – Single –v- Multi-block Disk Speed • Relative cost of index –v- scan Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 11 By default, how does Oracle maintain statistics? • Maintenance Window – New in Oracle 10g • Previously you had to define your own job – Default 10pm – 6am weekdays + weekends – Refresh ‘stale’ object statistics • Stale means >10% change • Table monitoring enabled by default – user_tab_statistics – dbms_stats.flush_database_monitoring_info – Automatic Histogram Collection Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 12 Histograms? • Statistics include minimum and maximum value for every column – Optimiser assumes values evenly distributed in that range. • Describe non-uniformities in data distributions – Frequency –v- Height Balanced – Up to 254 ‘buckets’ • Parse Overhead – More input data, more calculations Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 13 Maintaining Statistics • What usually happens! – Leave to to default behaviour • We keep the statistics up to date to maintain system stability – I sometimes see Legacy 9i scripts, that collect stats on everything at high sample size. Managing Cost-Based Optimiser Statistics in PeopleSoft • My Advice – 90% of time, default is fine • Rather like traction control on a car – Take control, at least for tables where you have had to resolve a problem ©2009 www.go-faster.co.uk 14 Taking Control • Techniques: – – – – Lock Statistics Delete Statistics Optimiser Dynamic Sampling Modifying PeopleSoft Behaviour Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 15 Locking Statistics • New in Oracle 10g • Table (and indexes) omitted from database/schema-wide statistics operations • dbms_stats.gather_table_stats & ANALYZE commands generate an error when stats are locked • dbms_stats has force=>TRUE Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 16 Optimiser Dynamic Sampling • Extra statistics collected on-the-fly at parse time. – Usually for when there are no object statistics – Useful for when object statistics may lead to errors in estimation of selectivity. • Recommendation: Set to Level 4 • default is 2 – Enabled for tables with 2 column predicates – This helps where columns correlated Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 17 Working Storage/Reporting Tables • In an ideal world – Clear the table %TruncateTable() – Populate table – Gather statistics %UpdateStats() – Use tables Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 18 %UpdateStats • Application Engine macro – Invokes DDL Model – Dynamically defined in PSDDLMODEL • So you can change this – Used to be ANALYZE TABLE – From PT8.48 delivered with dbms_stats.gather_table_stats • http://blog.psftdba.com/2007/05/changes-to-ddlmodels-in-peopletools.html Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 19 DDL Model Problems (1) %UpdateStats(record name ,[HIGH/LOW]) • Low – DDL Model specifies 1% sample • High – dbms_stats.auto_sample_size • Dynamically calculated by Oracle • often less than 1% Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 20 DDL Model Problems (2) %UpdateStats(record name ,[HIGH/LOW]) • Low – FOR ALL COLUMNS SIZE 1 • Removes all histograms • High – FOR ALL INDEXED COLUMNS SIZE 1 • Removes histograms from indexed columns • Leaves histograms and min/max values on unindexed column values unchanged. Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 21 %UpdateStats • A good idea, but a blunt instrument. • Replace it with custom PL/SQL package – wrapper.ps_stats() • • • • If statistics locked suppress gather Doesn’t analyze Global Temporary Tables Refreshes stale statistics on partitioned tables Meta-data driven to control behaviour per record to override defaults – http://blog.psftdba.com/2009/06/controlling-howupdatestats-collects.html Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 22 COBOL %UpdateStats • Expansion hard coded in COBOL – Still produces ANALYZE in PT8.49 – GP Cobol uses stored statements STORE GPPSERVC_U_STATH %UPDATESTATS(PS_GP_PYE_HIST_WRK); – It is possible to recode statements to call wrapper STORE GPPSERVC_U_STATH BEGIN wrapper.ps_stats(p_ownname=>user,p_tabname=> 'PS_GP_PYE_HIST_WRK'); END;; Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 23 PeopleSoft Temporary Records • Permanent Database Tables used for – Temporary Working Storage • Application Engine automatically clears the data out of the table. – Truncate or Delete • Neither invalidate statistics. – Statistics will reflect the previous data • No point including these tables in maintenance window – Statistics gathered will always be out of date. Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 24 Statistics on Temporary Records • Delete and lock Statistics – To omit them from schema/database-wide maintenance activities • Optimiser Dynamic Sampling – So you don’t need to explicitly collect statistics • But you will sometimes! • %UpdateStats – Or Process needs to refresh statistics – Use wrapper.ps_stats() otherwise ORA-20005 Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 25 Deleting Statistics in Application Engine Programs • Delete statistics on temporary tables on allocation to Application Engine instance – Trigger on PS_AETEMPTBLMGR – Ensures that you don’t use statistics collected by previous Application Engine • With different process instance number – http://blog.psftdba.com/2009/04/statisticsmanagement-for-peoplesoft.html Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 26 Oracle 10g Statistics Retention • By default Oracle 10g retains a history of object statistics for 31 days. – Global setting, not per object or schema – dbms_stats writes previous values to SYS.WRI$_OPTSTAT%HISTORY tables – Lots of concurrent batches with lots of %UpdateStats? • you could have a locking problem 31 days after go live. – http://blog.psftdba.com/2009/06/oracle-10g-statisticshistory-retention.html – You might choose to disable statistics retention. Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 27 Statistics on Partitioned Tables • Partitioning is a strategy that is usually adopted for very large tables. • Gathering Statistics can be time consuming – Have to update partition-level statistics on every partition – Then calculate table-level statistics. Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 28 Time-based Partitioning • Annual/Monthly/Weekly/Periodic partitions – Historical Partitions haven’t changed – Future partitions created in advance are still empty – GATHER STALE only available for database/schema-wide operations • http://blog.psftdba.com/2009/04/statisticsmanagement-for-partitioned.html • Wrapper package can refresh stale partitions only. Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 29 Conclusion • Whatever you decide to do about statistics, make sure it’s your decision. – Defaults may not be the right decision all of the time. • My suggestions in this presentation are all in production on a large system. • The wrapper package is my solution to series of problems Managing Cost-Based Optimiser Statistics in PeopleSoft ©2009 www.go-faster.co.uk 30 Questions? Managing Cost-Based Optimiser Statistics in PeopleSoft David Kurtz Go-Faster Consultancy Ltd. david.kurtz@go-faster.co.uk www.go-faster.co.uk