White Papers

STATISTICS ON PARTITIONED OBJECTS

ABSTRACT

Optimal SQL execution plans are rooted in the quality of the statistics describing the underlying objects that are used by

Oracle’s Cost Based Optimizer. Statistics maintenance is a particular challenge on Data Warehouse systems holding large data volumes that support complex and flexible user reporting requirements. Such systems are likely to use partitioned tables and indexes to complicate the picture further.

This article is based on the problems encountered and solutions used during the implementation of several large DW systems running on Oracle 10.2 with different workload profiles and requirements and will discuss:

CORE CHALLENGES AND STATISTICS FUNDAMENTALS



The quality/performance trade off



Statistic levels on partitioned objects and how they are used



Key DBMS_STATS parameters and defaults

GLOBAL STATISTICS



Global and Approximate Global statistics

 Statistics aggregation scenarios and pitfalls

ALTERNATIVE STRATEGIES



Setting statistics



Copying statistics



Dynamic Sampling

11G IMPROVEMENTS



Incremental Statistics

INTRODUCTION

The Oracle Cost-Based Optimiser (CBO) evaluates many possible execution plans using a combination of rules and statistics to determine the most efficient execution plan for each SQL statement. The rules are embedded in the RDBMS code and largely beyond your control, but the statistics are specific to your system and data and there are various options available to you when collecting or manipulating those statistics. As the optimiser relies heavily on database object statistics to calculate the estimated row source cardinalities that are one of the primary influences on optimal execution plans, it follows that paying close attention to your object statistics will pay long-term dividends in system performance.

Even when dealing with the most basic example of collecting statistics on non-partitioned tables and indexes, the options are sufficiently complex to make statistics management challenging. To give a few examples (none of which this paper will discuss)

:-

9 April 2020 1 Doug Burns – Hotsos Symposium 2011



How often should we update statistics?

 What percentage of the data should Oracle sample?

 Should we gather Histograms on columns?

Over recent versions, Oracle has worked towards providing automatic statistics collection routines and sensible default parameter values to try to make these decisions for you automatically. However, even in cases where the defaults work well for your system, it’s useful to know how they work because you are likely to come across other systems where the defaults aren’t the correct solution. The fact that the options change regularly from release to release indicates that statistics management is a problem that is far from solved.

Data Warehouses (DW) and other Very Large Databases (VLDB) would be practically impossible to manage without the introduction of Partitioning in Oracle 8 because of the high cost of accessing large tables and indexes completely, rather than just a subset of partitions representing a fraction of the overall object. However, once you start to use Partitioned Tables and

Indexes you will encounter additional levels of complexity which this paper will focus on. Consider a simple partitioned table and index -

Listing 1 – Create Partitioned Table and Index and INSERT initial rows (INIT.SQL)

CREATE TABLE TEST_TAB1

(

REPORTING_DATE NUMBER NOT NULL,

SOURCE_SYSTEM VARCHAR2(30 CHAR) NOT NULL,

SEQ_ID NUMBER NOT NULL,

STATUS VARCHAR2(1 CHAR) NOT NULL

)

PARTITION BY RANGE (REPORTING_DATE)

SUBPARTITION BY LIST (SOURCE_SYSTEM)

SUBPARTITION TEMPLATE

(SUBPARTITION MOSCOW VALUES ('MOSCOW') TABLESPACE TEST_DATA,

SUBPARTITION LONDON VALUES ('LONDON') TABLESPACE TEST_DATA,

SUBPARTITION SYDNEY VALUES ('SYDNEY') TABLESPACE TEST_DATA,

SUBPARTITION OTHERS VALUES (DEFAULT) TABLESPACE TEST_DATA)

(

PARTITION P_20110131 VALUES LESS THAN (20110201) NOLOGGING NOCOMPRESS,







PARTITION P_20110207 VALUES LESS THAN (20110208) NOLOGGING NOCOMPRESS

);

CREATE UNIQUE INDEX TEST_TAB1_IX1 ON TEST_TAB1

(REPORTING_DATE, SOURCE_SYSTEM, SEQ_ID)

LOCAL NOPARALLEL COMPRESS 1;

INSERT INTO TEST_TAB1 VALUES (20110201, 'MOSCOW', 1000, 'P');





INSERT INTO TEST_TAB1 VALUES (20110201, 'LONDON', 500, 'P');

INSERT INTO TEST_TAB1 VALUES (20110201, 'SYDNEY', 700, 'P');


INSERT INTO TEST_TAB1 VALUES (20110201, 'DALLAS', 400, 'P');



COMMIT;

There is a range partition per REPORTING_DATE which is list sub-partitioned depending on the SOURCE_SYSTEM that sent the data, with an OTHERS sub-partition for SOURCE_SYSTEMS other than MOSCOW, LONDON and SYDNEY. There are no statistics


on this table or index so as a first step, I’ll gather statistics using the default values for all parameters other than the schema owner and table name.

Listing 2 –Gather Statistics with Default parameter values (DEFAULT.SQL) 1

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1');

PL/SQL procedure successfully completed.

Those familiar with gathering statistics will expect to see some of the updated statistics in views such as USER_TABLES and

USER_INDEXES.

Listing 3 –View basic table and index statistics (DEFAULT.SQL)

SQL> select table_name, global_stats, last_analyzed, num_rows

2 from user_tables

3 where table_name='TEST_TAB1';

TABLE_NAME GLO LAST_ANALYZED NUM_ROWS

-------------- --- -------------------- ----------

TEST_TAB1 YES 05-FEB-2011 12:38:11 11

SQL> select index_name, last_analyzed, distinct_keys, clustering_factor, global_stats

2 from user_indexes


INDEX_NAME LAST_ANALYZED DISTINCT_KEYS CLUSTERING_FACTOR GLO

------------------------------ -------------------- ------------- ----------------- ---

TEST_TAB1_IX1 05-FEB-2011 12:38:11 11 11 YES

As you can see, the statistics were gathered on the 5 th February and the table-level NUM_ROWS=11 which reflects the INSERT statements in Listing 1 accurately. However, you should also note the GLOBAL_STATS column which contains ‘YES’ for both the table and index. That is the first hint of the additional layer of complexity introduced with partitioned objects. Because the statistics we are looking at are no longer just Table or Index statistics but Global Statistics. We will also find that additional statistics were gathered on other levels of the objects because remember that our ‘table’ consists of the table itself, the partitions and the sub-partitions. For example, by looking in DBA_TAB_PARTITIONS, we’ll see some of those additional statistics.

1 At this stage all of the examples are executed against the default general purpose database created by DBCA as part of the

Oracle 10.2.0.4 installation process, with no additional one-off patches applied. This means that you can download Oracle

10.2.0.4 from http://www.oracle.com/technetwork/database/database10g/downloads/index.html

and run the scripts in

Appendix B yourself to see the full results. In this paper, I will limit the output displayed to that required to illustrate my points.


Listing 4 –View partition statistics (DEFAULT.SQL)

SQL> select table_name, partition_name, global_stats, last_analyzed, num_rows

2 from user_tab_partitions

3 where table_name='TEST_TAB1'

4 order by partition_name;

TABLE_NAME PARTITION_NAME GLO LAST_ANALYZED NUM_ROWS

-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 YES 05-FEB-2011 12:38:11 0

TEST_TAB1 P_20110201 YES 05-FEB-2011 12:38:11 4

TEST_TAB1 P_20110202 YES 05-FEB-2011 12:38:11 2

TEST_TAB1 P_20110203 YES 05-FEB-2011 12:38:11 1

TEST_TAB1 P_20110204 YES 05-FEB-2011 12:38:11 2

TEST_TAB1 P_20110205 YES 05-FEB-2011 12:38:11 1

TEST_TAB1 P_20110206 YES 05-FEB-2011 12:38:11 1

TEST_TAB1 P_20110207 YES 05-FEB-2011 12:38:11 0

8 rows selected.

These additional Partition statistics also appear to be Global (according to the view) and show the number of rows contained in each partition. As we’ll see later, statistics are not limited to row counts but even the addition of this basic information about each partition is useful to the CBO in determining the profile of our data to arrive at the optimal execution plan.

Note – at this stage I will drop the index and use table statistics to demonstrate the underlying principles that apply to both tables and indexes for the rest of the paper.

COLLECTION GRANULARITY AND STATISTICS LEVELS

There are several logical components of partitioned objects and each level of components has its own set of statistics.

Partitioned Objects

 Global Statistics that describe the entire object.

 Partition Statistics on each Partition.

Subpartitioned (or Composite Partitioned) Objects



Global Statistics that describe the entire object.



Partition Statistics that describe the entire Partition and all of its Subpartitions.



Subpartition Statistics on each Subpartition.

The CBO can use different combinations of these statistics depending on which components the query must access and the version of Oracle. We’ll revisit how the statistics are used shortly but first of all, how do we gather the different levels of statistics in the first place?

When using Partitioned Tables and Indexes, you can use DBMS_STATS 2 to gather separate statistics on the various components of the objects. The level at which statistics are gathered is governed by the DBMS_STATS GRANULARITY parameter, which allows the following values in Oracle 10g and 11g.

2 In fact you should always use DBMS_STATS for manipulating statistics these days but you must use DBMS_STATS to gather

Partition statistics. The ANALYZE command is for dinosaurs only and supports a fraction of the DBMS_STATS functionality.


DBMS_STATS GRANULARITY value Statistics Gathered

ALL

AUTO

DEFAULT

GLOBAL

GLOBAL AND PARTITION

PARTITION

SUBPARTITION

Global, Partition and Subpartition

Determines granularity based on partitioning type (This is the default)

Gathers global and partition-level statistics. This option is deprecated, and while currently supported, it is included in the documentation for legacy reasons only. You should use 'GLOBAL AND PARTITION' for this functionality. Note that the default value is now 'AUTO'.

Global

Global and Partition, but does not gather Subpartition stats, even if the object is subpartitioned

Partition (specify PARTNAME for a specific partition. Default is all partitions.)

Subpartition (specify PARTNAME for a specific subpartition. Default is all subpartitions.)

For example, in Listing 2 we gathered statistics on TEST_TAB1 with the default DBMS_STATS options which implies that we used

GRANULARITY => ‘AUTO’ and that Oracle determined the granularity based on the partitioning type. What did this mean in practice? Well, as we saw in Listings 3 and 4, DBMS_STATS gathered



Global Statistics on the Table (Listing 3)



Partition Statistics on all of the Partitions (Listing 4)

How about the Subpartition Statistics? Here is a small sample of the results that shows that Subpartition Statistics were gathered. (Note that this is different to the previous 9i default value of ‘DEFAULT’ that did not collect Subpartition Statistics)

Listing 5 –View subpartition statistics for the P_20110201 partition (DEFAULT.SQL)

SQL> select table_name, subpartition_name, global_stats, last_analyzed, num_rows

2 from user_tab_subpartitions


4 and partition_name='P_20110201'

5 order by subpartition_name;

TABLE_NAME SUBPARTITION_NAME GLO LAST_ANALYZED NUM_ROWS

-------------- ------------------ --- -------------------- ----------

TEST_TAB1 P_20110201_LONDON YES 05-FEB-2011 12:44:36 1

TEST_TAB1 P_20110201_MOSCOW YES 05-FEB-2011 12:44:36 1

TEST_TAB1 P_20110201_OTHERS YES 05-FEB-2011 12:44:36 1

TEST_TAB1 P_20110201_SYDNEY YES 05-FEB-2011 12:44:36 1

HOW STATISTICS LEVEL S ARE USED

Why does Oracle gather statistics about the various levels of partitioned objects? Because the CBO will use different statistics in different situations and although the precise details are likely to change frequently between versions, you can probably rely on a few high-level points.

 If a statement accesses multiple partitions the CBO will use Global Statistics.



If a statement is able to limit access to a single partition, then the partition statistics can be used.




If a statement accesses a single subpartition, then subpartition statistics can be used. However, prior to 10.2.0.4, subpartition statistics are rarely used. The CBO will use BLOCKS in Table Subpartition statistics to calculate the cost of a full scan of the Subpartition but does not use any Index Subpartition Statistics. For practical purposes, it is probably best to assume that Subpartition Statistics are not very valuable in any version prior to 10.2.0.4.

 For most applications you will need both Global and Partition Statistics for the CBO to operate effectively which is why

Oracle’s default DBMS_STATS options have always included Global Statistics.

AGGREGATED GLOBAL STATISTICS AND THE QUALITY/PERFORMANCE TRADE-OFF

If the CBO is so dependent on a combination of global and partition statistics then why not use the default GRANULARITY of

AUTO every time we re-gather statistics on updated data? Well partitioned objects are normally used because accessing the entire object is resource-intensive and slow and you’re aiming for SQL execution plans that access a small subset of partitions instead. Collecting global statistics requires a full table scan and is therefore likely to be a prohibitively expensive operation; particularly if you are regularly adding new partitions which make the Global Statistics become stale even if you have gathered up to date statistics on the new partitions.

You might be able to schedule a window outside of your standard operating hours to regather global statistics on large objects occasionally, but this would still mean that your global statistics may become stale. You might also decide to gather global statistics using a very small sample (low ESTIMATE_PERCENT) to speed up the processs but as the sampling percentage falls it will typically affect the quality of the resulting statistics negatively.

Oracle offers an alternative to gathering Global Statistics, which is to use the statistics on the underlying partitions as the source to generate Aggregated Global Statistics 3 at higher levels of the same object. A valid statistics collection strategy for a composite partitioned object might look something like this



Gather statistics at the subpartition level for all subpartitions and allow Oracle to generate Aggregated Global Statistics at the Table and Partition levels



As new subpartitions are added or the data changes sufficiently in an existing subpartition to make the statistics stale, only gather statistics on the relevant subpartitions and allow the aggregation process to maintain the Table and

Partition level Aggregated Statistics.

I'll empty TEST_TAB1, delete the existing table stats and regather at the SUBPARTITION level. Note that I have not passed in a specific subpartition with the PARTNAME parameter, so all subpartitions will have their statistics gathered.

Listing 6 –Gather subpartition statistics with aggregation and view statistics (AGGREGATION.SQL)

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION' );



2 from user_tables



-------------- --- -------------------- ----------

3 It might help you to think of these as Estimated or Approximate Global Statistics, for reasons that will become apparent.


TEST_TAB1 NO 05-FEB-2011 12:51:27 11






-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 NO 05-FEB-2011 12:51:27 0

TEST_TAB1 P_20110201 NO 05-FEB-2011 12:51:27 4

TEST_TAB1 P_20110202 NO 05-FEB-2011 12:51:27 2

TEST_TAB1 P_20110203 NO 05-FEB-2011 12:51:27 1

TEST_TAB1 P_20110204 NO 05-FEB-2011 12:51:27 2

TEST_TAB1 P_20110205 NO 05-FEB-2011 12:51:27 1

TEST_TAB1 P_20110206 NO 05-FEB-2011 12:51:27 1

TEST_TAB1 P_20110207 NO 05-FEB-2011 12:51:27 0

8 rows selected.







-------------- ------------------ --- -------------------- ----------





I can see that although the Subpartition Statistics are global 4 , the Table and Partition statistics are not (GLOBAL=NO), but the number of rows looks correct so the aggregation process appears to have been successful and we have avoided the full table scan overhead of collecting true global statistics. Next I’ll simulate what might happen next – new data is inserted into subartition P_20110206_MOSCOW and so we want to gather fresh statistics on just that subpartition, without the overhead of accessing the entire table.

Listing 7 –Insert additional rows into subpartition, gather subpartition statistics and view results (AGGREGATION.SQL)

SQL> INSERT INTO TEST_TAB1 VALUES (20110206, 'MOSCOW', 100000, 'P');

1 row created.


1 row created.


1 row created.


1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20110206, 'MOSCOW', 240000, 'U');

1 row created.

SQL> COMMIT;

4 Subpartition Statistics are implicitly Global because there is no structural level below subpartitions.


Commit complete.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1',

GRANULARITY => 'SUBPARTITION', PARTNAME => 'P_20110206_MOSCOW' );



2 from user_tables



-------------- --- -------------------- ----------

TEST_TAB1 NO 05-FEB-2011 12:51:27 16






-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 NO 05-FEB-2011 12:51:27 0

TEST_TAB1 P_20110201 NO 05-FEB-2011 12:51:27 4

TEST_TAB1 P_20110202 NO 05-FEB-2011 12:51:27 2

TEST_TAB1 P_20110203 NO 05-FEB-2011 12:51:27 1

TEST_TAB1 P_20110204 NO 05-FEB-2011 12:51:27 2

TEST_TAB1 P_20110205 NO 05-FEB-2011 12:51:27 1

TEST_TAB1 P_20110206 NO 05-FEB-2011 12:51:27 6

TEST_TAB1 P_20110207 NO 05-FEB-2011 12:51:27 0

8 rows selected.

The table-level NUM_ROWS has increased from 11 to 16 and the partition P_20110206 has increased from 1 to 6, both of which are a perfect reflection of the INSERT activity and at this stage I would argue that the aggregation process looks to be working pretty well. Although this is only a very small table, the resource savings and reduced elapsed time for gathering statistics on a single sub-partition and updating the aggregated statistics at higher levels are likely to be very significant on large partitioned objects and might be the only realistic alternative to having missing or very stale global statistics.

However, there is a significant flaw to this approach. When I inserted new data, I also introduced a new value in the STATUS column of ‘U’ and it’s only when we begin to look at the column statistics that things start to look less healthy.

Listing 8 –View Aggregated Global column statistics 5 (AGGREGATION.SQL)

SQL> select a.column_name, a.num_distinct,

2 display_raw(a.low_value,b.data_type) as low_val,

3 display_raw(a.high_value,b.data_type) as high_val, b.data_type

4 from user_tab_col_statistics a, user_tab_cols b

5 where a.table_name='TEST_TAB1'

6 and a.table_name=b.table_name

7 and a.column_name=b.column_name

8 order by a.column_name;

COLUMN_NAME NUM_DISTINCT LOW_VAL HIGH_VAL DATA_TYPE

-------------------------- ------------ -------------------- -------------------- -------------------

REPORTING_DATE 6 20110201 20110206 NUMBER

SEQ_ID 14 400 3000000 NUMBER

SOURCE_SYSTEM 5 DALLAS SYDNEY VARCHAR2

STATUS 4 P U VARCHAR2

5 This script uses Greg Rahn’s DISPLAY_RAW function to display Column High and Low values in a human-readable form. See

References.


SQL>

SQL> select a.column_name, a.partition_name, a.num_distinct,


3 display_raw(a.high_value,b.data_type) as high_val,

4 b.data_type

5 from user_part_col_statistics a, user_tab_cols b




9 and a.partition_name='P_20110206'

10 order by a.column_name, a.partition_name;

COLUMN_NAME PARTITION_NAME NUM_DISTINCT LOW_VAL HIGH_VAL

DATA_TYPE

-------------------------- ---------------- ------------ -------------------- -------------------- --

------------------

REPORTING_DATE P_20110206 1 20110206 20110206

NUMBER

SEQ_ID P_20110206 6 600 3000000

NUMBER

SOURCE_SYSTEM P_20110206 2 DALLAS MOSCOW

VARCHAR2

STATUS P_20110206 3 P U

VARCHAR2

At first glance the table-level statistics look reasonably accurate. For example, the STATUS column has the correct High Value of

'U', which has just appeared in the last set of rows that were inserted. Based on what I've seen to date, Oracle does accurately update the High/Low column values and row counts when generating aggregated statistics but closer investigation highlights a problem. According to the column statistics, there are 5 distinct SOURCE_SYSTEMs and 4 distinct STATUSes in the table, but that's not true.


Listing 9 –True number of distinct values in test data (AGGREGATION.SQL)

SQL> select source_system, count(*) from test_tab1 group by source_system;

SOURCE_SYSTEM COUNT(*)

------------------------------ ----------

MOSCOW 10

LONDON 1

SYDNEY 2

DALLAS 3

SQL>

SQL> select status, count(*) from test_tab1 group by status;

S COUNT(*)

- ----------

U 1

P 15

As far as the CBO is concerned when looking at these statistics there are more distinct values for each of these columns in the table and the Number of Distinct Values (NDV) is one of the most important inputs to the CBO cardinality calculations.

For example, if I execute a query to return all of the rows with STATUS=’U’ there should be one row returned but if I include a

/*+ gather_plan_statistics */ hint along with DBMS_XPLAN.DISPLAY_CURSOR to view actual row source execution statistics I can see the discrepancy between the optimisers estimated cardinalities (E-Rows) and the actual cardinalities (A-Rows).

Listing 10 –Incorrect cardinality estimate arising from inaccurate NDV (AGGREGATION.SQL)

SQL> select /*+ gather_plan_statistics */ *

2 from test_tab1

3 where status='U';

REPORTING_DATE SOURCE_SYSTEM SEQ_ID S

-------------- ------------------------------ ---------- -

20110206 MOSCOW 240000 U

SQL>

SQL> select * from TABLE(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------

-------------------------------

SQL_ID 6fnb7kqfhx69j, child number 0

------------------------------------- select /*+ gather_plan_statistics */ * from test_tab1 where status='U'

Plan hash value: 2028438416

-------------------------------------------------------------------------------------------

| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |

-------------------------------------------------------------------------------------------

| 1 | PARTITION RANGE ALL| | 1 | 4 | 1 |00:00:00.01 | 312 |

| 2 | PARTITION LIST ALL| | 8 | 4 | 1 |00:00:00.01 | 312 |

|* 3 | TABLE ACCESS FULL| TEST_TAB1 | 32 | 4 | 1 |00:00:00.01 | 312 |

-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

3 - filter("STATUS"='U')

19 rows selected.


The optimiser expects there to be 4 rows returned (E-Rows=4) but only 1 row is actually returned (A-Rows=1). How did the optimiser arrive at that estimate? Dividing the number of rows in the table (16) by the number of distinct values in the table for

STATUS (4) gives an estimated 4 rows returned. Oracle uses the table-level statistics because we are retrieving data from more than one partition.

The estimated cardinality if the NDV had been the correct value of 2 would have been even more inaccurate. Oracle assumes that with two distinct values of STATUS then we would be retrieving 8 rows instead of 1, but that is a data skew issue that might more sensibly be solved with the use of a histogram. Regardless of the effectiveness of the true NDV in this example versus the approximate NDV generated by the aggregation process, NDV inaccuracies with such a small number of rows and only a few distinct values indicates how significant a problem this could become with larger volumes of data.

Why aren't the values accurate? Well let's compare High/Low values to Number of Distinct Values (NDV). When we gathered statistics on the new subpartition, we had access to the previous High/Low values at the table level. Here are the column statistics before stats were gathered on the new subpartition.

Listing 11 –Original Table Column Statistics (AGGREGATION.SQL)


-------------------------- ------------ -------------------- -------------------- -------------


SEQ_ID 9 400 30000 NUMBER


STATUS 1 P P VARCHAR2

As the aggregated statistics are updated, Oracle has the current low and high values at all levels of the object and adjusting those based on the underlying statistics is straightforward. The highest value in the new subpartition statistics of ‘U’ is clearly higher than the existing high value at the table and partitions levels of ‘P’.

Listing 12 –Table Column Statistics after additional aggregation (AGGREGATION.SQL)


-------------------------- ------------ -------------------- -------------------- -------------


SEQ_ID 14 400 3000000 NUMBER



How does Oracle determine the new NDV? Remember, Oracle can't look at any of the data in Partitions or Subpartitions other than the one we're gathering statistics for in order to minimize the collection overhead. So it has to decide what the new NDV should be based on several inputs

1.

The actual values in STATUS for the rows in the subpartition we can look at.

2.

The previously-gathered and stored NDV for the other subpartitions.

3.

The previously-aggregated NDV stored at the table and partition levels.

The problem is that Oracle knows the number of distinct values in other subpartitions (we looked at the data previously to calculate them) but not the values themselves because the NDV calculation is a destructive process that loses information about what the distinct values are and only preserves how many there are. Without the full original range of values, how can it say whether the 2 distinct values (P and U) in this subpartition are distinct when compared to the values in the other subpartitions without looking at those subpartitions again?

In this case we might expect Oracle to be clever and realise that, as there was only one distinct value of P prior to our new subpartition and the only values in the current subpartition are P and U, that there can only be 2 distinct values. Unfortunately


that is not how global statistics maintenance works prior to Oracle 11g.

The poor NDV results of the aggregation process is probably the main reason Oracle recommends gathering both Global and

Partition statistics and does so by default if we don’t request otherwise. However some systems might have little opportunity to gather true global statistics and although aggregated statistics are a poor alternative and will lead to sub-optimal execution plans they are definitely preferable to no global statistics at all, as we’ll see shortly.

Please note, though, that if you do use the aggregation process, you must do so correctly and not fall into one of the following traps.

AGGREGATION SCENARIOS AND PITFALLS

Putting aside the fundamental weakness of the aggregation process for a while, the reality is that many systems have used it as the only viable alternative to maintaining global statistics in the past. However, there are strict requirements for the process to work correctly and the implications if you don’t ensure you meet these requirements can be dramatic.

MISSING PARTITION ST ATISTICS

To illustrate the first potential problem, I’ll continue the previous example that gathered statistics purely at the SUBPARTITION level and allowed Oracle to aggregate those to generate the Table and Partition stats. I’ll add a new partition (and because we have a subpartition template, the subpartitions are created too) much as we might during the first load of data on a new day, then I'll insert some rows into the different new subpartitions.

Listing 13 –Add partition and INSERT new data for 20110208 from Dallas then gather statistics on the relevant subpartition (AGGREGATION2.SQL)

SQL> ALTER TABLE TEST_TAB1

2 ADD PARTITION P_20110208 VALUES LESS THAN (20110209);

Table altered.

SQL> INSERT INTO TEST_TAB1 VALUES (20110208, 'DALLAS', 400, 'P');

1 row created.


1 row created.

SQL> COMMIT;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', partname => 'P_20110208_OTHERS');








-------------- ------------------ --- -------------------- ----------

TEST_TAB1 P_20110208_LONDON NO

TEST_TAB1 P_20110208_MOSCOW NO



TEST_TAB1 P_20110208_SYDNEY NO

At this stage the subpartitition statistics look acceptable for P_20110208_OTHERS however examining the Table and Partition statistics highlights a serious issue.

Listing 14 –Table and partition statistics after subpartition statistics collection (AGGREGATION2.SQL)


2 from user_tables



-------------- --- -------------------- ----------

TEST_TAB1 NO

SQL>






-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 NO 05-FEB-2011 13:11:10 0

TEST_TAB1 P_20110201 NO 05-FEB-2011 13:11:10 8

TEST_TAB1 P_20110202 NO 05-FEB-2011 13:11:10 4

TEST_TAB1 P_20110203 NO 05-FEB-2011 13:11:10 2

TEST_TAB1 P_20110204 NO 05-FEB-2011 13:11:10 4

TEST_TAB1 P_20110205 NO 05-FEB-2011 13:11:10 2

TEST_TAB1 P_20110206 NO 05-FEB-2011 13:11:10 7

TEST_TAB1 P_20110207 NO 05-FEB-2011 13:11:10 0

TEST_TAB1 P_20110208 NO

9 rows selected.

The Table level statistics have disappeared as though they have been deleted or were never gathered in the first place! There are apparently no statistics on the new P_20110208 partition either. The underlying cause is that Oracle will only aggregate statistics when all of the underlying components have stats that can be aggregated. At this stage, P_20100208_OTHERS is the only subpartition of P_20100208 that I have gathered stats for so in the absence of stats for P_20110208_LONDON,

P_20110208_MOSCOW and P_20110208_SYDNEY, no aggregated Partition statistics will be set for P_20110208 and as those are missing, the Table level aggregated statistics are missing too.

Missing statistics is one of the more challenging situation for the optimiser and, prior to Oracle 9.2 could lead to terrible plans based on built-in optimiser default values are unlikely to reflect your data accurately. Current versions of Oracle will try to solve the problem by generating statistics using Dynamic Sampling unless this has been explicitly disabled. For example, if I run a query that must use the table statistics, we can see that Dynamic Sampling takes place because of the default dynamic_sampling setting of 2.

Listing 15 – Table-wide query with missing Table and Partition statistics (AGGREGATION2.SQL)

SQL> show parameter dynamic_sampling optimizer_dynamic_sampling integer 2


SQL> select /*+ gather_plan_statistics */ *

2 from test_tab1

3 where source_system='DALLAS';

20110201 DALLAS 400 P

20110204 DALLAS 700 P

20110206 DALLAS 600 P

20110208 DALLAS 400 P

20110208 DALLAS 600 P

SQL> select * from TABLE(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

SQL_ID 544krkd2kvxat, child number 0

------------------------------------- select /*+ gather_plan_statistics */ * from test_tab1 where source_system='DALLAS'

Plan hash value: 3088684340

----------------------------------------------------------------------------------------------

| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |

----------------------------------------------------------------------------------------------

| 1 | PARTITION RANGE ALL | | 1 | 9 | 5 |00:00:00.01 | 100 |

| 2 | PARTITION LIST SINGLE| | 9 | 9 | 5 |00:00:00.01 | 100 |

|* 3 | TABLE ACCESS FULL | TEST_TAB1 | 9 | 9 | 5 |00:00:00.01 | 100 |

----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

3 - filter("SOURCE_SYSTEM"='DALLAS')

Note

-----

- dynamic sampling used for this statement

However, although Oracle has come up with a reasonable plan that assumed 9 rows would be returned rather than the actual 5 rows, dynamic sampling consumes additional resources, is difficult to spot without looking at detailed execution plans and will not work as effectively on larger and more complex data sets. Therefore any missing statistics on components of partitioned objects should be treated very seriously.

The only defence against missing statistics resulting from the aggregation process is to design your statistics collection very carefully if you depend on aggregation.

Let's look at the statistics once the other data feeds complete (additional rows are

INSERTed into the other partitions in aggregation2.sql) and we gather the rest of the subpartition statistics.

Listing 16 – Gather remaining subpartition statistics (AGGREGATION2.SQL)

SQL> -- Simulate other loads finishing later

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', partname => 'P_20110208_LONDON');


SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', partname => 'P_20110208_SYDNEY');


SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', partname => 'P_20110208_MOSCOW');



2 from user_tables




-------------- --- -------------------- ----------

TEST_TAB1 NO 05-FEB-2011 13:11:15 37






-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 NO 05-FEB-2011 13:11:10 0

TEST_TAB1 P_20110201 NO 05-FEB-2011 13:11:10 8

TEST_TAB1 P_20110202 NO 05-FEB-2011 13:11:10 4

TEST_TAB1 P_20110203 NO 05-FEB-2011 13:11:10 2

TEST_TAB1 P_20110204 NO 05-FEB-2011 13:11:10 4

TEST_TAB1 P_20110205 NO 05-FEB-2011 13:11:10 2

TEST_TAB1 P_20110206 NO 05-FEB-2011 13:11:10 7

TEST_TAB1 P_20110207 NO 05-FEB-2011 13:11:10 0

TEST_TAB1 P_20110208 NO 05-FEB-2011 13:11:15 10

9 rows selected.

That looks much better. The best approach in this particular case is probably to gather statistics on all of the new subpartitions as soon as they are created. Even though the subpartitions will be empty initially and will need stats gathered again immediately after they are loaded, the initial collection will be very cheap and minimise the likelihood of missing statistics leading to aggregation problems.

INABILITY TO REPLACE TRUE GLOBAL STATISTICS

To illustrate the next potential aggregation problem, I’ll revert to the status of TEST_TAB1 after listings 1, 2, 3 and 4. i.e. The table has been recreated, populated with initial data and statistics have been gathered using the default granularity of AUTO.

The table has global statistics at the Table, Partition and Subpartition levels. This is a sensible default starting point and it’s worth noting how likely a situation this is if the table has been created and then the statistics have been updated by Oracle’s automatic statistics gathering job.

Imagine next that we are about to insert more data for 20110206 from the MOSCOW source system. As we are introducing new data, we will also regather the statistics.

Listing 17 – Insert data for 20110206/MOSCOW and update statistics 6 (AGGREGATION2.SQL)


1 row created.


1 row created.


1 row created.


6 In the AGGREGATION2.SQL script, I’ve inserted a PAUSE point to simulate the delay between the initial population and the subsequent arrival of the new data.


1 row created.


1 row created.

SQL> COMMIT;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION',

PARTNAME => 'P_20110206_MOSCOW');











I can see that the statistics for the P_20110206_MOSCOW subpartition have been updated because the LAST_ANALYZED time is a little later. How do the Table and Partition statistics look?

Listing 18 – Table and Partition Statistics after subsequent data load (AGGREGATION2.SQL)


2 from user_tables


TEST_TAB1 YES 09-FEB-2011 02:45:17 11

SQL>





TEST_TAB1 P_20110131 YES 09-FEB-2011 02:45:17 0

TEST_TAB1 P_20110201 YES 09-FEB-2011 02:45:17 4

TEST_TAB1 P_20110202 YES 09-FEB-2011 02:45:17 2

TEST_TAB1 P_20110203 YES 09-FEB-2011 02:45:17 1

TEST_TAB1 P_20110204 YES 09-FEB-2011 02:45:17 2

TEST_TAB1 P_20110205 YES 09-FEB-2011 02:45:17 1

TEST_TAB1 P_20110206 YES 09-FEB-2011 02:45:17 1

TEST_TAB1 P_20110207 YES 09-FEB-2011 02:45:17 0

TEST_TAB1 P_20110208 YES 09-FEB-2011 02:45:17 0

9 rows selected.

Glance at these statistics too quickly and they might appear to be correct but there are a couple of possible concerns

1.

The LAST_ANALYZED time hasn’t changed and reflects the initial data population time.

2.

The Table and P_20110206 NUM_ROWS don’t appear to have changed.

The reason is that DBMS_STATS will not replace true Global Statistics (GLOBAL=YES) with the results of the aggregation process.

Perhaps this is sensible because it will not replace carefully gathered global statistics with approximate, aggregated statistics and allows you to combine different collection granularities as part of an overall strategy. For example

1.

Initial GRANULARITY=> AUTO to ensure Global Statistics are good at all levels.


2.

GRANULARITY=>SUBPARTITION and individual PARTNAME to ensure fresh statistics on subpartitions.

3.

Subsequent gather with GRANULARITY=>AUTO when there is sufficient resource availability and the impact on application users is minimal to improve Global Statistics that have become stale without introducing inaccurate aggregated NDVs.

However, for this to work you need to have decided that this is the strategy you will use and apply it consistently because I’ve seen a number of cases recently that indicate a lack of understanding of the way the aggregation process works leading to

Global Statistics that are months and years out of date because part 3 of the above process has been neglected. Worse still are the systems I’ve seen where this has caused problems that have been solved temporarily by gathering statistics with a manual one-off execution at AUTO or DEFAULT granularity, only for the problems to gradually reappear as the Global Statistics are effectively fixed and become more and more stale over time.

_MINIMAL_STATS_AGGREGATION

The examples so far have been based on INSERTing new rows into subpartitions and although that's an approach that might be used for some systems, the likelihood is that on modern DW systems that are constantly receiving new data feeds you will be using partition exchange at some point. I'd say that there are two distinct approaches you are likely to use.

1) Create a temporary load table, load it with data, gather statistics on it and then exchange it with the relevant subpartition in the target table. The advantage here is that the primary statistics gathering activity can take place during the load phase and when the load table is subsequently exchanged with the correct subpartition of the target table, the new statistics will appear.

2) Create a temporary load table, load it with data, exchange it with the relevant subpartition and then gather statistics on the subpartition. The potential problem here is that if it takes some time to gather the subpartition statistics then there will be a time-lag between the data appearing and the relevant statistics.

I think that 1) appears to be the most sensible approach but can lead to difficulties caused by the hidden

_minimal_stats_aggregation parameter. This was introduced in Oracle 8i, not long after the initial introduction of Partitioning.

The default setting is a potentially confusing TRUE, which means that Oracle minimises automatic statistics aggregation activity by disabling some activities. Changing this to FALSE re-enables those activities.

First of all I'll recreate TEST_TAB1 as it was at the end of Listing 1 (i.e. before any statistics have been gathered). Then I’ll add a new partition (and, by implication, the related subpartitions) and create a seperate new table that I'll load the data into.

Listing 19 – Add new partition and create load table (PARTEX_MSA.SQL)



Table altered.

SQL> DROP TABLE LOAD_TAB1;

Table dropped.

SQL> CREATE TABLE LOAD_TAB1

2 AS SELECT * FROM TEST_TAB1 WHERE 1=0;

Table created.

SQL> CREATE UNIQUE INDEX LOAD_TAB1_IX1 ON LOAD_TAB1

2 (REPORTING_DATE, SOURCE_SYSTEM, SEQ_ID)

3 NOPARALLEL COMPRESS 1;


Index created.

For each of the different source systems, I will INSERT data to LOAD_TAB1, gather statistics on it and then exchange it with the relevant subpartition of TEST_TAB1. (The following listing only shows the activities for the MOSCOW subpartition, but these are repeated for the others, using the same LOAD_TAB1.)

Listing 20 – Load data and gather stats on load table and then exchange partition for MOSCOW (PARTEX_MSA.SQL)

SQL> TRUNCATE TABLE LOAD_TAB1;

Table truncated.

SQL> INSERT INTO LOAD_TAB1 VALUES (20110209, 'MOSCOW', 400, 'P');

1 row created.


1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20110209, 'MOSCOW', 900, 'Z');

1 row created.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1', granularity => 'GLOBAL', method_opt => 'FOR ALL COLUMNS SIZE 1', estimate_percent => 1);


SQL> ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20110209_MOSCOW WITH TABLE load_tab1;

Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110209_MOSCOW REBUILD UNUSABLE LOCAL INDEXES;

Table altered.

Having repeated the process for the different subpartitions, all of the P_20100209 subpartitions have statistics that were swapped in as part of the partition exchange operation

Listing 21 – List subpartition statistics (PARTEX_MSA.SQL)







-------------- ------------------ --- -------------------- ----------

TEST_TAB1 P_20110209_LONDON NO 09-FEB-2011 17:59:17 3

TEST_TAB1 P_20110209_MOSCOW NO 09-FEB-2011 17:59:16 3

TEST_TAB1 P_20110209_OTHERS NO 09-FEB-2011 17:59:17 3

TEST_TAB1 P_20110209_SYDNEY NO 09-FEB-2011 17:59:17 3

The Subpartition statistics look good but we also need to check the Table and Partition statistics.

Listing 22 – List table and partition statistics (PARTEX_MSA.SQL)


2 from user_tables




-------------- --- -------------------- ----------

TEST_TAB1 YES 09-FEB-2011 17:58:51 11






-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 YES 09-FEB-2011 17:58:51 0

TEST_TAB1 P_20110201 YES 09-FEB-2011 17:58:51 4

TEST_TAB1 P_20110202 YES 09-FEB-2011 17:58:51 2

TEST_TAB1 P_20110203 YES 09-FEB-2011 17:58:51 1

TEST_TAB1 P_20110204 YES 09-FEB-2011 17:58:51 2

TEST_TAB1 P_20110205 YES 09-FEB-2011 17:58:51 1

TEST_TAB1 P_20110206 YES 09-FEB-2011 17:58:51 1

TEST_TAB1 P_20110207 YES 09-FEB-2011 17:58:51 0

TEST_TAB1 P_20110209 NO

9 rows selected.

The original global statistics on the Table are present but don’t appear to have been updated to reflect the new data but that makes sense because as they are true global statistics they wouldn’t be replaced by the aggregation process. More puzzling is the absence of statistics on P_20110209. Global statistics were not gathered on this new partition but all of the partitions subpartitions have valid statistics so I would expect the aggregation process to aggregate statistics up to the Partition level.

Aggregation hasn’t occurred because it has been disabled by the default setting of TRUE for _mimimal_stats_aggregation. That instructs Oracle to minimise aggregation operations and one of the ways it does so is not to aggregate statistics as a result of a partition exchange operation but to leave you to do that manually by gathering stats on the target table partition. If we were to modify the parameter to a non-default value 7 , we would see different behaviour. I ran the same script, but with this small addition that changes the parameter setting at the session level.

Listing 23 – Modify _minimal_stats_aggregation (PARTEX_NON_MSA.SQL)

SQL> -- Note - this is a non-default parameter value

SQL> alter session set "_minimal_stats_aggregation"=FALSE;

Session altered.

Which will change the end result to :-

Listing 24 – List table and partition statistics (PARTEX_NON_MSA.SQL)


2 from user_tables



-------------- --- -------------------- ----------

TEST_TAB1 YES 09-FEB-2011 17:58:51 11

SQL>





7 Because _minimal_stats_aggregation is a hidden parameter, it should only really be set under the guidance of Oracle Support or at your own risk.



-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 YES 09-FEB-2011 17:58:51 0

TEST_TAB1 P_20110201 YES 09-FEB-2011 17:58:51 4

TEST_TAB1 P_20110202 YES 09-FEB-2011 17:58:51 2

TEST_TAB1 P_20110203 YES 09-FEB-2011 17:58:51 1

TEST_TAB1 P_20110204 YES 09-FEB-2011 17:58:51 2

TEST_TAB1 P_20110205 YES 09-FEB-2011 17:58:51 1

TEST_TAB1 P_20110206 YES 09-FEB-2011 17:58:51 1

TEST_TAB1 P_20110207 YES 09-FEB-2011 17:58:51 0

TEST_TAB1 P_20110209 NO 09-FEB-2011 18:11:42 12

9 rows selected.

Note that the Table-level global statistics have still not been updated by the aggregation process because they are true global statistics but that there are aggregated statistics on the P_20100209 partition, because all of the relevant subpartitions do have statistics. Unfortunately we have introduced a combination of aggregated and true global statistics which is likely to invite new problems if we don’t gather true global statistics on the P_20110209 partition later.

MISSING GLOBAL STATISTICS

Another problem encountered in the past year was when migrating one of our main databases to version 11.2.0.1 and restructuring a large partitioned table at the same time. Our approach was to gather the statistics at the Partition level, one partition at a time so that when all partitions had statistics, they would be aggregated up to the table level as approximate global statistics 8 . DBMS_STATS was taking approximately three minutes to gather statistics on each partition where we'd normally expect it to take a few seconds on the previous system. The two main symptoms were

1.

Many executions of the following statement.

SELECT /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size,

MINIMUM, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln

FROM hist_head$

WHERE obj# = :1 AND intcol# = :2

2.

DML row locks on WRI$_OPTSTAT_HISTHEAD_HISTORY.

On deciding that the problem might be caused because the table had global statistics on the original database and not on the new one, we gathered Global statistics using a very small sample size (less than 1%) to ensure the job would complete. Once the global statistics were in place the partition statistics gathering performance reflected the few seconds that we expected. We had to regather Global Statistics once all the Partition stats were in place and then we'd have to implement something to keep the Global Statistics up to date on a regular basis.

ALTERNATIVE STRATEGIES

It should be clear by now that those responsible for maintaining statistics on VLDB are faced with a constant battle to ensure that statistics are of a high enough quality to result in reasonable execution plans but can be gathered at a bearable cost. The solutions described so far suffer from either being unworkable or generating low-quality, inaccurate statistics. This section looks at alternative strategies to try to solve both problems.

8 At this point you might wonder why we didn't just use 11g new features and gather statistics at the default granularity and sample size. We did try that several times, but gave up after many hours of running.


DYNAMIC SAMPLING

As a solution to the pervasive problem of missing statistics caused by inadequate statistics collection processes, Oracle 9.2 introduced the capability to gather statistics dynamically at the SQL statement optimisation stage. With the default dynamic_sampling setting of 2, Oracle will perform dynamic sampling when one or more tables accessed in the SQL statement have no statistics, including missing partition statistics. We have seen this process occur already (See ‘Missing Partition

Statistics’ above.) If Oracle is able to perform dynamic sampling in order to generate statistics for the objects being accessed, then why bother gathering statistics at all, with all the hard work that entails?

Although on the face of it, Dynamic Sampling might seem the ideal solution, it has disadvantages

1) Collection cost is now paid each time a query is optimized. Although this might appear to be a low cost relative to overall resource consumption of typical data warehouse queries, it is being paid repeatedly and generates a recurring load on the system.

2) Because of the recurring cost, the collection needs to be fairly basic so the quality of the resulting statistics is likely to be lower. For example, the number of blocks sampled at dynamic_sampling=2 is a mere 64. This can be increased which will in turn increase the collection cost.

Dynamic Sampling should be considered a last resort for situations where correct statistics can’t be collected or haven’t been collected in error.

SETTING STATISTICS

A common approach taken by some sites supporting very large databases is to avoid the time and resources spent gathering object statistics by setting statistics directly. It is worth reconsidering why we are gathering statistics in the first place.



To describe database objects and data to the optimiser by setting values in the data dictionary (e.g. Number of Rows,

Number of Distinct Values, etc.)



To help the optimiser identify the optimal execution plans

In the unlikely situation that we know the precise content of specific objects at all times then why trawl through the tables and indexes examining the content when we could just set the relevant values in the data dictionary manually? DBMS_STATS has several suitable procedures. (e.g. GET_COLUMN_STATS, SET_COLUMN_STATS and SET_TABLE_STATS.)

We could set the statistics for specific tables, indexes or partitions manually, lock the statistics and choose not to gather statistics again for those objects if their content is unlikely to change again. This is fast and consumes minimal resources.

Knowledge of your system and data are an essential requirement although I would suggest that it's not as important that the statistics are absolutely precise as that they describe the data accurately enough to lead to optimal plans.

There are a couple of cases where this might prove particularly useful.



We add a new partition each month and so the partition starts out empty and fills up gradually over the course of the month. However, this is likely to lead to fluctuating execution plans as the contents of the partition change day by day.

The CBO is designed to calculate new plans based on changing data distribution but that also implies plan instability which can lead to painful, unpredictable system performance. By setting statistics manually we can 'pretend' that we start off the month with a full partition and have the optimiser evaluate plans based on that fixed assumption.




We add new partitions very frequently and have very little time to gather statistics before users are likely to report against the data. It might prove useful to set some approximate synthetic stats quickly to get reasonable plans and then possibly gather more detailed statistics later when time allows. However, the change from the synthetic statistics to properly gathered statistics may also lead to plan instability.

SETTING STATISTICS SUMMARY

Positives



Very fast and low resource usage method for setting statistics on new partitions.



Potential improvements to plan stability when accessing time-period partitions that start off empty and are filled over time.

Negatives



Does not eliminate the difficulty in maintaining accurate Global Statistics although these could be set directly.



Detailed knowledge of system and data profile is essential and choosing the most appropriate values for new partitions may prove particularly difficult on some systems.



Significant code needs to be developed, particularly when working with composite partitioned objects, which increases the possibility of human error.

COPYING STATISTICS

An extension of the concept of setting statistics for new partitions directly was introduced in Oracle 10.2.0.4

9 .

DBMS_STATS.COPY_TABLE_STATS is designed to achieve a similar effect to setting statistics manually at a similarly low cost but with a little more intelligence. For example, when adding a new partition for each new REPORTING_DATE on our example table, maybe it's reasonable to assume that the data volumes and distribution of values are similar to the previous REPORTING_DATE?

If so, why not avoid setting fixed statistics based on fixed assumptions about the contents of a partition, but copy the statistics from the most recent partition to the new partition or from the previous Monday or 15 th of the month if the data follows a periodic pattern? The inclusion of the SCALING_FACTOR parameter could also help with growing data volumes.

Here is an example of how this might look, based on my example table. After the initial creation of the table (listings 1 and 2), the table, partition and subpartition statistics look like this.

Listing 25 – List table and partition statistics (DEFAULT.SQL) Need to update after previous sections


2 from user_tables



-------------- --- -------------------- ----------

TEST_TAB1 YES 11-FEB-2011 22:22:20 11


9 Although there is evidence of COPY_TABLE_STATS being available in previous versions, it is not supported until 10.2.0.4






-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 YES 11-FEB-2011 22:22:20 0

TEST_TAB1 P_20110201 YES 11-FEB-2011 22:22:20 4

TEST_TAB1 P_20110202 YES 11-FEB-2011 22:22:20 2

TEST_TAB1 P_20110203 YES 11-FEB-2011 22:22:20 1

TEST_TAB1 P_20110204 YES 11-FEB-2011 22:22:20 2

TEST_TAB1 P_20110205 YES 11-FEB-2011 22:22:20 1

TEST_TAB1 P_20110206 YES 11-FEB-2011 22:22:20 1

TEST_TAB1 P_20110207 YES 11-FEB-2011 22:22:20 0

8 rows selected.

Next I add a new partition (and the related subpartitions using the subpartition template) to contain rows with a

REPORTING_DATE of 20100211, then empty LOAD_TAB1 and insert some appropriate rows into it before using partition exchange to load the data into the target TEST_TAB1.

Listing 26 – Add partition and load data for MOSCOW and 20110211 (COPY_STATS.SQL)



Table altered.


Table truncated.

SQL> exec dbms_stats.delete_table_stats('TESTUSER', 'LOAD_TAB1');



1 row created.


1 row created.


1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20110211, 'MOSCOW', 10000, 'N');

1 row created.


1 row created.


1 row created.


1 row created.


1 row created.



1 row created.


1 row created.

SQL> COMMIT;

Commit complete.


Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110211_MOSCOW REBUILD UNUSABLE LOCAL

INDEXES;

Table altered.

There are 10 rows in the new P_20110211_MOSCOW subpartition and no data in the other P_20110211 subpartitions. Next I copy the statistics from the P_20100201 partition into the new partition. i.e. I am performing a Partition-level copy.

Listing 27 – Copy Statistics from P_20110201 partition to P_20110211 and check results (COPY_STATS.SQL)

SQL> exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201', dstpartname => 'P_20110211');


SQL>


2 from user_tables



-------------- --- -------------------- ----------

TEST_TAB1 YES 12-FEB-2011 10:16:54 11

SQL>






-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 YES 12-FEB-2011 10:16:54 0

TEST_TAB1 P_20110201 YES 12-FEB-2011 10:16:54 4

TEST_TAB1 P_20110202 YES 12-FEB-2011 10:16:54 2

TEST_TAB1 P_20110203 YES 12-FEB-2011 10:16:54 1

TEST_TAB1 P_20110204 YES 12-FEB-2011 10:16:54 2

TEST_TAB1 P_20110205 YES 12-FEB-2011 10:16:54 1

TEST_TAB1 P_20110206 YES 12-FEB-2011 10:16:54 1

TEST_TAB1 P_20110207 YES 12-FEB-2011 10:16:54 0

TEST_TAB1 P_20110211 YES 12-FEB-2011 10:16:54 4

9 rows selected.




4 and partition_name IN ('P_20110201', 'P_20110211')



-------------- ------------------ --- -------------------- ----------






TEST_TAB1 P_20110211_LONDON NO

TEST_TAB1 P_20110211_MOSCOW NO

TEST_TAB1 P_20110211_OTHERS NO

TEST_TAB1 P_20110211_SYDNEY NO

8 rows selected.

By performing a Partition-level copy of statistics from P_20110201 to P_20110211 I have new statistics on P_20110211 but have no statistics on the new Subpartitions yet. However, that is easily solved by performing four additional copies at the

Subpartition-level for each of the Subpartitions. Also, note that the global statistics at the Table level have not been updated so copying stats at the Partition and Subpartition levels does not eliminate the requirement to maintain global statistics. That is just the beginning of the flaws in the copy_table_stats process.

COPY_TABLE_STATS BUGS AND PATCHES

One of the barriers to more rapid uptake of COPY_TABLE_STATS has been the number of different related bugs and one-off patches required. Based on my own experience and discussions with others, this is true of DBMS_STATS in general over recent releases as much work has gone into improving statistics collection options, but seems particularly relevant to

COPY_TABLE_STATS. I will make the assumption that we are using a minimum 10.2.0.4 patchset with no one-off patches required as this is the first reasonable version to use and the version used for the examples so far.

Subpartition issues

When I executed the partition-level copy in Listing 27 I actually cheated a little and executed the script against a 10.2.0.5 database. Had I executed it on an unpatched 10.2.0.4 database, the output would have looked like this.

Listing 28 – COPY_STATS bug number 8318020 (COPY_STATS.SQL)

SQL> exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname =>

'P_20110201', dstpartname => 'P_20110211');

BEGIN dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname =>

'P_20110201', dstpartname => 'P_20110211'); END;

*

ERROR at line 1:

ORA-06533: Subscript beyond count

ORA-06512: at "SYS.DBMS_STATS", line 17408

ORA-06512: at line 1

This is caused by bug number 8318020 10 and only applies to composite partitioned objects (i.e. those with subpartitions) when copying statistics at the Partition level. To fix this bug, I’d recommend one of the following three options, all of which will fix the bug. For my examples, I used option 2.

1.

Apply Merge Label Request (MLR) 8866627 on top of 10.2.0.4, which also includes other related bug fixes

2.

Apply patch set 10.2.0.5

10 The information on My Oracle Support is confusing as there are a number of apparently unrelated issues which are described as being due to bug number 8318020. Regardless, there are so many issues that it’s prudent to apply suitable patches in advance rather than uncovering more!


3.

Upgrade to 11.2.0.2

A workaround is to only copy statistics at the Subpartition level as in the next example but then we would need to maintain statistics at the Partition level using the traditional gathering process.

High/Low Partition Key Values

This is probably the best known of the statistics copying bugs in Oracle 10.2.0.4. I will add a new P_20110212 paritition to the existing table and copy statistics from the P_20110201 partition. This time I perform the copy for each of the subpartitions at the Subpartition level to avoid the previous bug.

Listing 29 – List table and partition statistics (COPY_STATS.SQL)



Table altered.


Table truncated.

SQL> exec dbms_stats.delete_table_stats('TESTUSER', 'LOAD_TAB1');



1 row created.


1 row created.


1 row created.

SQL> INSERT INTO LOAD_TAB1 VALUES (20110212, 'MOSCOW', 10000, 'N');

1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


SQL> COMMIT;

Commit complete.


Table altered.

SQL> ALTER TABLE test_tab1 MODIFY SUBPARTITION

INDEXES;

P_20110212_MOSCOW REBUILD UNUSABLE LOCAL

Table altered.


'P_20110201_OTHERS', dstpartname => 'P_20110212_OTHERS');



'P_20110201_MOSCOW', dstpartname => 'P_20110212_MOSCOW');


SQL>


'P_20110201_LONDON', dstpartname => 'P_20110212_LONDON');



'P_20110201_SYDNEY', dstpartname => 'P_20110212_SYDNEY');


How do the resulting partition and subpartition statistics look?

Listing 30 – List table and partition statistics (COPY_STATS.SQL)





TABLE_NAME PARTITION_NAME GLO LAST_ANAL NUM_ROWS

-------------- ---------------- --- --------- ----------

TEST_TAB1 P_20110131 YES 12-FEB-11 0

TEST_TAB1 P_20110201 YES 12-FEB-11 4

TEST_TAB1 P_20110202 YES 12-FEB-11 2

TEST_TAB1 P_20110203 YES 12-FEB-11 1

TEST_TAB1 P_20110204 YES 12-FEB-11 2

TEST_TAB1 P_20110205 YES 12-FEB-11 1

TEST_TAB1 P_20110206 YES 12-FEB-11 1

TEST_TAB1 P_20110207 YES 12-FEB-11 0

TEST_TAB1 P_20110211 YES 12-FEB-11 4

TEST_TAB1 P_20110212 NO

10 rows selected.




4 and partition_name IN ('P_20110201', 'P_20110212')


TABLE_NAME SUBPARTITION_NAME GLO LAST_ANAL NUM_ROWS

-------------- ------------------ --- --------- ----------

TEST_TAB1 P_20110201_LONDON YES 12-FEB-11 1

TEST_TAB1 P_20110201_MOSCOW YES 12-FEB-11 1

TEST_TAB1 P_20110201_OTHERS YES 12-FEB-11 1

TEST_TAB1 P_20110201_SYDNEY YES 12-FEB-11 1


TEST_TAB1 P_20110212_LONDON YES 12-FEB-11 1

TEST_TAB1 P_20110212_MOSCOW YES 12-FEB-11 1

TEST_TAB1 P_20110212_OTHERS YES 12-FEB-11 1

TEST_TAB1 P_20110212_SYDNEY YES 12-FEB-11 1

8 rows selected.

There are no statistics on the new P_20110212 partition because we didn’t copy statistics at the partition level but the subpartition statistics appear to have been copied correctly. However, when we drill down into the column statistics for the new subpartitions there is a clear problem.

Listing 31 – List column statistics for P_20110201 and P_20110212 subpartitions (COPY_STATS.SQL)

SQL> select a.column_name, a.subpartition_name, a.num_distinct,


3 display_raw(a.high_value,b.data_type) as high_val

4 from dba_subpart_col_statistics a, dba_tab_cols b




8 and (a.subpartition_name LIKE 'P_20110201%'

9 or a.subpartition_name LIKE 'P_20110212%')

10 order by a.column_name, a.subpartition_name;

COLUMN_NAME SUBPARTITION_NAME NUM_DISTINCT LOW_VAL HIGH_VAL

-------------------------- ------------------ ------------ -------------------- --------------------

REPORTING_DATE P_20110201_LONDON 1 20110201 20110201

REPORTING_DATE P_20110201_MOSCOW 1 20110201 20110201

REPORTING_DATE P_20110201_OTHERS 1 20110201 20110201

REPORTING_DATE P_20110201_SYDNEY 1 20110201 20110201

REPORTING_DATE P_20110212_LONDON 1 20110201 20110201

REPORTING_DATE P_20110212_MOSCOW 1 20110201 20110201

REPORTING_DATE P_20110212_OTHERS 1 20110201 20110201

REPORTING_DATE P_20110212_SYDNEY 1 20110201 20110201

SEQ_ID P_20110201_LONDON 1 500 500

SEQ_ID P_20110201_MOSCOW 1 1000 1000

SEQ_ID P_20110201_OTHERS 1 400 400

SEQ_ID P_20110201_SYDNEY 1 700 700

SEQ_ID P_20110212_LONDON 1 500 500

SEQ_ID P_20110212_MOSCOW 1 1000 1000

SEQ_ID P_20110212_OTHERS 1 400 400

SEQ_ID P_20110212_SYDNEY 1 700 700

SOURCE_SYSTEM P_20110201_LONDON 1 LONDON LONDON

SOURCE_SYSTEM P_20110201_MOSCOW 1 MOSCOW MOSCOW

SOURCE_SYSTEM P_20110201_OTHERS 1 DALLAS DALLAS

SOURCE_SYSTEM P_20110201_SYDNEY 1 SYDNEY SYDNEY

SOURCE_SYSTEM P_20110212_LONDON 1 LONDON LONDON

SOURCE_SYSTEM P_20110212_MOSCOW 1 MOSCOW MOSCOW

SOURCE_SYSTEM P_20110212_OTHERS 1 DALLAS DALLAS

SOURCE_SYSTEM P_20110212_SYDNEY 1 SYDNEY SYDNEY

STATUS P_20110201_LONDON 1 P P

STATUS P_20110201_MOSCOW 1 P P

STATUS P_20110201_OTHERS 1 P P

STATUS P_20110201_SYDNEY 1 P P

STATUS P_20110212_LONDON 1 P P

STATUS P_20110212_MOSCOW 1 P P

STATUS P_20110212_OTHERS 1 P P

STATUS P_20110212_SYDNEY 1 P P

32 rows selected.

As we might expect, Oracle has merely copied the high and low values for each column from the P_20110201 subpartitions to the equivalent P_20110212 subpartitions. How effective this is likely to be will be system and data-dependant and is an important consideration when choosing whether copying statistics is the best solution. For example, the copy process works well for SOURCE_SYSTEM but perhaps not for the OTHERS list partition that can store multiple SOURCE_SYSTEMs if the range of

SOURCE_SYSTEMS is not static.


However, there is one area where we could reasonably expect the copy process to show a bit more intelligence. I have highlighted the most relevant rows. Despite the fact that Oracle has more than enough information to determine the Low and

High Values of the REPORTING_DATE partition key column for the new subpartitions because there is a hard limit on the values of REPORTING_DATE that can be stored in these subpartitions, it has simply copied those values from the P_20110201 partition.

This issue is also covered by bug 8318020 so can be addressed by any of the three methods described in the previous section.

For example, if I run the previous example on a 10.2.0.5 database, the partition key high and low values are updated intelligently.


SQL> select a.column_name, a.partition_name, a.num_distinct,



4 from user_part_col_statistics a, user_tab_cols b




8 and a.partition_name IN ('P_20110201', 'P_20110211')

9 order by a.column_name, a.partition_name;

COLUMN_NAME PARTITION_NAME NUM_DISTINCT LOW_VAL HIGH_VAL

-------------------------- ---------------- ------------ -------------------- --------------------

REPORTING_DATE P_20110201 1 20110201 20110201

REPORTING_DATE P_20110211 1 20110208 20110212

SEQ_ID P_20110201 4 400 1000

SEQ_ID P_20110211 4 400 1000

SOURCE_SYSTEM P_20110201 4 DALLAS SYDNEY

SOURCE_SYSTEM P_20110211 4 DALLAS SYDNEY

STATUS P_20110201 1 P P

STATUS P_20110211 1 P P

8 rows selected.

The observant might have noticed, however, that I have listed partition statistics here and not the subpartition statistics we’ve been working with so far in this example. The simple reason is that, even in 10.2.0.5 and 11.2.0.1, copying subpartition statistics does not update the high and low value for the range partition key that is implicitly defined by the parent partition. Here is a sample of the results after copying subpartition statistics.

Listing 33 – Sample of subpartition column statistics after Subpartition-level copy (COPY_STATS.SQL)

SQL> select a.column_name, a.subpartition_name, a.num_distinct,



4 from dba_subpart_col_statistics a, dba_tab_cols b




8 and (a.subpartition_name LIKE 'P_20110201%'

9 or a.subpartition_name LIKE 'P_20110212%')

10 order by a.column_name, a.subpartition_name;

COLUMN_NAME SUBPARTITION_NAME NUM_DISTINCT LOW_VAL HIGH_VAL

-------------------------- ------------------ ------------ -------------------- --------------------





ORA-07445 / ORA-03113 while copying List partition statistics


The next bug occurs when copying statistics between List partitions (or, in our case, subpartitions) If I run script from listing 29 on an Oracle 10.2.0.5 or 11.2.0.1 database, I see different results when I copy the statistics at the Subpartition-level.

Listing 34 – Sample of subpartition column statistics after Subpartition-level copy (COPY_STATS.SQL)

SQL> exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201_MOSCOW', dstpartname => 'P_20110212_MOSCOW');


SQL> exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201_LONDON', dstpartname => 'P_20110212_LONDON');


SQL> exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201_SYDNEY', dstpartname => 'P_20110212_SYDNEY');


SQL> exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201_OTHERS', dstpartname => 'P_20110212_OTHERS');

BEGIN dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname =>

'P_20110201_OTHERS', dstpartname => 'P_20110212_OTHERS'); END;

*

ERROR at line 1:

ORA-03113: end-of-file on communication channel

My session is disconnected when I copy the final P_20110212 subpartition statistics or at least that is how it appears. If I trace the session, I can see the point of failure and that the underlying error is an ORA-07445 in qospMinMaxPartCol

.


PARSING IN CURSOR #44 len=38 dep=1 uid=0 oct=7 lid=0 tim=1242214568228618 hv=624276838 ad='a03acac0' delete from ind_online$ where obj#= :1

2END OF STMT

PARSE #44:c=0,e=43,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1242214568228606

EXEC #44:c=0,e=141,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=1242214568228961

STAT #44 id=1 cnt=0 pid=0 pos=1 obj=0 op='DELETE IND_ONLINE$ (cr=3 pr=0 pw=0 time=101 us)'

STAT #44 id=2 cnt=0 pid=1 pos=1 obj=731 op='TABLE ACCESS FULL IND_ONLINE$ (cr=3 pr=0 pw=0 time=66 us)'


FETCH #7:c=0,e=158,p=0,cr=12,cu=0,mis=0,r=1,dep=1,og=1,tim=1242214568229851





Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x0,

PC: [0x1a1baf7, qospMinMaxPartCol()+3135]

*** 2010-04-23 14:01:57.888 ksedmp: internal or fatal error

ORA-07445: exception encountered: core dump [qospMinMaxPartCol()+3135]

[SIGSEGV] [Address not mapped to object] [0x000000000] [] []

Current SQL statement for this session:

BEGIN dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1',

srcpartname => 'P_20100209_OTHERS', dstpartname => 'P_20100211_OTHERS'); END;

----- PL/SQL Call Stack -----

object line object

handle number name

0xa0625b88 830 package body SYS.DBMS_STATS

0xa0625b88 18292 package body SYS.DBMS_STATS

0x97cbbe60 1 anonymous block

----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value)

-------------------- -------- -------------------- ----------------------------


ksedst()+31 call ksedst1() 000000000 ? 000000001 ?

2B3BC0FACD50 ? 2B3BC0FACDB0 ?

2B3BC0FACCF0 ? 000000000 ? ksedmp()+610 call ksedst() 000000000 ? 000000001 ?


2B3BC0FACCF0 ? 000000000 ? ssexhd()+629 call ksedmp() 000000003 ? 000000001 ?


2B3BC0FACCF0 ? 000000000 ?

__restore_rt()+0 call ssexhd() 00000000B ? 2B3BC0FADD70 ?

2B3BC0FADC40 ? 2B3BC0FACDB0 ?

2B3BC0FACCF0 ? 000000000 ? qospMinMaxPartCol() signal __restore_rt() 0068986E0 ? 000000000 ?

+3135 000000001 ? 000000000 ?

000000000 ? 0000000C2 ? spefcmpa()+687 call qospMinMaxPartCol() 7FFF31FFEDA0 ? 2B3BC13AABB0 ?

000000000 ? 2B3BC13AAB70 ?

000000000 ? 2B3BC13AAB30 ? spefmccallstd()+218 call spefcmpa() 2B3BC13AAA90 ? 2B3BC13AAAE0 ?

2B3BC129E500 ? 2B3BC13AAB70 ?

000000000 ? 2B3BC13AAB30 ? pextproc()+41 call spefmccallstd() 7FFF31FFEE00 ? 7FFF31FFEA80 ?

When I investigated whether this was still an error in 11.2 so that I might file a Service Request with Oracle Support, I soon discovered the real reason for this is not an error in a background automatic aggregation process (as I initially suspected because it occurs when the final subpartition statistics are copied) but because the P_20110212_OTHERS subpartition is a

DEFAULT list subpartition. Recall our Subpartition Template





SUBPARTITION OTHERS VALUES (DEFAULT) TABLESPACE TEST_DATA )

This is bug number 10268597 - ORA-7445 (qospMinMaxPartCol) [ID 1288540.1] in 10.2.0.5 and 11.2.0.2 which is marked as fixed in 11.2.0.3 and 12.1.0.0 so all current installations will encounter this bug unless Oracle provide a backport one-off patch.

COPY_TABLE_STATS SUMMARY

After my initial excitement about the introduction of COPY_TABLE_STATS as a method to simplify the copying of statistics to future partitions and include some intelligence to the process, I find myself underwhelmed .

Positives

 Very fast and low resource method for setting statistics on new partitions



Potential improvements to plan stability when accessing time-period partitions that are filled over time.

Negatives



Many bugs and related patches although this can be largely resolved by using 10.2.0.5 or 11.2



Does not eliminate the difficulty in maintaining accurate Global Statistics. Whilst copying statistics is not designed to solve that problem, collecting partition and subpartition statistics is a much smaller part of the overall statistics maintenance challenge because the volume of data accessed to maintain statistics on individual partitions is much smaller and does not tend to grow over time as much as full table scans to gather global statistics.




Does not work well with composite partitioned tables. Regardless of whether Oracle feel people should use subpartition statistics or not, why support copying statistics between subpartitions whilst o Not setting the High and Low Values of the parent Range partition key columns. o Not copying subpartition statistics when the parent partition statistics are copied or at least offering the option

 Does not work in current releases with List Partitioning where there is a DEFAULT partition

APPROX_GLOBAL AND PARTITION

Although COPY_TABLE_STATS does not help with Global Statistics maintenance, Oracle introduced another new feature to 10.2 that might. The new GRANULARITY option is ‘APPROX_GLOBAL AND PARTITION’ which is similar to ‘GLOBAL AND PARTITION’ but uses the aggregation process to generate Approximate Global Statistics and therefore does not need to perform a full table scan. Note, however that it does perform full scans of any indexes to maintain true global statistics on them. There are several interesting notes in the DBMS_STATS package specification.

-- 'APPROX_GLOBAL AND PARTITION' - This option is similar to

-- 'GLOBAL AND PARTITION'. But the global statistics are aggregated

-- from partition level statistics. It will aggregate all statistics

-- except number of distinct values for columns and number of distinct

-- keys of indexes.

-- The existing histograms of the columns at the table level

-- are also aggregated.The global statistics are gathered

-- (i.e., going back to GLOBAL AND PARTITION behaviour)

-- if partname argument is null or if the aggregation cannot be done

-- e.g., statistics of one of the partitions is missing.

-- This option is useful when you collect statistics for a new

-- partition added into a range partitioned table (for example,

-- a table partitioned by month).

-- The new data in the partition makes the global statistics stale

-- (especially the min/max values of the partitioning column).

-- This stale global statistics may cause suboptimal plans.

-- In this scenario, users can collect statistics for the newly added

-- partition with 'APPROX_GLOBAL AND PARTITION' option so that the

-- global statistics will reflect the newly added range.

-- This option will take less time than 'GLOBAL AND PARTITION' option

-- since the global statistics are aggregated from underlying

-- partition level statistics.

-- Note that, if you are using APPROX_GLOBAL AND PARTITION,

-- you still need to collect global statistics (with granularity =

-- 'GLOBAL' option)

-- when there is substantial amount of change at the table level.

-- For example you added 10% more data to the table.

-- This is needed to get the correct number of distinct values/keys

-- statistic at table level.

I’ll be running the rest of the examples against an Oracle 11.2.0.1 database, where APPROX_GLOBAL AND PARTITION is a valid option although, for reasons that will become apparent, this option is only an appropriate solution for Oracle 10.2 databases.

First the table is truncated, rows inserted for reporting_date 20110201 and statistics gathered using the default GRANULARITY

=> ‘ALL’ which will result in true Global Statistics on all levels of TEST_TAB1.

Listing 36 – List table and partition statistics (APPROX_GLOBAL2.SQL)

SQL> TRUNCATE TABLE TEST_TAB1;

Table truncated.

SQL> exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1')




1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20110201, 'LONDON', 5000, 'U');

1 row created.


1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20110201, 'SYDNEY', 7000, 'P');

1 row created.


1 row created.

SQL> COMMIT;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1');



2 from user_tables



-------------- --- -------------------- ----------

TEST_TAB1 YES 06-FEB-2011 19:26:50 11






-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 YES 06-FEB-2011 19:26:50 0


TEST_TAB1 P_20110201 YES 06-FEB-2011 19:26:50 11

TEST_TAB1 P_20110202 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110203 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110204 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110205 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110206 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110207 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110209 YES 06-FEB-2011 19:26:50 0

9 rows selected.







-------------- ------------------ --- -------------------- ----------














-------------------------- ------------ -------------------- -------------------- -------------





The statistics all look as we might expect so next I’ll insert rows for the new reporting_date 20110202 and gather statistics using the new GRANULARITY => ‘APPROX GLOBAL AND PARTITION’ for the single P_20110202 partition.


SQL> INSERT INTO TEST_TAB1 VALUES (20110202, 'CHICAGO', 1000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20110202, 'CHICAGO', 2000, 'U');

1 row created.


1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20110202, 'CHICAGO', 60000, 'N');

1 row created.


1 row created.


1 row created.



1 row created.


1 row created.


1 row created.


1 row created.


1 row created.

SQL> COMMIT;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity => 'APPROX_GLOBAL AND PARTITION', partname => 'P_20110202' );


Looking at the updated statistics below, it would appear that the row counts on the table have been updated accurately and statistics on the 20110202 partition look correct.



2 from user_tables



-------------- --- -------------------- ----------

TEST_TAB1 YES 06-FEB-2011 19:26:50 22






-------------- ---------------- --- -------------------- ----------

TEST_TAB1 P_20110131 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110201 YES 06-FEB-2011 19:26:50 11

TEST_TAB1 P_20110202 YES 06-FEB-2011 19:26:50 11

TEST_TAB1 P_20110203 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110204 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110205 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110206 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110207 YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110209 YES 06-FEB-2011 19:26:50 0

9 rows selected.

However, as might be expected from the parameter value passed, subpartition statistics have not been gathered, just as they would not have been with GRANULARITY=>’GLOBAL AND PARTITION’ used to gather partition statistics and true global statistics.

Listing 39 – List subpartition statistics (APPROX_GLOBAL2.SQL)







TABLE_NAME SUBPARTITION_NAME GLO LAST_ANALYZED NUM_ROWS

-------------- ------------------ --- -------------------- ----------

TEST_TAB1 P_20110202_LONDON YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110202_MOSCOW YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110202_OTHERS YES 06-FEB-2011 19:26:50 0

TEST_TAB1 P_20110202_SYDNEY YES 06-FEB-2011 19:26:50 0

The statistics on the subpartitions are those gathered by the initial GRANULARITY=>’ALL’ default and show zero rows. If we are working with composite partitioned objects and have decided that subpartition statistics are valuable so are gathering them or, in this case, Oracle has gathered them via GRANULARITY => ‘ALL’ then we need to make additional calls to DBMS_STATS with

GRANULARITY => ‘SUBPARTITION’ for each of the underlying subpartitions we’ve just loaded with data. This seems like an acceptable overhead to maintain the global statistics at the higher levels of the object. As always, let’s check the column statistics to verify the quality of those statistics.

Listing 40 – List approximate global column statistics (APPROX_GLOBAL2.SQL)










-------------------------- ------------ -------------------- -------------------- -------------



SOURCE_SYSTEM 4 CHICAGO SYDNEY VARCHAR2

STATUS 2 N U VARCHAR2

Although the column High and Low values have been updated correctly, the NDVs are incorrect again. SOURCE_SYSTEM should show NUM_DISTINCT of 5 and STATUS has three distinct values. These Approximate Global Statistics suffer precisely the same issue as those generated by the aggregation process. Row counts and High/Low values are correct but Oracle simply can’t derive NDVs accurately by looking at the NDV of a single partition.

APPROX_GLOBAL AND PARTITION SUMMARY

What, then, does APPROX_GLOBAL AND PARTITION actually deliver? Very little on the face of it although I suppose it works as a better combination with updating stale global statistics occasionally because there are unlikely to be the aggregation issues when combining true global statistics with aggregated statistics, as highlighted earlier. It should be clear that APPROX GLOBAL

AND PARTITION used in isolation does not solve the challenge of accurate global statistics.

11G IMPROVEMENTS

INCREMENTAL STATISTICS


Recognising that maintaining Global Statistics is the most formidable challenge for those managing large partitioned objects,

Oracle has implemented a more efficient solution in Oracle 11g – Incremental Statistics. The difficulty in maintaining accurate global statistics is that the default process is i.

Perform full table scan to gather global statistics. ii.

Gather statistics on modified partitions.

As tables grow to hundreds of millions or billions of rows, i. becomes so expensive as to be unworkable and, as we have seen, the aggregation process to generate Approximate Global Statistics yields very inaccurate NDVs. Oracle 11g resolves these issues by storing synopses in the data dictionary describing the distinct column values contained in the various levels of the table (e.g.

Table and Partition). Synposes can be merged to generate aggregated synposes at the higher levels of the object so that the process for regathering statistics for new partitions becomes a much more efficient i.

Gather statistics and create synopses for the changed partitions only ii.

Oracle automatically merges partition level synopses into a global synopsis iii.

The global statistics are automatically derived from the partition level statistics and global synopses

Storing and processing of synopses is controlled via the INCREMENTAL table preference. This is set to ‘FALSE’ by default so the database uses the historic approach of performing a full table scan to maintain global statistics. To use incremental statistics, all of the following have to be true.

1.

The INCREMENTAL value for the partitioned table is TRUE.

2.

The PUBLISH value for the partitioned table is the default TRUE.

3.

The user specifies AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT and AUTO for GRANULARITY when gathering statistics on the table. (Note that these are the default values.)

Requirement 3 is likely to strike fear into the experienced Oracle heart because Oracle’s earlier attempts at AUTO_SAMPLE_SIZE have typically been time-consuming, resource-intensive and difficult to predict. However, 11g uses one-pass distinct sampling 11 that dramatically improves the sampling process so that much higher percentages can be sampled at a much lower cost. It has become sensible to leave ESTIMATE_PERCENT at its default setting of AUTO_SAMPLE_SIZE unless faced with extreme performance problems gathering statistics.

First I will set the INCREMENTAL table preference for TEST_TAB1, repopulate with some intial data and gather statistics using default values for all other options.

Listing 41 – Modify TEST_TAB1 to use incremental statistics, repopulate amd gather statistics (INCREMENTAL.SQL)

SQL> TRUNCATE TABLE TEST_TAB1;

Table truncated.

SQL> exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1')


SQL> exec dbms_stats.set_table_prefs('TESTUSER','TEST_TAB1','INCREMENTAL','TRUE')

PL/SQL procedure successfully completed.


11 This process is described in great detail in Amit Poddar’s excellent paper on One pass distinct sampling (see Appendix A)


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20110201, 'LONDON', 5000, 'U');

1 row created.


1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20110201, 'SYDNEY', 7000, 'P');

1 row created.


1 row created.

SQL> COMMIT;

Commit complete.




2 from user_tables



-------------- --- -------------------- ----------

TEST_TAB1 YES 13-FEB-2011 11:31:06 11










-------------------------- ------------ -------------------- -------------------- -------------------






As expected, the statistics are global and the column statistics look reasonable but the real challenge arrives as we start to add data to the P_20110202 partition.

Listing 42 – Add 20110202 data, regather statistics and list Table-level statistics (INCREMENTAL.SQL)


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.


1 row created.

SQL> COMMIT;

Commit complete.




2 from dba_tables


4 and owner='TESTUSER'

5 order by 1, 2, 4 desc nulls last;


-------------- --- -------------------- ----------

TEST_TAB1 YES 13-FEB-2011 11:31:07 22











-------------------------- ------------ -------------------- -------------------- -------------------

-




STATUS 3 N U VARCHAR2

Pleasingly, the NDVs finally look accurate and repeating the process for further partitions illustrates that all of the Global

Statistics are being maintained accurately at a very low cost 12 .

12 For a nice illustration of just how effectively costs are reduced, see Greg Rahn’s blog post on Incremental Statistics at http://structureddata.org/2008/07/16/oracle-11g-incremental-global-statistics-on-partitioned-tables/


Listing 43 – Global Statistics with P20110203 and P_20110204 data versus true contents (INCREMENTAL.SQL)


2 from user_tables



-------------- --- -------------------- ----------

TEST_TAB1 YES 13-FEB-2011 11:31:08 44










-------------------------- ------------ -------------------- -------------------- -------------------

-




STATUS 4 N Z VARCHAR2

SQL> select reporting_date, count(*) from test_tab1 group by reporting_date;

REPORTING_DATE COUNT(*)

-------------- ----------

20110201 11

20110202 11

20110203 11

20110204 11

4 rows selected.

SQL> select source_system, count(*) from test_tab1 group by source_system;

SOURCE_SYSTEM COUNT(*)

------------------------------ ----------

MOSCOW 8

LONDON 1

SYDNEY 1

CHICAGO 22

DALLAS 12

5 rows selected.

SQL> select status, count(*) from test_tab1 group by status;

S COUNT(*)

- ----------

U 8

P 27

N 7

Z 2

4 rows selected.

SQL> select seq_id, count(*) from test_tab1 group by seq_id;

SEQ_ID COUNT(*)

---------- ----------

100 4

1000 4

4000 4

5000 4

1 1

400 1


60000 3

300 4

2000 4

80000 1

8000 3

6000 4

10 4

7000 3

14 rows selected.

The combination of Incremental Statistics and One-pass Distinct Sampling appears to deliver on the promise of a solution to low-cost global statistics maintenance for large partitioned objects that receive regular updates. However, there are a number of issues to be aware of before using Incremental Statistics.

INCREMENTAL STATISTICS BUGS, ISSUES AND USAGE

SYSAUX tablespace usage

The documentation states clearly that the synopsis information will consume space in the SYSAUX tablespace but this could mean very large data volumes. For example, bug 8411774 was filed for a customer who was concerned that 6GB of their 12GB

SYSAUX Tablespace was being consumed by synopses 13 . These are stored in two main tables - WRI$_OPTSTAT_SYNOPSIS_HEAD$ and WRI$_OPTSTAT_SYNOPSIS$. Oracle’s response to the bug is ‘This is not a bug in case INCREMENTAL is set to TRUE on

partitioned tables. It works as designed’.

We can investigate the contents of these tables to see what is being stored using a query courtesy of Robin Moffat and his blog post describing a significant Incremental Statistics performance issue at http://rnm1978.wordpress.com/2010/12/31/datawarehousing-and-statistics-in-oracle-11g-incremental-global-statistics/

13 Note that this is not the same as the issue of optimiser statistics history consuming space in the SYSAUX tablespace, which can be resolved by reducing the default statistics history retention period of 31 days.


Listing 44 – Viewing Synopses

SQL> SELECT u.NAME "owner",

o.NAME "table_name",

p.subname "partition_name",

c.NAME "column_name",

to_char(h.analyzetime, 'YYYY-MM-DD-HH24:MI:SS') "analyse_Time",

COUNT(*) "hash entries"

FROM sys.wri$_optstat_synopsis_head$ h left join sys.wri$_optstat_synopsis$ s

ON h.synopsis# = s.synopsis#

left join sys.obj$ o ON h.bo# = o.obj#

left join sys.user$ ON o.owner# = u.user#

left join sys.col$ ON h.bo# = c.obj# AND h.intcol# = c.intcol#

left join (SELECT bo#, obj# FROM sys.tabpart$

UNION ALL

SELECT bo#, obj# FROM sys.tabcompart$) tp

ON h.bo# = tp.bo# AND h.group# = tp.obj# * 2

left join sys.obj$ p ON tp.obj# = p.obj#

GROUP BY u.NAME, o.NAME, p.subname, c.NAME, h.analyzetime

ORDER BY u.NAME, o.NAME, c.NAME; owner table_name partition_name column_name analyse_Time hash entries

---------- -------------- ---------------- ---------------- ------------------- ------------

TESTUSER TEST_TAB1 P_20110131 REPORTING_DATE 1

TESTUSER TEST_TAB1 P_20110201 REPORTING_DATE 2011-02-13-13:37:27 1









TESTUSER TEST_TAB1 P_20110131 SEQ_ID 1

TESTUSER TEST_TAB1 P_20110201 SEQ_ID 2011-02-13-13:37:27 11









TESTUSER TEST_TAB1 P_20110131 SOURCE_SYSTEM 1

TESTUSER TEST_TAB1 P_20110201 SOURCE_SYSTEM 2011-02-13-13:37:27 4









TESTUSER TEST_TAB1 P_20110131 STATUS 1

TESTUSER TEST_TAB1 P_20110201 STATUS 2011-02-13-13:37:27 2









2011-02-13-12:10:01 7

41 rows selected.

As Robin says “Looking at the data on WRI$_OPTSTAT_SYNOPSIS$, it can be determined that for every table, partition, and column, there is an entry on the header table WRI$_OPTSTAT_SYNOPSIS_HEAD$. There is a unique synopsis number given to


each combination, which has one or many entries on the synopsis hash table WRI$_OPTSTAT_SYNOPSIS$. There seems to be one hash for every distinct value in the table/partition/column”

Armed with this knowedge, you should be able to calculate the synopsis storage requirements based on the number of tables, partitions and columns you want to gather Incremental Statistics for and how many distinct values are likely to be contained in your data. I’d argue that the space requirements are acceptable for the significant benefits on what is already a large database, were it not for the next issue.

Performance issues with large synposes

As Robin describes in his post, the main issue is the impact that large synopsis tables have on the performance of statistics collection. He shows an example of a very small table (a couple of partitions and a couple of rows) that should take less than a second to gather statistics on but rises to over three minutes when executed on a database that already has a large number of rows in the synopsis tables for other, unrelated tables. Most of the three minutes is spent on the following delete statement.

DELETE /*+ use_nl(h s) leading(h s)index(s i_wri$_optstat_synopsis) */

FROM WRI$_OPTSTAT_SYNOPSIS$ S

WHERE S.SYNOPSIS# IN (SELECT /*+ unnest */ H.SYNOPSIS#

FROM SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ H

WHERE H.BO# = :B2 AND H.GROUP# = :B1 * 2 )

As Greg Rahn points out in his comments on the post, this is likely to be a bug in Oracle 11.1.0.7 and although the issue still exists in 11.2.0.1 I believe from discussion with Robin that it could be a result of the WRI$OPTSTAT_SYNPOSIS$ table not being partitioned, which it is in 11.2.0.2. I have not tested whether this will resolve the performance problem.

Partition Exchange

As discussed earlier, there are a couple of options when using partition exchange to load data.

1.

Gather statistics on the load table before partition exchange

2.

Gather statistics on the partition after partition exchange

When using Incremental Statistics, you must use approach 2 because Oracle can’t use the values in the load table to merge synopses to update the synposes at higher levels.

Global Index Statistics

Although Incremental Statistics means that we no longer need to perform a full table scan to update global statistics as Oracle is able to merge the partition-level synposes to update the table-level synopses, Index statistics are handled differently. Oracle will perform a full scan of the index in order to maintain the global statistics.

INCREMENTAL STATISTICS SUMMARY


I feel confident saying that the introduction of Incremental Statistics alongside One-pass Distinct Sampling is the solution to maintaining statistics on large partitioned objects that users have been looking for. Once any lingering performance issues are ironed out I expect it to become the recognized default approach for partitioned tables.

Positives

 Very low resource usage and elapsed time for maintaining global statistics.

 Almost accurate NDVs.

Negatives



Requirement to modify any existing gather-and-exchange statistics collection code to exchange-then-gather.



Implementing for existing large tables will require an initial gather of global statistics that involves a full table scan and synopsis generation. This is a price worth paying but needs to be planned.

CONCLUSIONS

Maintaining accurate statistics on partitioned objects is a significant challenge as many designers, developers and architects of large Oracle databases would testify. The fundamental trade-off is between time and resources spent collecting statistics and the quality of the results. Statistics quality has a profound impact on the cost calculations of the optimiser so poor statistics often lead to bad execution plans and extremely poor performance.

The variety of improvements Oracle has developed to address the trade-off complicate the picture and without a clear strategy it is extremely easy to create new problems while trying to address existing ones. The best strategy for one system could be completely different to another so this section offers guidelines and directions rather than prescriptive solutions.

Summary of Issues



The Number of Distinct Values calculated by the aggregation process is very poor and NDVs are a key input to the CBO calculations. Only properly gathered global statistics are likely to result in consistently good execution plans.



DBMS_STATS will only aggregate statistics when all of the underlying components have statistics. If there are any missing statistics, the existing aggregated statistics will also go missing, leading to dynamic sampling.



DBMS_STATS will never overwrite gathered global statistics with aggregated global statistics unless you are using

‘APPROX_GLOBAL AND PARTITION’. You must maintain any true global statistics that you have gathered or they will become stale rapidly.



The default setting of _minimal_stats_aggregation means that, unless you explicitly gather statistics on the partitions you've just exchanged, aggregation will not take place. Using aggregation and a gather-then-exchange approach requires you to use a non-default _minimal_stats_aggregation setting that is something you must discuss with Oracle

Support first.



You must use an exchange-then-gather approach if you are using Incremental Statistics because Oracle won’t maintain synopses on the load table.




There are a number of COPY_TABLE_STATS bugs that are fixed in later releases and one or two that remain unfixed, particularly for composite or list partitioned tables.



APPROX_GLOBAL statistics still suffer from poor aggregated NDVs.

Guidelines and tips



Checking that statistics have been gathered correctly includes looking at the detailed results. It is far too easy to look at

NUM_ROWS and LAST_ANALYZED and satisfy yourself that statistics are in place. To check the quality of those statistics, it’s essential to look at High and Low Values and NDVs and at all levels of the partitioned object.



USER_TAB_STATS_HISTORY can be useful in determining which levels statistics were gathered on and when.



Pay particular attention to the GLOBAL_STATS column that indicates whether the statistics are true or aggregated global statistics.



DBMS_STATS is constantly evolving. It is therefore important to be on the latest versions of Oracle where possible and to be aware of the various patches that should be applied where not.



If you have developed your own wrapper code for your statistics gathering processes you should always gather statistics using the same code. Manual statistics gathering exercises are likely to introduce inappropriate DBMS_STATS parameter settings due to human error.



Once you have set your statistics appropriately you should lock them to prevent them being regathered in error in future either by manual intervention or auto stats gather jobs. Your statistics management code can unlock any statistics that need to change subsequently.



Because bad or missing statistics can cause such serious performance problems, it may be worth implementing scheduled jobs to check that statistics are being gathered correctly according to your overall approach and are in place.



Statistics collection takes time and resources but setting the DEGREE parameter to use parallel execution can make run times more bearable on modern servers with spare capacity.

Recommendations



Use Oracle’s default settings wherever possible. Despite my own bad experiences over the years with default settings causing performance problems, I think there is too strong a tendency to implement specific site or individual solutions based on past experiences. DBMS_STATS is constantly evolving. However, if default settings do cause performance issues, I’d have no hesitation in filing a Service Request and then implementing an alternative.



Decide on an overall statistics collection strategy, develop any required PL/SQL code and scheduled jobs and stick to it.

Redesign it later if necessary but don’t chop and change between strategies on a whim or through emergency manual intervention. The root cause of most of the worst cases that I’ve seen of performance problems caused by missing or out-of-date statistics has been a combination of approaches caused by those unfamiliar with a system. For example, manually gathering statistics on an object using DBMS_STATS options that are incompatible with the regular statistics collection approach or inadvertently having both the auto stats gather and site-specific processes enabled.



Collecting accurate global statistics can be an expensive proposition but is essential to achieve optimal execution plans.


o If using Oracle 11g you should definitely try Incremental Statistics although ensure that you have all of the relevant patches applied and the best solution is likely to be the 11.2.0.2 patchset. o If using Oracle 10g or below then you should gather at all levels with GRANULARITY=> ‘AUTO’ if at all possible o If that is not possible due to time or resource constraints, a combination of slightly stale Global statistics and up to date Partition statistics is probably preferable to Aggregated Global statistics which implies an approach of gathering statistics as new partitions appear (GRANULARITY=>’PARTITION’ with specific PARTNAME) along with an occasional gather at GRANULARITY=>’GLOBAL’ level to refresh the global statistics. o As the poorest option, use the aggregation process in the knowledge that the global NDVs will be inaccurate and that it’s very easy to misuse the aggregation process.



Picking the right sampling percentage o If time allows, use AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT in 11g because it will deliver very high quality statistics at a reasonably low cost using one-pass distinct sampling. o Use AUTO_SAMPLE_SIZE where time allows on previous versions. o Use lower ESTIMATE_PERCENT values if essential to get some statistics on an object but be aware that as you reduce the sampling percentage, the quality of the statistics will degrade.



Don’t use Dynamic Sampling as the core component of your statistics collection strategy.



Maintaining statistics on Composite Partitioned objects is slightly trickier and DBMS_STATS support can be a little patchy. I wouldn’t suggest designing your physical data model based on this but consider whether subpartitions are delivering any benefit and whether the statistics on them are actually being used.



Although part of me can understand the resistance to applying new patch-sets because of the additional testing, change management and outages required, I think it's asking for trouble to be on older patch-sets if you're using newer features such as COPY_TABLE_STATS or Incremental Statistics. For example, as far as I'm aware COPY_TABLE_STATS is not truly supported in 10.2.0.3 but I've seen a number of people using it.

Developing an effective statistics maintenance strategy can be a very time-consuming process that requires careful design and testing across large data volumes and a variety of queries. Working within modern development cycles it can be difficult to justify the time required without obvious business functionality being delivered but I would argue that nothing is as likely to deliver the businesses performance expectations across a wide range of queries, including those yet to be developed that will require less manual tuning effort. Fortunately Oracle is starting to introduce features that make the technical challenges less formidable.

ACKNOWLEDGEMENTS

I’d like to thank the following individuals

 Fellow travelers on the long and winding road to stats nirvana – Jari Kuhanen, Vadim Gotovsky and Steven Holder (all

UBS), Randolf Geist, Martin Widlake and Jonathan Lewis.



Robin Moffat for agreeing to let me share the information and queries from his Incremental Statistics performance blog post (see References).



Greg Rahn (Oracle), for the handy display_raw function and consistently good stats-related posts.




Maria Colgan (Oracle), for sanity-checking several of the blog posts that spawned the idea for this paper.

ABOUT THE AUTHOR

Doug Burns is an independent consultant who has over 20 years experience working with Oracle in a range of industries and applications and has worked as a course instructor and technical editor for both Oracle UK and Learning Tree International. He can be contacted at dougburns@yahoo.com and this document, other articles and his blog can be found at http://oracledoug.com

. He is a member of the Oak Table Network and an Oracle ACE Director.


APPENDIX A - ADDITIONAL RESOURCES

DOCUMENTATION AND SUPPORT NOTES

Managing Optimizer Statistics (11.2) http://download.oracle.com/docs/cd/E11882_01/server.112/e16638/stats.htm

There are also a number of useful resources on My Oracle Support

236935.1 – Global Statistics – An Explanation

44961.1 - Statistics Gathering; Frequency and Strategy

BLOGS

Some of the best information about DBMS_STATS and optimiser behaviour can be found in blog postings because these are likely to delve into specific narrow issues in more detail. http://blogs.oracle.com/optimizer - The home of the Optimizer Development Group so posts all tend to be focused on statistics or optimiser behaviour and the quality is uniformly excellent. http://structureddata.org

– The home of Greg Rahn, a performance engineer working in Oracle’s Real World Performance

Group. Although the range of subjects is wider than the optimiser, there are many useful related posts there, including the following post that includes the DISPLAY_RAW function I have used throughout the examples. http://structureddata.org/2008/03/26/choosing-an-optimal-stats-gathering-strategy/ http://oracledoug.com/serendipity - The author’s blog, including the statistics-related posts that originally prompted this paper.

Individual Posts

Performance issue with missing Global Statistics http://mwidlake.wordpress.com/2010/02/16/stats-need-stats-to-gather-stats/ http://optimizermagic.blogspot.com/2009/02/maintaining-statistics-on-large.html

http://blogs.oracle.com/optimizer/2010/08/dynamic_sampling_and_its_impact_on_the_optimizer.html

http://askdba.org/weblog/2009/06/dbms_stats-copy_table_stats-does-not-alter-lowhigh-value/ http://antognini.ch/2008/10/granularity-approx_global-and-partition/ http://martincarstenbach.wordpress.com/2010/06/07/incrementally-gathering-statistics-in-10g/ http://optimizermagic.blogspot.com/2009/08/understanding-dbmsstatssetprefs.html

http://blogs.oracle.com/optimizer/2011/01/gathering_optimizer_statistics_is_one.html


WHITE PAPERS

One Pass Distinct Sampling by Amit Poddar http://www.oraclegeek.net/downloads/OnePassDistinctSampling.pdf

Upgrading from Oracle Database 10g to 11g: What to expect from the optimizer by Maria Colgan http://www.oracle.com/technetwork/database/features/bi-datawarehousing/twp-upgrading-10g-to-11g-what-to-ex-

133707.pdf

BUGS AND PATCHES

Patch 8866627: MERGE LABEL REQUEST ON TOP OF 10.2.0.4.0 FOR BUGS 8318020 7475493

This is the most important patch to apply if you are using any of the newer features. It contains fixes for around a dozen bugs and I recommend you apply this unless you are already running 10.2.0.5 or 11.2.0.x

7475493, 7116357, 7463138, 6718212, 7441785, 6526370, 8318020

Bug 8719831 - Incomplete statistics on partition level using COPY_TABLE_STATS – fixed in 11.2.0.2

Incorrect density on columns after copying stats

Bug 8318020 - Slow DBMS_STATS / Inaccurate or Missing Statistics - fixed in MLR 8866627, 10.2.0.5 and 11.2.0.1

A range of issues, some apparently unrelated

Bug 10234419 - Extend dbms_stats.copy_table_stats to all range partitioning key columns – fixed in 12.1

Copy stats does not adjust values for partition key columns other than the first column


APPENDIX B – LISTINGS

This appendix contains the full versions of the scripts executed to test the various statistics collection features. The listings in the main body of the paper use small excerpts of output when these scripts were executed.

INIT.SQL

@@defs -– Contains default column width settings etc spool init.txt

PROMPT ****************************************************

PROMPT First build the table and index and INSERT some data

PROMPT ****************************************************

DROP TABLE TEST_TAB1;

CREATE TABLE TEST_TAB1

(

REPORTING_DATE NUMBER NOT NULL,

SOURCE_SYSTEM VARCHAR2(30 CHAR) NOT NULL,

SEQ_ID NUMBER NOT NULL,

STATUS VARCHAR2(1 CHAR) NOT NULL

)

PARTITION BY RANGE (REPORTING_DATE)

SUBPARTITION BY LIST (SOURCE_SYSTEM)





SUBPARTITION OTHERS VALUES (DEFAULT) TABLESPACE TEST_DATA)

(








PARTITION P_20110207 VALUES LESS THAN (20110208) NOLOGGING NOCOMPRESS

);

CREATE UNIQUE INDEX TEST_TAB1_IX1 ON TEST_TAB1


LOCAL NOPARALLEL COMPRESS 1;












COMMIT;


DEFAULT.SQL

@@defs

@@init spool default.txt exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select index_name, last_analyzed, distinct_keys, clustering_factor, global_stats from user_indexes where table_name='TEST_TAB1'; drop index test_tab_ix1; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110201' order by subpartition_name; select a.column_name, a.num_distinct,

display_raw(a.low_value,b.data_type) as low_val,

display_raw(a.high_value,b.data_type) as high_val, b.data_type from user_tab_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name order by a.column_name; select a.column_name, a.partition_name, a.num_distinct,


display_raw(a.high_value,b.data_type) as high_val,

b.data_type from user_part_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,



b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110201%' order by a.column_name, a.subpartition_name;


AGGREGATION.SQL

@defs spool aggregation.txt

TRUNCATE TABLE TEST_TAB1; exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1')












COMMIT; exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110206' order by subpartition_name; select a.column_name, a.num_distinct,





b.data_type from user_part_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,









INSERT INTO TEST_TAB1 VALUES (20110206, 'MOSCOW', 240000, 'U');

COMMIT; exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION', PARTNAME =>

'P_20110206_MOSCOW'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110206' order by subpartition_name; select a.column_name, a.num_distinct,





b.data_type from user_part_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.partition_name='P_20110206' order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,



b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110206%' order by a.column_name, a.subpartition_name; select reporting_date, count(*) from test_tab1 group by reporting_date; select source_system, count(*) from test_tab1 group by source_system; select status, count(*) from test_tab1 group by status; select /*+ gather_plan_statistics */ * from test_tab1 where status='U'; select * from TABLE(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));










INSERT INTO TEST_TAB1 VALUES (20110201, 'CHICAGO', 4000, 'P');



COMMIT; exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110206' order by subpartition_name; select a.column_name, a.num_distinct,








b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110206%' order by a.column_name, a.subpartition_name; spool off


AGGREGATION2.SQL

@default

@aggregation spool aggregation2.txt

ALTER TABLE TEST_TAB1

ADD PARTITION P_20110208 VALUES LESS THAN (20110209); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110208' order by subpartition_name; select a.column_name, a.num_distinct,









-- Simulate one load finishing



COMMIT; exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', partname =>

'P_20110208_OTHERS'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows


from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110208' order by subpartition_name; select a.column_name, a.num_distinct,








b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110208%' order by a.column_name, a.subpartition_name; show parameter dynamic_sampling select /*+ gather_plan_statistics */ * from test_tab1 where source_system='DALLAS'; select * from TABLE(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

-- Simulate other loads finishing later




INSERT INTO TEST_TAB1 VALUES (20110208, 'MOSCOW', 10000, 'Z');




INSERT INTO TEST_TAB1 VALUES (20110208, 'LONDON', 1200, 'P'); exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', partname =>

'P_20110208_LONDON'); exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', partname =>

'P_20110208_SYDNEY'); exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', granularity => 'SUBPARTITION', partname =>

'P_20110208_MOSCOW'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1';


select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110208' order by subpartition_name; select a.column_name, a.num_distinct,










AGGREGATION3.SQL

@defs

@default

@aggregation

@aggregation2 spool aggregation3.txt













COMMIT; exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1') select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110206' order by subpartition_name;

PAUSE






COMMIT; exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'SUBPARTITION', PARTNAME =>

'P_20110206_MOSCOW'); select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110206' order by subpartition_name; select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1'


order by partition_name; select a.column_name, a.num_distinct,










PARTEX_MSA.SQL

@defs

@init

@default pause spool partex_msa.txt


ADD PARTITION P_20110209 VALUES LESS THAN (20110210);

DROP TABLE LOAD_TAB1;

CREATE TABLE LOAD_TAB1

AS SELECT * FROM TEST_TAB1 WHERE 1=0;

CREATE UNIQUE INDEX LOAD_TAB1_IX1 ON LOAD_TAB1


NOPARALLEL COMPRESS 1;

-- Note - this is the default parameter value alter session set "_minimal_stats_aggregation"=TRUE;

INSERT INTO LOAD_TAB1 VALUES (20110209, 'MOSCOW', 400, 'P');


INSERT INTO LOAD_TAB1 VALUES (20110209, 'MOSCOW', 900, 'Z'); exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1', granularity => 'GLOBAL', method_opt => 'FOR ALL

COLUMNS SIZE 1', estimate_percent => 1);

ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20110209_MOSCOW WITH TABLE load_tab1;

ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110209_MOSCOW REBUILD UNUSABLE LOCAL INDEXES;

TRUNCATE TABLE LOAD_TAB1; exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1')

INSERT INTO LOAD_TAB1 VALUES (20110209, 'LONDON', 400, 'P');


INSERT INTO LOAD_TAB1 VALUES (20110209, 'LONDON', 900, 'Z'); exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1', granularity => 'GLOBAL', method_opt => 'FOR ALL


ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20110209_LONDON WITH TABLE load_tab1;

ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110209_LONDON REBUILD UNUSABLE LOCAL INDEXES;


INSERT INTO LOAD_TAB1 VALUES (20110209, 'SYDNEY', 400, 'P');


INSERT INTO LOAD_TAB1 VALUES (20110209, 'SYDNEY', 900, 'Z'); exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1', granularity => 'GLOBAL', method_opt => 'FOR ALL


ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20110209_SYDNEY WITH TABLE load_tab1;

ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110209_SYDNEY REBUILD UNUSABLE LOCAL INDEXES;


INSERT INTO LOAD_TAB1 VALUES (20110209, 'CHICAGO', 400, 'P');


INSERT INTO LOAD_TAB1 VALUES (20110209, 'CHICAGO', 900, 'Z'); exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1', granularity => 'GLOBAL', method_opt => 'FOR ALL



ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20110209_OTHERS WITH TABLE load_tab1;

ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110209_OTHERS REBUILD UNUSABLE LOCAL INDEXES; select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110209' order by subpartition_name; select a.column_name, a.num_distinct,










PARTEX_NON_MSA.SQL

@defs

@init

@default spool partex_non_msa.txt



DROP TABLE LOAD_TAB1;

CREATE TABLE LOAD_TAB1

AS SELECT * FROM TEST_TAB1 WHERE 1=0;

CREATE UNIQUE INDEX LOAD_TAB1_IX1 ON LOAD_TAB1


NOPARALLEL COMPRESS 1;

-- Note - this is a non-default parameter value alter session set "_minimal_stats_aggregation"=FALSE;



INSERT INTO LOAD_TAB1 VALUES (20110209, 'MOSCOW', 900, 'Z'); exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1', granularity => 'GLOBAL', method_opt => 'FOR ALL



ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110209_MOSCOW REBUILD UNUSABLE LOCAL INDEXES;




INSERT INTO LOAD_TAB1 VALUES (20110209, 'LONDON', 900, 'Z'); exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1', granularity => 'GLOBAL', method_opt => 'FOR ALL


ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20110209_LONDON WITH TABLE load_tab1;

ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110209_LONDON REBUILD UNUSABLE LOCAL INDEXES;




INSERT INTO LOAD_TAB1 VALUES (20110209, 'SYDNEY', 900, 'Z'); exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1', granularity => 'GLOBAL', method_opt => 'FOR ALL


ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20110209_SYDNEY WITH TABLE load_tab1;

ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110209_SYDNEY REBUILD UNUSABLE LOCAL INDEXES;




INSERT INTO LOAD_TAB1 VALUES (20110209, 'CHICAGO', 900, 'Z'); exec dbms_stats.gather_table_stats('TESTUSER', 'LOAD_TAB1', granularity => 'GLOBAL', method_opt => 'FOR ALL


ALTER TABLE test_tab1 EXCHANGE SUBPARTITION P_20110209_OTHERS WITH TABLE load_tab1;


ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110209_OTHERS REBUILD UNUSABLE LOCAL INDEXES; select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110209' order by subpartition_name; select a.column_name, a.num_distinct,










APPROX_GLOBAL1.SQL

@defs spool approx_global1.txt









INSERT INTO TEST_TAB1 VALUES (20110201, 'LONDON', 5000, 'U');




COMMIT; exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity =>

'APPROX_GLOBAL AND PARTITION'); exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity =>

'SUBPARTITION', partname => 'P_20110201_MOSCOW'); exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity =>

'SUBPARTITION', partname => 'P_20110201_LONDON'); exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity =>

'SUBPARTITION', partname => 'P_20110201_SYDNEY'); exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity =>

'SUBPARTITION', partname => 'P_20110201_OTHERS'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110201' order by subpartition_name; select a.column_name, a.num_distinct,





b.data_type from user_part_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name


and a.partition_name='P_20110201' order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,





INSERT INTO TEST_TAB1 VALUES (20110202, 'CHICAGO', 2000, 'U');


INSERT INTO TEST_TAB1 VALUES (20110202, 'CHICAGO', 60000, 'N');









'SUBPARTITION', partname => 'P_20110202_OTHERS'); exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity =>

'SUBPARTITION', partname => 'P_20110202_MOSCOW'); exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity =>

'SUBPARTITION', partname => 'P_20110202_LONDON'); exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity =>

'SUBPARTITION', partname => 'P_20110202_SYDNEY'); select table_name, global_stats, last_analyzed, num_rows from dba_tables where table_name='TEST_TAB1' and owner='TESTUSER' order by 1, 2, 4 desc nulls last; select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110202' order by subpartition_name; select a.column_name, a.num_distinct,









b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110202%' order by a.column_name, a.subpartition_name; exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity =>

'APPROX_GLOBAL AND PARTITION', partname => 'P_20110202'); select table_name, global_stats, last_analyzed, num_rows from dba_tables where table_name='TEST_TAB1' and owner='TESTUSER' order by 1, 2, 4 desc nulls last; select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110202' order by subpartition_name; select a.column_name, a.num_distinct,








b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name


and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110202%' order by a.column_name, a.subpartition_name;


INSERT INTO TEST_TAB1 VALUES (20110203, 'DALLAS', 2000, 'U');


INSERT INTO TEST_TAB1 VALUES (20110203, 'DALLAS', 60000, 'N');









'APPROX_GLOBAL AND PARTITION', partname => 'P_20110203'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110203' order by subpartition_name; select a.column_name, a.num_distinct,










APPROX_GLOBAL2.SQL

@defs spool approx_global2.txt













COMMIT; exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110201' order by subpartition_name; select a.column_name, a.num_distinct,






















'APPROX_GLOBAL AND PARTITION', partname => 'P_20110202'); select table_name, global_stats, last_analyzed, num_rows from dba_tables where table_name='TEST_TAB1' and owner='TESTUSER' order by 1, 2, 4 desc nulls last; select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110202' order by subpartition_name; select a.column_name, a.num_distinct,





















COMMIT; exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', granularity => 'DEFAULT'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110203' order by subpartition_name; select a.column_name, a.num_distinct,














INSERT INTO TEST_TAB1 VALUES (20110204, 'CHICAGO', 10, 'Z');








'APPROX_GLOBAL AND PARTITION', partname => 'P_20110204'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110204' order by subpartition_name; select a.column_name, a.num_distinct,









SPOOL OFF


COPY_STATS.SQL

@defs

@default spool copy_stats.txt



TRUNCATE TABLE LOAD_TAB1; exec dbms_stats.delete_table_stats('TESTUSER', 'LOAD_TAB1');




INSERT INTO LOAD_TAB1 VALUES (20110211, 'MOSCOW', 10000, 'N');






INSERT INTO LOAD_TAB1 VALUES (20110211, 'MOSCOW', 700, 'Z');

COMMIT;


ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110211_MOSCOW REBUILD UNUSABLE LOCAL INDEXES; exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201', dstpartname => 'P_20110211'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name IN ('P_20110201', 'P_20110211') order by subpartition_name; select a.column_name, a.num_distinct,


display_raw(a.high_value,b.data_type) as high_val from user_tab_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name order by a.column_name; select a.column_name, a.partition_name, a.num_distinct,


display_raw(a.high_value,b.data_type) as high_val from user_part_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.partition_name IN ('P_20110201', 'P_20110211') order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,


display_raw(a.high_value,b.data_type) as high_val from dba_subpart_col_statistics a, dba_tab_cols b


where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and (a.subpartition_name LIKE 'P_20110201%' or a.subpartition_name LIKE 'P_20110211%') order by a.column_name, a.subpartition_name;



TRUNCATE TABLE LOAD_TAB1; exec dbms_stats.delete_table_stats('TESTUSER', 'LOAD_TAB1');




INSERT INTO LOAD_TAB1 VALUES (20110212, 'MOSCOW', 10000, 'N');






INSERT INTO LOAD_TAB1 VALUES (20110212, 'MOSCOW', 700, 'Z');

COMMIT;


ALTER TABLE test_tab1 MODIFY SUBPARTITION P_20110212_MOSCOW REBUILD UNUSABLE LOCAL INDEXES; exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname =>

'P_20110201_MOSCOW', dstpartname => 'P_20110212_MOSCOW'); exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname =>

'P_20110201_LONDON', dstpartname => 'P_20110212_LONDON'); exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname =>

'P_20110201_SYDNEY', dstpartname => 'P_20110212_SYDNEY'); exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname =>

'P_20110201_OTHERS', dstpartname => 'P_20110212_OTHERS'); connect TESTUSER/TESTUSER select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name IN ('P_20110201', 'P_20110212') order by subpartition_name; select a.column_name, a.num_distinct,




display_raw(a.high_value,b.data_type) as high_val from user_part_col_statistics a, user_tab_cols b


where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.partition_name IN ('P_20110201', 'P_20110212') order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,


display_raw(a.high_value,b.data_type) as high_val from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and (a.subpartition_name LIKE 'P_20110201%' or a.subpartition_name LIKE 'P_20110212%') order by a.column_name, a.subpartition_name; exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201', dstpartname => 'P_20110212'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name IN ('P_20110201', 'P_20110212') order by subpartition_name; select a.column_name, a.num_distinct,




display_raw(a.high_value,b.data_type) as high_val from user_part_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.partition_name IN ('P_20110201', 'P_20110212') order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,


display_raw(a.high_value,b.data_type) as high_val from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and (a.subpartition_name LIKE 'P_20110201%' or a.subpartition_name LIKE 'P_20110212%') order by a.column_name, a.subpartition_name; spool off


INCREMENTAL.SQL

@defs spool incremental.txt

TRUNCATE TABLE TEST_TAB1; exec dbms_stats.delete_table_stats('TESTUSER', 'TEST_TAB1') exec dbms_stats.set_table_prefs('TESTUSER','TEST_TAB1','INCREMENTAL','TRUE')












COMMIT; exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name='P_20110201' order by subpartition_name; select a.column_name, a.num_distinct,








b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110201%'


order by a.column_name, a.subpartition_name;












COMMIT; exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1'); select table_name, global_stats, last_analyzed, num_rows from dba_tables where table_name='TEST_TAB1' and owner='TESTUSER' order by 1, 2, 4 desc nulls last; select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name IN ('P_20110201', 'P_20110202') order by subpartition_name; select a.column_name, a.num_distinct,





b.data_type from user_part_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.partition_name IN ('P_20110201', 'P_20110202') order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,



b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110202%' or a.subpartition_name LIKE 'P_20110201%' order by a.column_name, a.subpartition_name;













COMMIT; exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1'); select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name IN ('P_20110201', 'P_20110202', 'P_20110203') order by subpartition_name; select a.column_name, a.num_distinct,





b.data_type from user_part_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.partition_name IN ('P_20110201', 'P_20110202', 'P_20110203') order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,



b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110201%' or a.subpartition_name LIKE 'P_20110202%' or a.subpartition_name LIKE 'P_20110203%' order by a.column_name, a.subpartition_name;













COMMIT; alter session set tracefile_identifier='INCREMENTAL'; alter session set events='10046 trace name context forever, level 12'; exec dbms_stats.gather_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1'); alter session set events='10046 trace name context off'; select table_name, global_stats, last_analyzed, num_rows from user_tables where table_name='TEST_TAB1'; select table_name, partition_name, global_stats, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TAB1' order by partition_name; select table_name, subpartition_name, global_stats, last_analyzed, num_rows from user_tab_subpartitions where table_name='TEST_TAB1' and partition_name IN ('P_20110201', 'P_20110202', 'P_20110203', 'P_20110204') order by subpartition_name; select a.column_name, a.num_distinct,





b.data_type from user_part_col_statistics a, user_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.partition_name IN ('P_20110201', 'P_20110202', 'P_20110203', 'P_20110204') order by a.column_name, a.partition_name; select a.column_name, a.subpartition_name, a.num_distinct,



b.data_type from dba_subpart_col_statistics a, dba_tab_cols b where a.table_name='TEST_TAB1' and a.table_name=b.table_name and a.column_name=b.column_name and a.subpartition_name LIKE 'P_20110201%' and a.subpartition_name LIKE 'P_20110202%' and a.subpartition_name LIKE 'P_20110203%' and a.subpartition_name LIKE 'P_20110204%' order by a.column_name, a.subpartition_name; select reporting_date, count(*) from test_tab1 group by reporting_date; select source_system, count(*) from test_tab1 group by source_system; select status, count(*) from test_tab1 group by status; select seq_id, count(*) from test_tab1 group by seq_id;

SPOOL OFF


White Papers

STATISTICS ON PARTITIONED OBJECTS

ABSTRACT

CORE CHALLENGES AND STATISTICS FUNDAMENTALS

GLOBAL STATISTICS

ALTERNATIVE STRATEGIES

11G IMPROVEMENTS

INTRODUCTION

COLLECTION GRANULARITY AND STATISTICS LEVELS

HOW STATISTICS LEVEL S ARE USED

AGGREGATED GLOBAL STATISTICS AND THE QUALITY/PERFORMANCE TRADE-OFF

AGGREGATION SCENARIOS AND PITFALLS

MISSING PARTITION ST ATISTICS

INABILITY TO REPLACE TRUE GLOBAL STATISTICS

_MINIMAL_STATS_AGGREGATION

MISSING GLOBAL STATISTICS

ALTERNATIVE STRATEGIES

DYNAMIC SAMPLING

SETTING STATISTICS

SETTING STATISTICS SUMMARY

COPYING STATISTICS

COPY_TABLE_STATS BUGS AND PATCHES

COPY_TABLE_STATS SUMMARY

APPROX_GLOBAL AND PARTITION

APPROX_GLOBAL AND PARTITION SUMMARY

11G IMPROVEMENTS

INCREMENTAL STATISTICS

INCREMENTAL STATISTICS BUGS, ISSUES AND USAGE

INCREMENTAL STATISTICS SUMMARY

CONCLUSIONS

ACKNOWLEDGEMENTS

ABOUT THE AUTHOR

APPENDIX A - ADDITIONAL RESOURCES

DOCUMENTATION AND SUPPORT NOTES

BLOGS

WHITE PAPERS

BUGS AND PATCHES

APPENDIX B – LISTINGS

INIT.SQL

DEFAULT.SQL

AGGREGATION.SQL

AGGREGATION2.SQL

AGGREGATION3.SQL

PARTEX_MSA.SQL

PARTEX_NON_MSA.SQL

APPROX_GLOBAL1.SQL

APPROX_GLOBAL2.SQL

COPY_STATS.SQL

INCREMENTAL.SQL

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib