Materialized Views

advertisement

Materialized Views

Materialized views are a data warehousing/decision support system tool that can increase by many orders of magnitude the speed of queries that access a large number (maybe many hundreds of thousands or millions) of records. In basic terms, they allow a user to query potentially terabytes of detail data in seconds (or less). They accomplish this by transparently using pre-computed summarizations and joins of data. These pre-computed summaries would typically be very small compared to the original source data.

Say, for example, your company has a sales database loaded with the details of a million orders, and you want to get a breakdown of sales by region (a common enough query). Each and every record would be scanned, the data aggregated to the region level, and the calculation performed. Using a materialized view, we can store a summary of the sales data by region, and have the system maintain these summaries for us. If you have ten sales regions, this summary will have ten records, so instead of sifting through a million detail records, we’ll query only ten. Furthermore, if someone asks a slightly different question, say for the sales in a specific region, then that query can also obtain the answer from the materialized view.

In this chapter you’ll find out what materialized views are, what they can do and, most importantly, how they work – a lot of the ‘magic’ goes on behind the scenes. Having gone to the trouble of creating it, you’ll find out how to make sure that your materialized view is used by all queries to which the view is capable of providing the answer (sometimes, you know Oracle could use the materialized view, but it is not able to do so simply because it lacks important information). Specifically, we will:

❑ Run through an example that will demonstrate the essential power of materialized views, and help you decide quickly whether it is a feature you might want to use.

Discuss the various parameters and privileges that must be set in order to use this feature.

Investigate, with examples, the use of constraints and dimensions to let the database know when to use a materialized view to answer a query in the most effective manner.

Chapter 13

Look at how to use the DBMS _ OLAP package to analyze your views.

Round off the chapter with two caveats you should be aware of when using materialized views.

A Brief History

Summary table management , another term for the materialized view, has actually been around for some time in tools such as Oracle Discoverer (an ad-hoc query and reporting tool). Using Discoverer, an administrator would set up various summary tables in the database. The tool would then parse queries before sending them to Oracle. If it determined that some summary table existed, which could answer the question more readily, it would rewrite the query to access the summary tables, rather than the underlying table that was originally specified in the query, and submit it to Oracle for processing. This was great, as long you used that tool to process your queries. If you ran a query in SQL*PLUS, or from your Java JDBC client, then the query rewrite would not (could not) take place. Furthermore, the synchronization between the details (original source data) and the summaries could not be performed or validated for you automatically, since the tool ran outside the database.

Furthermore, since version 7.0, the Oracle database itself has actually implemented a feature with many of the characteristics of summary tables – the Snapshot . This feature was initially designed to support replication, but I would use it myself to ‘pre-answer’ large queries. So, I would have snapshots that did not use a database link to replicate data from database to database, but rather just summarized or prejoined frequently accessed data. This was good, but without any query rewrite capability, it was still problematic. The application had to know to use the summary tables in the first place, and this made the application more complex to code and maintain. If I added a new summary then I would have to find the code that could make use of it, and rewrite that code.

In Oracle 8.1.5 (Enterprise and Personal Editions) Oracle took the query rewriting capabilities from tools like Discoverer, the automated refresh and scheduling mechanisms from snapshots (that makes the summary tables ‘self maintaining’), and combined these with the optimizer’s ability to find the best plan out of many alternatives. This produced the materialized view.

With all of this functionality centralized in the database, now every application can take advantage of the automated query rewrite facility , regardless of whether access to the database is via SQL*PLUS, Oracle

Forms, JDBC, ODBC, Pro*C, OCI, or some third party tool. Every Oracle 8i enterprise database can have summary table management. Also, since everything takes place inside the database, the details can be easily synchronized with the summaries, or at least the database knows when they aren’t synchronized, and might bypass ‘stale’ summaries (you control its behavior in this case). By putting the functionality right in front of the data, anything that can access Oracle can take advantage of this functionality.

The same philosophy underpins features such as Fine Grained Access Control (see the section on

Openness in Chapter 1 and also Chapter 21, which is dedicated to FGAC). The closer to the data these functions are, the broader the audience that can appreciate them. If you put security outside the database, in an application perhaps, only people who use the application can make use of it (hence the only access to the data is via the application).

594

Materialized Views

What you’ll need to run the Examples

In order to run the examples in this chapter you’ll need access to a Personal or Enterprise Edition of

Oracle 8.1.5, or higher. This functionality is not provided in the Standard release. You will need a user account with the following privileges (at least):

❑ GRANT CREATE SESSION

GRANT CREATE TABLE

GRANT CREATE MATERIALIZED VIEW

❑ GRANT QUERY REWRITE

The first three privileges above may be granted to a role that you have been granted. The QUERY

REWRITE privilege must be granted directly to you.

Additionally, you’ll need access to a tablespace with about 30 — 50MB of free space.

Finally, you must be using the C ostB ased O ptimizer ( CBO ) in order to make use of query rewrite. If you do not use the CBO, query rewrite will not take place. In these examples, our optimizer goal will be left at the default of CHOOSE ; simply analyzing the tables will ensure we can take advantage of query rewrite.

An Example

A quick example will demonstrate what a materialized view entails. The concept demonstrated below is that of reducing the execution time of a long running query transparently, by summarizing data in the database. A query against a large table will be transparently rewritten into a query against a very small table, without any loss of accuracy in the answer. We’ll start with a large table that contains a list of owners of objects, and the objects they own. This table is based on the ALL _ OBJECTS data dictionary view: tkyte@TKYTE816> create table my _ all _ objects

2 nologging

3 as

4 select * from all _ objects

5 union all

6 select * from all _ objects

7 union all

8 select * from all _ objects

9 /

Table created. tkyte@TKYTE816> insert /*+ APPEND */ into my _ all _ objects

2 select * from my _ all _ objects;

65742 rows created. tkyte@TKYTE816> commit;

Commit complete.

595

Chapter 13 tkyte@TKYTE816> insert /*+ APPEND */ into my _ all _ objects

2 select * from my _ all _ objects;

131484 rows created. tkyte@TKYTE816> commit;

Commit complete. tkyte@TKYTE816> analyze table my _ all _ objects compute statistics;

Table analyzed.

On my system, I have the Java option installed so the MY _ ALL _ OBJECTS table has about 250000 rows in it after the above. You may have to adjust the number of times you UNION ALL and INSERT , in order to achieve the same effect. Now, we’ll execute a query against this table that shows the number of objects owned by each user. Initially, this will require a full scan of the large table we have above: tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> set timing on tkyte@TKYTE816> select owner, count(*) from my _ all _ objects group by owner;

OWNER COUNT(*)

------------------------------ ----------

A 36

B 24

CTXSYS 2220

DBSNMP 48

DEMO 60

DEMO11 36

DEMO _ DDL 108

MDSYS 2112

MV _ USER 60

ORDPLUGINS 312

ORDSYS 2472

OUR _ TYPES 12

OUTLN 60

PERFSTAT 636

PUBLIC 117972

SCHEDULER 36

SCOTT 84

SEAPARK 36

SYS 135648

SYSTEM 624

TESTING 276

TKYTE 12

TTS _ USER 48

TYPES 36

24 rows selected.

Elapsed: 00:00:03.35 tkyte@TKYTE816> set timing off tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select owner, count(*) from my _ all _ objects group by owner;

596

Materialized Views

24 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2525 Card=24 Bytes=120)

1 0 SORT (GROUP BY) (Cost=2525 Card=24 Bytes=120)

2 1 TABLE ACCESS (FULL) OF ‘MY _ ALL _ OBJECTS’ (Cost=547 Card=262968

Statistics

----------------------------------------------------------

0 recursive calls

27 db block gets

3608 consistent gets

3516 physical reads

0 redo size

1483 bytes sent via SQL*Net to client

535 bytes received via SQL*Net from client

3 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

24 rows processed

In order to get the aggregate count, we must count 250000+ records on over 3600 blocks. Unfortunately, in our system we ask this question frequently, dozens of times every day. We are scanning almost 30MB of data. We could avoid counting the details each and every time by creating a materialized view of the data. The following demonstrates the basic steps needed to perform this operation. We’ll discuss the

GRANT and ALTER statements in more detail in the How Materialized Views Work section. In addition to the grants below, you might need the CREATE MATERIALIZED VIEW privilege as well, depending on what roles you have been granted and have enabled: tkyte@TKYTE816> grant query rewrite to tkyte;

Grant succeeded. tkyte@TKYTE816> alter session set query _ rewrite _ enabled=true;

Session altered. tkyte@TKYTE816> alter session set query _ rewrite _ integrity=enforced;

Session altered. tkyte@TKYTE816> create materialized view my _ all _ objects _ aggs

2 build immediate

3 refresh on commit

4 enable query rewrite

5 as

6 select owner, count(*)

7 from my _ all _ objects

8 group by owner

9 /

Materialized view created. tkyte@TKYTE816> analyze table my _ all _ objects _ aggs compute statistics;

Table analyzed.

597

Chapter 13

Basically, what we’ve done is pre-calculate the object count, and define this summary information as a materialized view. We have asked that the view be immediately built and populated with data. You’ll notice that we have also specified REFRESH ON COMMIT and ENABLE QUERY REWRITE , but more on these in a moment. Also notice that we may have created a materialized view, but when we ANALYZE , we are analyzing a table. A materialized view creates a real table, and this table may be indexed, analyzed, and so on.

First, let’s see the view in action by issuing the same query again (the query that we used to define the view itself): tkyte@TKYTE816> set timing on tkyte@TKYTE816> select owner, count(*)

2 from my _ all _ objects

3 group by owner;

OWNER COUNT(*)

------------------------------ ----------

A 36

B 24

...

TYPES 36

24 rows selected.

Elapsed: 00:00:00.10 tkyte@TKYTE816> set timing off tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select owner, count(*)

2 from my _ all _ objects

3 group by owner;

24 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=24 Bytes=216)

1 0 TABLE ACCESS (FULL) OF ‘MY _ ALL _ OBJECTS _ AGGS’ (Cost=1 Card=Valve)

Statistics

----------------------------------------------------------

0 recursive calls

12 db block gets

7 consistent gets

0 physical reads

0 redo size

1483 bytes sent via SQL*Net to client

535 bytes received via SQL*Net from client

3 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

24 rows processed tkyte@TKYTE816> set autotrace off

598

Materialized Views

From over 3,600 consistent gets (logical I/Os), to just 12. No physical I/O this time around as the data was found in the cache. Our buffer cache will be much more efficient now as it has less to cache. I could not even begin to cache the previous query’s working set, but now I can. Notice how our query plan shows we are now doing a full scan of the MY _ ALL _ OBJECTS _ AGGS table, even though we queried the detail table MY _ ALL _ OBJECTS . When the SELECT OWNER, COUNT(*)...

query is issued, the database automatically directs it to our materialized view.

Let’s take this a step further by adding a new row to the MY _ ALL _ OBJECTS table, and committing the change: tkyte@TKYTE816> insert into my _ all _ objects

2 ( owner, object _ name, object _ type, object _ id )

3 values

4 ( ‘New Owner’, ‘New Name’, ‘New Type’, 1111111 );

1 row created. tkyte@TKYTE816> commit;

Commit complete.

Now, we issue effectively the same query again, but this time we’re just looking at our newly inserted row: tkyte@TKYTE816> set timing on tkyte@TKYTE816> select owner, count(*)

2 from my _ all _ objects

3 where owner = ‘New Owner’

4 group by owner;

OWNER COUNT(*)

------------------------------ ----------

New Owner 1

Elapsed: 00:00:00.01 tkyte@TKYTE816> set timing off tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select owner, count(*)

2 from my _ all _ objects

3 where owner = ‘New Owner’

4 group by owner;

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=9)

1 0 TABLE ACCESS (FULL) OF ‘MY _ ALL _ OBJECTS _ AGGS’ (Cost=1 Card=Valve)

Statistics

----------------------------------------------------------

0 recursive calls

12 db block gets

6 consistent gets

0 physical reads

0 redo size

430 bytes sent via SQL*Net to client

424 bytes received via SQL*Net from client

599

Chapter 13

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed tkyte@TKYTE816> set autotrace off

The analysis shows that we scanned the materialized view and found the new row. By specifying

REFRESH ON COMMIT in our original definition of the view, we requested that Oracle maintain synchronization between the view and the details – when we update the details, the summary will be maintained as well. It cannot maintain synchronization in every case of an arbitrary materialized view, but in the case of a single table summary (as we have) or joins with no aggregation, it can.

Now, one last query: tkyte@TKYTE816> set timing on tkyte@TKYTE816> select count(*)

2 from my _ all _ objects

3 where owner = ‘New Owner’;

COUNT(*)

----------

1

Elapsed: 00:00:00.00 tkyte@TKYTE816> set timing off tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select count(*)

2 from my _ all _ objects

3 where owner = ‘New Owner’;

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=9)

1 0 SORT (AGGREGATE)

2 1 TABLE ACCESS (FULL) OF ‘MY _ ALL _ OBJECTS _ AGGS’ (Cost=1 Card=Valve)

Statistics

----------------------------------------------------------

0 recursive calls

12 db block gets

5 consistent gets

0 physical reads

0 redo size

367 bytes sent via SQL*Net to client

424 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed tkyte@TKYTE816> set autotrace off

600

Materialized Views

We can see that Oracle is smart enough to use the view even when the query appears to be slightly different. There was no GROUP BY clause here, yet the database recognized that the materialized view could still be used. This is what makes materialized views ‘magical’. The end users do not have to be aware of these summary tables. The database will realize for us that the answer already exists and, as long as we enable query rewrite (which we did), will automatically rewrite the query to use them. This feature allows you to immediately impact existing applications, without changing a single query.

Uses of Materialized Views

This is relatively straightforward and is answered in a single word – performance. By calculating the answers to the really hard questions up front (and once only), we will greatly reduce the load on our machine. We will experience:

❑ Less physical reads — There is less data to scan through.

❑ Less writes — We will not be sorting/aggregating as frequently.

❑ Decreased CPU consumption — We will not be calculating aggregates and functions on the data, as we will have already done that.

❑ Markedly faster response times — Our queries will return incredibly quickly when a summary is used, as opposed to the details. This will be a function of the amount of work we can avoid by using the materialized view, but many orders of magnitude is not out of the question.

Materialized views will increase your need for one resource – more permanently allocated disk. We need extra storage space to accommodate the materialized views, of course, but for the price of a little extra disk space, we can reap a lot of benefit.

Materialized views work best in a read-only, or read-intensive environment. They are not designed for use in a high-end OLTP environment. They will add overhead to modifications performed on the base tables in order to capture the changes. There are concurrency issues with regards to using the REFRESH

ON COMMIT option. Consider our summary example from before. Any rows that are inserted or deleted from this table will have to update one of 24 rows in the summary table in order to maintain the count in real time. What this means is that, at most, you can have 24 people committing at the same time

(assuming they all affect a different owner that is). This does not preclude the use of materialized views in an OLTP environment. For example if you use full refreshes on a recurring basis (during off-peak time) there will be no overhead added to the modifications, and there would be no concurrency issues.

This would allow you to report on yesterday’s activities, for example, and not query the live OLTP data for reports.

How Materialized Views Work

Materialized views may appear to be hard to work with at first. There will be cases where you create a materialized view, and you know that the view holds the answer to a certain question but, for some reason, Oracle does not. If you dig deep enough, you’ll discover why, and it always comes back to the fact that Oracle is just a piece of software, and can only work with the information disk provided with it.

The more meta data provided, the more pieces of information about the underlying data you can give to

Oracle, the better. These pieces of information are mundane things that you might not even think about in a data warehouse environment, such as NOT NULL constraints, primary keys, foreign keys and so on.

The meta data provided by these keys and constraints gives the optimizer more information, and hence, more of a chance.

601

Chapter 13

Not only do keys and constraints, such as those listed above preserve data integrity, they also add information about the data, into the data dictionary, which can be used in query rewrites – hence the term met adata . See the ‘Constraints’ section for further information.

In the following sections we will look at what you have to do to set up materialized views, some examples of using them, and how adding more information, more meta data, to the database will make materialized views work more often.

Setting Up

There is one mandatory INIT.ORA

parameter necessary for materialized views to function, and this is the COMPATIBLE parameter. The value of COMPATIBLE should be set to 8.1.0, or above, in order for query rewrites to be functional. If this value is not set appropriately, query rewrite will not be invoked.

There are two other relevant parameters that may be set at either the system-level (via the INIT.ORA

file), or the session-level (via the ALTER SESSION command). They are:

❑ QUERY _ REWRITE _ ENABLED – Unless the value of this parameter is set to TRUE , query rewrites will not take place. The default value is FALSE .

❑ QUERY _ REWRITE _ INTEGRITY – This parameter controls how Oracle rewrites queries and may be set to one of three values:

ENFORCED – Queries will be rewritten using only constraints and rules that are enforced and guaranteed by Oracle. There are mechanisms by which we can tell Oracle about other inferred relationships, and this would allow for more queries to be rewritten, but since Oracle does not enforce those relationships, it would not make use of these facts at this level.

TRUSTED – Queries will be rewritten using the constraints that are enforced by Oracle, as well as any relationships existing in the data that we have told Oracle about, but are not enforced by the database. For example, in our initial example we could have created the physical table

MY _ ALL _ OBJECTS _ AGGS manually using a parallel, no-logging CREATE TABLE AS SELECT (to speed up the building of the summary table). We could have then created the materialized view, instructing it to use this pre-built table instead of creating the summary table itself. If we wish Oracle to use this pre-built table during a subsequent query rewrite, we must specify a value of TRUSTED . This is because we want Oracle to ‘trust’ that we have supplied the correct data in the pre-built table – Oracle does not enforce that the data in this table is correct.

STALE _ TOLERATED – Queries will be rewritten to use materialized views even if Oracle knows the data contained in the materialized view is ‘stale’ (out-of-sync with the details). This might be useful in an environment where the summary tables are refreshed on a recurring basis, not on commit, and a slightly out-of-sync answer is acceptable.

In the example above, you saw the ALTER SESSION statements that enable this query rewrite magic.

Since the example used only relationships and objects enforced by Oracle, the query rewrite integrity could be set to the highest level – ENFORCED .

I also needed to grant QUERY REWRITE privileges to myself. Now, the account I used happened to be a

DBA account, which has QUERY REWRITE , so why did I need to grant the privilege directly to myself?

The reason is that you cannot create compiled stored objects such as materialized views, stored procedures, and triggers that rely on privileges from a role (the DBA role in this case). See Chapter 23,

Invoker and Definers Rights, for a full explanation of the use of roles and compiled stored objects. If you create a materialized view with QUERY REWRITE enabled, but do not have QUERY REWRITE system privilege yourself, you will receive the following error:

602

Materialized Views create materialized view my _ all _ objects _ aggs

*

ERROR at line 1:

ORA-01031: insufficient privileges

Internal Mechanics

So, now that we can create a materialized view and show that it works, what are the steps Oracle will undertake to rewrite our queries? Normally, when QUERY _ REWRITE _ ENABLED is set to FALSE , Oracle will take your SQL as is, parse it, and optimize it. With query rewrites enabled, Oracle will insert an extra step into this process. After parsing, Oracle will attempt to rewrite the query to access some materialized view, instead of the actual table that it references. If it can perform a query rewrite, the rewritten query (or queries) is parsed and then optimized along with the original query. The query plan with the lowest cost from this set is chosen for execution. If it cannot rewrite the query, the original parsed query is optimized and executed as normal.

Query Rewrite

When query rewrite is enabled, Oracle will use the following steps to try and rewrite a query with a materialized view.

Full Exact Text Match

In this method, Oracle considers possible exact string matches in the set of available materialized views found in the data dictionary. In the example above, this is the method Oracle would have used for the very first query that used the materialized view. The algorithm used is ‘friendlier’ (more flexible) than a shared pool comparison (which demands an exact byte-for-byte match) as it ignores whitespace, case of characters, and other formatting.

Partial Text Match

Starting with the FROM clause, the optimizer compares the remaining text of the materialized view’s defining query. This allows for mismatches between items in the SELECT list. If the data you need can be generated from the materialized view (if your SELECT list can be satisfied) Oracle will rewrite the query using the materialized view. The query SELECT LOWER(OWNER) FROM MY _ ALL _ OBJECTS GROUP

BY OWNER; would be an example of a partial text match.

General Query Rewrite Methods

These enable the use of a materialized view even if it contains only part of the data, more data than requested, or data that can be converted. The optimizer tests the materialized view’s definition against the individual components of the query ( SELECT , FROM , WHERE , GROUP BY ) to find a match. The checks that Oracle performs against these components are:

❑ Data sufficiency – Can the required data be obtained from a given materialized view? If you ask for column X , and column X is not in the materialized view and, furthermore, it is not retrievable via some join with the materialized view, then Oracle will not rewrite the query to use that view. For example, the query SELECT DISTINCT OWNER FROM MY _ ALL _ OBJECTS , using our previous example, can be rewritten using our materialized view – the OWNER column is available. The query SELECT DISTINCT OBJECT _ TYPE FROM MY _ ALL _ OBJECTS cannot be satisfied using the materialized view, as the view does not have sufficient data.

❑ Join compatibility – Ensures that any JOIN required by the submitted query can be satisfied by the materialized view.

603

Chapter 13

We can look at an example of join compatibility using MY _ ALL _ OBJECTS and the following tables: tkyte@TKYTE816> create table t1 ( owner varchar2(30), flag char(1) );

Table created. tkyte@TKYTE816> create table t2 ( object _ type varchar2(30), flag char(1) );

Table created.

The following query is join-compatible with the materialized view – the query can and will be rewritten using the materialized view: tkyte@TKYTE816> select a.owner, count(*), b.owner

2 from my _ all _ objects a, t1 b

3 where a.owner = b.owner

4 and b.flag is not null

5 group by a.owner, b.owner

6 /

The database can see that using our materialized view, or the actual base table would result in the same answer. However, the following query, while similar, is not join-compatible: tkyte@TKYTE816> select a.owner, count(*), b.object

_ type

2 from my _ all _ objects a, t2 b

3 where a.object

_ type = b.object

_ type

4 and b.flag is not null

5 group by a.owner, b.object

_ type

6 /

The OBJECT _ TYPE column is not in our materialized view, so Oracle cannot rewrite this query using the materialized view.

Grouping Compatibility

This is required if both the materialized view and the query contain a GROUP BY clause. If the materialized view is grouped at the same level, or is grouped at a higher level of detail than is needed, the query will be rewritten to use the materialized view. The query SELECT COUNT(*) FROM

MY _ ALL _ OBJECTS GROUP BY 1; applied against our first example would be a case where the materialized view is grouped at a higher level of detail than is needed. The database can rewrite this query to use our materialized view even though the grouping for the query is not the same as the grouping for the materialized view.

Aggregate Compatibility

This is required if both query and materialized view contain aggregates. It will ensure that the materialized view can satisfy the required aggregates. It can perform some interesting rewrites in some cases. For example, it will recognize that AVG(X) is the same as SUM(X)/COUNT(X) , so a query that requires AVG(X) can be satisfied by a materialized view with the SUM and COUNT .

In many cases, simple application of the above rules will allow Oracle to rewrite a query to use a materialized view. In other cases (as we will see in an example below), the database will need a little more help from you. You will need to give it additional information in order for it to recognize that it can use a materialized view to answer a question.

604

Materialized Views

Making sure your View gets used

In this section, we’ll look at ways to do this, first by using constraints to help us make use of a query rewrite, and then by using dimensions , which are a means of describing complex relationships – hierarchies of data.

Constraints

I’ve been asked in the past, ‘Why should I use a primary key? Why not just use a unique index?’ Well, the answer is that you could, but doesn’t the fact you used a primary key say something over and above just using a unique index? In fact it does – it can say a lot. The same goes for the use of foreign keys,

NOT NULL constraints and others. Not only do they protect the data, but they also add information about the data into the data dictionary. Using this additional information, Oracle is able to perform a query rewrite more often, in many complex cases.

Consider the following small example. We will copy the EMP and DEPT tables from the SCOTT schema, and create a materialized view that pre-joins the tables together. This materialized view differs from our first example in that it is a REFRESH ON DEMAND materialized view. This means that in order for changes to be applied to it, we will have to refresh it manually: tkyte@TKYTE816> create table emp as select * from scott.emp;

Table created. tkyte@TKYTE816> create table dept as select * from scott.dept;

Table created. tkyte@TKYTE816> alter session set query _ rewrite _ enabled=true;

Session altered. tkyte@TKYTE816> alter session set query _ rewrite _ integrity=enforced;

Session altered. tkyte@TKYTE816> create materialized view emp _ dept

2 build immediate

3 refresh on demand

4 enable query rewrite

5 as

6 select dept.deptno, dept.dname, count (*)

7 from emp, dept

8 where emp.deptno = dept.deptno

9 group by dept.deptno, dept.dname

10 /

Materialized view created. tkyte@TKYTE816> alter session set optimizer _ goal=all _ rows;

Session altered.

605

Chapter 13

Since the underlying tables and the resulting materialized view are very small, we force the use of the cost-based optimizer using the ALTER SESSION command, instead of analyzing tables as we would normally. If Oracle knew how small these tables were, it would not do some of the optimizations we would like. Using the ‘default’ statistics, it will behave as if the tables were fairly large.

Now, we have withheld a lot of information from Oracle here. It does not understand the relationship between EMP and DEPT , does not know which columns are primary keys, and so on. Now, let’s run a query, and see what happens: tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select count(*) from emp;

COUNT(*)

----------

14

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=ALL _ ROWS (Cost=1 Card=1)

1 0 SORT (AGGREGATE)

2 1 TABLE ACCESS (FULL) OF ‘EMP’ (Cost=1 Card=82)

The query has been directed at the underlying EMP table. Now, you and I know that the COUNT(*) query could easily, and more efficiently (especially if the number of employees in each department was large, and there were lots of departments), have been answered from the materialized view. There, we have all of the information we need to get the count of employees. We know this because we are aware of things about the data that we kept from Oracle:

❑ DEPTNO is the primary key of DEPT – This means that each EMP record will join to, at most, one DEPT record.

❑ DEPTNO in EMP is a foreign key to DEPTNO in DEPT – If the DEPTNO in EMP is not a Null value, then it will be joined to a row in DEPT (we won’t lose any non-Null EMP records during a join).

❑ DEPTNO in EMP is NOT NULL – This coupled with the foreign key constraint tells us we won’t lose any EMP records.

These three facts imply that if we join EMP to DEPT , each EMP row will be observed in the resultset at

least once and at most once. Since we never told Oracle these facts, it was not able to make use of the materialized view. So, let’s make Oracle aware of them: tkyte@TKYTE816> alter table dept

2 add constraint dept _ pk primary key(deptno);

Table altered. tkyte@TKYTE816> alter table emp

2 add constraint emp _ fk _ dept

3 foreign key(deptno) references dept(deptno);

Table altered. tkyte@TKYTE816> alter table emp modify deptno not null;

606

Materialized Views

Table altered. tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select count(*) from emp;

COUNT(*)

----------

14

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=ALL _ ROWS (Cost=1 Card=1 Bytes=13)

1 0 SORT (AGGREGATE)

2 1 TABLE ACCESS (FULL) OF ‘EMP _ DEPT’ (Cost=1 Card=82 Bytes=1066)

Now Oracle is able to rewrite the query using the EMP _ DEPT materialized view. Any time that you know

Oracle could use a materialized view, but it is not doing so (and you have verified you can use materialized views in general), take a closer look at the data and ask yourself ‘What piece of information have I withheld from Oracle?’ Nine times out of ten, you’ll find a missing piece of meta data that, when included, allows Oracle to rewrite the query.

So, what happens if this is a true data warehouse, and there are tens of millions of records in the above tables? You don’t really want the additional effort of verifying a foreign key relationship – you already did that in your data scrubbing routine, didn’t you? In this case, you can create a non-validated constraint, one that is used to inform the database about a relationship, but it has not been validated by the database itself. Let’s look at the above example again, but this time we’ll simulate the loading of data into an existing data warehouse (our example above is our data warehouse). We’ll drop our constraints, load the data, refresh the materialized views, and add our constraints back. We’ll start with dropping the constraints: tkyte@TKYTE816> alter table emp drop constraint emp _ fk _ dept;

Table altered. tkyte@TKYTE816> alter table dept drop constraint dept _ pk;

Table altered. tkyte@TKYTE816> alter table emp modify deptno null;

Table altered.

Now, in order to simulate our load, I will insert a single new row into EMP (not much of a load, I know, but enough for demonstration purposes). Then, we will refresh our materialized view and tell Oracle to consider it as FRESH : tkyte@TKYTE816> insert into emp (empno,deptno) values ( 1, 1 );

1 row created. tkyte@TKYTE816> exec dbms _ mview.refresh( ‘EMP _ DEPT’ );

PL/SQL procedure successfully completed.

607

Chapter 13 tkyte@TKYTE816> alter materialized view emp _ dept consider fresh;

Materialized view altered.

Now we tell Oracle about the relationships between EMP and DEPT : tkyte@TKYTE816> alter table dept

2 add constraint dept _ pk primary key(deptno)

3 rely enable NOVALIDATE

4 /

Table altered. tkyte@TKYTE816> alter table emp

2 add constraint emp _ fk _ dept

3 foreign key(deptno) references dept(deptno)

4 rely enable NOVALIDATE

5 /

Table altered. tkyte@TKYTE816> alter table emp modify deptno not null NOVALIDATE;

Table altered.

So here we have told Oracle that there is a foreign key from EMP to DEPT as before. However this time, because we will have scrubbed our data prior to loading it into the warehouse, we tell Oracle not to perform any validating checks. The NOVALIDATE option bypasses the checking of existing data we loaded, and RELY tells Oracle to trust the integrity of the data. Basically, we have told Oracle to trust that if it joins EMP to DEPT by DEPTNO , every row in EMP will be retrieved at least once, and at most once.

In fact, in this case, we have ‘lied’ to the database. We have inserted a row into EMP that has no corresponding row in DEPT . We are now ready to query: tkyte@TKYTE816> alter session set query _ rewrite _ integrity=enforced;

Session altered. tkyte@TKYTE816> select count(*) from emp;

COUNT(*)

----------

15

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=ALL _ ROWS (Cost=1 Card=1)

1 0 SORT (AGGREGATE)

2 1 TABLE ACCESS (FULL) OF ‘EMP’ (Cost=1 Card=164)

Since we set QUERY _ REWRITE _ INTEGRITY=ENFORCED Oracle did not rewrite the query to use the materialized view. We must go down a level in query integrity. We need Oracle to ‘trust’ us:

608

Materialized Views tkyte@TKYTE816> alter session set query_rewrite_integrity=trusted;

Session altered. tkyte@TKYTE816> select count(*) from emp;

COUNT(*)

----------

14

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=ALL _ ROWS (Cost=1 Card=1 Bytes=13)

1 0 SORT (AGGREGATE)

2 1 TABLE ACCESS (FULL) OF ‘EMP _ DEPT’ (Cost=1 Card=82 Bytes=1066)

In this case, Oracle did in fact rewrite the query, but the side effect is that our newly inserted row has not been counted. The ‘wrong’ answer is returned because the ‘fact’ that each row in EMP should be preserved in a join to DEPT is not a fact, given the data we loaded. When the materialized view refreshed, it did not get the newly added EMP row. The data we told Oracle to rely on was not reliable.

This demonstration highlights two important points:

❑ You can use materialized views in a large data warehouse very efficiently without having to perform lots of extra, typically redundant, verifications of the data.

❑ BUT , you had better be 100 percent sure that your data is scrubbed if you ask Oracle to rely on it.

Dimensions

Use of dimensions is another method by which we can give even more information to Oracle. Suppose we have a table of details giving a transaction date and a customer ID. The transaction date points to another table that gives full details of what month the transaction date was in, what quarter of your fiscal year it represents, what fiscal year it was in, and so on. Now, suppose you created a materialized view that stored aggregated sales information at the monthly level. Can Oracle use that view to answer a query for sales data for a particular quarter or year? Well, we know that transaction date implies month, month implies quarter, and quarter implies year, so the answer is that it can, but Oracle doesn’t know this (yet), so although it can, it won’t.

Using a database object called a DIMENSION , we can alert Oracle to these facts so that it will use them to rewrite queries in more cases. A dimension declares a parent/child relationship between pairs of columns. We can use it to describe to Oracle that, within a row of a table, the MONTH column implies the value you’ll find in the QTR column, the QTR column implies the value you’ll find in the YEAR column, and so on. Using a dimension, we can have a materialized view that has fewer details than the detailed records (summarized to the monthly level perhaps). This may still be at a higher level of aggregation than the query requests (the query wants data by quarter, say), but Oracle will recognize that it can use the materialized view to get the answer.

Here is a simple example. We will set up a SALES table to store the transaction date, a customer ID, and the total number of sales. This table will have about 350000 rows in it. Another table,

TIME _ HIERARCHY , will store the mapping of transaction date to month, to quarter, to year. If we join the two together, we can obtain aggregate sales by month, quarter, year, and so on. Likewise, if we had a table that mapped a customer ID to a zip code, and the zip code to a region, we could easily join this table to SALES , and aggregate by zip code or region.

609

Chapter 13

In a conventional database schema (one without materialized views and other structures) these operations would succeed, but they would be very slow. For every row in the sales data, we would have to perform an indexed read into the lookup table to convert either the transaction date or customer ID into some other value (a NESTED LOOP JOIN ) in order to group by this other value. Enter the materialized view. We can store a summarized rollup of the details, perhaps at the monthly level for the transaction date, and at the zip code level for the customer information. Now, rolling up to quarters or by region becomes a very fast operation.

We’ll start by creating the SALES table, and load it with some random test data, generated using the

ALL _ OBJECTS view. tkyte@TKYTE816> create table sales

2 (trans _ date date, cust _ id int, sales _ amount number );

Table created. tkyte@TKYTE816> insert /*+ APPEND */ into sales

2 select trunc(sysdate,’year’)+mod(rownum,366) TRANS _ DATE,

3 mod(rownum,100) CUST _ ID,

4 abs(dbms _ random.random)/100 SALES _ AMOUNT

5 from all _ objects

6 /

21921 rows created. tkyte@TKYTE816> commit;

Commit complete.

These details will represent one year’s worth of data. I set up the TRANS _ DATE to simply be the first day of this year plus a number between 1 and 365. The CUST _ ID is a number between 0 and 99. The total number of sales is some typically ‘large’ number (it was a really good year).

My ALL _ OBJECTS view contains about 22000 rows, so after four inserts that consecutively double the size of the table, we’ll have about 350000 records. I am using the /*+ APPEND */ hint simply to avoid the redo log that would otherwise be generated by these large inserts: tkyte@TKYTE816> begin

2 for i in 1 .. 4

3 loop

4 insert /*+ APPEND */ into sales

5 select trans _ date, cust _ id, abs(dbms _ random.random)/100

6 from sales;

7 commit;

8 end loop;

9 end;

10 /

PL/SQL procedure successfully completed. tkyte@TKYTE816> select count(*) from sales;

COUNT(*)

----------

350736

610

Materialized Views

Now we need to set up our TIME _ HIERARCHY table, to roll up the date field by month, year, quarter, and so on: tkyte@TKYTE816> create table time _ hierarchy

2 (day primary key, mmyyyy, mon _ yyyy, qtr _ yyyy, yyyy)

3 organization index

4 as

5 select distinct

6 trans _ date DAY,

7 cast (to _ char(trans _ date,’mmyyyy’) as number) MMYYYY,

8 to _ char(trans _ date,’mon-yyyy’) MON _ YYYY,

9 ‘Q’ || ceil( to _ char(trans _ date,’mm’)/3) || ‘ FY’

10 || to _ char(trans _ date,’yyyy’) QTR _ YYYY,

11 cast( to _ char( trans _ date, ‘yyyy’ ) as number ) YYYY

12 from sales

13 /

Table created.

In this case, it was simple enough. We generated:

❑ MMYYYY – The month, including the year

MON _ YYYY – Same as above but we ‘spelled’ out the month

QTR _ YYYY – The quarter of the year, including the year

❑ YYYY – The year itself

In general, the computations required to create this table could be much more complex. For example, fiscal year quarters, typically, are not so easily computed, and neither are fiscal years. They do not generally follow the calendar year.

We’ll now create the materialized view, SALES _ MV . The summary we are creating rolls the data up from individual days to months. We would expect our materialized view to have about 1/30 the number of rows of our SALES table, if the data was evenly distributed: tkyte@TKYTE816> analyze table sales compute statistics;

Table analyzed. tkyte@TKYTE816> analyze table time _ hierarchy compute statistics;

Table analyzed. tkyte@TKYTE816> create materialized view sales _ mv

2 build immediate

3 refresh on demand

4 enable query rewrite

5 as

6 select sales.cust

_ id, sum(sales.sales

_ amount) sales _ amount,

7 time _ hierarchy.mmyyyy

8 from sales, time _ hierarchy

9 where sales.trans

_ date = time _ hierarchy.day

10 group by sales.cust

_ id, time _ hierarchy.mmyyyy

11 /

611

Chapter 13

Materialized view created. tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select time _ hierarchy.mmyyyy, sum(sales _ amount)

2 from sales, time _ hierarchy

3 where sales.trans

_ date = time _ hierarchy.day

4 group by time _ hierarchy.mmyyyy

5 /

MMYYYY SUM(SALES _ AMOUNT)

---------- -----------------

12001 3.2177E+11

12002 1.0200E+10

22001 2.8848E+11

32001 3.1944E+11

42001 3.1012E+11

52001 3.2066E+11

62001 3.0794E+11

72001 3.1796E+11

82001 3.2176E+11

92001 3.0859E+11

102001 3.1868E+11

112001 3.0763E+11

122001 3.1305E+11

13 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=327 Bytes=850VALVE)

1 0 SORT (GROUP BY) (Cost=4 Card=327 Bytes=8502)

2 1 TABLE ACCESS (FULL) OF ‘SALES _ MV’ (Cost=1 Card=327 Bytes

So far, so good – Oracle rewrote the query to use the view, SALES _ MV . However, let’s see what happens if we issue a query that calls for a higher level of aggregation: tkyte@TKYTE816> set timing on tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select time _ hierarchy.qtr

_ yyyy, sum(sales _ amount)

2 from sales, time _ hierarchy

3 where sales.trans

_ date = time _ hierarchy.day

4 group by time _ hierarchy.qtr

_ yyyy

5 /

QTR _ YYYY SUM(SALES _ AMOUNT)

------------------------------------------------ -----------------

Q1 FY2001 9.2969E+11

Q1 FY2002 1.0200E+10

Q2 FY2001 9.3872E+11

Q3 FY2001 9.4832E+11

Q4 FY2001 9.3936E+11

Elapsed: 00:00:05.58

612

Materialized Views

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8289 Card=5 Bytes=14)

1 0 SORT (GROUP BY) (Cost=8289 Card=5 Bytes=145)

2 1 NESTED LOOPS (Cost=169 Card=350736 Bytes=10171344)

3 2 TABLE ACCESS (FULL) OF ‘SALES’ (Cost=169 Card=350736 B

4 2 INDEX (UNIQUE SCAN) OF ‘SYS _ IOT _ TOP _ 30180’ (UNIQUE)

Statistics

----------------------------------------------------------

0 recursive calls

15 db block gets

351853 consistent gets

...

We see that Oracle doesn’t have the knowledge we have. It does not yet know that it could have used the materialized view to answer this particular query, so it used the original SALES table instead, and had to do a lot of work to get the answer. The same thing would happen if we requested data aggregated by fiscal year.

So, let’s use a DIMENSION to alert Oracle to the fact that the materialized view would be useful in answering this question. First, we’ll create the DIMENSION : tkyte@TKYTE816> create dimension time _ hierarchy _ dim

2 level day is time _ hierarchy.day

3 level mmyyyy is time _ hierarchy.mmyyyy

4 level qtr _ yyyy is time _ hierarchy.qtr

_ yyyy

5 level yyyy is time _ hierarchy.yyyy

6 hierarchy time _ rollup

7 (

8 day child of

9 mmyyyy child of

10 qtr _ yyyy child of

11 yyyy

12 )

13 attribute mmyyyy

14 determines mon _ yyyy;

Dimension created.

This tells Oracle that the DAY column of the TIME _ HIERARCHY table implies MMYYYY , which in turn implies QTR _ YYYY . Finally, QTR _ YYYY implies YYYY . Also stated is the fact that MMYYYY and MON _ YYYY are synonymous – there is a one-to-one mapping between the two. So, any time Oracle sees MON _ YYYY used, it understands it as if MMYYYY was used. Now that Oracle has a greater understanding of the relationships between the data we can see a marked improvement in our query response times: tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select time _ hierarchy.qtr

_ yyyy, sum(sales _ amount)

2 from sales, time _ hierarchy

3 where sales.trans

_ date = time _ hierarchy.day

4 group by time _ hierarchy.qtr

_ yyyy

5 /

613

Chapter 13

QTR _ YYYY SUM(SALES _ AMOUNT)

----------------------------------------------- -----------------

Q1 FY2001 9.2969E+11

Q1 FY2002 1.0200E+10

Q2 FY2001 9.3872E+11

Q3 FY2001 9.4832E+11

Q4 FY2001 9.3936E+11

Elapsed: 00:00:00.20

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=5 Bytes=195)

1 0 SORT (GROUP BY) (Cost=7 Card=5 Bytes=195)

2 1 HASH JOIN (Cost=6 Card=150 Bytes=5850)

3 2 VIEW (Cost=4 Card=46 Bytes=598)

4 3 SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)

5 4 INDEX (FAST FULL SCAN) OF ‘SYS _ IOT _ TOP _ 30180’ (UNI

6 2 TABLE ACCESS (FULL) OF ‘SALES _ MV’ (Cost=1 Card=327 Byt

Statistics

----------------------------------------------------------

0 recursive calls

16 db block gets

12 consistent gets

...

Well, we went from more than 350000 logical reads to 12 – not too bad at all. If you run this example, you’ll be able to see the difference. The first query took a while (about six seconds), the answer to the second query was on the screen before our hands left the Enter key (about a fifth of a second).

We can use this DIMENSION feature many times on the same base fact table. Consider if we assign a

ZIP _ CODE and REGION attribute to every customer in our database: tkyte@TKYTE816> create table customer _ hierarchy

2 ( cust _ id primary key, zip _ code, region )

3 organization index

4 as

5 select cust _ id,

6 mod( rownum, 6 ) || to _ char(mod( rownum, 1000 ), ‘fm0000’) zip _ code,

7 mod( rownum, 6 ) region

8 from ( select distinct cust _ id from sales)

9 /

Table created. tkyte@TKYTE816> analyze table customer _ hierarchy compute statistics;

Table analyzed.

Next, we recreate our materialized view to be a summary that shows us SALES _ AMOUNT by ZIP _ CODE and MMYYYY :

614

Materialized Views tkyte@TKYTE816> drop materialized view sales _ mv;

Materialized view dropped. tkyte@TKYTE816> create materialized view sales _ mv

2 build immediate

3 refresh on demand

4 enable query rewrite

5 as

6 select customer _ hierarchy.zip

_ code,

7 time _ hierarchy.mmyyyy,

8 sum(sales.sales

_ amount) sales _ amount

9 from sales, time _ hierarchy, customer _ hierarchy

10 where sales.trans

_ date = time _ hierarchy.day

11 and sales.cust

_ id = customer _ hierarchy.cust

_ id

12 group by customer _ hierarchy.zip

_ code, time _ hierarchy.mmyyyy

13 /

Materialized view created.

We’ll try to execute a query now, which will show us sales by ZIP _ CODE and MMYYYY , and demonstrate that I used the materialized view as expected: tkyte@TKYTE816> set autotrace tkyte@TKYTE816> select customer _ hierarchy.zip

_ code,

2 time _ hierarchy.mmyyyy,

3 sum(sales.sales

_ amount) sales _ amount

4 from sales, time _ hierarchy, customer _ hierarchy

5 where sales.trans

_ date = time _ hierarchy.day

6 and sales.cust

_ id = customer _ hierarchy.cust

_ id

7 group by customer _ hierarchy.zip

_ code, time _ hierarchy.mmyyyy

8 /

1250 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=409 Bytes=204

1 0 TABLE ACCESS (FULL) OF ‘SALES _ MV’ (Cost=1 Card=409 Bytes=2

Statistics

----------------------------------------------------------

28 recursive calls

12 db block gets

120 consistent gets

...

However, when we ask for information at a different level of aggregation (rolling MMYYYY up to YYYY and ZIP _ CODE up to REGION ), we see that Oracle does not recognize that it can use the materialized view: tkyte@TKYTE816> select customer _ hierarchy.region,

2 time _ hierarchy.yyyy,

3 sum(sales.sales

_ amount) sales _ amount

4 from sales, time _ hierarchy, customer _ hierarchy

615

Chapter 13

5 where sales.trans_date = time_hierarchy.day

6 and sales.cust

_ id = customer _ hierarchy.cust

_ id

7 group by customer _ hierarchy.region, time _ hierarchy.yyyy

8 /

9 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8289 Card=9 Bytes=26

1 0 SORT (GROUP BY) (Cost=8289 Card=9 Bytes=261)

2 1 NESTED LOOPS (Cost=169 Card=350736 Bytes=10171344)

3 2 NESTED LOOPS (Cost=169 Card=350736 Bytes=6663984)

4 3 TABLE ACCESS (FULL) OF ‘SALES’ (Cost=169 Card=350736

5 3 INDEX (UNIQUE SCAN) OF ‘SYS _ IOT _ TOP _ 30185’ (UNIQUE)

6 2 INDEX (UNIQUE SCAN) OF ‘SYS _ IOT _ TOP _ 30180’ (UNIQUE)

Statistics

----------------------------------------------------------

0 recursive calls

15 db block gets

702589 consistent gets

...

Oracle understands the time dimension we set up, but it does not yet have any information about how

CUST _ ID , ZIP _ CODE , and REGION relate to each other in our CUSTOMER _ HIERARCHY table. What we do to correct this is to rebuild our dimension with two hierarchies in it – one describing the

TIME _ HIERARCHY and the other describing the CUSTOMER _ HIERARCHY table: tkyte@TKYTE816> drop dimension time _ hierarchy _ dim

2 /

Dimension dropped. tkyte@TKYTE816> create dimension sales _ dimension

2 level cust _ id is customer _ hierarchy.cust

_ id

3 level zip _ code is customer _ hierarchy.zip

_ code

4 level region is customer _ hierarchy.region

5 level day is time _ hierarchy.day

6 level mmyyyy is time _ hierarchy.mmyyyy

7 level qtr _ yyyy is time _ hierarchy.qtr

_ yyyy

8 level yyyy is time _ hierarchy.yyyy

9 hierarchy cust _ rollup

10 (

11 cust _ id child of

12 zip _ code child of

13 region

14 )

15 hierarchy time _ rollup

16 (

17 day child of

18 mmyyyy child of

19 qtr _ yyyy child of

20 yyyy

21 )

22 attribute mmyyyy

616

Materialized Views

23 determines mon _ yyyy;

Dimension created.

We dropped the original time hierarchy and created a new, more descriptive one explaining all of the relevant relationships. Now Oracle will understand that the SALES _ MV we created is able to answer many more questions. For example, if we ask our ‘ REGION by YYYY ’ question again: tkyte@TKYTE816> select customer _ hierarchy.region,

2 time _ hierarchy.yyyy,

3 sum(sales.sales

_ amount) sales _ amount

4 from sales, time _ hierarchy, customer _ hierarchy

5 where sales.trans

_ date = time _ hierarchy.day

6 and sales.cust

_ id = customer _ hierarchy.cust

_ id

7 group by customer _ hierarchy.region, time _ hierarchy.yyyy

8 /

REGION YYYY SALES _ AMOUNT

---------- ---------- ------------

0 2001 5.9598E+11

0 2002 3123737106

1 2001 6.3789E+11

2 2001 6.3903E+11

2 2002 3538489159

3 2001 6.4069E+11

4 2001 6.3885E+11

4 2002 3537548948

5 2001 6.0365E+11

9 rows selected.

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=9 Bytes=576)

1 0 SORT (GROUP BY) (Cost=11 Card=9 Bytes=576)

2 1 HASH JOIN (Cost=9 Card=78 Bytes=4992)

3 2 HASH JOIN (Cost=6 Card=78 Bytes=4446)

4 3 VIEW (Cost=3 Card=19 Bytes=133)

5 4 SORT (UNIQUE) (Cost=3 Card=19 Bytes=133)

6 5 INDEX (FAST FULL SCAN) OF ‘SYS _ IOT _ TOP _ 30180’ (U

7 3 TABLE ACCESS (FULL) OF ‘SALES _ MV’ (Cost=1 Card=409 B

8 2 VIEW (Cost=3 Card=100 Bytes=700)

9 8 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)

10 9 INDEX (FULL SCAN) OF ‘SYS _ IOT _ TOP _ 30185’ (UNIQUE)

Statistics

----------------------------------------------------------

0 recursive calls

16 db block gets

14 consistent gets

...

Oracle was able to make use of both hierarchies in the dimension here, and can now make use of the materialized view. Due to the dimensions we created, it performed simple lookups to convert the

CUST _ ID column into REGION (since CUST _ ID implies ZIP _ CODE implies REGION ), the MMYYYY column

617

Chapter 13 into QTR _ YYYY , and answered our question almost instantly. Here, we reduced the number of logical

I/Os from more than 700,000 to 16. When you consider that the size of the SALES table will only grow over time, and the size of the SALES _ MV will grow much more slowly (180 records or so per month), we can see that this query will scale very well.

DBMS_OLAP

The last piece to the materialized view puzzle is the DBMS _ OLAP package. This package is used for the following purposes:

❑ To estimate the size of a materialized view , in terms of the number of rows and bytes of storage.

To validate that your dimension objects are correct , given the primary/foreign key relationships you have set up.

To recommend additional materialized views, and to name views that should be dropped , based on either actual utilization and structure, or just structure alone.

❑ To evaluate the utilization of a materialized view , using procedures that are provided, which will report on the actual usefulness of your materialized views whether they are actually being used or not.

Unfortunately, the utilization routines are beyond the scope of what I can cover in one chapter. It involves the setting up of Oracle Trace and the Enterprise Manager Performance Pack, but we shall take a look at the other three.

In order to use the DBMS _ OLAP package, you must have external procedures set up, as most of the

DBMS _ OLAP code is actually stored in a C library. See Chapter 18, C-Based External Procedures, for set-up instructions if you receive an error such as the following:

ERROR at line 1:

ORA-28575: unable to open RPC connection to external procedure agent

ORA-06512: at “SYS.DBMS

_ SUMADV”, line 6

ORA-06512: at “SYS.DBMS

_ SUMMARY”, line 559

ORA-06512: at line 1

Estimating Size

The ESTIMATE _ SUMMARY _ SIZE routine will report the estimated number of rows and bytes of storage that the materialized view will consume. Since hindsight is 20/20, we can ask DBMS _ OLAP to estimate a figure and then compare that figure to what we get in reality.

In order to run this procedure, you will need to ensure that you have a PLAN _ TABLE installed in your schema. You will find the CREATE TABLE statement in the [ORACLE _ HOME]/rdbms/admin directory on your database server, in a file named utlxplan.sql

. If you execute this script, it will create the

PLAN _ TABLE for you. This table is used by the EXPLAIN PLAN facility that in turn is used by

DBMS _ OLAP to estimate the size of the materialized view. With this table in place, we can use the built-in

ESTIMATE _ SUMMARY _ SIZE routine to get an estimate of the number of rows/bytes a materialized view would need if we were to build it. I start with a DELETE STATISTICS on our SALES _ MV materialized view. DBMS _ OLAP would not normally have access to a materialized view to actually see what the sizes

618

Materialized Views are, so we have to hide it (otherwise DBMS _ OLAP will get the exact answer from the data dictionary): tkyte@TKYTE816> analyze table sales _ mv DELETE statistics;

Table analyzed. tkyte@TKYTE816> declare

2 num _ rows number;

3 num _ bytes number;

4 begin

5 dbms _ olap.estimate

_ summary _ size

6 ( ‘SALES _ MV _ ESTIMATE’,

7 ‘select customer _ hierarchy.zip

_ code,

8 time _ hierarchy.mmyyyy,

9 sum(sales.sales

_ amount) sales _ amount

10 from sales, time _ hierarchy, customer _ hierarchy

11 where sales.trans

_ date = time _ hierarchy.day

12 and sales.cust

_ id = customer _ hierarchy.cust

_ id

13 group by customer _ hierarchy.zip

_ code, time _ hierarchy.mmyyyy’,

14 num _ rows,

15 num _ bytes );

16

17 dbms _ output.put

_ line( num _ rows || ‘ rows’ );

18 dbms _ output.put

_ line( num _ bytes || ‘ bytes’ );

19 end;

20 /

409 rows

36401 bytes

PL/SQL procedure successfully completed.

The first parameter to this routine is the name of the plan to be stored in the plan table. This name is not excessively relevant, except that you will want to DELETE FROM PLAN _ TABLE WHERE STATEMENT _ ID =

‘SALES _ MV _ ESTIMATE’ after you are done. The second parameter is the query that will be used to instantiate the materialized view. DBMS _ OLAP will analyze this query using the statistics on all of the underlying tables to guess the size of this object. The remaining two parameters are the outputs from

DBMS _ OLAP – the rows and byte count estimates, with values of 409 and 36401, respectively. Now let’s calculate the true valves: tkyte@TKYTE816> analyze table sales _ mv COMPUTE statistics;

Table analyzed. tkyte@TKYTE816> select count(*) from sales _ mv;

COUNT(*)

----------

1250 tkyte@TKYTE816> select blocks * 8 * 1024

2 from user _ tables

3 where table _ name = ‘SALES _ MV’

4 /

BLOCKS*8*1024

-------------

40960

So, the ESTIMATE _ SUMMARY _ SIZE routine did very well on the size of the table, but underestimated the number of rows. This is typical of anything that ‘estimates’ – it will get some things right, and miss on others. I would use this routine for a rough ‘best guess’ as to the size of an object.

619

Chapter 13

Dimension Validation

This routine takes any given dimension, and checks that the hierarchies you have defined are valid. For example, in our example earlier, it would ensure that a CUST _ ID implied a ZIP _ CODE implied a

REGION . To see this routine at work, we’ll create a ‘bad’ example to work from. We’ll start by creating a table that has a row for every day of this year with the day, month and year as attributes: tkyte@TKYTE816> create table time _ rollup

2 ( day date,

3 mon number,

4 year number

5 )

6 /

Table created. tkyte@TKYTE816> insert into time _ rollup

2 select dt, to _ char(dt,’mm’), to _ char(dt,’yyyy’)

3 from ( select trunc(sysdate,’year’)+rownum-1 dt

4 from all _ objects where rownum < 366 )

5 /

365 rows created.

So, here we have set up a time roll-up similar to our previous example. This time however, I did not preserve the year in the month attribute, just the two digits that represent the month. If we add one more row to this table: tkyte@TKYTE816> insert into time _ rollup values

2 ( add _ months(sysdate,12),

3 to _ char(add _ months(sysdate,12),’mm’),

4 to _ char(add _ months(sysdate,12),’yyyy’) );

1 row created.

We can see the problem. We will be saying that DAY implies MONTH , and MONTH implies YEAR , but it is not true in this case. We’ll have one month that implies one of two different years. DBMS _ OLAP will do a sanity check for us to show us our error. First we’ll set up the dimension: tkyte@TKYTE816> create dimension time _ rollup _ dim

2 level day is time _ rollup.day

3 level mon is time _ rollup.mon

4 level year is time _ rollup.year

5 hierarchy time _ rollup

6 (

7 day child of mon child of year

8 )

9 /

Dimension created.

620

Materialized Views

And then validate it: tkyte@TKYTE816> exec dbms _ olap.validate

_ dimension( ‘time _ rollup _ dim’, user, false, false );

PL/SQL procedure successfully completed.

It looks like it succeeded, but we really have to check the table it creates and populates for us: tkyte@TKYTE816> select * from mview$ _ exceptions;

OWNER TABLE _ NAME DIMENSION _ NAME RELATIONSHI BAD _ ROWID

----- ------------ ---------------- ----------- ------------------

TKYTE TIME _ ROLLUP TIME _ ROLLUP _ DIM CHILD OF AAAGkxAAGAAAAcKAA7

TKYTE TIME _ ROLLUP TIME _ ROLLUP _ DIM CHILD OF AAAGkxAAGAAAAcKAA8

TKYTE TIME _ ROLLUP TIME _ ROLLUP _ DIM CHILD OF AAAGkxAAGAAAAcKAA9

...

32 rows selected.

If we look at the rows the MVIEW$ _ EXCEPTIONS point us to, we’ll find that they are the rows for the month of MARCH (I ran this in March). Specifically: tkyte@TKYTE816> select * from time _ rollup

2 where rowid in ( select bad _ rowid from mview$ _ exceptions );

DAY MON YEAR

--------- ---------- ----------

01-MAR-01 3 2001

02-MAR-01 3 2001

03-MAR-01 3 2001

04-MAR-01 3 2001

...

30-MAR-01 3 2001

31-MAR-01 3 2001

26-MAR-02 3 2002

32 rows selected.

The problem is evident at this point, MON does not imply YEAR – the dimension is invalid. It would be unsafe to use this dimension, as the wrong answer would result.

It is recommended that your dimensions be validated after they are modified to ensure the integrity of the results you receive from the materialized views that make use of them.

621

Chapter 13

Recommending Materialized Views

One of the more interesting uses of the DBMS _ OLAP package is to have it tell you what materialized views you should consider creating. The RECOMMEND routines do just that.

There are two versions of this routine:

❑ RECOMMEND _ MV looks at the structure of the table, the foreign keys that are in place, existing materialized views, statistics on everything, and then develops a list of prioritized recommendations.

❑ RECOMMEND _ MV _ W goes one step further. If you are using Oracle Trace and the Enterprise

Manager Performance Packs, it will look at the queries the system processes, and recommend materialized views based on that real-life information.

As a simple example, we’ll ask DBMS _ OLAP to take a look at our existing ‘fact’ table, SALES .

A fact table is a table in a star schema that, quite simply, contains facts. The SALES table we used above is a fact table. A fact table typically has two types of columns. There are columns that contain facts (values like the SALES_AMOUNT in our sales table), and there are columns that are foreign keys to dimension tables (list TRANS_DATE in our sales table).

Let’s see what DBMS _ OLAP has to say. Before we can do that, we’ll need to add the foreign keys. The

RECOMMEND routine won’t look at the DIMENSION to see what can be done – it needs to see the foreign keys to determine the relationships between tables: tkyte@TKYTE816> alter table sales add constraint t _ fk _ time

2 foreign key( trans _ date) references time _ hierarchy

3 /

Table altered. tkyte@TKYTE816> alter table sales add constraint t _ fk _ cust

2 foreign key( cust _ id) references customer _ hierarchy

3 /

Table altered.

Once we’ve done that, we are ready to look at our fact table, SALES : tkyte@TKYTE816> exec dbms _ olap.recommend

_ mv( ‘SALES’, 10000000000, ‘‘ );

PL/SQL procedure successfully completed.

Here we asked RECOMMEND _ MV to:

1. Look at the table SALES.

2.

Consider a large amount of space to be used for the materialized views (we just passed a really big number).

622

3.

Not feel that it needs to keep any particular materialized view (we passed ‘‘ as the list of views to KEEP ).

Materialized Views

Next, we can either query the tables it populates directly or, more conveniently, use a sample routine to print the contents. To install the sample routine and run the report you will: tkyte@TKYTE816> @C:\oracle\RDBMS\demo\sadvdemo

Package created.

Package body created.

Package created.

Package body created. tkyte@TKYTE816> exec demo _ sumadv.prettyprint

_ recommendations

Recommendation Number = 1

Recommended Action is CREATE new summary:

SELECT CUSTOMER _ HIERARCHY.CUST

_ ID, CUSTOMER _ HIERARCHY.ZIP

_ CODE,

CUSTOMER _ HIERARCHY.REGION , COUNT(*), SUM(SALES.SALES

_ AMOUNT),

COUNT(SALES.SALES

_ AMOUNT)

FROM TKYTE.SALES, TKYTE.CUSTOMER

_ HIERARCHY

WHERE SALES.CUST

_ ID = CUSTOMER _ HIERARCHY.CUST

_ ID

GROUP BY CUSTOMER _ HIERARCHY.CUST

_ ID, CUSTOMER _ HIERARCHY.ZIP

_ CODE,

CUSTOMER _ HIERARCHY.REGION

Storage in bytes is 2100

Percent performance gain is 43.2371266138567

Benefit-to-cost ratio is .0205891079113603

Recommendation Number = 2

...

PL/SQL procedure successfully completed.

DBMS _ OLAP looked at the dimensions and existing materialized views, and is now making suggestions for additional materialized views that may usefully be created, given the meta data (primary keys, foreign keys, and dimensions) we have entered into the database.

If we used Oracle Trace we could go a step further with this recommendation process. Oracle Trace is capable of capturing the actual queries asked of the system, and recording details about them. These will be used by DBMS _ OLAP to perform even more focused recommendations, recommendations based not only on what is possible, but based on the reality of the types of questions you actually ask of the data.

Materialized views that are possible but would not be used by you based on your workload are not recommended. Other materialized views that are possible, and would be used by you, would be recommended, since they would be used by queries that your system actually executes.

623

Chapter 13

Caveats

There are a few considerations to be aware of with regards to using materialized views. We will briefly cover some of them here.

Materialized Views are Not Designed for OLTP Systems

As mentioned above, materialized views typically add overhead to individual transactions and, if created with REFRESH ON COMMIT , will introduce contention. The overhead arises from the need to track the changes made by a transaction – these changes will either be maintained in the session state or in log tables. In a high-end OLTP system, this overhead is not desirable. The concurrency issue comes into play with a REFRESH ON COMMIT materialized view, due to the fact that many rows in the detail fact table point to a single row in a summary table. An update to any one of perhaps thousands of records, will need to modify a single row in the summary. This will naturally inhibit concurrency in a high update situation.

This does not preclude the use of materialized views with OLTP, in particular materialized views that are REFRESH ed ON DEMAND , with a full refresh. A full refresh does not add the overhead of tracking transaction-level changes. Rather, at some point in time, the defining query for the materialized view is executed, and the results simply replace the existing materialized view. Since this is done on demand (or on a timed basis), the refresh may be scheduled for a time when the load is light. The resulting materialized view is especially relevant for reporting purposes – your OLTP data can be transformed using SQL into something that is easy and fast to query, every night. The next day, your online reports of yesterday’s activities run as fast as possible, and easily co-exist with your OLTP system.

Query Rewrite Integrity

As we discussed above – this has three modes:

❑ ENFORCED – Will only use a materialized view if there is no chance of getting incorrect or stale data.

❑ TRUSTED – Oracle will use a materialized view even if some of the constraints it is relying on are something Oracle did not validate or enforce. This is typical in a data warehouse environment, where many constraints may be present but have not been enforced by Oracle.

❑ STALE _ TOLERATED – Oracle will use a materialized view even if it knows the data it is derived from has changed. This is typical in a reporting environment such as described in the preceding caveat.

You must understand the ramifications of using each of these modes. ENFORCED will give you the right answer every time, at the expense of not using some materialized views that would speed up the query response time. TRUSTED , if what Oracle has been asked to ‘trust’ turns out to be false, may give a result that would not be achieved if the original source data had been queried instead. We saw an example of this early on with the EMP _ DEPT materialized view. STALE _ TOLERATED should be used in reporting systems where getting a value that existed a while ago is acceptable. If up-to-the-minute information is mandatory, STALE _ TOLERATED should not be used.

624

Materialized Views

Summary

Materialized views are a powerful data warehouse/decision support system feature. A single materialized view may be used by many different, but related, queries. Best of all, it is 100 percent transparent to the application, and the end user. You do not have to teach people what summaries are available – you inform Oracle what is possible via constraints, referential integrity, and dimensions. It does the rest for you.

Materialized views are the natural evolution, and merging, of features found in the database and decision support tools. No more are the summary table management features of Oracle’s Discoverer (and other like tools) limited to these environments. Now every client, from the lowly SQL*PLUS to your custom developed applications, to off-the-shelf reporting tools can take advantage of the fact that the answer has already been stored for them.

Add to all this the DBMS _ OLAP tool. It will not only estimate how much additional storage you need to support a materialized view, but it can watch how your existing views are used. Based on this, it will recommend dropping some and creating others – to the point of even supplying you with the query it feels you should use in the materialized view.

All in all, materialized views in a read-only/read-intensive environment will definitely pay you back for the additional storage through reduced query response time and reduced resources needed to actually process the queries.

625

Download