The Basic “How to” of Indexing

advertisement
All About Indexes
All About Indexes in Oracle
What is an Index?
An index is used to increase read access performance. A book, having an index, allows rapid
access to a particular subject area within that book. Indexing a database table provides rapid
location of specific rows within that table, where indexes are used to optimize the speed of
access to rows. When indexes are not used or are not matched by SQL statements submitted to
that database then a full table scan is executed. A full table scan will read all the data in a table to
find a specific row or set of rows, this is extremely inefficient when there are many rows in the
table.
 It is often more efficient to full table scan small tables. The optimizer will often assess
full table scan on small tables as being more efficient than reading both index and data
space, particularly where a range scan rather than an exact match would be used
against the index.
An index of columns on a table contains a one-to-one ratio of rows between index and indexed
table, excluding binary key groupings, more on this later. An index is effectively a separate table
to that of the data table. Tables and indexes are often referred to as data and index spaces. An
index contains the indexed columns plus a ROWID value for each of those column combination
rows. When an index is searched through the indexed columns rather than all the data in the row
of a table is scanned. The index space ROWID is then used to access the table row directly in the
data space. An index row is generally much smaller than a table row, thus more index rows are
stored in the same physical space, a block. As a result less of the database is accessed when
using indexes as opposed to tables to search for data. This is the reason why indexes enhance
performance.
The Basic “How to” of Indexing
There are a number of important factors with respect to efficient and effective creation and use of
indexing.







The number of indexes per table.
The number of table columns to be indexed.
What datatypes are sensible for use in columns to be indexed?
Types of indexes from numerous forms of indexes available.
How does SQL behave with indexes?
What should be indexed?
What should not be indexed?
Number of Indexes per Table
Whenever a table is inserted into, updated or deleted from, all indexes plus the table must be
updated. Thus if one places ten indexes onto a single table then every change to that table
requires an effective change to a single table and ten indexes. The result is that performance will
be substantially degraded since one insert requires eleven inserts to insert the new row into both
data and index spaces. Be frugal with indexing and be conscious of the potential ill as well as the
good effects produced by indexing. The general rule is that the more dynamic a table is the fewer
indexes it should have.
 A dynamic table is a table changes constantly, such as a transactions table. Catalog
tables on the other hand store information such as customer details; customers
1 of 10
info@oracletroubleshooter.com
All About Indexes
change a lot less often than invoices. Customer details are thus static in nature and
over-indexing may be advantageous to performance.
Number of Columns to Index
Composite indexes are indexes made up of multiple columns. Minimize on the number of
columns in a composite key. Create indexes with single columns. Composite indexes are often a
requirement of traditional relational database table structures.
 With the advent of object-oriented application programming languages such as Java,
sequence identifiers tend to be used to identify every row in every table uniquely. The
result is single column indexes for every table. The only exceptions are generally
many-to-many join resolution entities.
It may sometimes be better to exclude some of the lower-level or less relevant columns from the
index since at that level there may not be much data, if there are not many rows to index it can be
more efficient to read a group of rows from the data space. For instance, a composite index
comprised of five columns could be reduced to the first three columns based on a limited number
of rows traversed as a result of ignoring the last two columns. Look at your data carefully when
constructing indexes. The more columns you add to a composite index the slower the search will
be since there is a more complex requirement for that search and the indexes get physically
larger. The benefit of indexes is that an index occupies less physical space than the data. If the
index gets so large that it is as large as the data then it will become less efficient to read both the
index and data spaces rather than just the data space.
 Most database experts recommend a maximum of three columns for composite keys.
Datatypes of Index Columns
Integers make the most efficient indexes. Try to always create indexes on columns with fixed
length values. Avoid using VARCHAR2 and any object data types. Use integers if possible or
fixed length, short strings. Also try to avaoid indexing on dates and floating-point values. If using
dates be sure to use the internal representation or just the date, not the date and the time. Use
integer generating sequences wherever possible to create consistently sequential values.
Types of Indexes
There are different types of indexes available in different databases. These different indexes are
applicable under specific circumstances, generally for specific search patterns, for instance exact
matches or range matches.
The simplest form of indexing is no index at all, a heap structure. A heap structure is effectively a
collection of data units, rows, which is completely unordered. The most commonly used indexed
structure is a Btree (Binary Tree). A Btree index is best used for exact matches and range
searches. Other methods of indexing exist.
1. Hashing algorithms produce a pre-calculated best guess on general row location and are
best used for exact matches.
2. ISAM or Indexed Sequential Access Method indexes are not used in Oracle.
3. Bitmaps contain maps of zero’s and 1’s and can be highly efficient access methods for readonly data.
4. There are other types of indexing which involve clustering of data with indexes.
In general every index type other than a Btree involves overflow. When an index is required to
overflow it means that the index itself cannot be changed when rows are added, changed or
2 of 10
info@oracletroubleshooter.com
All About Indexes
removed. The result is inefficiency because a search to find overflowing data involves a search
through originally indexed rows plus overflowing rows. Overflow index space is normally not
ordered. A Btree index can be altered by changes to data. The only exception to a Btree index
coping with data changes in Oracle is deletion of rows. When rows are deleted from a table,
physical space previously used by the index for the deleted row is never reclaimed unless the
index is rebuilt. Rebuilding of Btree indexes is far less common than that for other types of
indexes since non-Btree indexes simply overflow when row changes are applied to them.
Oracle uses has the following types of indexing available.
 Btree index. A Btree is a binary tree. General all-round index and common in OLTP systems.
An Oracle Btree index has three layers, the first two are branch node layers and the third, the
lowest, contains leaf nodes. The branch nodes contain pointers to the lower level branch or
leaf node. Leaf nodes contain index column values plus a ROWID pointer to the table row.
The branch and leaf nodes are optimally arranged in the tree such that each branch will
contain an equal number of branch or leaf nodes.
 Bitmap index. Bitmap containing binary representations for each row. A zero implies that a
row does not have a specified value and a 1 denotes that row having that value. Bitmaps are
very susceptible to overflow in OLTP systems and should only be used for read-only data
such as in Data Warehouses.
 Function-Based index. Contains the result of an expression pre-calculated on each row in a
table.
 Index Organized Tables. Clusters index and data spaces together physically for a single
table and orders the merged physical space in the order of the index, usually the primary key.
An index organized table is a table as well as an index, the two are merged.
 Clusters. Partial merge of index and data spaces, ordered by an index, not necessarily the
primary key. A cluster is similar to an index organized table except that it can be built on a
join (more than a single table). Clusters can be ordered using binary tree structures or
hashing algorithms. A cluster could also be viewed as a table as well as an index since
clustering partially merges index and data spaces.
 Bitmap Join index. Creates a single bitmap for one table in a join.
 Domain index. Specific to certain application types using contextual or spatial data, amongst
others.
Indexing Attributes
Various types of indexes can have specific attributes or behaviors applied to them. These
behaviors are listed below, some are Oracle specific and some are not.
 Ascending or Descending. Indexes can be order in either way.
 Uniqueness. Indexes can be unique or non-unique. Primary keys must be unique since a
primary key uniquely identifies a row in a table referentially. Other columns such as names
sometimes have unique constraints or indexes, or both, added to them.
 Composites. A composite index is an index made up of more than one column in a table.
 Compression. Applies to Btree indexes where duplicated prefix values are removed.
Compression speeds up data retrieval but can slow down table changes.
 Reverse keys. Bytes for all columns in the index are reversed, retaining the order of the
columns. Reverse keys can help performance in clustered server environments (Oracle8i
Parallel Server / RAC Oracle9i) by ensuring that changes to similar key values will be better
physically spread. Reverse key indexing can apply to rows inserted into OLTP tables using
sequence integer generators, where each number is very close to the previous number.
When searching for and updating rows with sequence identifiers, where rows are searched
for
 Null values. Null values are generally not included in indexes.
3 of 10
info@oracletroubleshooter.com
All About Indexes
 Sorting (NOSORT). This option is Oracle specific and does not sort an index. This assumes
that data space is physically ordered in the desired manner.
What SQL does with Indexes
In general a SQL statement will attempt to match the structure of itself to an index, the where
clause ordering will attempt to match available indexes and use them if possible. If no index is
matched then a full table scan will be executed. A table scan is extremely inefficient for anything
but the smallest of tables. Obviously if a table is read sequentially, in physical order then an index
is not required. A table does not always need an index.
What to Index
Use indexes where frequent queries are performed with where and order by clause matching the
ordering of columns in those indexes. Use indexing generally on larger tables or multi-table,
complex joins. Indexes are best created in the situations listed below.




Columns used in joins.
Columns used in where clauses.
Columns used in order by clauses.
In most relational databases the order-by clause is generally executed on the subset
retrieved by the where clause, not the entire data space. This is not always unfortunately the
case for Oracle.
 Traditionally the order-by clause should
never include the columns contained in the
where cause. The only case where the order-by clause will include columns contained
in the where clause is the case of the where clause not matching any index in the
database or a requirement for the order by clause to override the sort order of the
where, typically in highly complex, multi-table joins.
 The group-by clause can be enhanced by indexing when the range of values being grouped
is small in relation to the number of rows in the table selected.
What not to Index
Indexes will degrade performance of inserts, updates and deletes, sometimes substantially.





Tables with a small number of rows.
Static tables.
Columns with a wide range of values.
Tables changed frequently and with a low amount of data retrieval.
Columns not used in data access query select statements.
Tuning Oracle SQL Code and Using Indexes
What is SQL Tuning?
Tune SQL based on the nature of your application, OLTP or read-only Data Warehouse. OLTP
applications have high volumes of concurrent transactions and are better served with exact match
SQL where many transactions compete for small amounts of data. Read-only Data Warehouses
require rapid access to large amounts of information at once and thus many records are
accessed at once, either by many or a small number of sessions.
The EXPLAIN PLAN command can be used to compare different versions of SQL statements,
and tune your application SQL code as required. When tuning OLTP applications utilize sharing
of SQL code in PL/SQL procedures and do not use triggers unless absolutely necessary. Triggers
4 of 10
info@oracletroubleshooter.com
All About Indexes
can cause problems such as self-mutating transactions where a table can expect a lock on a row
already locked by the same transaction. This is because triggers do not allow transaction
termination commands such as COMMIT and ROLLBACK. In short, do not use triggers unless
absolutely necessary.
The best approach to tuning of SQL statements is to seek out those statements consuming the
greatest amount of resources (CPU, memory and I/O). The more often a SQL statement is
executed the more finely it should be tuned. Additionally SQL statements executed many times
more often than other SQL statements can cause issues with locking. SQL code using bind
variables will execute much faster than those not. Constant re-parsing of similar SQL code can
over stress CPU time resources.
Tuning is not necessarily a never-ending process but can be iterative. It is always best to take
small steps and then assess improvements. Small changes are always more manageable and
more easier to implement. Use the Oracle performance views plus tools such as TKPROF,
tracing, Oracle Enterprise Manager, Spotlight, automated scripts and other tuning tools or
packages which aid in monitoring and Oracle performance tuning. Detection of bottlenecks and
SQL statements causing problem is as important as resolving those issues. In general tuning falls
into three categories as listed below, in order of importance and performance impact.
1. Data model tuning.
2. SQL statement tuning.
3. Physical hardware and Oracle database configuration.
 Physical
hardware and Oracle database configuration installation will, other than
bottleneck resolution, generally only affect performance by between 10% and 20%.
Most performance issues occur from poorly developed SQL code, with little attention
to SQL tuning during development, probably causing around 80% of general system
performance problems. Poor data model design can cause even more serious
performance problems than SQL code but it is rare because data models are usually
built more carefully than SQL code. It is a common problem that SQL code tuning is
often left to DBA personnel. DBA people are often trained as Unix Administrators, SQL
tuning is conceptually a programming skill; programming skills of Unix Administrators
are generally very low-level, if present at all, and very different in skills requirements
to that of SQL code.
How to Tune SQL
Indexing
When building and restructuring of indexing never be afraid of removing unused indexes. The
DBA should always be aware of where indexes are used and how.
 Oracle9i can automatically monitor index usage using the ALTER INDEX index name
[NO]MONITORING USAGE; command with subsequent selection of the USED column
from the V$OBJECT_USAGE column.
Taking an already constructed application makes alterations of any kind much more complex.
Pay most attention to indexes most often utilized. Some small static tables may not require
indexes at all. Small static lookup type tables can be cached but will probably be force tablescanned by the optimizer anyway; table-scans may be adversely affected by the addition of
unused superfluous indexes. Sometimes table-scans are faster than anything else. Consider the
use of clustering, hashing, bitmaps and even index organized tables, only in Data Warehouses.
Many installations use bitmaps in OLTP databases, this often a big mistake! If you have bitmap
indexes in your OLTP database and are having performance problems, get rid of them!
5 of 10
info@oracletroubleshooter.com
All About Indexes
Oracle recommends the profligate use of function-based indexes, assuming of course there will
not be too many of them. Do not allow too many programmers to create their indexes, especially
not function-based indexes, because you could end-up with thousands of indexes. Application
developers tend to be unaware of what other developers are doing and create indexes specific to
a particular requirement where indexes may be used in only one place. Some DBA control and
approval process must be maintained on the creation of new indexes. Remember, every table
change requires a simultaneous update to all indexes created based on that table.
SQL Statement Reorganisation
SQL statement reorganization encompasses factors as listed below, amongst others.




WHERE clause filtering and joining orders matching indexes.
Use of hints is not necessarily a good idea. The optimizer is probably smarter than you are.
Simplistic SQL statements and minimizing on table numbers in joins.
Use bind variables to minimize on re-parsing. Buying lots of expensive RAM and sizing your
shared pool and database buffer cache to very large values may make performance worse.
Firstly, buffer cache reads are not as fast as you might think. Secondly, a large SQL parsing
shared pool, when not using bind variables in SQL code, will simply fill up and take longer for
every subsequent SQL statement to search.
 Oracle9i has adopted various SQL ANSI standards. The ANSI join syntax standard could
cause SQL code performance problems. The most effective tuning approach to tuning Oracle
SQL code is to remove rows from joins using where clause filtering prior to joining multiple
tables, obviously the larger tables, requiring the fewest rows should be filtered first. ANSI join
syntax applies joins prior to where clause filtering; this could cause major performance
problems.
Nested subquery SQL statements can be effective under certain circumstances. However,
nesting of SQL statements increases the level of coding complexity and if sometimes looping
cursors can be utilized in PL/SQL procedures, assuming the required SQL is not completely adhoc.
 Avoid
ad-hoc SQL if possible. Any functionality, not necessarily business logic, is
always better provided at the application level. Business logic, in the form of
referential integrity, is usually best catered for in Oracle using primary and foreign key
constraints and explicitly created indexes.
Nested subquery SQL statements can become over complicated and impossible for even the
most brilliant coder to tune to peak efficiency. The reason for this complexity could lie in an overNormalized underlying data model. In general use of subqueries is a very effective approach to
SQL code performance tuning. However, the need to utilize intensive, multi-layered subquery
SQL code is often a symptom of a poor data model due to requirements for highly complex SQL
statement joins.
Some Oracle Tricks
Use [NOT] EXISTS Instead of [NOT] IN
In the example below the second SQL statement utilizes an index in the subquery because of the
use of EXISTS in the second query as opposed to IN. IN will build a set first and EXISTS will not.
IN will not utilize indexes whereas EXISTS will.
SELECT course_code, name FROM student
WHERE course_code NOT IN
6 of 10
info@oracletroubleshooter.com
All About Indexes
(SELECT course_code FROM maths_dept);
SELECT course_code, name FROM student
WHERE NOT EXISTS
(SELECT course_code FROM maths_dept
WHERE maths_dept.course_code = student.course_code);
In the example below the nesting of the two queries could be reversed depending on which table
has more rows. Also if the index is not used or not available, reversal of the subquery is required
if tableB has significantly more rows than tableA.
DELETE FROM tableA WHERE NOT EXISTS
(SELECT columnB FROM tableB WHERE tableB.columnB = tableA.columnA);
Use of value lists with the IN clause could indicate a missing entity. Also that missing entity is
probably static in nature and can potentially be cached, although caching causing increased data
buffer size requirements is not necessarily a sensible solution.
SELECT country FROM countries WHERE continent IN ('africa','europe','north america');
Equijoins and Column Value Transformations
AND and = predicates are the most efficient. Avoid transforming of column values in any form,
anywhere in a SQL statement, for instance as shown below.
SELECT * FROM <table name> WHERE TO_NUMBER(BOX_NUMBER) = 94066;
And the example below is really bad! Typically indexes should not be placed on descriptive fields
such as names. A function-based index would be perfect in this case but would probably be
unnecessary if the data model and the data values were better organized.
SELECT * FROM table1, table2
WHERE UPPER(SUBSTR(table1.name,1,1)) = UPPER(SUBSTR(table2.name,1,1));
Transforming literal values is not such a problem but application of a function to a column within a
table in a where clause will use a table-scan regardless of the presence of indexes. When using a
function-based index the index is the result of the function which the optimizer will recognize and
utilize for subsequent SQL statements.
Datatypes
Try not to use mixed datatypes by setting columns to appropriate types in the first place. If mixing
of datatypes is essential do not assume implicit type conversion because it will not always work,
and implicit type conversions can cause indexes not to be used. Function-based indexes can be
used to get around type conversion problems but this is not the most appropriate use of functionbased indexes. If types must be mixed, try to place type conversion onto explicit values and not
columns. For instance, as shown below.
WHERE zip = TO_NUMBER('94066') as opposed to WHERE TO_CHAR(zip) = '94066'
The DECODE Function
The DECODE function will ignore indexes completely. DECODE is very useful in certain
circumstances where nested looping cursors can become extremely complex. DECODE is
intended for specific requirements and is not intended to be used prolifically, especially not with
respect to type conversions. Most SQL statements containing DECODE function usage can be
altered to use explicit literal selection criteria perhaps using separate SELECT statements
combined with UNION clauses. Also Oracle9i contains a CASE statement which is much more
versatile than DECODE and may be much more efficient.
7 of 10
info@oracletroubleshooter.com
All About Indexes
Join Orders
Always use indexes where possible, this applies to all tables accessed in a join, both those in the
driving and nested subqueries. Use indexes between parent and child nested subqueries in order
to utilize indexes across a join. A common error is that of accessing a single row from the driving
table using an index and then to access all rows from a nested subquery table where an index
can be used in the nested subquery table based on the row retrieved by the driving table.
Put where clause filtering before joins, especially for large tables where only a few rows are
required.
Try to use indexes fetching the minimum number of rows. The order in which tables are accessed
in a query is very important. Generally a SQL statement is parsed from top-to-bottom and from
left-to-right. The further into the join or SQL statement, then the fewer rows should be accessed.
Even consider constructing a SQL statement based on the largest table being the driving table
even if that largest table is not the logical driver of the SQL statement. When a join is executed,
each join will overlay the result of the previous part of the join, effectively each section (based on
each table) is executed sequentially. In the example below table1 has the most rows and table3
has the fewest rows.
SELECT * FROM table1, table2, table3
WHERE table1.index = table2.index AND table1.index = table3.index;
Hints
Use them? Perhaps. When circumstances force their use. Generally the optimizer will succeed
where you will not. Hints allow, amongst many other things, forcing of index usage rather than full
table-scans. The optimizer will generally find full scans faster with small tables and index usage
faster with large tables. Therefore if row numbers and the ratios of row between tables are known
then using hints will probably make performance worse. One specific situation where hints could
help are generic applications where rows in specific tables can change drastically depending on
the installation. However, once again the optimizer may still be more capable than any
programmer or DBA.
Use INSERT, UPDATE and DELETE ... RETURNING
When values are produced and contained in insert or update statements, such as new sequence
numbers or expression results, and those values are required in the same transaction by
following SQL statements, the values can be returned into variables and used later without
recalculation of expression being required. This tactic would be used in PL/SQL and anonymous
procedures. Examples are shown below.
INSERT INTO table1 VALUES (test_id.nextval, 'Jim Smith', '100.12', 5*10)
RETURNING col1, col4 * 2 INTO val1, val2;
UPDATE table1 SET name = 'Joe Soap'
WHERE col1 = :val1 AND col2 = :val2;
DELETE FROM table1 RETURN value INTO :array;
Triggers
Simplification or complete disabling and removal of triggers is advisable. Triggers are very slow
and can cause many problems, both performance related and can even cause serious locking
and database errors. Triggers were originally intended for messaging and are not intended for
use as rules triggered as a result of a particular event. Other databases have full-fledged rule
systems aiding in the construction of Rule-Based Expert systems. Oracle triggers are more like
database events than event triggered rules causing other potentially recursive events to occur.
8 of 10
info@oracletroubleshooter.com
All About Indexes
Never use triggers to validate referential integrity. Try not to use triggers at all. If you do use
triggers and have performance problems, their removal and recoding into stored procedures,
constraints or application code could solve a lot of your problems.
Data Model Restructuring
Data restructuring involves partitioning, normalization and even denormalization. Oracle
recommends avoiding the use of primary and foreign keys for validation of referential integrity,
and suggests validating referential integrity in application code. Application code is more prone to
error since it changes much faster. Avoiding constraint-based referential is not necessarily the
best solution.
Referential integrity can be centrally controlled and altered in a single place in the database.
Placing referential integrity in application code is less efficient due to increased network traffic
and requires more code to be maintained in potentially many applications. All foreign keys must
have indexes explicitly created and these indexes will often be used in general application SQL
calls, other than just for validation of referential integrity. Oracle does not create internal indexes
when creating foreign reference keys. In fact Oracle recommends that indexes should be
explicitly created for all primary, foreign and unique hey constraints. Foreign keys not indexed
using the CREATE INDEX statement can cause table locks on the table containing the foreign
key. It is highly likely that foreign keys will often be used by the optimizer in SQL statement
filtering where clauses, if the data model and the application are consistent with each other
structurally, which should be the case.
Views
Do not use views as a basis for SQL statements taking a portion of the rows defined by that view.
Views were originally intended for security and access privileges. No matter what where clause is
applied to a view the entire view will always be executed first. On the same basis also avoid
things such as SELECT * FROM …, GROUP BY clauses and aggregations such as DISTINCT.
DISTINCT will always select all rows first. Do not create new entities using joined views, it is
better to create those intersection view joins as entities themselves; this applies particularly in the
case of many-to-many relationships. Also Data Warehouses can benefit from materialized views
which are views actually containing data, refreshed by the operator at a chosen juncture.
Maintenance of Current Statistics and Cost Based Optimization
Maintain current statistics as often as possible, this can be automated. Cost-based optimization,
using statistics, is much more efficient than rule-based optimization.
Regeneration and Coalescing of Indexes
Indexes subjected to constant DML update activity can become skewed and thus become less
efficient over a period of time. Use of Oracle Btree indexes implies that when a value is searched
for within the tree, a series of comparisons are made in order to depth-first traverse down through
the tree until the appropriate value is found.
Oracle Btree indexes are usually only three levels, requiring three hits on the index to find a
resulting ROWID pointer to a table row. Index searches, even into very large tables, especially
unique index hits, not index range scans, can be incredibly fast.
In some circumstances constant updating of a binary tree can cause the tree to become more
heavily loaded in some parts or skewed. Thus some parts of the tree require more intensive
searching which can be largely fruitless. Indexes should sometimes be rebuilt where the binary
tree is regenerated from scratch, this can be done online in Oracle9i, as shown below.
ALTER INDEX index name REBUILD ONLINE;
9 of 10
info@oracletroubleshooter.com
All About Indexes
Coalescing of indexes is a more physical form of maintenance where physical space chunks
which are fragmented. Index fragmentation is usually a result of massive deletions from table, at
once or over time. Oracle Btree indexes do not reclaim physical space as a result of row
deletions. This can cause serious performance problems as a result of fruitless searches and
very large index files where much of the index space is irrelevant. The command shown below
will do a lot less than rebuilding but it can help. If PCTINCREASE is not set to zero for the index
then extents could vary in size greatly and not be reusable. In that the only option is rebuilding.
ALTER INDEX index name REBUILD COALESCE NOLOGGING;
Disclaimer Notice: This information is available “AS IS”. I am in no way responsible or liable for any mishaps as a result of
using this information.
10 of 10
info@oracletroubleshooter.com
Download