Introduction to the Query Optimizer and Database Engine for DB2

IBM eServer iSeries
Session:
Intro to Query Optimization
DB2 UDB for iSeries
Tom McKinley
IBM Rochester, MN USA
8 Copyright IBM Corporation, 2005. All Rights Reserved.
This publication may refer to products that are not currently
available in your country. IBM makes no commitment to make
available any products referred to herein.
Background / Foundation
IBM eServer iSeries
IBM's DB2 UDB Family
Three code bases...
– Based on the system history, architecture and operating system
ƒ DB2 UDB for Linux, UNIX, Windows (LUW)
ƒ DB2 UDB for z/OS (S/390)
ƒ DB2 UDB for iSeries (AS/400)
© 2005 IBM Corporation
IBM eServer iSeries
DB2 UDB for iSeries
i5 + i5/OS
–System viewed as a database server, not just an application system
–DB2 UDB for iSeries (integrated part of OS/400 or i5/OS)
–Universal Database support
–Data Centric focus
–Business logic moving into the database engine
–SQL (DDL and DML) as primary interface to database
–GUI to operating system and database via iSeries Navigator
© 2005 IBM Corporation
IBM eServer iSeries
iSeries - Logical Partitioning (LPAR)
LPAR-1
LPAR-2
LPAR-3
IXS/IXA
i5/OS
Linux
AIX
Windows***
DB2
UDB
for
Linux
DB2
UDB
for
AIX
DB2
UDB
for
iSeries
DB2
UDB
for
Win
Virtual 1Gbit Ethernet LAN
*** No LPAR support
© 2005 IBM Corporation
IBM eServer iSeries
iSeries i5 i5/OS Architecture
QUERY
M
E
M
O
R
Y
Multiple CPUs
N-way
SMP
Single Level Storage
Single
System
64 bit
POWER
Storage Management
IOP
IOP
IOP
IOP
IOP
Table
© 2005 IBM Corporation
IOP
IOP
IBM eServer iSeries
i5/OS Objects
SQL
i5/OS
schema/collection
library
table
physical file
view
logical file
index
keyed logical file
row
record
column
field
log
journal
© 2005 IBM Corporation
IBM eServer iSeries
i5/OS Objects
SELECT...
FROM Physical File
Library (Schema)
CREATE ALIAS...
Physical File (Table)
Member 1
Member 2
Member 3
Alias_1
SELECT...
FROM Alias_1
Alias_2
SELECT...
FROM Alias_2
Alias_3
SELECT...
FROM Alias_3
© 2005 IBM Corporation
IBM eServer iSeries
i5/OS Objects
ƒ
System
ƒ Library
ƒ
Object
ƒ Type
ƒ Attribute (subtype)
System
ƒ My_Schema
ƒ DB_Table
ƒ *FILE
ƒ PF (physical file)
ƒ System
ƒ My_Schema
ƒ DB_Index
ƒ *FILE
ƒ LF (logical file)
ƒ System
ƒ My_Schema
ƒ DB_View
ƒ *FILE
ƒ LF (logical file)
Must be unique
ƒ
CREATE TABLE My_Schema.DB_Table ...
CREATE INDEX My_Schema.DB_Index ...
CREATE VIEW My_Schema.DB_View ...
© 2005 IBM Corporation
IBM eServer iSeries
i5/OS Objects
One Database Management System with multiple interfaces
Command Language (CL)
Structured Query Language (SQL)
Embedded
ODBC
JDBC
CLI
CREATE TABLE
CRTPF
DB2
High
Level
Language
Native I/O
DB File (PF) object
© 2005 IBM Corporation
SELECT...
FROM...
IBM eServer iSeries
SQL Query Processing
SQL request
Optimize
Open
DB2 UDB for iSeries
© 2005 IBM Corporation
Run
Query Optimization
IBM eServer iSeries
V5R1 Database Architecture
ODBC / JDBC / ADO / DRDA / XDA
Network
Host Server
Static
Compiled
embedded
statements
Native
(Record
I/O)
CLI / JDBC
Dynamic
Extended
Dynamic
Prepare
every
time
Prepare
once and
then
reference
SQL
Optimizer
DB2 UDB
(Data Storage & Management)
© 2005 IBM Corporation
The optimizer and
database engine are
separated at different
layers of the
operating system
IBM eServer iSeries
V5R2 and V5R3 Database Architecture
ODBC / JDBC / ADO .NET / DRDA / XDA
Network
Host Server
Static
Compiled
embedded
statements
Native
(Record
I/O)
CLI / JDBC
Dynamic
Extended
Dynamic
Prepare
every
time
Prepare
once and
then
reference
SQL
Optimizer
DB2 UDB
(Data Storage & Management)
© 2005 IBM Corporation
The optimizer and
database engine
merged to form the SQL
Query Engine, and
much of the work was
moved to SLIC
IBM eServer iSeries
V5R2 and V5R3 Database Architecture
© 2005 IBM Corporation
IBM eServer iSeries
The Query Dispatcher
Determines which engine will optimize and process each query
request
–Only SQL requests are considered for the SQL Query Engine
Initial step for all query optimization that occurs in i5/OS
Ability to “back up” and use the Classic Query Engine when nonstandard indexes are encountered during optimization
Initial goal is to use SQE
© 2005 IBM Corporation
IBM eServer iSeries
The Query Dispatcher – V5R2
Dispatched to CQE if:
–>1 Table (i.e. no joins)
SQE support added into V5R2 - May 2003
–OR & IN predicates
(Latest DB Group + SI07650)
–SMP requested
–Non-Read (INSERT with subselect can use new path)
–LIKE predicates
Not part of
–UNIONS
any
–View or Logical File references
package
–Subquery
–Derived Tables & Common Table expressions, UDTFs
–LOB columns
–LOWER, TRANSLATE, or UPPER scalar function
–CHARACTER_LENGTH, POSITION, or SUBSTRING scalar function using UTF-8/16
–Sort Sequences & CCSID translation between columns
–Distributed queries via DB2 Multisystem
–Non-SQL queries (QQQQry API, Query/400, OPNQRYF)
–ALWCPYDTA(*NO) specified
–Sensitive Cursor
© 2005 IBM Corporation
IBM eServer iSeries
The Query Dispatcher - V5R3
Dispatched to CQE if:
–LIKE predicates
–Logical File references
–UDTFs
–LOB columns
–LOWER, TRANSLATE, or UPPER scalar function
–CHARACTER_LENGTH, POSITION, or SUBSTRING scalar function using UTF-8/16
–Sort Sequences & CCSID translation between columns
–DB2 Multisystem
–Non-SQL queries (QQQQry API, Query/400, OPNQRYF)
–ALWCPYDTA(*NO) specified
–Sensitive Cursor
SQE now optimizes
Only SQE optimizes
–VIEWS, UNIONS, SubQueries
–INSERT, UPDATE, DELETE
–Star Schema Join queries
–INTERSECT
–EXCEPT
© 2005 IBM Corporation
IBM eServer iSeries
The Query Dispatcher
Back up to CQE to complete optimization if any of the following are
encountered:
–Select/omit logical file
–Logical file over multiple members
–Join logical file
–Derived key (s)
Native logical files that perform some intermediate mapping of the fields referenced in the
key. Common ones are renaming fields, adding a translate or only selecting a subset of
the columns
ƒ Specifying an Alternate Collating Sequence (ACS) on a field used for a key will also make
a “derived key” (an implied map occurs within the index)
ƒ
–Sort Sequence (NLSS) specified for index or logical file
ƒ
Probably the trickiest one to detect for users. The index is built while an NLSS table is
specified in the query environment
–Cost to “back up” and revert to CQE adds about 15% to the total optimization
time
–QAQQINI parameter to ignore unsupported logical files
ƒ
Ignore_Derived_Index = *YES
© 2005 IBM Corporation
IBM eServer iSeries
Optimization
The Optimizer
Writes the best? program to fulfill your request
The Optimizer
Provides the recipe
Provides the methods
Does no cooking
© 2005 IBM Corporation
IBM eServer iSeries
Optimization... the intersection of various factors
Server attributes
Server configuration
Version/Release/Modification
Level
Server performance
SMP
Job, Query attributes
The Plan
Table sizes, number of rows
SQL Request
Static
Dynamic
Extended Dynamic
Interfaces
Database design
Views and Indexes (Radix, EVI)
Work
management
© 2005 IBM Corporation
IBM eServer iSeries
(Query) Access Plans
The output of query optimization
(“the recipe and methods”)
Contents
A control structure that contains information on the actions
necessary to satisfy each SQL request
These contents include:
–Access Method
–Info on associated tables and indexes
–Any applicable program and/or environment information
© 2005 IBM Corporation
IBM eServer iSeries
Query Optimization
Cost Based Query Optimization
The DB2 for iSeries Optimizer performs "cost based" optimization
"Cost" is defined as the estimated time it takes to run the request
"Costing" various plans refers to the comparison of a given set of
algorithms and methods in an attempt to identify the "fastest" plan
Optimization is based on time, not on resource utilization
Usually the fastest plan is also the most resource efficient plan, but this is
not necessarily true
The goal of the optimizer is to eliminate I/O as early as possible by
identifying the best path to and through the data
The optimizer has the ability and freedom to "rewrite" the query
© 2005 IBM Corporation
IBM eServer iSeries
Query Phases
Query processing can be divided into four phases:
Query Validation
–Validate the query request
–Validate existing access plan
–Builds internal query structures
Query Dispatcher
–Determine which query engine should complete the processing
Query Optimization
We can affect this...
–Choose most efficient access method
–Builds access plan
Query Execution
–Build the structures needed for query cursor
–Build the structures for any temporary indexes (if needed)
–Builds and activates query cursor (ODP)
–Generate any feedback requested
Debug messages in the job log
DB Monitor records
Visual Explain
© 2005 IBM Corporation
IBM eServer iSeries
Query Optimization Feedback
SQE Plan
Cache
DB Monitor
Data
SQL request
Joblog
Messages
Query Optimization
SQL Info from
PGMs & PKGs
© 2005 IBM Corporation
Visual
Explain
IBM eServer iSeries
Data Access Methods
Cost based optimization dictates that the fastest access method
for a given table will vary based upon selectivity of the query
High
Response
Time
Method 3
Method 2
Method 1
Low
Few
Many
Number of rows searched / accessed
© 2005 IBM Corporation
IBM eServer iSeries
Strategy for Query Optimization
Query optimization will generally follow this simplified strategy:
Gather meta-data and statistics for costing
Selectivity statistics
Indexes available to be costed
Sort the indexes based upon their usefulness
Environmental attributes that may affect the costs
Generate default cost
Build an access plan associated with the default plan
For each index:
Gather information needed specific to this index
Build an access plan based on this index
Cost the use of the index with this access plan
Compare the resulting cost against the cost from the current best plan
© 2005 IBM Corporation
IBM eServer iSeries
Strategy for Query Optimization
Optimizing indexes will generally follow this simplified strategy:
Gather list of indexes for statistics and costing
Sort the list of indexes considering how the index can be used
Local selection
Joining
Grouping
Ordering
Index only access
One index may be useful for statistics, and another useful for implementation
© 2005 IBM Corporation
IBM eServer iSeries
Statistics
All query optimizers rely upon statistics to make plan decisions
–DB2 UDB for the iSeries has always relied upon indexes as its source for stats
–Other databases rely upon manual stats collection for their source
SQE offers a hybrid approach where column stats will be
automatically collected for cases where indexes do not already exist
© 2005 IBM Corporation
IBM eServer iSeries
Sources of Information
Meta-data sources
–Existing indexes (Radix or Encoded Vector)
Best
More accurately describes multi-column key values
ƒ Stats available immediately as the index maintenance occurs
ƒ Selectivity estimates from radix by reading n keys
ƒ Selectivity from EVI by reading symbol table values
ƒ
–Column Statistics
SQE only
ƒ Column Cardinality, Histograms & Frequent Values List
ƒ Constructed over a single column in a table
ƒ Stored internally as a part of the table object after created
ƒ Collected automatically by default for the system
ƒ Stats not immediately maintained as the table changes
ƒ Stats are refreshed as they become “stale” over time
ƒ
Default sources
Worst
–No representation of actual values in columns
© 2005 IBM Corporation
IBM eServer iSeries
SQE Automatic Stats Collection
i5/OS Statistics collection job
–Reactive, based on query requests
–Automatic collection runs in this background job at very low priority
ƒ
QDBFSTCCOL system job
–Statistics Manager continuously analyzes entries in the Plan Cache and
queues up requests for the collection job
–Controlled by system value QDBFSTCCOL
iSeries Navigator graphical interface to manage stats collected by
the system
–API’s also provided to manage the stats
© 2005 IBM Corporation
IBM eServer iSeries
Review
What is the optimizer's job?
What is the optimizer's output?
What are some of the key elements used for cost based
optimization?
What things affect the Access plan?
Look at resources used as well as response time.
© 2005 IBM Corporation
IBM eServer iSeries
Trademarks and Disclaimers
IBM Corporation 1994-2005. All rights reserved.
References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
The following terms are trademarks of International Business Machines Corporation in the United States, other countries, or both:
Rational is a trademark of International Business Machines Corporation and Rational Software Corporation in the United States, other countries, or both.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
SET and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC.
Other company, product or service names may be trademarks or service marks of others.
Information is provided "AS IS" without warranty of any kind.
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance
characteristics may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement
of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages.
IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should
be addressed to the supplier of those products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized
reseller for the full text of the specific Statement of Direction.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules
with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development
activities as a good faith effort to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary
depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore,
no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.
Photographs shown are of engineering prototypes. Changes may be incorporated in production models.
© 2005 IBM Corporation