SQL Coding Best Practices for Developers

advertisement
Platform: DB2 for Linux, UNIX, and Windows
SQL Coding Best Practices
for Developers
Phil Gunning
Principal Consultant,
Gunning Technology Solutions, LLC
Session: G2
May 23, 2005
12:30 – 1:40
Outline
•
•
Best Practices
Classes of Predicates
•
•
•
•
Predicate Best Practices
•
•
Index SARGable
Range Delimiting
Data SARGable
Local, Order By, Join Predicates
Constraining Results
•
Filter Factors
•
•
Defaults/Formulas
DB2 Catalog Queries/Explain
•
Sources of Filter Factor information
2
Outline
•
Index Design
•
•
•
•
•
•
Local, Order By, Join predicates
Include Columns
Uniqueness
DB2 Visual Explain/db2exfmt/Design Advisor
Monitor and Evaluate
Summary
3
Best Practices
1. Use Range Delimiting and Index SARGable Predicates wherever possible
2. Understand DB2 predicate rules
3. Specify most restrictive predicates first
4. Select only columns that are needed
5. Adhere to proper index design techniques
6. Understand inputs to the Optimizer
7. Developers and DBAs collaborate to design proper indexes
8. Evaluate all SQL using Visual Explain/db2exfmt
9. Use Design Advisor to tune SQL/SQL Workloads
10. Consistently monitor and review application performance
4
Application
COST
Relational Data Services
Residual predicates
Data Management Services
Data SARGable predicates
Range Delimiting
Index Manager
Index SARGable
Data
5
Classes of Predicates
•
•
Range Delimiting
Index SARGable
•
•
•
Predicates that can use an index for a search argument
Resolved by Index Manager
Data SARGable
6
Predicates
•
Index SARGable
•
•
•
Are not used to bracket an index scan
Can be evaluated from the index if one is chosen
Evaluated by the Index Manager
7
Index SARGable Example
• Col A = 9 and Col C = 4
• Col A can be used as a range delimiting (start-stop)
predicate
• Col C can be used as an Index SARGable predicate, it cannot
be used as a range delimiting since there is no predicate on
Col B
• Starting with columns in the index, from left to right, the first
inequality predicate stops the column matching
8
Predicates
•
Range Delimiting
•
•
•
Used to bracket an index scan
Uses start and stop predicates
Evaluated by the Index Manager
9
Range Delimiting Example
•
•
Col A = 3 and Col B = 6 and Col C = 8
In this case the equality predicates on all the columns of the
index can be applied as start-stop keys and they are all range
delimiting
10
Predicates
•
Data SARGable
•
•
•
Cannot be evaluated by the Index Manager
Evaluated by Data Management Services
Require the access of individual rows from the base table
11
Data SARGable Example
•
•
Col A = 3 and Col B <= 6 and Col D = 9
Col A is used as a start-stop predicate, Col B is used as a
stop predicate, and Col D which is not present in the index is
applied as a Data SARGable predicate during the FETCH
from the table
12
Predicates
•
Residual Predicates
•
•
•
•
•
Cannot be evaluated by the Index Manager
Cannot be evaluated by Data Management Services
Require IO beyond accessing the base table
Predicates such as those using quantified sub-queries (ANY,
ALL, SOME, or IN), LONG VARCHAR, or LOB data
which is stored separately from the table
Are evaluated by Relational Data Services and are the most
expensive type of predicates
13
Residual Predicate Example
•
•
•
•
Col B = 4 and UDF with external action(Col D)
In this case the leading Col A does not have a predicate
Col B can only be used as an Index SARGable predicate
(where the whole index is scanned)
Col D involves a user defined function which will be
applied as a residual predicate
14
RULE#1
•
Use range delimiting and Index SARGable predicates
whenever possible
15
Index Review
• An index is a data structure that contains column values and a pointer to
the table data
• Primary key – Unique Index
• If a primary key is defined, DB2 automatically creates a unique index to
enforce the PK constraint
• Secondary Index
• Created to support access to frequently referenced columns
• Indexes provide efficient access (in terms of CPU and IO) to columns
found in the table
• Just like an index entry in a book, an index in a database enables rapid
lookup of associated table entries
16
Index Characteristics
•
•
•
•
Index entries are usually much smaller (subset) of all table
columns
Can fit more index entries on a page
Allows for more efficient use of buffer pool
Separate index buffer pool
•
•
Enables often used index pages to remain in the buffer pool longer
More logical IO than physical IO
17
A Word About Index Structures
•
•
•
B+ -tree used to store index entries
Provides for a tree structure that is balanced to a constant
depth from the root to the leaf blocks along every branch
Usually more efficient (less costly) than a table scan
18
B+ - Tree Index Example
ROOT
Level 0
Level 1
Level 2
Intermediate
NODE 1
Leaf NODE
L1.1
Intermediate
NODE 2
Leaf NODE
L1.300
Leaf NODE
L2.1
Intermediate
NODE 3
Leaf NODE
L2.600
Leaf NODE
L3.1
19
Intermediate
NODE 4
Leaf NODE
L3.900
Leaf NODE
L4.1
Leaf NODE
L4.1200
Index-Only Access
ROOT
Level 0
Level 1
Level 2
Leaf NODE
L1.1
Leaf NODE
L1.300
Leaf NODE
L2.1
Intermediate
NODE 4
Intermediate
NODE 3
Intermediate
NODE 2
Intermediate
NODE 1
Leaf NODE
L3.1
Leaf NODE
L2.600
20
Leaf NODE
L3.900
Leaf NODE
L4.1
Leaf NODE
L4.1200
Range Delimiting (Start – Stop Predicates)
ROOT
Level 0
Level 1
Level 2
Intermediate
NODE 1
Leaf NODE
L1.1
Intermediate
NODE 2
Leaf NODE
L1.300
Leaf NODE
L2.1
Intermediate
NODE 3
Leaf NODE
L2.600
Leaf NODE
L3.1
21
Intermediate
NODE 4
Leaf NODE
L3.900
Leaf NODE
L4.1
Leaf NODE
L4.1200
Select deptnumb,
deptname from
db2admin.org
Where deptnumb < 20
22
23
Full Table Scan
Query ….
Index
ACCT_IND1
24
Select deptnumb, deptname
from db2admin.org
Where deptnumb =20 and
deptname like 'b%' or
division
= 'midwest' and manager =
88 or location like 'bo%'
25
26
Table Scan Rules of Thumb
• If > 20-25% of the rows will be read, good likelihood of
table scan
• If 0.5 – 20% of the rows are read, likely index access but this
can vary depending on numerous factors
• Exact formulas used are complex and not very useful for
practical purposes
27
Rule #2
•
Understand and apply DB2 predicate rules
28
WITH DEPT_MGR AS (
SELECT DEPTNO, DEPTNAME, EMPNO, LASTNAME, FIRSTNME,
PHONENO
FROM DEPARTMENT D, EMPLOYEE E
WHERE D.MGRNO=E.EMPNO AND E.JOB='MANAGER' ), DEPT_NO_MGR
AS (
SELECT DEPTNO, DEPTNAME, MGRNO AS EMPNO
FROM DEPARTMENT EXCEPT ALL
SELECT DEPTNO, DEPTNAME, EMPNO
FROM DEPT_MGR ), MGR_NO_DEPT (DEPTNO, EMPNO, LASTNAME,
FIRSTNME, PHONENO) AS (
SELECT WORKDEPT, EMPNO, LASTNAME, FIRSTNME, PHONENO
FROM EMPLOYEE
WHERE JOB='MANAGER' EXCEPT ALL
SELECT DEPTNO,EMPNO, LASTNAME, FIRSTNME, PHONENO
FROM DEPT_MGR )
SELECT DEPTNO, DEPTNAME, EMPNO, LASTNAME, FIRSTNME,
PHONENO
FROM DEPT_MGR UNION ALL
SELECT DEPTNO, DEPTNAME, EMPNO, CAST(NULL AS VARCHAR(15))
AS LASTNAME, CAST(NULL AS VARCHAR(12)) AS FIRSTNME,
CAST(NULL AS CHAR(4)) AS PHONENO
FROM DEPT_NO_MGR UNION ALL
SELECT DEPTNO, CAST(NULL AS VARCHAR(29)) AS DEPTNAME,
EMPNO, LASTNAME, FIRSTNME, PHONENO
FROM MGR_NO_DEPT
ORDER BY 4
29
30
31
Created Two Indexes
CREATE INDEX "DB2ADMIN"."AABB" ON "DB2ADMIN"."DEPARTMENT"
("DEPTNO" ASC,
"DEPTNAME" ASC,
"MGRNO" ASC)
PCTFREE 10 CLUSTER MINPCTUSED 10
ALLOW REVERSE SCANS;
CREATE INDEX "DB2ADMIN"."CCDD" ON "DB2ADMIN"."EMPLOYEE"
("EMPNO" ASC, "FIRSTNME" ASC,
"MIDINIT" ASC, "LASTNAME" ASC,
"WORKDEPT" ASC, "PHONENO" ASC)
PCTFREE 10 MINPCTUSED 10
ALLOW REVERSE SCANS;
32
33
34
Index Scan
Read one node
at each
intermediate level
Index on DEPTNO, DEPTNAME, MGRNO
DEPTNAME DEPTNO MGRNO
Read leaf nodes by following sibling pointers
until no matching entry is found
35
Selectivity Catalog Queries
• SELECT INDNAME, NPAGES, CARD, FIRSTKEYCARD AS FIRSTK,
FIRST2KEYCARD AS F2KEY, FIRST3KEYCARD AS F3KEY,
FIRST4KEYCARD AS F4KEY, FULLKEYCARD AS FULLKEY,
NLEAF, NLEVELS AS NLEV, CLUSTERRATIO AS CR,
CLUSTERFACTOR AS CF, UNIQUERULE AS U, T.COLCOUNT AS
TBCOL, I.COLCOUNT AS IXCOL FROM SYSCAT.TABLES T,
SYSCAT.INDEXES I WHERE T.TABSCHEMA = I.TABSCHEMA AND
T.TABSCHEMA = ‘PGUNNING' AND T.TABNAME = I.TABNAME
AND CARD >20000 ORDER BY CARD DESC, 1;
36
INDNAME NPAGES CARD
FIRSTK F2KEY F3KEY F4KEY FULLKEY NLEAF NLEV
PSAITEM_DST516893 5682718
3
9
21
7339
7339 6886
PSBITEM_DST516893 5682718 1207
-1
-1
-1
1207 7412
PS_ITEM_DST516893 5682718
3 14947 1313902 1313906 5682718 136107
PS_PENDING_DST
516882 5682569
3 146890 148686 380119 5682569 172775
PSAITEM_ACTIVITY
577065 2884167
3 2959
-1
-1
2959 3667
PSBITEM_ACTIVITY
577065 2884167
-1
-1
-1
-1
-1
-1
PS_ITEM_ACTIVITY
577065 2884167
3 14947 1313902 1313906 2884167 50034
PSAJRNL_LN119531 1433651
543 1865
1865
-1
1865 1755
PSBJRNL_LN119531 1433651 56152 60820 65131 65268 65268 2184
PSDJRNL_LN119531 1433651 12831 14473 116514
-1 116514 2538
PSFJRNL_LN119531 1433651 1318 3654
3705 65268 65270 2223
37
CF
3
3
5
5
3
-1
4
3
3
3
3
U
8.90E-01 D
9.00E-01 D
8.33E-01 U
9.18E-01 U
7.77E-01 D
-1.00E+00 D
7.15E-01 U
7.70E-01 D
9.86E-01 D
8.58E-01 D
9.68E-01 D
TBCOL
IXCOL
53
53
53
49
91
91
91
44
44
44
44
COLNAMES
5 +APPL_JRNL_ID+
1 +PROCESS_INSTA
8 +BUSINESS_UNIT
10 +GROUP_BU+GROU
2 +BUSINESS_UNIT
4 +DEPOSIT_BU+DE
5 +BUSINESS_UNIT
3 +ACCOUNT+BUSIN
4 +JOURNAL_ID+JO
3 +PROCESS_INSTA
5 +JOURNAL_DATE+
XBOOKING1 Selectivity = Number of Distinct Values /
CARD
1229/389151 = .003
Meets our rule for selectivity < .10
XBOOKING2 Selectivity = Number of Distinct Values /
CARD
111217/389151 = .285
Does not meets our rule for selectivity < .10
38
Data Specification
• Specify the most restrictive predicates first
• Select only those columns needed
• Use business sense when developing reports for end users
• They should not be so voluminous that the average end user will not
be able to use them anyway
• Haven’t we all seen these monster reports that consume lots of CPU
and IO and never get looked at?
39
Fast Retrieval
• OPTIMIZE FOR N ROWS CLAUSE
• Can guide the optimizer to use an access path to quickly
return N Rows
• Also effects the size of the number of rows blocked in the
communications buffer
• Useful when the number of rows you want is significantly
less than total number of rows that could be returned
• Can slow performance if most of the rows are going to be
processed
40
Fetch First
• FETCH FIRST N ROWS ONLY CLAUSE
• Used to restrict fetching to only N rows regardless of number
of rows that there may have been in the result set if not
specified
• FOR FETCH ONLY CLAUSE
• Use when no updates are planned
• Query can take advantage of row blocking
• Only S locks taken on rows retrieved
• Improved concurrency
41
Rule #3 & 4
•
•
Specify most restrictive predicates first
Select only those columns needed
42
Selectivity
•
•
Selectivity of an index column indicates the number of rows
that will satisfy the predicate condition
Formula:
•
•
Selectivity = number of distinct values / number of rows in the table
Selectivity of predicates should be < .10, that is will return
less than 10% of the table to the requesting application or to
the intermediate result set if more than a two-way join
43
Index Design
• Indexes should be created on local, order by and join
predicates
• Frequently access columns with good selectivity
• Number of Indexes
• Determined by business rules
• OLTP
• 3-5 indexes
• Fewer indexes offer fewer choices to the optimizer
• DW
• 5 or more
44
Rule#5
•
Adhere to proper index design techniques
45
DB2 Optimizer
•
•
What inputs does the Optimizer consider/analyze during
statement optimization?
Important to know as some of these inputs can cause
suboptimal access paths if not current
•
•
•
RUNSTATS not current
Buffer pool changes
Configuration parameter changes
46
Buffer pool size (npages)
To determine how much of the buffer pool
may be available for tables/indexes involved.
SORTHEAP DB CFG parameter
To determine if a piped sort can be used.
LOCKLIST
To determine amount of memory available
for storing locks for this access plan.
CPU Speed
Speed of CPUs available.
PREFETCHSIZE
To determine I/O costs.
Value of INTRA_PARALLEL DBM CFG
Parameter
To determine if parallelism may be used.
Type of table space and number of containers To determine I/O costs and degree of I/O
parallelism.
SHEAPTHRES
Determine maximum amount of shared
SORTHEAP available.
DISK Speed
To estimate I/O costs.
47
Degree of clustering
To determine effectiveness of prefetching
and to determine how clustered data is.
Indexes Available
To determine if index access cost.
DFT_DEGREE
Default degree of parallelism.
AVG_APPLS
To determine amount of buffer pool space
available for a query.
MAXLOCKS
Percent of LOCKLIST used by a single
application before lock escalation occurs.
LOCKLIST
Size of memory area reserved for locks.
DFT_QUERYOPT
The default optimization class to be used.
STMTHEAP
Size can effect amount of optimization
conducted.
COMM_BANDWITH
Used for partitioned databases.
MAX_QUERYDEGREE
Maximum number of subagents to be used if
intra_parallel enabled.
48
REOPT Bind Option
•
•
Can be used to enable query reoptimization for dynamic and
static SQL that have host variables, parameter markers or
special registers
Can set the REOPT option to one of three values
•
•
•
None – No reoptimzation will take place, the default behavior
Once – the access plan will use real values the first time and the plan
will be cached in the package cache
Always – the access path will always be compiled and reoptimized
using the values of the parameter markers, host variables, or special
registers known at each execution time
49
Lock Wait Mode
•
•
Application can specify individual lock wait mode strategy
Take one of the following actions when it cannot obtain a
lock:
•
•
•
•
•
Return and SQLCODE or SQLSTATE
Wait indefinitely for a lock
Wait a specified amount of time for a lock
Use value of locktimeout DB CFG parameter
SET CURRENT LOCK TIMEOUT statement
•
•
Specifies number of seconds to wait for a lock
Applies to row, table, index key, and MDC block locks
50
KEEP UPDATE LOCKS
•
•
•
A lock type can be specified for queries that perform updates
Allows FOR UPDATE cursors to take advantage of row
blocking
RR or RS can be used when querying a read only results
table
•
Allows positioned cursor updates to succeed
51
Rule#6
•
Understand inputs to the DB2 Optimizer
52
DB2 Visual Explain
•
•
•
•
•
Use it as part of testing and development process
Developers can use any type of DB2 explain or other SQL
Analysis tool but it must be integrated into the development
process
DBAs also use all types of explain in support of application
development testing and in fixing production problems
Evaluate all SQL using Visual Explain/db2exfmt or some
type of explain tool
Monitor on a recurring basis
53
54
Rule #7
•
Developers and DBAs collaborate to develop applications
that perform when implemented in production
55
Design Advisor
•
•
•
DBAs work with developers using Design Advisor to
evaluate individual SQL statements and workloads to
identify possible index solutions, clustering indexes, MDC
indexes, and MQT recommendations
DBAs use the package cache option to look for high cost
SQL in the package cache
Best used as part of physical design process but use is
ongoing
56
db2advis -d gunprd -I wildsortsql.txt > wildsqlixadvout.txt
execution started at timestamp 2004-08-12-10.25.44.141157
found [1] SQL statements from the input file
Calculating initial cost (without recommmended indexes) [23866.660156]
timerons
Initial set of proposed indexes is ready.
Found maximum set of [1] recommended indexes
Cost of workload with all indexes included [75.079346] timerons
total disk space needed for initial set [ 4.747] MB
total disk space constrained to
[ -1.000] MB
1 indexes in current solution
[23866.6602] timerons (without indexes)
[ 75.0793] timerons (with current solution)
[%99.69] improvement
Trying variations of the solution set.--- execution finished at timestamp 2004-08-12-10.25.45.932376--- LIST OF RECOMMENDED INDEXES
-- ===========================
-- index[1], 4.747MB
CREATE INDEX WIZ1 ON "PUSER "."T_FILE" ("FILE_STATUS_NUM" DESC) ;
-- ===========================-Design Advisor tool is finished.
57
Rule #8
•
•
Evaluate all SQL using Visual Explain/db2exfmt or some
type of explain tool
Use Design Advisor to tune SQL statements and workloads
58
RUNSTATS
•
•
•
•
RUNSTATS should be run on a regular schedule
After a reorg, change in prefetchsize, static SQL change, growth in data
What is a regular schedule?
Nightly or Weekly depending on many things
•
•
•
Data changes by 10% or more
Indexes changed or added
Use SAMPLING in V8.1+, and the WITH DISTRIBUTION clause
•
•
•
If simple and straight forward queries and no skewed data then don’t use WITH
DISTRIBUTION
In database with all dynamic SQL like most ERP, CRM, SCM packages
today, RUNSTATS may be needed nightly if data changes as noted
above
Most shops that get consistent performance schedule RUNSTATS either
nightly or weekly
59
Monitoring
•
•
If you don’t know how performance was, you can’t tell why
it is suddenly bad
Application developers can monitor the performance of
applications under their control by:
•
•
•
•
•
Monitoring service levels
Periodic explains in Production
Checking with end users/help desk
Querying the performance repository
DBAs should implement and maintain a continuous
monitoring program using snapshots, snapshot repository,
and event monitors when needed
60
Monitoring
•
•
•
•
Compute performance metrics on an hourly/daily basis and
track and evaluate over time
With such a system established, you will be able to answer
such questions as “What was this SQL running like
yesterday, last week, and last month?”
Were there any database problems today?
Why has this query suddenly gone from running in 30
seconds to 30 hrs?
61
Rule#10
•
•
•
Implement a monitoring solution that provides both realtime and historical performance data
Build canned reports to identify top 10 SQL statements, in
terms of User CPU, System CPU, rows read, sorts, and sort
time
Implement a “closed loop” system where problems are
tracked until they are resolved
62
63
SQL Coding Best Practices for Developers
Session: G2
THANK YOU!
Phil Gunning
Gunning Technology Solutions, LLC
pgunning@gunningts.com
64
Download