The ABCs of Using XML with SQL DB2 Version 9

advertisement
Utilizing Views, RI and Other
Stuff for Performance
New England DB2 User
Group (NEDB2UG)
March 25, 2009
B.L. “Tink” Tysor
Bayard Lee Tysor, Inc.
www.BLTysor.com
Tink@BLTysor.com
401-965-2688
1
Bayard Lee Tysor, Inc.
DB2 SQL, DBA & Data Modeling
Consulting & Education
Sheryl Larsen is an internationally recognized researcher,
consultant and lecturer, specializing in DB2 and is known
for her extensive expertise in SQL coding and tuning.
SherylMLarsen@cs.com
www.SMLSQL.com
USA 630-399-3330
Reed Meseck is an internationally recognized researcher,
consultant and lecturer, specializing in very high volume, highly
scalable transaction and data warehouse systems
RMeseck@attglobal.net
BL "Tink" Tysor is an internationally recognized researcher,
consultant and lecturer, specializing in Data Modeling, DB2
SQL and Database Administration.
Tink@BLTysor.com
www.BLTysor.com
USA 401-965-2688
DB2 is a Registered Trademark of IBM Corporation
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
2
Outline
 How did we get here?
 Back to the future
 Data Matters
 Getting to know your data.
 Using Views for Domains
 Layered Views
 Views Applied
 Relations? Who Needs Them?
 Constraints? Who needs them?
 Manual Query Rewrite
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
3
HOW DID WE GET HERE?
Back to the Future
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
4
Do You Know This Man?
Dr. Edgar Frank “Ted” Codd
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
5
Codd’s 12 Rules Abridged
0.
For a system to qualify as a RELATIONAL, DATABASE, MANAGEMENT
system, that system must use its RELATIONAL facilities (exclusively)
to MANAGE the DATABASE.
1. The information rule – Everything is a value in a column in a table.
2. The guaranteed access rule – Every scalar value in the database must
be logically addressable using table name, column name and the
primary key of the containing row.
3. Systematic treatment of null values – Everything has a value, even
nothing (NULL).
4. Active online catalog based on the relational model – The system
must eat its own dogfood, i.e. the catalog is relational and accessed
via SQL.
5. The comprehensive data sublanguage rule – One comprehensive
language (SQL) for expression, definition and implementation.
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
6
Codd’s 12 Rules Abridged
6. The view updating rule - All views that are theoretically updatable
must be updatable by the system.
7. High-level insert, update, and delete - The system must support setat-a-time INSERT, UPDATE, and DELETE operators.
8. Physical data independence - Self-explanatory.
9. Logical data independence - Self-explanatory
10. Integrity independence –
Integrity constraints must be specified
separately from application programs and
stored in the catalog. It must be possible to
change such constraints as and when
appropriate without unnecessarily affecting
existing applications.
11. Distribution independence – Data access should be transparent
regardless of where it lives.
12. The nonsubversion rule – No cheaters, no perversions, no backdoors!
If the system provides a low-level (record-at-a-time) interface, then
that interface cannot be used to subvert the system (e.g.) bypassing
a relational security or integrity constraint.
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
7
E.F. Codd
• Applications written transparent to physical design
• Applications remain untouched by physical design
changes
• Physical design changes are often needed and
natural in types of stored information
“Future users of large data banks must be protected
from having to know how the data is organized …...”
E.F. Codd, A Relational Model of Data Large Shared
Data Banks
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
8
Highlights
• Users must be protected from having to know how the data is
organized
– Hiding real representation from users is okay and even good
– It ensures accuracy and proper JOINs
• Most application programs should remain unaffected when the
internal representation of data is changed
– Changes to base tables should break few or no applications
• Changes in data representation will often be needed as a result
of changes in query, update, and report traffic and natural
growth in the types of stored information
– Change is inevitable
– Changes in traffic are normal and likely
– Growth is natural
Most Applications
get BIGGER
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
9
Evolution of Program Abstraction
Service Oriented Architecture
Object Oriented Programming
APIs
3GL Languages
Assembly Language
Hardware Programming
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
10
Evolution of Data Abstraction
Layered Views
Views
Relational Concept
File Systems
Hardware Programming
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
11
First Normal Form
Remove multi Valued Attributes
CLAIM
Policy ID
Occurrence ID
Claim ID
Insured Name
Insured Address
Insured Customer Rating
Policy Effective Date
Policy Expiration Date
Occurrence Date
Claimant Name
Claimant Address
Medical Payments
Indemnity Payments
Expense Payments
CLAIM1
Policy ID
Occurrence ID
Claim ID
Insured Name
Insured Address
Insured Customer Rating
Policy Effective Date
Policy Expiration Date
Occurrence Date
Claimant Name
Claimant Address
Payments1
Policy ID (FK)
Occurrence ID (FK)
Claim ID (FK)
Payment Type Key (FK)
Payment
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
Payment Type1
Payment Type Key
Payment Type
12
Second Normal Form
Remove Duplicate Data From Key Attributes
POLICY2
Policy ID
Insured Name
Insured Address
Insured Customer Rating
Policy Effective Date
Policy Expiration Date
CLAIM1
Policy ID
Occurrence ID
Claim ID
Insured Name
Insured Address
Insured Customer Rating
Policy Effective Date
Policy Expiration Date
Occurrence Date
Claimant Name
Claimant Address
CLAIM2
Occurrence ID
Claim ID
Policy ID
Occurrence Date
Claimant Name
Claimant Address
Payments1
Policy ID (FK)
Occurrence ID (FK)
Claim ID (FK)
Payment Type Key (FK)
Payment
NEDB2UG March 25,
2010
Payments2
Payment Type1
Payment Type Key
Payment Type
Policy ID (FK)
Occurrence ID (FK)
Claim ID (FK)
Payment Type Key (FK)
Payment
© Bayard Lee Tysor, Inc. 2009-2010
Payment Type2
Payment Type Key
Payment Type
13
Second Normal Form Cont.
Remove Duplicate Data From Key Attributes
POLICY2
Policy ID
OCCURRENCE2a
CLAIM2
Occurrence ID
Policy ID
Occurrence Date
Occurrence ID
Claim ID
Policy ID
Occurrence Date
Claimant Name
Claimant Address
CLAIM2a
Occurrence ID
Claim ID
Claimant Name
Claimant Address
Payments2
NEDB2UG March 25,
2010
Policy ID
Insured Name
Insured Address
Insured Customer Rating
Policy Effective Date
Policy Expiration Date
Insured Name
Insured Address
Insured Customer Rating
Policy Effective Date
Policy Expiration Date
Policy ID (FK)
Occurrence ID (FK)
Claim ID (FK)
Payment Type Key (FK)
Payment
POLICY2
Payment Type2
Payment Type Key
Payment Type
Payments2a
Policy ID (FK)
Occurrence ID (FK)
Claim ID (FK)
Payment Type Key (FK)
Payment
© Bayard Lee Tysor, Inc. 2009-2010
Payment Type2a
Payment Type Key
Payment Type
14
Third
Normal
Form
Cont.
Remove Duplicate Data From Key Attributes
POLICY2
INSURED3
POLICY3
Policy ID
Policy ID
Insured Name
Insured Address
Insured Customer Rating
Policy Effective Date
Policy Expiration Date
OCCURRENCE2a
Insured Key
Policy Effective Date
Policy Expiration Date
Occurrence ID
Policy ID
Occurrence Date
CLAIM3
CLAIM2a
Occurrence ID
Claim ID
Claimant Name
Claimant Address
Occurrence ID
Claim ID
Claimant Key
CLAIMANT3
Claimant Key
Claimant Name
Claimant Address
Payments3
Payments2a
NEDB2UG March 25,
2010
Insured Name
Insured Address
Insured Customer Rating
OCCURRENCE3
Occurrence ID
Policy ID
Occurrence Date
Policy ID (FK)
Occurrence ID (FK)
Claim ID (FK)
Payment Type Key (FK)
Payment
Insured Key
Payment Type2a
Payment Type Key
Payment Type
Policy ID (FK)
Occurrence ID (FK)
Claim ID (FK)
Payment Type Key (FK)
Payment
© Bayard Lee Tysor, Inc. 2009-2010
Payment Type3
Payment Type Key
Payment Type
15
Star Schema Example
Payment_Key: INTEGER
Payment_Type: CHAR
Time_Key: INTEGER
Policy_Key: INTEGER
Date: DATE
Quarter: SMALLINT
Loss_Period: CHAR()
Premium_Period: CHAR()
Policy_ID: CHAR
Policy_Effective_Date: DATE
Policy_Expiration_Date: DATE
Payment_Key:
INTEGER
Fact_Key:
INTEGER
Policy_ID: INTEGER
Time_Key: SMALLINT
Insured_Key: INTEGER
Claimant_Key: INTEGER
Payment:
DECIMAL(,) DECIMAL(,)
Indemnity_Payments:
Medical_Payments: DECIMAL(,)
Expense_Payments: DECIMAL(,)
Insured_Key: INTEGER
Claimant_Key: INTEGER
Insured_Name: CHAR
Insured_Address: CHAR()
Insured_Customer_Rating: CHAR()
Claimant_Name: CHAR
Claimant_Address: CHAR()
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
16
To Reduce Costs
E
L
A
P
S
E
D
T
I
M
E
CPU COST
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
17
To Reduce Costs
•Development
•Maintenance
•Enhancements
•Fixing Bugs
NEDB2UG March 25,
2010
E
L
A
P
S
E
D
T
I
M
E
© Bayard Lee Tysor, Inc. 2009-2010
CPU COST
18
Performance –
How Can We Affect It?
• “Grain” of Performance
– Large – “Gross” tuning
• Data Design
 Database design
 3NF, Horizontal Table splits
 Data placement
 Partitioning, load balancing
 Data organization
 UNION in Views
• Subsystem Parameters
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
19
Performance –
How Can We Affect It?
• “Grain” of Performance
– Small – “Fine” tuning
• SQL tuning
 Rewriting the query
• Index tuning
 Altering Index design
• Query plan tuning
 Changing the Optimizers’ Mind
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
20
UNION ALL Views
UNION ALL View
Daily Report
Frequency
0
R
E
G
I
O
N
S
I
Z
E
500
M
1000
M
I
I
U
Partitioned
By Region
S
A
USA
N
X
N
One physical table of all
regions with five years of data
NEDB2UG March 25,
2010
X
© Bayard Lee Tysor, Inc. 2009-2010
One logical table
of all regions
with five years of
data
21
DATA MATTERS
Getting to Know Your Data
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
22
Parkinson’s Law of Data
Definition:
"Data expands to fill the space available for storage";
Buying more memory encourages the use of more
memory-intensive techniques.
It has been observed over the last 10 years that the
memory usage of evolving systems tends to double
roughly once every 18 months.
Fortunately, memory density available for constant
dollars also tends to double about once every 12
months (see Moore's Law);
Unfortunately, the laws of physics guarantee that the
latter cannot continue indefinitely.
•
- COPYRIGHT © 2000-2003 WEBNOX CORP. HYPERDICTIONARY
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
23
What Matters ….
• Size matters!
− Absolute size
− Relative size
− Measures
 Rows
 Bytes
 Etc.
• Where it matters
− JOINS
− Schema
definition
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
24
Cardinality - Partitioning
• What Does Your Data Distribution Look Like?
– Ideal and Uniform?
– Less than Ideal and “Clumpy”
• Yesterday’s Partitioning Scheme May Not Work
Today!
Customers
A-F
Customers
A-F
NEDB2UG March 25,
2010
Customers
G-L
Customers
M-R
Customers
M-R
© Bayard Lee Tysor, Inc. 2009-2010
Customers
S-Z
25
Strategies for Performance
UNION ALL View
Daily Report
Frequency
0
R
E
G
I
O
N
S
I
Z
E
500
M
1000
M
I
I
U
S
A
USA
Partitioned
By
Subsets
of USA
N
X
N
One physical table of all
regions with five years of data
NEDB2UG March 25,
2010
X
© Bayard Lee Tysor, Inc. 2009-2010
One logical table
of all regions
with five years of
data
26
UNION ALL View
M
UNION
ALL
View
I
UNION
ALL
View
M C-4
M Current
I C-4
I Current
Partitioned by
Month
USA C- 5
USA
UNION
ALL
View
USA C-4
USA C-3
N
X
One logical table of all regions
with five years of data
NEDB2UG March 25,
2010
UNION
ALL
View
UNION
ALL
View
USA C-2
USA C-2
USA C-2
C-2
USA Current USA
USA
C-2
USA Current
USA
C-2
USA Current
USA
C-2
USA Current
USA
C-2
USA Current
USA
C-2
USA Current
USA
C-2
USA Current
USA
C-2
USA Current
USA
C-2
USA Current
USA Current
USA Current
USA Current
N C-4
N Current
X C-4
X Current
© Bayard Lee Tysor, Inc. 2009-2010
27
Strata
2004
2003
2002
2001
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
28
Affinity
Part 1
Part 2
Part 3
Part 4
(Southwest)
(Northwest)
(Southeast)
(Northeast)
Application Servers
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
29
Motif / Template Pattern
• Data “Frame” is common
– Note the commonality
• Variant portion is small
– “ABC”, “DEF”, “GHI” in the example
• Commonly found in XML documents,
Web pages, etc.
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
30
Vertical Split
VIEW definition hides JOIN from APP
Customer
Customer - Employee
Prior 7 Years
NEDB2UG March 25,
2010
Employee
© Bayard Lee Tysor, Inc. 2009-2010
31
Horizontal Split
UNION ALL VIEW
Newest
Prior 2 Years
Customer - Employee
Prior 7 Years
Prior 5 Years
Oldest
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
32
Horizontal Split
UNION ALL and MQT
“NOW”
Current Period (MQT)
Newest
Prior 2 Years
Customer - Employee
Prior 7 Years
Prior 5 Years
Oldest
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
33
Horizontal & Vertical Splitting
UNION ALL, MQT and Vertical
“NOW”
Current Period (MQT)
Newest
Prior 2 Years
Customer - Employee
Customer
Prior 7 Years
Prior 5 Years
Oldest
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
34
Patterns of Denormalization
• Collapsing Tables
• Splitting Tables
–Horizontal Split
–Vertical Split
• Adding Redundant Columns
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
35
Collapsing Tables
Table A
Table A
Table B
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
36
Splitting Tables – Horizontal(Strata)
Table A1
Table A
Table A2
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
37
Splitting Tables – Vertical (Striping)
Table A
Table A
Table B
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
38
USING VIEWS FOR DOMAINS
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
39
Relationship of Domains
Domain of All
Customers
Domain of
Active
Customers
Domain of Active
Customers with
Account Balance
<> 0
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
40
Domains as Views
SELECT * FROM
V_CUSTOMERS
V_CUSTOMERS_ALL
SELECT * FROM
V_CUSTOMERS_ALL
WHERE ACTIVE = ‘Y’
V_CUSTOMERS_ACTIVE
V_CUSTOMERS_SENDBILL
SELECT * FROM
V_CUSTOMERS_ACTIVE
WHERE BALANCE <> 0
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
41
LAYERED VIEWS
Simple Concepts for Simple Minds
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
42
Layered Views
Late
Current
Active Customers
Inactive Customers
All Customers
Base Table
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
43
Views Can Change
Late Current
Balance <> 0
Active Customers
Inactive Customers
All Customers
Base Table
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
44
Base Tables Can Change
Late Current
Balance <> 0
Active Customers
Inactive Customers
All Customers
Base Table
NEDB2UG March 25,
2010
Base Table
Base Table
© Bayard Lee Tysor, Inc. 2009-2010
Base Table
45
VIEWS APPLIED
Can you spell
“Viewmiester”?
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
46
Views
• Views for Easier Programming
– Reduce complexity for reliability
• Logical data independence
– Ability to modify the physical layout
– No program impact
• Views for performance
– “Skinny” views
– Views that define domains
• Result set is primary keys
• Views for Reuse
– Combine domains using SET operators
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
47
Views Can Make Programming Easier
SELECT
P.POLICY_ID
,O.OCCURANCE_ID
,C.CLAIM_ID
Insured3
,I.INSURED_NAME
Insured Key
,I.INSURED ADDRESS
,I.INUSRED_CUSTOMER_RATING
Insured Name
,P.POLICY_EFFECTIVE_DATE
Insured Address
,P.POLICY_EXPIRATION_DATE
Insured Customer Rating
,O.OCCURANCE_DATE
,T.CLAIMANT_NAME
FROM
,T.CLAIMANT_ADDRESS
POLICY3 P INNER JOIN INSURED3 I
,COALESCE(MED.MEDICAL_PAYMENTS,0)
ON P.INSURED_KEY = I.INSURED_KEY
AS MEDICAL_PAYMENTS
INNER JOIN OCCURANCE3 O
,COALESCE(IND.INDEMNITY_PAYMENTS,0)
ON P.POLICY_ID = O.POLICY_ID
AS INDEMNITY_PAYMENTS
INNER JOIN CLAIM3 C
,COALESCE(EXP.EXPENSE_PAYMENTS,0)
ON C.OCCURANCE_ID = O.OLCCURANCE_ID AS EXPENSE_PAYMENTS
INNER JOIN CLAIMANTS3 T
ON C.CLAIMANT_KEY = T.CLAIMANT_K
LEFT OUTER JOIN
CLAIM
(SELECT OCCURRENCE_ID,CLAIM_ID
PAYMENT AS MEDICAL_PAYMENTS
Claimants3
Policy ID
FROM PAYMENT3
Occurrence ID
Claimant Key
WHERE PAYMENT_TYPE_KEY = ‘M’) AS MED
Claim ID
ON MED.OCCURANCE_ID = C.OCCURANCE_ID
Claimant Name
AND MED.CLAIM_ID = C.CLAIM_ID LEFT OUTER JOIN
Insured Name
Claimant Address
(SELECT OCCURRENCE_ID,CLAIM_ID
Insured Address
,PAYMENT AS INDEMNITY_PAYMENTS
Insured Customer Rating
FROM PAYMENT3
Policy Effective Date
WHERE PAYMENT_TYPE_KEY = ‘I’) AS IND
Policy Expiration Date
ON IND.OCCURANCE_ID = C.OCCURANCE_ID
Occurrence Date
IND.CLAIM_ID = C.CLAIM_ID
Payment Type3 LEFTAND
Claimant Name
OUTER JOIN
Payment Type Key
(SELECT OCCURRENCE_ID,CLAIM_ID
Claimant Address
,PAYMENT AS EXPENSE_PAYMENTS
Medical Payments
Payment Type
FROM PAYMENT3
Indemnity Payments
WHERE PAYMENT_TYPE_KEY = ‘E’) AS EXP
Expense Payments
ON EXP.OCCURANCE_ID = C.OCCURANCE_ID
AND EXP.CLAIM_ID = C.CLAIM_ID;
Create VIEW CLAIM AS
Policy3
Policy ID
Insured Key (FK)
Policy Effective Date
Policy Expiration Date
Occurance3
Occurrence ID
Policy ID (FK)
Occurrence Date
Claim3
Occurrence ID (FK)
Claim ID
Claimant Key (FK)
Payment3
Occurrence ID (FK)
Claim ID (FK)
Payment Type Key (FK)
Payment
Insured Key
Name
Insured
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
48
Views Can Make Programming Easier
Policy3
CLAIM
Policy ID
Occurrence ID
Claim ID
Insured Name
Insured Address
Insured Customer Rating
Policy Effective Date
Policy Expiration Date
Occurrence Date
Claimant Name
Claimant Address
Medical Payments
Indemnity Payments
Expense Payments
Policy ID
Insured Key
Insured Key (FK)
Policy Effective Date
Policy Expiration Date
Insured Name
Insured Address
Insured Customer Rating
Occurance3
Occurrence ID
Policy ID (FK)
Occurrence Date
SELECT
C.INSURED_NAME
,SUM(C.MEDICAL_PAYMENTS)
AS TOTAL_PAYMENTS
FROM
CLAIM C
GROUP BY C.INSURED_NAME
ORDER BY
TOTAL_PAYMENTS DESC
FETCH FIRST 10 ROWS ONLY;
NEDB2UG March 25,
2010
Insured3
Claim3
Occurrence ID (FK)
Claim ID
Claimant Key (FK)
RI
Claimants3
Claimant Key
Claimant Name
Claimant Address
Payment3
Occurrence ID (FK)
Claim ID (FK)
Payment Type Key (FK)
Payment
Insured
InsuredName
Key`
Insured
Key
© Bayard Lee Tysor, Inc. 2009-2010
Payment Type3
Payment Type Key
Payment Type
49
Using Views to Avoid XML
• Can make an XML document look like a
DB2 Column
• Performance could be a problem
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
50
Sample XML
<Courses>
<Course ID="B1">
<Title>Basic SQL</Title>
<Instructor ID = "BLT">
<Name>B.L. "Tink" Tysor</Name>
<Phone>401-965-2688</Phone>
<Email>Tink@BLTysor.com</Email>
<Web_Site>www.BLTysor.com</Web_Site>
</Instructor>
<Duration>1 Day</Duration>
<Labs>3 Labs</Labs>
</Course>
<Course ID="I1">
<Title>Intermediate SQL</Title>
<Instructor ID = "BLT">
<Name>B.L. "Tink" Tysor</Name>
<Phone>401-965-2688</Phone>
<Email>Tink@BLTysor.com</Email>
<Web_Site>www.BLTysor.com</Web_Site>
</Instructor>
<Instructor ID = "SML">
<Name>Sheryl M. Larsen</Name>
<Phone>630-399-3330</Phone>
<Email>SMLSQL@Comcast.net</Email>
<Web_Site>www.SMLSQL.com</Web_Site>
</Instructor>
<Duration>2 Days</Duration>
<Labs>6 Labs</Labs>
</Course>
NEDB2UG March 25,
2010
<Course ID = "A2">
<Title>Tuning DB2 SQL for Performance</Title>
<Instructor ID = "SML">
<Name>Sheryl M. Larsen</Name>
<Phone>630-399-3330</Phone>
<Email>SMLSQL@Comcast.net</Email>
<Web_Site>www.SMLSQL.com</Web_Site>
</Instructor>
<Duration>1 Day</Duration>
<Labs>1 Lab</Labs>
</Course>
<Course ID = "X1">
<Title>pureXML</Title>
<Instructor ID = "BLT">
<Name>B.L. "Tink" Tysor</Name>
<Phone>401-965-2688</Phone>
<Email>Tink@BLTysor.com</Email>
<Web_Site>www.SMLSQL.com</Web_Site>
</Instructor>
<Duration>2 Days</Duration>
<Labs>6 Labs</Labs>
</Course>
</Courses>
© Bayard Lee Tysor, Inc. 2009-2010
51
Using Views to Avoid XML
CREATE VIEW VCLASSES AS
SELECT AC.CLASS_EFF, XT.ID, XT.TITLE, XT.DURATION, XT.LABS
FROM ALL_CLASSES AC ,
XMLTABLE( '$T/Courses/Course' PASSING AC.CLASSES AS "T"
COLUMNS
“ID" CHAR(3) PATH './@ID'
,”TITLE" VARCHAR(30) PATH 'Title'
,”DURATION" CHAR(10) PATH 'Duration'
,”LABS" CHAR(10) PATH 'Labs'
) AS XT
WHERE AC.CLASS_EFF_DTE= '12/01/2008';
SELECT * FROM VCLASSES;
CLASS_EFF
ID TITLE
2008-12-01 B1 Basic SQL
2008-12-01 I1 Intermediate SQL
2008-12-01 A2 Tuning DB2 SQL for Performance
2008-12-01 X1 pureXML
NEDB2UG March 25,
2010
DURATION
1 Day
2 Days
1 Day
2 Days
© Bayard Lee Tysor, Inc. 2009-2010
LABS
3 Labs
6 Labs
1 Lab
6 Labs
52
Using Views to Optimize XML
…
WHERE
XMLEXISTS(‘$a/author[@id = $book/authors/author/@id]’
PASSING bookinfo as “b”, authorinfo as “a”)
…
•Does
not use
indexes
CREATE INDEX bookAuthorIdx ON books(bookinfo)
GENERATE KEY USING XMLPATTERN ‘/book/authors/author/@id’
AS SQL DOUBLE;
CREATE INDEX authorIdx ON authors(authorinfo)
GENERATE KEY USING XMLPATTERN ‘/author/@id’
AS SQL DOUBLE;
…
WHERE
XMLEXISTS(‘$a/author[@id/xs:double(.) =
$book/authors/author/@id/xs:double(.)]’
PASSING bookinfo as “b”, authorinfo as “a”)
…
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
•Uses
indexes
53
Exception Based View
County
Coverage
County ID
Coverage ID
State Code
County Info
Coverage Info
ZIP Code
ZIP Code
State
State Code
State ID
State Info
Locality
Rate
County ZIP Locality REL
State Code
County ID
ZIP Code
Locality ID
REL Info
Coverage ID
State Code
County ID
ZIP Code
Locality ID
Rate Factor
Locality ID
31 Rows per Coverage
(actually >100,000)
State Code
Locality Info
ZIP Codes
State Code
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
54
Exception Based View cont.
County
Coverage
County ID
Coverage ID
State Code
County Info
Coverage Info
ZIP Code
ZIP Code
State
State Code
State ID
State Info
Locality
Rate
County ZIP Locality REL
State Code
County ID
ZIP Code
Locality ID
REL Info
Locality ID
Coverage ID
State Code
County ID
NULL
ZIP Code
NULL
Locality ID
NULL
Rate Factor
Priority Key
5 Rows
(actually approx 5,000)
State Code
Locality Info
ZIP Codes
State Code
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
55
Exception Based View cont.
Exceptions the default would be
according to the following
priorities
COUNTY/ZIP/LOCALITY
if no row then
ZIP/LOCALITY
if no row then
ZIP/COUNTY
if no row then
LOCALITY
if no row then
ZIP
if no row then
COUNTY
if no row then
STATE
NEDB2UG March 25,
2010
Coverage
Coverage ID
Coverage Info
Rate
County ZIP Locality REL
State Code
County ID
ZIP Code
Locality ID
REL Info
Coverage ID
State Code
County ID
NULL
ZIP Code
NULL
Locality ID
NULL
Rate Factor
Priority Key
Geo_Pol_Rating
Coverage ID
State Code
County ID
ZIP Code
Locality ID
Rate Factor
© Bayard Lee Tysor, Inc. 2009-2010
56
Exception Based View cont.
CREATE VIEW GEO_POL_RATING
Rate
County ZIP Locality REL
AS
Coverage ID
Coverage
SELECT
State Code
State Code
Coverage ID
County ID
County ID
NULL
COVERAGE_ID
Coverage Info
ZIP Code
ZIP Code
NULL
,GP.STATE_CODE
Locality ID
Locality ID
NULL
,GP.COUNTY_ID
REL Info
Rate Factor
,GP.LOCALITY_ID
Priority Key
,GP.ZIP_CODE
(SELECT
,RATE_FACTOR
MIN(R1.PRIORTY_KEY)
Geo_Pol_Rating
FROM
FROM
COUNTYZIP_LOCALITY_REL GP
Coverage ID
COUNTYZIP_LOCALITY_REL GP1
INNER JOIN
State Code
INNER JOIN
RATE R
County ID
RATE
R1
ZIP Code
ON
ON
Locality ID
GP.STATE_CODE = R.STATE_CODE
Rate Factor
GP1.STATE_CODE
=
R1.STATE_CODE
AND
AND
GP.ZIP_CODE =
GP1.ZIP_CODE = COALESCE(R1.ZIP_CODE,GP1.ZIP_CODE)
COALESCE(R.ZIP_CODE,GP.ZIP_CODE)
AND
AND
GP1.LOCALITY_ID = COALESCE(R1.LOCALITY_ID,GP1.LOCALITY_ID)
GP.LOCALITY_ID =
AND
COALESCE(R.LOCALITY_ID,GP.LOCALITY_ID)
GP1.COUNTY_ID = COALESCE(R1.COUNTY_ID,GP1.COUNTY_ID)
AND
WHERE
GP.COUNTY_ID =
GP.STATE_CODE = GP1.STATE_CODE
COALESCE(R.COUNTY_ID,GP.COUNTY_ID)
AND
WHERE
GP.COUNTY_ID = GP1.COUNTY_ID
R.PRIORTY_KEY =
AND
GP.LOCALITY_ID = GP1.LOCALITY_ID
AND
Correlated
GP.ZIP_CODE = GP1.ZIP_CODE)
Subquery
;
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
57
Exception Based View cont.
Rate
SELECT
GPR.RATE_FACTOR
FROM
GEO_POL_RATING GPR
WHERE
GPR.STATE_CODE = 'RI'
AND
GPR.COUNTY_ID = 'PROVIDENCE'
AND
GPR.LOCALITY_ID = 'PROVIDENCE'
AND
GPR.ZIP_CODE = '02906'
;
Geo_Pol_Rating
Coverage ID
State Code
County ID
ZIP Code
Locality ID
Rate Factor
RATE_FACTOR
----------6.2700
1 record(s) selected.
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
58
RELATIONS?
WHO NEEDS THEM?
Subliminal Requirements
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
59
Dumb, Simple SQL Join
SELECT E.EMPNO
FROM
EMPLOYEE E,
DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
;
E
NEDB2UG March 25,
2010
D
© Bayard Lee Tysor, Inc. 2009-2010
60
Do Constraints Matter?
No Indexes
or RI
SELECT
E.EMPNO
FROM
EMPLOYEE E
,DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
61
What Does a Primary Index Buy Us?
One Primary
Index
SELECT
E.EMPNO
FROM
EMPLOYEE E
,DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
62
What Does A Foreign Key
Constraint Buy Us?
One Foreign Key
Constraint
SELECT
E.EMPNO
FROM
EMPLOYEE E
,DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
63
What Does a Secondary Index
Buy Us?
One
Secondary
Index & RI
SELECT
E.EMPNO
FROM
EMPLOYEE E
,DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
64
Does it Matter with Views?
• Views!
CREATE VIEW JOINVIEW_ED
(EMPNO)
AS
SELECT
E.EMPNO
FROM
EMPLOYEE E
,DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
;
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
65
Same!
SELECT *
FROM
JOINVIEW_ED;
No Indexes
or RI
CREATE VIEW JOINVIEW_ED
(EMPNO)
AS
SELECT
E.EMPNO
FROM
EMPLOYEE E
,DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
;
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
66
Same!
SELECT *
FROM
JOINVIEW_ED;
One Primary
Index
CREATE VIEW JOINVIEW_ED
(EMPNO)
AS
SELECT
E.EMPNO
FROM
EMPLOYEE E
,DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
;
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
67
Same!!
SELECT *
FROM
JOINVIEW_ED;
One Foreign Key
Constraint
CREATE VIEW JOINVIEW_ED
(EMPNO)
AS
SELECT
E.EMPNO
FROM
EMPLOYEE E
,DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
;
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
68
Same!!!
One
Secondary
Index & RI
SELECT *
FROM
JOINVIEW_ED;
CREATE VIEW JOINVIEW_ED
(EMPNO)
AS
SELECT
E.EMPNO
FROM
EMPLOYEE E
,DEPARTMENT D
WHERE
E.WORKDEPT = D.DEPTNO
;
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
69
Redundant Join Elimination ..
Very Powerful
Elimination of redundant joins between tables related through an RI constraint
Employee
empno
Department
deptno
workdept
Original View
EmpDeptView
SQL
Rewritten SQL
NEDB2UG March 25,
2010
SELECT * FROM Employee E, Department D
WHERE workdept = deptno
SELECT empno, workdept FROM EmpDeptView
WHERE workdept = deptno
SELECT empno, workdept FROM Employee
WHERE workdept is not null
© Bayard Lee Tysor, Inc. 2009-2010
70
CONSTRAINTS?
WHO NEEDS THEM?
Where to Use Them,
Why They Matter!
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
71
Constraints
• Primitive constraints
–
–
–
–
data type
NOT NULL
unique indexes
DEFAULT
• Table CHECK Constraints
• Referential Integrity
– Primary Key Constraints
– Unique Key Constraints
– Foreign Key Constraints
• Triggers (may be)
• Constraints on Views "WITH CHECK OPTION"
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
72
UNION ALL Views
• Providing SELECT Transparency
CREATE VIEW LOGICAL_TABLE ….. AS
UNION ALL
SELECT columns
FROM LOGICAL_TABLE, other tables
WHERE some amazing filters
UNION ALL
SELECT
FROM
UNION ALL
UNION ALL
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
73
UNION Query - Rewrite
• Optimizing Access Paths Containing UNION
ALL
– DB2 tries to rewrite the query in this sequence:
• Distribute qualified predicates
• Prune the subselects (will also be done for UNIONs)
 Use BETWEEN, IN or COL op literal for best pruning
• Distribute the joins
 If results in more than 225 tables, then no distribution
• Distribute the aggregations (SUM & COUNT)
 To calculate accurate averages even if parallel
• Avoid Materialization
 Search for index support for each query block
 Unavoidable for nullable sets of outer joins
 Unavoidable for > 225 tables after distribution
• Execution – Pruning Continues for :hostvars
at execution time!
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
74
Constraint Definition (CHECK)
• CHECK Constraints
– Simple predicates (can use AND / OR but no subqueries)
– Limited to data in the row
– Can use deterministic User Defined Functions - very
powerful
– Defined using CREATE TABLE or ALTER TABLE
– Dropped using ALTER TABLE
• CREATE TABLE students
(name varchar(100),
age int,
CONSTRAINT agelimit
CHECK (age >= 5 AND age <= 18));
•
ALTER TABLE students DROP CONSTRAINT
agelimit ;
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
75
Constraint Definition (RI)
• Primary Key Constraints
– One per table
– Enforced using unique index on NOT NULL columns
• Unique Key Constraints
– Can define more than one per table
– Enforced using unique index on NOT NULL columns
• Foreign Key Constraints
– One or more columns
– Associated with PRIMARY KEY or UNIQUE KEY
constraint
– ON DELETE - CASCADE, SET NULL, RESTRICT, NO
ACTION
– ON UPDATE - RESTRICT, NO ACTION
– Referential Integrity can be self-referencing or cyclic
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
76
Informational Check Constraint
Example 1: Create an employee table where a minimum
salary of $25,000 is guaranteed by the application
CREATE TABLE emp
(empno INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(20),
firstname VARCHAR(20),
salary INTEGER CONSTRAINT minsalary
CHECK (salary >= 25000)
NOT ENFORCED
ENABLE QUERY OPTIMIZATION);
If later enforcement is desired:
ALTER TABLE emp ALTER CONSTRAINT
minsalary ENFORCED
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
77
Informational RI Constraint
Example 2: Create a department table where the
application ensures the existence of departments to
which the employees belong.
CREATE TABLE dept
(deptno INTEGER NOT NULL PRIMARY KEY,
deptName VARCHAR(20),
budget INTEGER);
ALTER TABLE emp ADD COLUMN dept INTEGER NOT
NULL
CONSTRAINT dept_exist
REFERENCES dept
NOT ENFORCED
ENABLE QUERY OPTIMIZATION);
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
78
EXPLOITING CONSTRAINTS
FOR QUERY OPTIMIZATION
To Prune or Not to Prune?
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
79
UNION ALL branch elimination
Data stored in separate tables for each year
Query needs 4Q/1995 data from UNION ALL View
S
Select * from T94
where tdate >= '10/01/1995'
and tdate <= '12/31/1995
Without Check Constraints
U
Without Check Constraints
S
T94
Select * from T96
where tdate >= '10/01/1995'
and tdate <= '12/31/1995
S
Select * from T95
where tdate >=
'10/01/1995‘ and tdate
<='12/31/1995
S
T96
T95
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
80
UNION ALL branch elimination
With check constraints we avoid compiling and executing
redundant branches of the UNION
Select * from T94
where tdate >= '10/01/1995'
and tdate <= '12/31/1995
and tdate >= '01/01/1994'
and tdate <= '12/31/1994'
With Check Constraints
S
T94
Select * from T96
where tdate >= '10/01/1995'
and tdate <= '12/31/1995
and tdate >= '01/01/1996'
and tdate <= '12/31/1996'
S
U
S
Select * from T95
where tdate >= '10/01/1995‘
and tdate <='12/31/1995
With Check Constraints
S
T96
T95
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
81
Exploiting RI for Query Optimization
•
•
•
•
Group By Pushdown
Group By + Truncated Order By Pushdown
Rewrite of Outer Join to Inner join
Better filter factor estimation for multi-column
RI joins
– Traditionally we use an independence assumption
– Better column correlation information with RI
• Elimination of redundant joins in star-schema
views
• Views often include more tables than query
requires
– RI allows us to prove that the joins are redundant
• RI information is exploited when matching
queries to Materialized Query Tables (MQTs)
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
82
Group By Pushdown Through
RI Joins
Find top 20 stores in terms of total revenue,
and the store name and city information:
select st.store_id, st.name, st.city, sum(f.sales) as sm
from salesF as f, store as st
where f.store_id=st.store_id
store
group by st.store_id, st.name, st.city
Dimension table
order by sm desc
Ref. Integrity
fetch first 20 rows only;
salesF
NEDB2UG March 25,
2010
Fact table
© Bayard Lee Tysor, Inc. 2009-2010
83
Group By Pushdown Through
RI Joins (Cont.)
S
Group By Pushdown
Sort
store_id, name, city, sm
order by sm desc
fetch first 20 rows only
sum(sales) as sm
GB
group by store_id, name, city
f.store_id=st.store_id /* FK=PK */
join
store_id, sales
100,000 rows
store_id, name, city
2,000 rows
salesF
NEDB2UG March 25,
2010
store
© Bayard Lee Tysor, Inc. 2009-2010
84
Group By Pushdown Through RI Joins
S
store_id, name, city, sm
After Group By Pushdown
order by sm desc
Sort
fetch first 20 rows only
f.store_id=st.store_id /* FK=PK */
join
store_id, sm
2000 rows
sum(sales) as sm
GB
group by store_id
store_id, name, city
store
salesF
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
85
Fetch First n Row
(Truncated Sort) Pushdown
S
After Group By + Truncated
Order By Pushdown
store_id, name, city, sm
f.store_id=st.store_id /* FK=PK */
join
store_id, sm
20 rows
order by sm desc
fetch first 20 rows only
sum(sales) as sm
store_id, name, city
Sort
store
GB
group by store_id
salesF
NEDB2UG March 25,
2010
z/OS - Use the “Separate the Group By Work”
method discussed in the Advanced SQL class
page 42, use nested table expressions.
© Bayard Lee Tysor, Inc. 2009-2010
86
Exploiting RI When Matching MQTs
• With a Summary table created on 5 tables .........
•
•
•
•
•
•
•
•
•
CREATE TABLE dba.PG_SALESSUM AS (
SELECT l.lineid, pg.pgid, loc.country, loc.state,
YEAR(pdate) AS year, MONTH(pdate) AS month,
SUM(ti.amount) AS amount, COUNT(*) AS count
FROM stars.transitem AS ti, stars.trans
stars.loc
AS t,
AS loc, stars.pgroup AS pg, stars.prodline
AS l
WHERE ti.transid = t.transid AND ti.pgid = pg.pgid AND pg.lineid = l.lineid AND t.locid = loc.locid
GROUP BY loc.country, loc.state, year(pdate), month(pdate) l.lineid, pg.pgid,
) DATA INITIALLY DEFERRED REFRESH IMMEDIATE;
• ...... the query on 3 tables will use the MQT with appropriate RI
between transitem and pgroup and between pgroup and prodline
•
•
•
•
•
•
SELECT YEAR(pdate) AS year, loc.country,
SUM(ti.amount) AS amount, COUNT(*) AS count
FROM
stars.transitem
AS ti, stars.trans
AS t, stars.loc
AS loc
WHERE ti.transid = t.transid AND t.locid = loc.locid
AND year(pdate) between 1990 and 1999
GROUP BY year(pdate), loc.country
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
87
Constraints Summary
Check and Referential Integrity constraints
push application rules down to the
database
The DB2 Optimizer can exploit constraint
information for better access plans
Informational constraints allows us to
optimize queries without the overhead of
enforcing
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
88
MANUAL QUERY REWRITE
Sometimes Necessary on
all Platforms
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
89
A Typical Data Warehouse/BI Query
• Initial cost of 16 million timerons
–WOULD NOT FINISH!
• Multiple DISTINCT Table
Expressions
• Initial join involved all columns
and all rows
• The very wide and very deep set
was dragged through many more
query steps
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
90
Before and After
SELECT DISTINCT
FROM
(SELECT DISTINCT
FROM
(SELECT DISTINCT
FROM
INNER
JOIN
INNER
JOIN
INNER
JOIN
INNER
JOIN
(SELECT DISTINCT
FROM
SELECT DISTINCT
FROM
LEFT JOIN
(SELECT DISTINCT
FROM
LEFT JOIN
SELECT DISTINCT
FROM
LEFT JOIN
LEFT JOIN
(SELECT DISTINCT
FROM
(SELECT DISTINCT
FROM
)
GROUP BY ROLLUP
NEDB2UG March 25,
2010
)))
GROUP BY ROLLUP ))
© Bayard Lee Tysor, Inc. 2009-2010
))
91
Conclusion
• Data Matters
–So Do Constraints
–So Does RI
–So Do Views
–So Do Access Paths
–So Does Good Index Design
–So Do MQTs!
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
92
Bibliography
– E.F. Codd – “A Relational Model of Data Large Shared
Data Banks”
– E. F. Codd – “Derivability, Redundancy and Consistency
of Relations Stored in Large Data Banks”
– Richard Snodgrass, et al – “Temporal Databases”
– Robert R. Stoll – “Set Logic and Theory”
– C.J. Date, et al – “Temporal Data & the Relational
Model”
– C.J. Date – “The Database Relational Model: A
Retrospective Review and Analysis : A Historical
Account and Assessment of E. F. Codd's Contribution to
the Field of Database Technology”
– C.J. Date – “An Introduction to Database Systems”,
Eighth Edition
NEDB2UG March 25,
2010
© Bayard Lee Tysor, Inc. 2009-2010
93
Utilizing Views, RI and Other
Stuff for Performance
New England DB2 User
Group (NEDB2UG)
March 25, 2009
B.L. “Tink” Tysor
Bayard Lee Tysor, Inc.
www.BLTysor.com
Tink@BLTysor.com
401-965-2688
94
Download