Utilizing Views, RI and Other Stuff for Performance New England DB2 User Group (NEDB2UG) March 25, 2009 B.L. “Tink” Tysor Bayard Lee Tysor, Inc. www.BLTysor.com Tink@BLTysor.com 401-965-2688 1 Bayard Lee Tysor, Inc. DB2 SQL, DBA & Data Modeling Consulting & Education Sheryl Larsen is an internationally recognized researcher, consultant and lecturer, specializing in DB2 and is known for her extensive expertise in SQL coding and tuning. SherylMLarsen@cs.com www.SMLSQL.com USA 630-399-3330 Reed Meseck is an internationally recognized researcher, consultant and lecturer, specializing in very high volume, highly scalable transaction and data warehouse systems RMeseck@attglobal.net BL "Tink" Tysor is an internationally recognized researcher, consultant and lecturer, specializing in Data Modeling, DB2 SQL and Database Administration. Tink@BLTysor.com www.BLTysor.com USA 401-965-2688 DB2 is a Registered Trademark of IBM Corporation NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 2 Outline How did we get here? Back to the future Data Matters Getting to know your data. Using Views for Domains Layered Views Views Applied Relations? Who Needs Them? Constraints? Who needs them? Manual Query Rewrite NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 3 HOW DID WE GET HERE? Back to the Future NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 4 Do You Know This Man? Dr. Edgar Frank “Ted” Codd NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 5 Codd’s 12 Rules Abridged 0. For a system to qualify as a RELATIONAL, DATABASE, MANAGEMENT system, that system must use its RELATIONAL facilities (exclusively) to MANAGE the DATABASE. 1. The information rule – Everything is a value in a column in a table. 2. The guaranteed access rule – Every scalar value in the database must be logically addressable using table name, column name and the primary key of the containing row. 3. Systematic treatment of null values – Everything has a value, even nothing (NULL). 4. Active online catalog based on the relational model – The system must eat its own dogfood, i.e. the catalog is relational and accessed via SQL. 5. The comprehensive data sublanguage rule – One comprehensive language (SQL) for expression, definition and implementation. NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 6 Codd’s 12 Rules Abridged 6. The view updating rule - All views that are theoretically updatable must be updatable by the system. 7. High-level insert, update, and delete - The system must support setat-a-time INSERT, UPDATE, and DELETE operators. 8. Physical data independence - Self-explanatory. 9. Logical data independence - Self-explanatory 10. Integrity independence – Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications. 11. Distribution independence – Data access should be transparent regardless of where it lives. 12. The nonsubversion rule – No cheaters, no perversions, no backdoors! If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system (e.g.) bypassing a relational security or integrity constraint. NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 7 E.F. Codd • Applications written transparent to physical design • Applications remain untouched by physical design changes • Physical design changes are often needed and natural in types of stored information “Future users of large data banks must be protected from having to know how the data is organized …...” E.F. Codd, A Relational Model of Data Large Shared Data Banks NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 8 Highlights • Users must be protected from having to know how the data is organized – Hiding real representation from users is okay and even good – It ensures accuracy and proper JOINs • Most application programs should remain unaffected when the internal representation of data is changed – Changes to base tables should break few or no applications • Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information – Change is inevitable – Changes in traffic are normal and likely – Growth is natural Most Applications get BIGGER NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 9 Evolution of Program Abstraction Service Oriented Architecture Object Oriented Programming APIs 3GL Languages Assembly Language Hardware Programming NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 10 Evolution of Data Abstraction Layered Views Views Relational Concept File Systems Hardware Programming NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 11 First Normal Form Remove multi Valued Attributes CLAIM Policy ID Occurrence ID Claim ID Insured Name Insured Address Insured Customer Rating Policy Effective Date Policy Expiration Date Occurrence Date Claimant Name Claimant Address Medical Payments Indemnity Payments Expense Payments CLAIM1 Policy ID Occurrence ID Claim ID Insured Name Insured Address Insured Customer Rating Policy Effective Date Policy Expiration Date Occurrence Date Claimant Name Claimant Address Payments1 Policy ID (FK) Occurrence ID (FK) Claim ID (FK) Payment Type Key (FK) Payment NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 Payment Type1 Payment Type Key Payment Type 12 Second Normal Form Remove Duplicate Data From Key Attributes POLICY2 Policy ID Insured Name Insured Address Insured Customer Rating Policy Effective Date Policy Expiration Date CLAIM1 Policy ID Occurrence ID Claim ID Insured Name Insured Address Insured Customer Rating Policy Effective Date Policy Expiration Date Occurrence Date Claimant Name Claimant Address CLAIM2 Occurrence ID Claim ID Policy ID Occurrence Date Claimant Name Claimant Address Payments1 Policy ID (FK) Occurrence ID (FK) Claim ID (FK) Payment Type Key (FK) Payment NEDB2UG March 25, 2010 Payments2 Payment Type1 Payment Type Key Payment Type Policy ID (FK) Occurrence ID (FK) Claim ID (FK) Payment Type Key (FK) Payment © Bayard Lee Tysor, Inc. 2009-2010 Payment Type2 Payment Type Key Payment Type 13 Second Normal Form Cont. Remove Duplicate Data From Key Attributes POLICY2 Policy ID OCCURRENCE2a CLAIM2 Occurrence ID Policy ID Occurrence Date Occurrence ID Claim ID Policy ID Occurrence Date Claimant Name Claimant Address CLAIM2a Occurrence ID Claim ID Claimant Name Claimant Address Payments2 NEDB2UG March 25, 2010 Policy ID Insured Name Insured Address Insured Customer Rating Policy Effective Date Policy Expiration Date Insured Name Insured Address Insured Customer Rating Policy Effective Date Policy Expiration Date Policy ID (FK) Occurrence ID (FK) Claim ID (FK) Payment Type Key (FK) Payment POLICY2 Payment Type2 Payment Type Key Payment Type Payments2a Policy ID (FK) Occurrence ID (FK) Claim ID (FK) Payment Type Key (FK) Payment © Bayard Lee Tysor, Inc. 2009-2010 Payment Type2a Payment Type Key Payment Type 14 Third Normal Form Cont. Remove Duplicate Data From Key Attributes POLICY2 INSURED3 POLICY3 Policy ID Policy ID Insured Name Insured Address Insured Customer Rating Policy Effective Date Policy Expiration Date OCCURRENCE2a Insured Key Policy Effective Date Policy Expiration Date Occurrence ID Policy ID Occurrence Date CLAIM3 CLAIM2a Occurrence ID Claim ID Claimant Name Claimant Address Occurrence ID Claim ID Claimant Key CLAIMANT3 Claimant Key Claimant Name Claimant Address Payments3 Payments2a NEDB2UG March 25, 2010 Insured Name Insured Address Insured Customer Rating OCCURRENCE3 Occurrence ID Policy ID Occurrence Date Policy ID (FK) Occurrence ID (FK) Claim ID (FK) Payment Type Key (FK) Payment Insured Key Payment Type2a Payment Type Key Payment Type Policy ID (FK) Occurrence ID (FK) Claim ID (FK) Payment Type Key (FK) Payment © Bayard Lee Tysor, Inc. 2009-2010 Payment Type3 Payment Type Key Payment Type 15 Star Schema Example Payment_Key: INTEGER Payment_Type: CHAR Time_Key: INTEGER Policy_Key: INTEGER Date: DATE Quarter: SMALLINT Loss_Period: CHAR() Premium_Period: CHAR() Policy_ID: CHAR Policy_Effective_Date: DATE Policy_Expiration_Date: DATE Payment_Key: INTEGER Fact_Key: INTEGER Policy_ID: INTEGER Time_Key: SMALLINT Insured_Key: INTEGER Claimant_Key: INTEGER Payment: DECIMAL(,) DECIMAL(,) Indemnity_Payments: Medical_Payments: DECIMAL(,) Expense_Payments: DECIMAL(,) Insured_Key: INTEGER Claimant_Key: INTEGER Insured_Name: CHAR Insured_Address: CHAR() Insured_Customer_Rating: CHAR() Claimant_Name: CHAR Claimant_Address: CHAR() NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 16 To Reduce Costs E L A P S E D T I M E CPU COST NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 17 To Reduce Costs •Development •Maintenance •Enhancements •Fixing Bugs NEDB2UG March 25, 2010 E L A P S E D T I M E © Bayard Lee Tysor, Inc. 2009-2010 CPU COST 18 Performance – How Can We Affect It? • “Grain” of Performance – Large – “Gross” tuning • Data Design Database design 3NF, Horizontal Table splits Data placement Partitioning, load balancing Data organization UNION in Views • Subsystem Parameters NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 19 Performance – How Can We Affect It? • “Grain” of Performance – Small – “Fine” tuning • SQL tuning Rewriting the query • Index tuning Altering Index design • Query plan tuning Changing the Optimizers’ Mind NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 20 UNION ALL Views UNION ALL View Daily Report Frequency 0 R E G I O N S I Z E 500 M 1000 M I I U Partitioned By Region S A USA N X N One physical table of all regions with five years of data NEDB2UG March 25, 2010 X © Bayard Lee Tysor, Inc. 2009-2010 One logical table of all regions with five years of data 21 DATA MATTERS Getting to Know Your Data NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 22 Parkinson’s Law of Data Definition: "Data expands to fill the space available for storage"; Buying more memory encourages the use of more memory-intensive techniques. It has been observed over the last 10 years that the memory usage of evolving systems tends to double roughly once every 18 months. Fortunately, memory density available for constant dollars also tends to double about once every 12 months (see Moore's Law); Unfortunately, the laws of physics guarantee that the latter cannot continue indefinitely. • - COPYRIGHT © 2000-2003 WEBNOX CORP. HYPERDICTIONARY NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 23 What Matters …. • Size matters! − Absolute size − Relative size − Measures Rows Bytes Etc. • Where it matters − JOINS − Schema definition NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 24 Cardinality - Partitioning • What Does Your Data Distribution Look Like? – Ideal and Uniform? – Less than Ideal and “Clumpy” • Yesterday’s Partitioning Scheme May Not Work Today! Customers A-F Customers A-F NEDB2UG March 25, 2010 Customers G-L Customers M-R Customers M-R © Bayard Lee Tysor, Inc. 2009-2010 Customers S-Z 25 Strategies for Performance UNION ALL View Daily Report Frequency 0 R E G I O N S I Z E 500 M 1000 M I I U S A USA Partitioned By Subsets of USA N X N One physical table of all regions with five years of data NEDB2UG March 25, 2010 X © Bayard Lee Tysor, Inc. 2009-2010 One logical table of all regions with five years of data 26 UNION ALL View M UNION ALL View I UNION ALL View M C-4 M Current I C-4 I Current Partitioned by Month USA C- 5 USA UNION ALL View USA C-4 USA C-3 N X One logical table of all regions with five years of data NEDB2UG March 25, 2010 UNION ALL View UNION ALL View USA C-2 USA C-2 USA C-2 C-2 USA Current USA USA C-2 USA Current USA C-2 USA Current USA C-2 USA Current USA C-2 USA Current USA C-2 USA Current USA C-2 USA Current USA C-2 USA Current USA C-2 USA Current USA Current USA Current USA Current N C-4 N Current X C-4 X Current © Bayard Lee Tysor, Inc. 2009-2010 27 Strata 2004 2003 2002 2001 NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 28 Affinity Part 1 Part 2 Part 3 Part 4 (Southwest) (Northwest) (Southeast) (Northeast) Application Servers NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 29 Motif / Template Pattern • Data “Frame” is common – Note the commonality • Variant portion is small – “ABC”, “DEF”, “GHI” in the example • Commonly found in XML documents, Web pages, etc. NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 30 Vertical Split VIEW definition hides JOIN from APP Customer Customer - Employee Prior 7 Years NEDB2UG March 25, 2010 Employee © Bayard Lee Tysor, Inc. 2009-2010 31 Horizontal Split UNION ALL VIEW Newest Prior 2 Years Customer - Employee Prior 7 Years Prior 5 Years Oldest NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 32 Horizontal Split UNION ALL and MQT “NOW” Current Period (MQT) Newest Prior 2 Years Customer - Employee Prior 7 Years Prior 5 Years Oldest NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 33 Horizontal & Vertical Splitting UNION ALL, MQT and Vertical “NOW” Current Period (MQT) Newest Prior 2 Years Customer - Employee Customer Prior 7 Years Prior 5 Years Oldest NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 34 Patterns of Denormalization • Collapsing Tables • Splitting Tables –Horizontal Split –Vertical Split • Adding Redundant Columns NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 35 Collapsing Tables Table A Table A Table B NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 36 Splitting Tables – Horizontal(Strata) Table A1 Table A Table A2 NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 37 Splitting Tables – Vertical (Striping) Table A Table A Table B NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 38 USING VIEWS FOR DOMAINS NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 39 Relationship of Domains Domain of All Customers Domain of Active Customers Domain of Active Customers with Account Balance <> 0 NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 40 Domains as Views SELECT * FROM V_CUSTOMERS V_CUSTOMERS_ALL SELECT * FROM V_CUSTOMERS_ALL WHERE ACTIVE = ‘Y’ V_CUSTOMERS_ACTIVE V_CUSTOMERS_SENDBILL SELECT * FROM V_CUSTOMERS_ACTIVE WHERE BALANCE <> 0 NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 41 LAYERED VIEWS Simple Concepts for Simple Minds NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 42 Layered Views Late Current Active Customers Inactive Customers All Customers Base Table NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 43 Views Can Change Late Current Balance <> 0 Active Customers Inactive Customers All Customers Base Table NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 44 Base Tables Can Change Late Current Balance <> 0 Active Customers Inactive Customers All Customers Base Table NEDB2UG March 25, 2010 Base Table Base Table © Bayard Lee Tysor, Inc. 2009-2010 Base Table 45 VIEWS APPLIED Can you spell “Viewmiester”? NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 46 Views • Views for Easier Programming – Reduce complexity for reliability • Logical data independence – Ability to modify the physical layout – No program impact • Views for performance – “Skinny” views – Views that define domains • Result set is primary keys • Views for Reuse – Combine domains using SET operators NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 47 Views Can Make Programming Easier SELECT P.POLICY_ID ,O.OCCURANCE_ID ,C.CLAIM_ID Insured3 ,I.INSURED_NAME Insured Key ,I.INSURED ADDRESS ,I.INUSRED_CUSTOMER_RATING Insured Name ,P.POLICY_EFFECTIVE_DATE Insured Address ,P.POLICY_EXPIRATION_DATE Insured Customer Rating ,O.OCCURANCE_DATE ,T.CLAIMANT_NAME FROM ,T.CLAIMANT_ADDRESS POLICY3 P INNER JOIN INSURED3 I ,COALESCE(MED.MEDICAL_PAYMENTS,0) ON P.INSURED_KEY = I.INSURED_KEY AS MEDICAL_PAYMENTS INNER JOIN OCCURANCE3 O ,COALESCE(IND.INDEMNITY_PAYMENTS,0) ON P.POLICY_ID = O.POLICY_ID AS INDEMNITY_PAYMENTS INNER JOIN CLAIM3 C ,COALESCE(EXP.EXPENSE_PAYMENTS,0) ON C.OCCURANCE_ID = O.OLCCURANCE_ID AS EXPENSE_PAYMENTS INNER JOIN CLAIMANTS3 T ON C.CLAIMANT_KEY = T.CLAIMANT_K LEFT OUTER JOIN CLAIM (SELECT OCCURRENCE_ID,CLAIM_ID PAYMENT AS MEDICAL_PAYMENTS Claimants3 Policy ID FROM PAYMENT3 Occurrence ID Claimant Key WHERE PAYMENT_TYPE_KEY = ‘M’) AS MED Claim ID ON MED.OCCURANCE_ID = C.OCCURANCE_ID Claimant Name AND MED.CLAIM_ID = C.CLAIM_ID LEFT OUTER JOIN Insured Name Claimant Address (SELECT OCCURRENCE_ID,CLAIM_ID Insured Address ,PAYMENT AS INDEMNITY_PAYMENTS Insured Customer Rating FROM PAYMENT3 Policy Effective Date WHERE PAYMENT_TYPE_KEY = ‘I’) AS IND Policy Expiration Date ON IND.OCCURANCE_ID = C.OCCURANCE_ID Occurrence Date IND.CLAIM_ID = C.CLAIM_ID Payment Type3 LEFTAND Claimant Name OUTER JOIN Payment Type Key (SELECT OCCURRENCE_ID,CLAIM_ID Claimant Address ,PAYMENT AS EXPENSE_PAYMENTS Medical Payments Payment Type FROM PAYMENT3 Indemnity Payments WHERE PAYMENT_TYPE_KEY = ‘E’) AS EXP Expense Payments ON EXP.OCCURANCE_ID = C.OCCURANCE_ID AND EXP.CLAIM_ID = C.CLAIM_ID; Create VIEW CLAIM AS Policy3 Policy ID Insured Key (FK) Policy Effective Date Policy Expiration Date Occurance3 Occurrence ID Policy ID (FK) Occurrence Date Claim3 Occurrence ID (FK) Claim ID Claimant Key (FK) Payment3 Occurrence ID (FK) Claim ID (FK) Payment Type Key (FK) Payment Insured Key Name Insured NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 48 Views Can Make Programming Easier Policy3 CLAIM Policy ID Occurrence ID Claim ID Insured Name Insured Address Insured Customer Rating Policy Effective Date Policy Expiration Date Occurrence Date Claimant Name Claimant Address Medical Payments Indemnity Payments Expense Payments Policy ID Insured Key Insured Key (FK) Policy Effective Date Policy Expiration Date Insured Name Insured Address Insured Customer Rating Occurance3 Occurrence ID Policy ID (FK) Occurrence Date SELECT C.INSURED_NAME ,SUM(C.MEDICAL_PAYMENTS) AS TOTAL_PAYMENTS FROM CLAIM C GROUP BY C.INSURED_NAME ORDER BY TOTAL_PAYMENTS DESC FETCH FIRST 10 ROWS ONLY; NEDB2UG March 25, 2010 Insured3 Claim3 Occurrence ID (FK) Claim ID Claimant Key (FK) RI Claimants3 Claimant Key Claimant Name Claimant Address Payment3 Occurrence ID (FK) Claim ID (FK) Payment Type Key (FK) Payment Insured InsuredName Key` Insured Key © Bayard Lee Tysor, Inc. 2009-2010 Payment Type3 Payment Type Key Payment Type 49 Using Views to Avoid XML • Can make an XML document look like a DB2 Column • Performance could be a problem NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 50 Sample XML <Courses> <Course ID="B1"> <Title>Basic SQL</Title> <Instructor ID = "BLT"> <Name>B.L. "Tink" Tysor</Name> <Phone>401-965-2688</Phone> <Email>Tink@BLTysor.com</Email> <Web_Site>www.BLTysor.com</Web_Site> </Instructor> <Duration>1 Day</Duration> <Labs>3 Labs</Labs> </Course> <Course ID="I1"> <Title>Intermediate SQL</Title> <Instructor ID = "BLT"> <Name>B.L. "Tink" Tysor</Name> <Phone>401-965-2688</Phone> <Email>Tink@BLTysor.com</Email> <Web_Site>www.BLTysor.com</Web_Site> </Instructor> <Instructor ID = "SML"> <Name>Sheryl M. Larsen</Name> <Phone>630-399-3330</Phone> <Email>SMLSQL@Comcast.net</Email> <Web_Site>www.SMLSQL.com</Web_Site> </Instructor> <Duration>2 Days</Duration> <Labs>6 Labs</Labs> </Course> NEDB2UG March 25, 2010 <Course ID = "A2"> <Title>Tuning DB2 SQL for Performance</Title> <Instructor ID = "SML"> <Name>Sheryl M. Larsen</Name> <Phone>630-399-3330</Phone> <Email>SMLSQL@Comcast.net</Email> <Web_Site>www.SMLSQL.com</Web_Site> </Instructor> <Duration>1 Day</Duration> <Labs>1 Lab</Labs> </Course> <Course ID = "X1"> <Title>pureXML</Title> <Instructor ID = "BLT"> <Name>B.L. "Tink" Tysor</Name> <Phone>401-965-2688</Phone> <Email>Tink@BLTysor.com</Email> <Web_Site>www.SMLSQL.com</Web_Site> </Instructor> <Duration>2 Days</Duration> <Labs>6 Labs</Labs> </Course> </Courses> © Bayard Lee Tysor, Inc. 2009-2010 51 Using Views to Avoid XML CREATE VIEW VCLASSES AS SELECT AC.CLASS_EFF, XT.ID, XT.TITLE, XT.DURATION, XT.LABS FROM ALL_CLASSES AC , XMLTABLE( '$T/Courses/Course' PASSING AC.CLASSES AS "T" COLUMNS “ID" CHAR(3) PATH './@ID' ,”TITLE" VARCHAR(30) PATH 'Title' ,”DURATION" CHAR(10) PATH 'Duration' ,”LABS" CHAR(10) PATH 'Labs' ) AS XT WHERE AC.CLASS_EFF_DTE= '12/01/2008'; SELECT * FROM VCLASSES; CLASS_EFF ID TITLE 2008-12-01 B1 Basic SQL 2008-12-01 I1 Intermediate SQL 2008-12-01 A2 Tuning DB2 SQL for Performance 2008-12-01 X1 pureXML NEDB2UG March 25, 2010 DURATION 1 Day 2 Days 1 Day 2 Days © Bayard Lee Tysor, Inc. 2009-2010 LABS 3 Labs 6 Labs 1 Lab 6 Labs 52 Using Views to Optimize XML … WHERE XMLEXISTS(‘$a/author[@id = $book/authors/author/@id]’ PASSING bookinfo as “b”, authorinfo as “a”) … •Does not use indexes CREATE INDEX bookAuthorIdx ON books(bookinfo) GENERATE KEY USING XMLPATTERN ‘/book/authors/author/@id’ AS SQL DOUBLE; CREATE INDEX authorIdx ON authors(authorinfo) GENERATE KEY USING XMLPATTERN ‘/author/@id’ AS SQL DOUBLE; … WHERE XMLEXISTS(‘$a/author[@id/xs:double(.) = $book/authors/author/@id/xs:double(.)]’ PASSING bookinfo as “b”, authorinfo as “a”) … NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 •Uses indexes 53 Exception Based View County Coverage County ID Coverage ID State Code County Info Coverage Info ZIP Code ZIP Code State State Code State ID State Info Locality Rate County ZIP Locality REL State Code County ID ZIP Code Locality ID REL Info Coverage ID State Code County ID ZIP Code Locality ID Rate Factor Locality ID 31 Rows per Coverage (actually >100,000) State Code Locality Info ZIP Codes State Code NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 54 Exception Based View cont. County Coverage County ID Coverage ID State Code County Info Coverage Info ZIP Code ZIP Code State State Code State ID State Info Locality Rate County ZIP Locality REL State Code County ID ZIP Code Locality ID REL Info Locality ID Coverage ID State Code County ID NULL ZIP Code NULL Locality ID NULL Rate Factor Priority Key 5 Rows (actually approx 5,000) State Code Locality Info ZIP Codes State Code NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 55 Exception Based View cont. Exceptions the default would be according to the following priorities COUNTY/ZIP/LOCALITY if no row then ZIP/LOCALITY if no row then ZIP/COUNTY if no row then LOCALITY if no row then ZIP if no row then COUNTY if no row then STATE NEDB2UG March 25, 2010 Coverage Coverage ID Coverage Info Rate County ZIP Locality REL State Code County ID ZIP Code Locality ID REL Info Coverage ID State Code County ID NULL ZIP Code NULL Locality ID NULL Rate Factor Priority Key Geo_Pol_Rating Coverage ID State Code County ID ZIP Code Locality ID Rate Factor © Bayard Lee Tysor, Inc. 2009-2010 56 Exception Based View cont. CREATE VIEW GEO_POL_RATING Rate County ZIP Locality REL AS Coverage ID Coverage SELECT State Code State Code Coverage ID County ID County ID NULL COVERAGE_ID Coverage Info ZIP Code ZIP Code NULL ,GP.STATE_CODE Locality ID Locality ID NULL ,GP.COUNTY_ID REL Info Rate Factor ,GP.LOCALITY_ID Priority Key ,GP.ZIP_CODE (SELECT ,RATE_FACTOR MIN(R1.PRIORTY_KEY) Geo_Pol_Rating FROM FROM COUNTYZIP_LOCALITY_REL GP Coverage ID COUNTYZIP_LOCALITY_REL GP1 INNER JOIN State Code INNER JOIN RATE R County ID RATE R1 ZIP Code ON ON Locality ID GP.STATE_CODE = R.STATE_CODE Rate Factor GP1.STATE_CODE = R1.STATE_CODE AND AND GP.ZIP_CODE = GP1.ZIP_CODE = COALESCE(R1.ZIP_CODE,GP1.ZIP_CODE) COALESCE(R.ZIP_CODE,GP.ZIP_CODE) AND AND GP1.LOCALITY_ID = COALESCE(R1.LOCALITY_ID,GP1.LOCALITY_ID) GP.LOCALITY_ID = AND COALESCE(R.LOCALITY_ID,GP.LOCALITY_ID) GP1.COUNTY_ID = COALESCE(R1.COUNTY_ID,GP1.COUNTY_ID) AND WHERE GP.COUNTY_ID = GP.STATE_CODE = GP1.STATE_CODE COALESCE(R.COUNTY_ID,GP.COUNTY_ID) AND WHERE GP.COUNTY_ID = GP1.COUNTY_ID R.PRIORTY_KEY = AND GP.LOCALITY_ID = GP1.LOCALITY_ID AND Correlated GP.ZIP_CODE = GP1.ZIP_CODE) Subquery ; NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 57 Exception Based View cont. Rate SELECT GPR.RATE_FACTOR FROM GEO_POL_RATING GPR WHERE GPR.STATE_CODE = 'RI' AND GPR.COUNTY_ID = 'PROVIDENCE' AND GPR.LOCALITY_ID = 'PROVIDENCE' AND GPR.ZIP_CODE = '02906' ; Geo_Pol_Rating Coverage ID State Code County ID ZIP Code Locality ID Rate Factor RATE_FACTOR ----------6.2700 1 record(s) selected. NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 58 RELATIONS? WHO NEEDS THEM? Subliminal Requirements NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 59 Dumb, Simple SQL Join SELECT E.EMPNO FROM EMPLOYEE E, DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO ; E NEDB2UG March 25, 2010 D © Bayard Lee Tysor, Inc. 2009-2010 60 Do Constraints Matter? No Indexes or RI SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 61 What Does a Primary Index Buy Us? One Primary Index SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 62 What Does A Foreign Key Constraint Buy Us? One Foreign Key Constraint SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 63 What Does a Secondary Index Buy Us? One Secondary Index & RI SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 64 Does it Matter with Views? • Views! CREATE VIEW JOINVIEW_ED (EMPNO) AS SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO ; NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 65 Same! SELECT * FROM JOINVIEW_ED; No Indexes or RI CREATE VIEW JOINVIEW_ED (EMPNO) AS SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO ; NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 66 Same! SELECT * FROM JOINVIEW_ED; One Primary Index CREATE VIEW JOINVIEW_ED (EMPNO) AS SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO ; NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 67 Same!! SELECT * FROM JOINVIEW_ED; One Foreign Key Constraint CREATE VIEW JOINVIEW_ED (EMPNO) AS SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO ; NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 68 Same!!! One Secondary Index & RI SELECT * FROM JOINVIEW_ED; CREATE VIEW JOINVIEW_ED (EMPNO) AS SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO ; NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 69 Redundant Join Elimination .. Very Powerful Elimination of redundant joins between tables related through an RI constraint Employee empno Department deptno workdept Original View EmpDeptView SQL Rewritten SQL NEDB2UG March 25, 2010 SELECT * FROM Employee E, Department D WHERE workdept = deptno SELECT empno, workdept FROM EmpDeptView WHERE workdept = deptno SELECT empno, workdept FROM Employee WHERE workdept is not null © Bayard Lee Tysor, Inc. 2009-2010 70 CONSTRAINTS? WHO NEEDS THEM? Where to Use Them, Why They Matter! NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 71 Constraints • Primitive constraints – – – – data type NOT NULL unique indexes DEFAULT • Table CHECK Constraints • Referential Integrity – Primary Key Constraints – Unique Key Constraints – Foreign Key Constraints • Triggers (may be) • Constraints on Views "WITH CHECK OPTION" NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 72 UNION ALL Views • Providing SELECT Transparency CREATE VIEW LOGICAL_TABLE ….. AS UNION ALL SELECT columns FROM LOGICAL_TABLE, other tables WHERE some amazing filters UNION ALL SELECT FROM UNION ALL UNION ALL NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 73 UNION Query - Rewrite • Optimizing Access Paths Containing UNION ALL – DB2 tries to rewrite the query in this sequence: • Distribute qualified predicates • Prune the subselects (will also be done for UNIONs) Use BETWEEN, IN or COL op literal for best pruning • Distribute the joins If results in more than 225 tables, then no distribution • Distribute the aggregations (SUM & COUNT) To calculate accurate averages even if parallel • Avoid Materialization Search for index support for each query block Unavoidable for nullable sets of outer joins Unavoidable for > 225 tables after distribution • Execution – Pruning Continues for :hostvars at execution time! NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 74 Constraint Definition (CHECK) • CHECK Constraints – Simple predicates (can use AND / OR but no subqueries) – Limited to data in the row – Can use deterministic User Defined Functions - very powerful – Defined using CREATE TABLE or ALTER TABLE – Dropped using ALTER TABLE • CREATE TABLE students (name varchar(100), age int, CONSTRAINT agelimit CHECK (age >= 5 AND age <= 18)); • ALTER TABLE students DROP CONSTRAINT agelimit ; NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 75 Constraint Definition (RI) • Primary Key Constraints – One per table – Enforced using unique index on NOT NULL columns • Unique Key Constraints – Can define more than one per table – Enforced using unique index on NOT NULL columns • Foreign Key Constraints – One or more columns – Associated with PRIMARY KEY or UNIQUE KEY constraint – ON DELETE - CASCADE, SET NULL, RESTRICT, NO ACTION – ON UPDATE - RESTRICT, NO ACTION – Referential Integrity can be self-referencing or cyclic NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 76 Informational Check Constraint Example 1: Create an employee table where a minimum salary of $25,000 is guaranteed by the application CREATE TABLE emp (empno INTEGER NOT NULL PRIMARY KEY, name VARCHAR(20), firstname VARCHAR(20), salary INTEGER CONSTRAINT minsalary CHECK (salary >= 25000) NOT ENFORCED ENABLE QUERY OPTIMIZATION); If later enforcement is desired: ALTER TABLE emp ALTER CONSTRAINT minsalary ENFORCED NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 77 Informational RI Constraint Example 2: Create a department table where the application ensures the existence of departments to which the employees belong. CREATE TABLE dept (deptno INTEGER NOT NULL PRIMARY KEY, deptName VARCHAR(20), budget INTEGER); ALTER TABLE emp ADD COLUMN dept INTEGER NOT NULL CONSTRAINT dept_exist REFERENCES dept NOT ENFORCED ENABLE QUERY OPTIMIZATION); NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 78 EXPLOITING CONSTRAINTS FOR QUERY OPTIMIZATION To Prune or Not to Prune? NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 79 UNION ALL branch elimination Data stored in separate tables for each year Query needs 4Q/1995 data from UNION ALL View S Select * from T94 where tdate >= '10/01/1995' and tdate <= '12/31/1995 Without Check Constraints U Without Check Constraints S T94 Select * from T96 where tdate >= '10/01/1995' and tdate <= '12/31/1995 S Select * from T95 where tdate >= '10/01/1995‘ and tdate <='12/31/1995 S T96 T95 NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 80 UNION ALL branch elimination With check constraints we avoid compiling and executing redundant branches of the UNION Select * from T94 where tdate >= '10/01/1995' and tdate <= '12/31/1995 and tdate >= '01/01/1994' and tdate <= '12/31/1994' With Check Constraints S T94 Select * from T96 where tdate >= '10/01/1995' and tdate <= '12/31/1995 and tdate >= '01/01/1996' and tdate <= '12/31/1996' S U S Select * from T95 where tdate >= '10/01/1995‘ and tdate <='12/31/1995 With Check Constraints S T96 T95 NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 81 Exploiting RI for Query Optimization • • • • Group By Pushdown Group By + Truncated Order By Pushdown Rewrite of Outer Join to Inner join Better filter factor estimation for multi-column RI joins – Traditionally we use an independence assumption – Better column correlation information with RI • Elimination of redundant joins in star-schema views • Views often include more tables than query requires – RI allows us to prove that the joins are redundant • RI information is exploited when matching queries to Materialized Query Tables (MQTs) NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 82 Group By Pushdown Through RI Joins Find top 20 stores in terms of total revenue, and the store name and city information: select st.store_id, st.name, st.city, sum(f.sales) as sm from salesF as f, store as st where f.store_id=st.store_id store group by st.store_id, st.name, st.city Dimension table order by sm desc Ref. Integrity fetch first 20 rows only; salesF NEDB2UG March 25, 2010 Fact table © Bayard Lee Tysor, Inc. 2009-2010 83 Group By Pushdown Through RI Joins (Cont.) S Group By Pushdown Sort store_id, name, city, sm order by sm desc fetch first 20 rows only sum(sales) as sm GB group by store_id, name, city f.store_id=st.store_id /* FK=PK */ join store_id, sales 100,000 rows store_id, name, city 2,000 rows salesF NEDB2UG March 25, 2010 store © Bayard Lee Tysor, Inc. 2009-2010 84 Group By Pushdown Through RI Joins S store_id, name, city, sm After Group By Pushdown order by sm desc Sort fetch first 20 rows only f.store_id=st.store_id /* FK=PK */ join store_id, sm 2000 rows sum(sales) as sm GB group by store_id store_id, name, city store salesF NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 85 Fetch First n Row (Truncated Sort) Pushdown S After Group By + Truncated Order By Pushdown store_id, name, city, sm f.store_id=st.store_id /* FK=PK */ join store_id, sm 20 rows order by sm desc fetch first 20 rows only sum(sales) as sm store_id, name, city Sort store GB group by store_id salesF NEDB2UG March 25, 2010 z/OS - Use the “Separate the Group By Work” method discussed in the Advanced SQL class page 42, use nested table expressions. © Bayard Lee Tysor, Inc. 2009-2010 86 Exploiting RI When Matching MQTs • With a Summary table created on 5 tables ......... • • • • • • • • • CREATE TABLE dba.PG_SALESSUM AS ( SELECT l.lineid, pg.pgid, loc.country, loc.state, YEAR(pdate) AS year, MONTH(pdate) AS month, SUM(ti.amount) AS amount, COUNT(*) AS count FROM stars.transitem AS ti, stars.trans stars.loc AS t, AS loc, stars.pgroup AS pg, stars.prodline AS l WHERE ti.transid = t.transid AND ti.pgid = pg.pgid AND pg.lineid = l.lineid AND t.locid = loc.locid GROUP BY loc.country, loc.state, year(pdate), month(pdate) l.lineid, pg.pgid, ) DATA INITIALLY DEFERRED REFRESH IMMEDIATE; • ...... the query on 3 tables will use the MQT with appropriate RI between transitem and pgroup and between pgroup and prodline • • • • • • SELECT YEAR(pdate) AS year, loc.country, SUM(ti.amount) AS amount, COUNT(*) AS count FROM stars.transitem AS ti, stars.trans AS t, stars.loc AS loc WHERE ti.transid = t.transid AND t.locid = loc.locid AND year(pdate) between 1990 and 1999 GROUP BY year(pdate), loc.country NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 87 Constraints Summary Check and Referential Integrity constraints push application rules down to the database The DB2 Optimizer can exploit constraint information for better access plans Informational constraints allows us to optimize queries without the overhead of enforcing NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 88 MANUAL QUERY REWRITE Sometimes Necessary on all Platforms NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 89 A Typical Data Warehouse/BI Query • Initial cost of 16 million timerons –WOULD NOT FINISH! • Multiple DISTINCT Table Expressions • Initial join involved all columns and all rows • The very wide and very deep set was dragged through many more query steps NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 90 Before and After SELECT DISTINCT FROM (SELECT DISTINCT FROM (SELECT DISTINCT FROM INNER JOIN INNER JOIN INNER JOIN INNER JOIN (SELECT DISTINCT FROM SELECT DISTINCT FROM LEFT JOIN (SELECT DISTINCT FROM LEFT JOIN SELECT DISTINCT FROM LEFT JOIN LEFT JOIN (SELECT DISTINCT FROM (SELECT DISTINCT FROM ) GROUP BY ROLLUP NEDB2UG March 25, 2010 ))) GROUP BY ROLLUP )) © Bayard Lee Tysor, Inc. 2009-2010 )) 91 Conclusion • Data Matters –So Do Constraints –So Does RI –So Do Views –So Do Access Paths –So Does Good Index Design –So Do MQTs! NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 92 Bibliography – E.F. Codd – “A Relational Model of Data Large Shared Data Banks” – E. F. Codd – “Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks” – Richard Snodgrass, et al – “Temporal Databases” – Robert R. Stoll – “Set Logic and Theory” – C.J. Date, et al – “Temporal Data & the Relational Model” – C.J. Date – “The Database Relational Model: A Retrospective Review and Analysis : A Historical Account and Assessment of E. F. Codd's Contribution to the Field of Database Technology” – C.J. Date – “An Introduction to Database Systems”, Eighth Edition NEDB2UG March 25, 2010 © Bayard Lee Tysor, Inc. 2009-2010 93 Utilizing Views, RI and Other Stuff for Performance New England DB2 User Group (NEDB2UG) March 25, 2009 B.L. “Tink” Tysor Bayard Lee Tysor, Inc. www.BLTysor.com Tink@BLTysor.com 401-965-2688 94