DB2 Version 9: Overview of pureXML Plus an XPath Primer, XML Column Indexes, Support for Validation Gregg Lippa Themis Inc. http://www.themisinc.com glippa@themisinc.com © 2009 Themis, Inc. All rights reserved. • Gregg Lippa is currently a Senior Technical Advisor at Themis Inc. He teaches DB2-related courses on SQL, Application Programming, and optimization, performance and tuning as well as Java, J2EE, distributed computing and Java-related topics. Gregg has worked with DB2 as a consultant and trainer for over 20 years and with Java for 10 years. • This material is taken from the Themis course DB1091: DB2 9 for z/OS pureXML Features. For more information visit http://www.themisinc.com. • Products and company names mentioned may be trademarks of their respective companies. Mention of third party products or software is for reference only and constitutes neither a recommendation nor an endorsement. © 2009 Themis, Inc. All rights reserved. DB2 Version 9 • pureXML technology – Seamlessly integrating XML with relational data • Including publishing and parsing functions – Storing XML data natively in a database table • Including a new XML data type © 2009 Themis, Inc. All rights reserved. An XML Document <patient id="13579"> <name>John Doe</name> <addr> <street>123 Main</street> <city>Anytown</city> <state>PA</state> <zip>19134</zip> </addr> <phone type="home">610-654-1234</phone> <phone type="work">610-987-4321</phone> <email>jdoe@mymail.com</email> <service> <sdate>2008-09-22</sdate> <reason>flu shot</reason> <descrip>gave flu shot</descrip> <cost>40.00</cost> <copay>10.00</copay> </service> <service> <sdate>2008-05-11</sdate> <reason>sore foot</reason> <descrip>referred to specialist</descrip> <cost>60.00</cost> <copay>10.00</copay> </service> </patient> © 2009 Themis, Inc. All rights reserved. Root Nodes Elements Attributes Values (text) Well-formed Valid XML – Why? • Pervasive • Versatile • Self-describing • Neutral for exchanging data among diverse devices – Universal standard for data interchange • Growth of XML data – XML-based industry and data standards – SOA and Web services; Services-based frameworks • messages are encapsulated as XML – Web 2.0 technologies • XML feeds • Syndication services - rendered as XML files • XML data becoming more critical to enterprise operations © 2009 Themis, Inc. All rights reserved. Previous Approaches to XML Document Storage • File systems – Not storing XML documents in a database – Does not scale well • Stuffing – XML data stored as large objects or VARCHAR – Inefficient for querying the XML data • Shredding – Decomposing XML data into multiple columns & tables – Often leads to complex join requirements – May be difficult to recreate original XML document • Utilizing XML-only database systems – Few options and little expertise available © 2009 Themis, Inc. All rights reserved. XML vs Relational Model • Major differences: – XML data is hierarchical; relational data is tabular – XML data is self-describing; relational data is not – XML data is ordered; relational data is not • Which approach is right for my data? – XML maximizes flexibility of the data structure – Relational data provides performance benefits for data retrieval – Relational data supports referential integrity requirements • Referential constraints cannot be based on XML columns – Data warehouses are oriented toward relational data © 2009 Themis, Inc. All rights reserved. pureXML Capabilities • XML data type – With specialized hierarchical storage structure • Indexing capabilities – Based on data within XML documents • New query languages (XPath and SQL/XML) – New query optimization techniques too • XML schema support – Including validation • Database utilities support • Integration with JDBC, ODBC, Embedded SQL • XML shredding and publishing facilities – For composing and decomposing XML documents © 2009 Themis, Inc. All rights reserved. Benefits of DB2 pureXML Technology • Faster development – Code simplification – Avoiding XML-relational transformations • Increased agility – Versatile XML schema evolution • Quickly modify applications to support new or changing requirements • Improved usability – Exploit previously unmanaged XML data – Speed up query processing through XML-optimized storage and indexing © 2009 Themis, Inc. All rights reserved. pureXML Usage Scenarios • Integration of diverse data sources – DB2 provides the ability to join XML documents • Form processing – Store whole electronic forms in DB2 rather than shredding • Document storage and querying – Store and manage less structured (document-centric) XML data in DB2 • Using XML to support transactions – Service-oriented architectures (SOA) offer message-based transactions – XML data can then be retrieved, updated, searched and analyzed • Syndication and XML feeds – Serve XML feeds through a Web services interface – Provide a repository for XML data to support these feeds © 2009 Themis, Inc. All rights reserved. DB2 Support for XML 0 – XML Storage DB2 ENGINE 1 – Bind in XML 7 2 – Store as XML XML 0 1 Application that makes use of XML 2 4 – Retrieve XML data 4 Textual XML 3 6 8 9 5 Relational XSR (validation) 3 – Shred into Relational 5 – Publish as XML 6 – Bind XML output 7 – XML to XML 8 – XML to Relational 9 – Relational to XML © 2009 Themis, Inc. All rights reserved. pureXML Architecture • DB2 9 – a hybrid database system – One database with both relational and native XML data – Single hybrid database engine handles all processing • Application may combine SQL and SQL/XML – Access relational and XML data in a hybrid database • XML data is stored separately from other table contents – Has its own table space • Supports XML document validation with XML schemas – XML schemas used for validation are registered with DB2 • XML Schema Repository (XSR) © 2009 Themis, Inc. All rights reserved. XML Data Type • DB2 native XML support includes a new XML data type – An XML column holds one XML document for each row • XML data is stored in a parsed tree structure • XML document can also be stuffed into LOB or VARCHAR – Provides advantages in certain scenarios • XML storage requirements – Separate XML tablespace plus space for any needed indexes © 2009 Themis, Inc. All rights reserved. The XML Data Type • Optimized storage – New XML data type • Supports insert, update, and delete • Stores parsed XML documents • Available when creating or altering tables CREATE TABLE PATIENT (PATIENTID CHAR(6), PATIENT_XML XML) CREATE TABLE P2 (PID INT, PTYP CHAR(8), PX1 XML, PX2 XML) – Supports access to nodes within XML document via XPath • Query optimization – CREATE INDEX supports specification of an xmlpattern © 2009 Themis, Inc. All rights reserved. XML Indexes • Indexes are often used to improve query performance • Indexes on XML columns are supported in DB2 9 – Uses an XML pattern (XPath) expression – Indexes paths and values in stored XML documents • XML index entries provide access to document nodes – Not limited to providing access to the beginning of a document – Index keys are created based on XML pattern expressions CREATE INDEX PATINDEX ON PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/service/sdate' as SQL VARCHAR(10) © 2009 Themis, Inc. All rights reserved. Application Development Support • Supports developing apps that include XML requirements – Language support: C/C++, Java, Assembler, Cobol, PL/I – API support: JDBC, DB2 / ODBC, Embedded SQL, SQLJ – SQL/XML query support – DB2 sample database enhancements • Universal DB2 driver for JDBC enhanced to support XML – Provided extension XML type: com.ibm.db2.DB2Xml import com.ibm.db2.jcc.DB2Xml; DB2Xml xml1 = (DB2Xml) rs.getObject ("patient_xml"); String s = xml1.getDB2String(); InputStream is = xml1.getDB2XMLBinaryStream("UTF-16"); © 2009 Themis, Inc. All rights reserved. DB2 and XPath Navigating Through the XML Tree © 2009 Themis, Inc. All rights reserved. DB2 XPath • XPath is an expression language – Designed by the World Wide Web Consortium (W3C) – Used to navigate XML documents – XPath expressions are similar to file path notations • DB2 XPath can be used: – With the XMLQUERY SQL built-in function • To extract data from an XML column – With the XMLEXISTS SQL predicate • To evaluate data in an XML column – When creating an XML index • To determine the XML document nodes to be indexed '/patient/name' All name elements within patient elements © 2009 Themis, Inc. All rights reserved. XPath Expressions • XPath expressions: the basic building block of Xpath • Types of expressions provided by DB2 XPath: 1. Primary expressions: basic primitives of the language • Include literals, variable references, and function calls 2. Path expressions: locate nodes within a document tree • Include Node Tests and Filter Expressions 3. Arithmetic expressions: add, subtract, multiply, divide, modulus 4. Comparison expressions: compare two values 5. Logical expressions: use boolean logic • Anywhere an expression is expected, any kind of expression can be used – Operands of an expression are typically other expressions © 2009 Themis, Inc. All rights reserved. XPath Data Model Key: Document node Comment node OneEmp.xml Element Node Sample comment Attribute node Employee Text node Name Phone type= "home" 800-555-1234 Phone type= "cell" Address Dept mgr= "bob" 212-321-4321 Sales LastName FirstName Street City State Zipcode Roger Rabbit 123 Main Smallville Wyoming 98765 © 2009 Themis, Inc. All rights reserved. Types of XPath Nodes • Document node encapsulates an XML document – Parent of root element node • Element node encapsulates an XML element – Can have one parent and many children • Attribute node represents an XML attribute – Belongs to an element • Text node encapsulates XML character content – Elements may have these • Processing Instruction (PI) node – Encapsulates XML processing instruction • Comment node encapsulates an XML comment • Namespaces node is considered to be a node in XPath © 2009 Themis, Inc. All rights reserved. XPath Nodes Processing Order 1 Document node Element node 2,5,7 9 Comment node Processing Instruction 10 6,8 Text node Attribute node 4 Namespace node 3 course title th="http:// themis. com/test" <?xml version="1.0"?> <th:course xmlns:th="http://themis.com/test" th:format="instructor led"> <th:title>XML for DB2</th:title> <th:descrip>Lots of information <!-- To be determined --> </th:descrip> <?ourOwnPIforCourseware?> </th:course> descrip name=format value=“instructor led” target= ourOwnPIforCourseware XML for DB2 © 2009 Themis, Inc. All rights reserved. Lots of information To be determined Path Expressions Overview • Path expressions navigate the XML tree structure to locate nodes • Navigation axes are used in XPath; always start at context node – – ForwardAxis moves down through the XML tree: • attribute (@) • child (default) • descendant • self (.) • descendant-or-self (//=/descendant-or-self:node()/) • Comment Element attribute Element Text ReverseAxis moves up through the XML tree: • parent (..) Three parts of an axis step: 1. Axis specifies a direction of movement 2. Node test specifies node selection criteria 3. Predicates (zero or more) filter returned sequence © 2009 Themis, Inc. All rights reserved. parent self Element children descendants Text Path Expressions Syntax • XML document to support upcoming example code: <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908-842-7531</phone> <email>saralee@cakemail.com</email> <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones</descrip> <cost>84.00</cost> <copay>15.00</copay> </service> </patient> • XPath expressions support abbreviated syntax in axis steps @ // .. . is abbreviated syntax for is abbreviated syntax for is abbreviated syntax for is abbreviated syntax for is abbreviated syntax for © 2009 Themis, Inc. All rights reserved. attribute /descendant-or-self::node()/ parent::node() self::node() child::node() Path Expression Examples • ‘ / ’ slash indicates that path begins at root node – Whole XML document • ‘//service’ two slashes at beginning of path expression – Requested node, service, may be located anywhere in the document – Returns the entire service element, including all of its children • ‘//phone/@*’ All attributes appearing under the phone element – Only attribute under phone is: home • ‘//@*’ All attributes in the XML document – All attributes: 11123 home • ‘/patient/name’ All name elements under patient – Only name is Sara Lee • ‘/patient/addr/city/..’ All child elements of addr – Element addr, the parent of city, plus all child elements of addr © 2009 Themis, Inc. All rights reserved. Filter Expressions • A filter expression is a path expression followed by predicates in square brackets – • Filter its result based on applying conditions; for example: All service information of patients who have a $15.00 copay '/patient/*[copay="15.00"]' <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribe hormones</descrip> <cost>84.00</cost> <copay>15.00</copay> </service> • Set context node to patient and filter based on attribute type rather than on an element /patient/phone[@type="home"] <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908-842-7531</phone> <email>saralee@cakemail.com</email> <service> <sdate>2004-02-29</sdate> <phone type="home">908-842-7531</phone> • Any patient that has email /patient[email] <reason>brittle nails</reason> <descrip>prescribed hormones</descrip> <cost>84.00</cost> <copay>15.00</copay> </service> Entire XML document © 2009 Themis, Inc. All rights reserved. </patient> Arithmetic Expressions • Arithmetic expressions: perform operations that involve addition, subtraction, multiplication, division, and modulus • The XPath arithmetic operators: * multiplication div division idiv integer division mod modulus + addition - subtraction • An arithmetic expression results in a numeric value – Or an empty sequence or an error • Place arithmetic expressions in parentheses © 2009 Themis, Inc. All rights reserved. Arithmetic Expression Example • An arithmetic expression to calculate the the remaining balance after the copay (and its return value): /patient/service/(cost – copay) 69.00 © 2009 Themis, Inc. All rights reserved. <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908-842-7531</phone> <email>saralee@cakemail.com</email> <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones</descrip> <cost>84.00</cost> <copay>15.00</copay> </service> </patient> Comparison Expressions – General Comparisons • Comparison expressions allow comparing two values – The comparison operators are = • != < > >= All services with a cost greater than 60 '/patient/service/cost > 60' true • <= All services with a cost greater than 60, but get the patient’s service info instead of just true or false '/patient/service[cost > 60]' <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones </descrip> <cost>84.00</cost> <copay>15.00</copay> </service> © 2009 Themis, Inc. All rights reserved. <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908-842-7531 </phone> <email>saralee@cakemail.com </email> <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones </descrip> <cost>84.00</cost> <copay>15.00</copay> </service> </patient> Comparison Expressions – Logical Comparisons • Logical expressions using AND return true if both of two expressions are true • Logical expressions using OR return true if one or both expressions are true • Return phone numbers of the type work or fax (two options shown) '//phone[./@type="work" or ./@type="fax"]' '//phone[@type="work" or @type="fax"]' • The result is this phone info <phone type="work">908-842-7531</phone> <phone type="fax">908-751-2468</phone> © 2009 Themis, Inc. All rights reserved. <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908-842-7531 </phone> <phone type="fax">908-751-2468 </phone> <email>saralee@cakemail.com </email> <service> <sdate>2004-02-29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones </descrip> <cost>84.00</cost> <copay>15.00</copay> </service> </patient> The Built-in Function Library • Built-in functions offered by the DB2 XPath library: – String functions – Numeric functions – Functions working on boolean values – Functions working on sequences • Calls to these function are allowed in an XPath expression anywhere an expression is expected • Example: SELECT XMLQUERY('fn:concat($x,$y)' PASSING 'come ' AS "x", 'together' AS "y") FROM SYSIBM.SYSDUMMY1; © 2009 Themis, Inc. All rights reserved. A Sampling of XPath Functions fn:compare compares two strings to see which one is greater fn:concat concatenates two or more strings into a single string fn:contains determines whether a string contains a given substring fn:count returns the number of values in a sequence fn:normalize-space strips leading and trailing whitespace characters fn:lower-case converts a string to lowercase fn:matches determines whether a string matches a given pattern fn:position returns the position of the context item in the sequence fn:replace replaces characters that match a pattern fn:round returns the integer that is closest to a numeric value fn:string returns the string representation of a value fn:string-length returns the length of a string fn:substring returns a substring of a string fn:upper-case converts a string to uppercase © 2009 Themis, Inc. All rights reserved. XML Namespaces • XML namespaces prevent naming collisions – An XML namespace is a set of names identified by a namespace URI – Distinguishes element types or attribute names with the same name associated with different DTDs or Schemas • Namespaces allow qualifying names of elements and attributes – Contain an optional namespace prefix, a colon, and a local name • Example: two elements with the same name bound to different URIs: <anElement xmlns:p1="someURI" xmlns:p2="otherURI"> <ElementABC> <p1:table>excel</p1:table> <p2:table>dining</p2:table> empty prefix; <table>DB2</table> bound to default element </ElementABC> namespace </anElement> © 2009 Themis, Inc. All rights reserved. Prolog – Namespace Definition • A DB2 XPath expression optionally contains a prolog – Establishes the processing environment • Prolog declaration may specify multiple namespace declarations – May also specify one default namespace declaration • Prolog declaration is always followed by a semicolon (;) – Syntax: declare namespace prefix="namespace string literal"; declare default element namespace "namespace string literal"; – Examples: declare namespace fn="http://www.w3.org/2005/xpath-functions"; declare default element namespace "http://www.xyz.com/movies"; © 2009 Themis, Inc. All rights reserved. Indexes and XML Indexes Built On Values Within XML Documents © 2009 Themis, Inc. All rights reserved. XML Indexing • DB2 supports creating indexes on XML columns – Generated using XML pattern expressions – Support access to nodes in the document • Multiple parts of an XML document can satisfy an XML pattern – Multiple index keys may be generated for insert of a single document • GENERATE KEY USING XMLPATTERN clause of CREATE INDEX – Specifies what you want to index – Contains XML pattern expression Same as before XML CREATE UNIQUE INDEX PATIENT_ID_IX ON PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/@id' AS SQL VARCHAR(5) Required keywords Type of stored index values © 2009 Themis, Inc. All rights reserved. XML node to be indexed Data Types Associated With Pattern Expressions • Keys from XML pattern expression specified in a CREATE INDEX statement must be associated with a data type – May use either DECFLOAT or VARCHAR(n), where n <= 1000 – Value being inserted/indexed must be convertible to this type INSERT INTO PATIENT VALUES('12345', '<patient id="123456"><name>Jim Beam</name></patient>') Value too long for VARCHAR(5) index DSNT408I SQLCODE = -20305, ERROR: AN XML VALUE CANNOT BE INSERTED OR UPDATED BECAUSE OF AN ERROR DETECTED WHEN INSERTING OR UPDATING THE INDEX IDENTIFIED BY 'DBID~132 OBID~23' ON TABLE *N. REASON CODE = 1. © 2009 Themis, Inc. All rights reserved. UNIQUE Keyword in XML Index Definition • The UNIQUE keyword is supported in XML index definitions – However, its meaning is different than in relational index definitions • When creating a relational index, the UNIQUE keyword enforces uniqueness across all rows in the table • When creating an index over XML data, the UNIQUE keyword enforces uniqueness across all documents in an XML column © 2009 Themis, Inc. All rights reserved. Example Queries and Supporting Indexes • Example 1 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/service[copay="10.00"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX copayIdx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/service/copay' AS SQL DECFLOAT • Example 2 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/phone[@type="work"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX phoneTypIdx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/phone/@type' AS SQL VARCHAR(20) • Example 3 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/addr[city="Uptown"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX cityIdx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/addr/city' AS SQL VARCHAR(20) © 2009 Themis, Inc. All rights reserved. XML Schemas, Validation and XML Schema Repository (XSR) © 2009 Themis, Inc. All rights reserved. XML Schema • XML Schema Definition (XSD) defines structure of XML instance documents – – – – Published as a recommendation by W3C Defines elements and attributes permitted in a document Defines parent / child relationship between elements Defines data types, constraints and values for elements and attributes <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="patient"> <xsd:complexType> <xsd:sequence> <xsd:element name="id" type="xsd:integer"/> <xsd:element name="name" type="xsd:string"/> <xsd:element name="addr" type="xsd:string"/> <xsd:element name="dob" type="xsd:date"/> <xsd:element name="amount" type="xsd:double"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> © 2009 Themis, Inc. All rights reserved. XML Schema Management With the XSR • XML schema repository (XSR) – Set of tables that store XML schemas – Created during DB2 installation or migration Only schemas, and not DTDs, may be used for XML validation in DB2 Version 9 • XML schemas may be added to the XSR – Then used to validate XML documents being inserted or updated • Registering XML schema documents (adding to XSR): – Call DB2-supplied stored procedures from a DB2 application – Or invoke a provided JDBC method from a Java application • Removing XML schema documents from the DB2 XSR – Call DB2-supplied stored procedure or invoke JDBC method © 2009 Themis, Inc. All rights reserved. XML Schema Validation • Use SQL INSERT statement to insert data into XML column – Inserted data must be a well-formed XML document • Validate the XML against a registered XML schema during insertion using the DSN_XMLVALIDATE function – User Defined Function • XML validation determines whether the structure, content, and data types of an XML document are valid according to a corresponding schema • Validation is optional INSERT into AutoDealers VALUES( '12345', CURRENT DATE, 'Sams Deals', DSN_XMLValidate(:xmlDealerInfo, SYSXSR.DealerInfoSchema)); © 2009 Themis, Inc. All rights reserved. XML Decomposition • Decomposition, or shredding, is the process of storing XML document content in columns of relational tables – Decomposed data has SQL type of column where it is inserted • An XML schema consists of one or more XML schema documents • Annotated XML schema decomposition – Control of the decomposition process is provided by XML schema annotation © 2009 Themis, Inc. All rights reserved. Resources • DB2 Version 9.1 for z/OS XML Guide (SC18-9858-03) • DB2 9 for z/OS Technical Overview (SG24-7330-00) • DB2 Version 9.1 for z/OS Application Programming and SQL Guide (SC18-9841-01) • DB2 Version 9.1 for z/OS Utility Guide and Reference (SC18-9855-02) • DB2 Version 9.1 for z/OS Application Programming Guide and Reference for Java (SC18-9842-01 ) © 2009 Themis, Inc. All rights reserved.