CSE 636 Data Integration XML Query Languages XPath XPath • http://www.w3.org/TR/xpath (11/99) • Building block for other W3C standards: – – – – • XSL Transformations (XSLT) XML Link (XLink) XML Pointer (XPointer) XQuery Was originally part of XSL 2 Example for XPath Queries <bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> </bib> 3 Data Model for XPath / Document XML PI The root Element bib Comment Element book The root element Element book Element publisher Element author Text Addison-Wesley Text Serge Abiteboul … … 4 XPath: Simple Expressions /bib/book/year Result: <year> 1995 </year> <year> 1998 </year> /bib/paper/year Result: empty (there were no papers) 5 XPath: Restricted Kleene Closure //author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> /bib//first-name Result:<first-name> Rick </first-name> 6 XPath: Functions /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: – text() = matches the text value – node() = matches any node (= * or @* or text()) – name()= returns the name of the current tag 7 XPath: Wildcard //author/* Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element 8 XPath: Attribute Nodes /bib/book/@price Result: “55” @price means that price is has to be an attribute 9 XPath: Qualifiers /bib/book/author[first-name] Result:<author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> 10 XPath: More Qualifiers /bib/book/author[firstname][address[//zip][city]]/lastname Result:<lastname> … </lastname> <lastname> … </lastname> 11 XPath: More Qualifiers /bib/book[@price < “60”] /bib/book[author/@age < “25”] /bib/book[author/text()] 12 XPath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib bib/book[@price<“55”]/author/lastname matches… 13 XPath: More Details • An XPath expression, p, establishes a relation between: – A context node, and – A node in the answer set • In other words, p denotes a function: – S[p] : Nodes {Nodes} • Examples: – – – – author/firstname . = self .. = parent part/*/*/subpart/../name = part/*/*[subpart]/name 14 The Root and the Root <bib> <paper> 1 </paper> <paper> 2 </paper> </bib> • • bib is the “document element” The “root” is above bib • • /bib = returns the document element / = returns the root • • • Why? Because we may have comments before and after <bib> They become siblings of <bib> 15 XPath: More Details • We can navigate along 13 axes: ancestor ancestor-or-self parent attribute We’ve only seen these, so far child descendant-or-self descendant following following-sibling namespace preceding preceding-sibling self 16 XPath: More Details • Examples: – child::author/child:lastname = author/lastname – child::author/descendant-or-self::node()/child::zip = author//zip – child::author/parent::* = author/.. – child::author/attribute::age = author/@age • What does this mean ? – /bib/book/publisher/parent::*/author – /bib//address[ancestor::book] – /bib//author/ancestor::*//zip 17 XPath: Even More Details • name() = the name of the current node /bib//*[name()=book] same as /bib//book • What does this mean? /bib//*[ancestor::*[name()!=book]] Is it equivalent to the following? /bib//* /bib//*[name()!=book]//* • Navigation axis gives us strictly more power! 18 XPath: Example How do we evaluate this XPath expression? /bib//*[name()!=book]//* Let’s take it one step at a time bib A B book C D 19 XPath: Example /bib returns the following list of one node: Node bib A B book C D 20 XPath: Example /bib//* when executed on the previous node list, returns the following new list of nodes: Node A B book C D book C D C D 21 XPath: Example /bib//*[name()!=book] when executed on the previous node list, it eliminates one node: Node A B book C D C D 22 XPath: Example /bib//*[name()!=book]//* gives us the resulting node list of the XPath expression: Node book C D C D 23 Keys in XML Schema • We forgot something about XML Schema – Keys – Key References • Why? • XPath is used for keys and key references 24 Keys in XML Schema XML: <purchaseReport> <regions> <zip code="95819"> <part number="872-AA" quantity="1"/> <part number="926-AA" quantity="1"/> <part number="833-AA" quantity="1"/> <part number="455-BX" quantity="1"/> </zip> <zip code="63143"> <part number="455-BX" quantity="4"/> </zip> </regions> <parts> <part number="872-AA">Lawnmower</part> <part number="926-AA">Baby Monitor</part> <part number="833-AA">Lapis Necklace</part> <part number="455-BX">Sturdy Shelves</part> </parts> </purchaseReport> XML Schema: <key name="NumKey"> <selector xpath="parts/part"/> <field xpath="@number"/> </key> 25 Keys in XML Schema XML Schema: <xs:element name="purchaseReport"> <xs:complexType> <xs:sequence> <xs:element name="regions"> … </xs:element> <xs:element name="parts"> … </xs:element> </xs:sequence> </xs:complexType> <xs:key name="numKey"> <xs:selector xpath="parts/part" /> <xs:field xpath="@number" /> </xs:key> <keyref name="numKeyRef" refer="numKey"> <selector xpath="regions/zip/part" /> <field xpath="@number" /> </keyref> </xs:element> 26 Keys in XML Schema • In general, two flavors: <key name=“someNameHere"> <selector xpath=“p"/> <field xpath=“p1"/> <field xpath=“p2"/> … <field xpath=“pk"/> </key> <unique name=“someNameHere"> <selector xpath=“p"/> <field xpath=“p1"/> <field xpath=“p2"/> … <field xpath=“pk"/> </key> Note • All XPath expressions “start” at the element currently being defined • The fields must identify a single node 27 Keys in XML Schema • Unique = guarantees uniqueness • Key = guarantees uniqueness and existence • All XPath expressions are “restricted”: – /a/b | /a/c – //a/b/*/c OK for selector OK for field • Note: better than DTD’s ID mechanism 28 Keys in XML Schema • Examples Recall: must have a single forename, surname <key name="fullName"> <selector xpath=".//person"/> <field xpath="forename"/> <field xpath="surname"/> </key> <unique name="nearlyID"> <selector xpath=".//*"/> <field xpath="@id"/> </unique> 29 Foreign Keys in XML Schema • Examples <keyref name="personRef" refer="fullName"> <selector xpath=".//personPointer"/> <field xpath="@first"/> <field xpath="@last"/> </keyref> 30 References • Lecture Slides – Dan Suciu – http://www.cs.washington.edu/homes/suciu/COURSES/590DS/0 6xpath.htm – http://www.cs.washington.edu/homes/suciu/COURSES/590DS/1 4constraintkeys.htm • BRICS XML Tutorial – A. Moeller, M. Schwartzbach – http://www.brics.dk/~amoeller/XML/index.html • W3C's XPath homepage – http://www.w3.org/TR/xpath • W3C's XML Schema homepage – http://www.w3.org/XML/Schema • XML School – http://www.w3schools.com 31