1 Quick Intro to XPath Roger L. Costello 14 December, 2012 2 Objective • XML Schema 1.1 uses XPath a lot, so if you don't know XPath then you're at a disadvantage. • The purpose of this short tutorial is to teach you enough XPath that you won't be at a disadvantage. 3 XPath is not a standalone language • XPath requires a host language. There are currently several XML languages that host XPath. 4 XPath is not a standalone language XML Schemas XSLT XPath XQuery XPointer Schematron This XML document can be represented as a tree, as shown below Document / PI <?xml version=“1.0”?> Text Jeff 5 Element FitnessCenter Element Member Element Member Element Name <?xml version="1.0"?> <FitnessCenter> <Member> <Name>Jeff</Name> <FavoriteColor>lightgrey</FavoriteColor> </Member> <Member> <Name>David</Name> <FavoriteColor>lightblue</FavoriteColor> </Member> <Member> <Name>Roger</Name> <FavoriteColor>lightyellow</FavoriteColor> </Member> </FitnessCenter> Element FavoriteColor Text lightgrey Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 6 Terminology - node Document node Processing Instruction (PI) node Element nodes Document / PI <?xml version=“1.0”?> Text nodes Element Name Text Jeff Element FitnessCenter Element Member Element Member Element FavoriteColor Text lightgrey Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 7 Document / PI <?xml version=“1.0”?> With respect to this node, these are its children Element Name Text Jeff Element FitnessCenter Element Member Element Member Element FavoriteColor Text lightgrey Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 8 Document / PI <?xml version=“1.0”?> These are its descendant nodes Element Name Text Jeff Element FitnessCenter Element Member Element Member Element FavoriteColor Text lightgrey Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 9 Document / PI <?xml version=“1.0”?> This is the context node Element Name Text Jeff Element FitnessCenter Element Member Element Member Element FavoriteColor Text lightgrey Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 10 Document / PI <?xml version=“1.0”?> Element Member Element Name Text Jeff Element FitnessCenter That's its parent Element FavoriteColor Text lightgrey Element Member Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 11 Document / PI <?xml version=“1.0”?> Element Member Element Name Text Jeff Element FitnessCenter Those are its ancestors Element FavoriteColor Text lightgrey Element Member Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 12 Document / PI <?xml version=“1.0”?> Element Member Element Name Text Jeff Element FitnessCenter It has 2 siblings Element FavoriteColor Text lightgrey Element Member Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 13 Document / PI <?xml version=“1.0”?> Element Member Element Name Text Jeff Element FitnessCenter They are followingsiblings Element FavoriteColor Text lightgrey Element Member Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 14 Document / PI <?xml version=“1.0”?> Element Member Element Name Text Jeff Element FitnessCenter It has no precedingsiblings Element FavoriteColor Text lightgrey Element Member Element Name Text David Element Member Element FavoriteColor Text lightblue Element Name Text Roger Element FavoriteColor Text lightyellow 15 Here are the capabilities of XPath • XPath provides a syntax for: – – – – navigating around an XML document selecting nodes and values comparing node values performing arithmetic on node values • XPath provides some functions (e.g., concat(), substring(), etc.) to facilitate the above. This XML document can be represented as a tree, as shown below <?xml version="1.0"?> Document classification="secret"> <Para classification="unclassified"> One if by land, two if by sea; </Para> <Para classification="confidential"> And I on the opposite shore will be, Ready to ride and spread the alarm </Para> <Para classification="unclassified"> Ready to ride and spread the alarm Through every Middlesex, village and farm, </Para> </Document> 16 Document / PI <?xml version=“1.0”?> Element Para Text One if … Element Document Element Para Attribute classification=“unclassified” Text And I … Attribute classification=“confidential” Attribute classification=“secret” Element Para Attribute classification=“un Text Ready to See document.xml in the xpath folder, within the examples folder. 17 Execute XPath using Oxygen XML Type your XPath expression here Change this to XPath 1.0 18 Use XPath Builder for long XPath expressions 19 Please Run the XPath Expressions • The following slides contain XPath expressions. • It's important that you copy the expression on the slide and paste it into Oxygen XML to see what the expression does. • First, copy the XML document on slide 16, save it to a file, then drag and drop the file into Oxygen XML. 20 Select all Para Elements /Document/Para 21 /Document/Para This is an absolute XPath expression 22 Establish a Context Node Click on this to establish it as the "context node" (any XPath expressions will be relative to it) 23 Relative XPath Expression In Oxygen XML click on <Document> to establish the “context node” and then type this in the XPath box: Para 24 Select all Para Elements //Para descendents 25 Select the first Para //Para[1] 26 Select the last Para //Para[last()] 27 Select the classification attribute of the first Para //Para[1]/@classification Is the Document element’s classification top-secret? /Document/@classification = 'top-secret' 28 Is the Document element’s classification top-secret or secret? (/Document/@classification = 'top-secret') or (/Document/@classification='secret') 29 30 Logical Operators A or B A and B not(A) 31 Select all Para’s with a secret classification //Para[@classification = 'secret'] 32 Check that no Para has a top-secret classification not(//Para[@classification = 'top-secret']) 33 Establish a New Context Node Make the second Para the context node 34 Select the Following Siblings following-sibling::* 35 Select the First Following Sibling following-sibling::*[1] 36 Add Another Element Add this <Test> element after the last Para 37 Select the Following Para Siblings following-sibling::Para 38 Select all Following Siblings following-sibling::* 39 Select all Preceding Siblings preceding-sibling::* 40 Make Document the Context Click on Document to make it the context node. 41 Equivalent! Para[1] child::Para[1] 42 Make Para[2] the context Establish this as the context node. 43 Get parent element's classification ../@classification 44 Equivalent! ../@classification parent::*/@classification 45 Axis following-sibling preceding-sibling child parent ancestor descendent self 46 Count the number of Para elements count(//Para) Count the number of Para elements with secret classification count(//Para[@classification = 'secret']) 47 Does the first Para element contain the string “SCRIPT”? contains(//Para[1], 'SCRIPT') 48 49 Select all nodes containing the string “SCRIPT” //node()[contains(., 'SCRIPT')] The node() function matches on these nodes: - element - text - comment - processing instructions (PIs) Note that it does not match on these nodes: - attribute - document Count the number of nodes containing the string “SCRIPT” count(//node()[contains(., 'SCRIPT')]) 50 51 Select the first 20 characters of the first Para substring(//Para[1], 1, 20) 52 What's the length of the content of the first Para? string-length(//Para[1]) 53 Convert Document’s classification to lowercase translate(/Document/@classification, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') 54 Add a new <Cost> element Add this element and establish Document as the context node. 55 Multiply Cost by 2 Cost * 2 56 N mod X = the remainder of dividing N by X Cost mod 2 57 Arithmetic Operators * mod - (leave space on either side) div + 58 Set this to XPath 2.0 Does Document’s classification match one in Classifications.xml? /Document/@classification = doc('Classifications.xml')/Classifications/li 59 60 Do the first two Para's have the same classification? Para[1]/@classification eq Para[2]/@classification 61 Boolean Operators eq means equal ne means not equal lt means less than gt means greater than le means less than or equal to ge means greater than or equal to if Document's classification is top-secret then there can be no Para with a classification not equal to top-secret if (/Document/@classification eq 'top-secret') then not(//Para[@classification ne 'top-secret']) else true() 62 63 Two built-in functions true() false() 64 Cast a value to a numeric type number(Cost) Check that Document's children are: multiple Para's, 1 Test, and 1 Cost (and nothing else) Para[2] and Test and Cost and empty(* except (Para, Test[1], Cost[1])) 65 66 The sum() function <?xml version="1.0"?> <numbers> <number>23</number> <number>5</number> <number>-41</number> <number>50</number> <number>12</number> </numbers> sum(//number) returns 49.0 67 Check that every Publisher has a string-length le 140 <BookStore> <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>1998</Date> <ISBN>1-56592-235-2</ISBN> <Publisher>McMillin Publishing</Publisher> <Author>John Ghostwriter</Author> </Book> <Book> <Publisher>Dell Publishing Co.</Publisher> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Title>Illusions The Adventures of a Reluctant Messiah</Title> </Book> <Book> <ISBN>0-06-064831-7</ISBN> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <Publisher>Harper &amp; Row</Publisher> </Book> </BookStore> 68 Check that every Publisher has a string-length le 140 every $i in //Publisher satisfies string-length($i) le 140 69 The XPath every expression • The form of the every expression is: every variable in sequence satisfies boolean expression • The result of the expression is either true or false. 70 Equivalent every $i in //Publisher satisfies string-length($i) le 140 not(//Publisher[string-length(.) gt 140])