name phone John 3634 row phone name Sue 6343 Dick 6363 Relation … in XML row “John” name row phone name 3634 “Sue” 6343 “Dick” phone 6363 { row: { name: “John”, phone: 3634 }, row: { name: “Sue”, phone: 6343 }, row: { name: “Dick”, phone: 6363 } } Must be high-level; “SQL for XML” Must conform to XSchema ◦ But also work in absence of schema info Support simple and complex/nested datatypes Support universal and existential quantifiers, aggregation Operations on sequences and hierarchies of doc structures Capability to transform and create XML structures “ A query language that uses the structure of XML intelligently and can express queries across all kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.” XQuery is an emerging standard for querying XML documents is strongly influenced by OQL is a functional language in which a query is represented as an ◦ expression (opposed to OQL and SQL which are declarative) ◦ expressions can be nested ◦ filters can strip out fields ◦ has grouping Extracting information from database Generating summary reports on data stored Searching textual documents on the web Selecting and transforming XML data to XHTML Pulling data from databases for application integration Splitting up an XML document Useful for both – structured and unstructured data Protocol independent(evaluation with predictable results) Able to accept collection of multiple documents Compatible with other W3C standards Path expressions Element constructors FLWOR (“flower”) expressions Expressions evaluated w.r.t. a context: ◦ Several other kinds of expressions as well, including conditional expressions, list expressions, quantified expressions, etc. ◦ ◦ ◦ ◦ Context item (current node) Context position (in sequence being processed) Context size (of the sequence being processed) Context also includes namespaces, variables, functions, date, etc. Examples: Bib/paper Bib/book/publisher Bib/paper/author/lastname Given an XML document, the value of a path expression p is a set of objects Bib &o1 paper book paper references Doc = &o12 author &o43 title year &o44 &o24 &o29 references references author author http title publisher title author author author &o45 &o46 &o52 page &25 &96 1997 &o47 &o48 &o49 &o50 &o51 firstname &o70 lastname first firstname &o71 &243 “Serge” last lastname “Abiteboul” Bib/paper = <&o12,&o29> Bib/book/publisher = <&o51> Bib/paper/author/lastname = <&o71,&206> &206 “Victor” “Vianu” 122 133 Note that order of elements matters! An XQuery expression can construct new values or structures Example: Consider the path expressions from the previous slide. ◦ Each of them returns a newly constructed sequence of elements ◦ Key point is that we don’t just return existing structures or atomic values; we can re-arrange them as we wish into new structures FOR-LET-WHERE-ORDERBY-RETURN = FLWOR FOR / LET Clauses List of tuples WHERE Clause List of tuples ORDERBY/RETURN Clause Instance of XQuery data model For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPath Simple FLWOR expression in XQuery ◦ find all accounts with balance > 400, with each result enclosed in an <account_number> .. </account_number> tag for $x in /bank-2/account let $acctno := $x/@account_number where $x/balance > 400 return <account_number> { $acctno } </account_number> ◦ Items in the return clause are XML text unless enclosed in {}, in which case they are evaluated Let clause not really needed in this query, and selection can be done In XPath. Query can be written as: for $x in /bank-2/account[balance>400] return <account_number> { $x/@account_number } </account_number> FOR $x IN list-expr ◦ Binds $x in turn to each value in the list expr LET $x = list-expr ◦ Binds $x to the entire list expr ◦ Useful for common sub-expressions and for aggregations FOR $x IN document("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ... Notice that result has several elements LET $x IN document("bib.xml")/bib/book RETURN <result> $x </result> Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result> Notice that result has exactly one element Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN $x/title Result: <title> abc </title> <title> def </title> <title> ghi </title> For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN <result> $a, FOR $t IN /bib/book[author=$a]/title RETURN $t </result> distinct = a function that eliminates duplicates (after converting inputs to atomic values) <result> <author>Jones</author> Observe how nested <title> abc </title> structure of result <title> def </title> elements is determined </result> by the nested structure <result> of the query. <author> Smith </author> <title> ghi </title> </result> <big_publishers> FOR $p IN distinct(document("bib.xml")//publisher) For each publisher p LET $b := document("bib.xml")/book[publisher = $p] - Let the list of books WHERE count($b) > 100 RETURN $p </big_publishers> published by p be b Count the # books in b, and return p if b > 100 count = (aggregate) function that returns the number of elements Find books whose price is larger than average: LET $a=avg(document("bib.xml")/bib/book/price) FOR $b in document("bib.xml")/bib/book WHERE $b/price > $a RETURN $b Ordered and unordered collections ◦ /bib/book/author = an ordered collection ◦ Distinct(/bib/book/author) = an unordered collection Examples: ◦ LET $a = /bib/book $a is a collection; stmt iterates over all books in collecion ◦ $b/author also a collection (several authors...) However: RETURN <result> $b/author </result> Returns a single collection! <result> <author>...</author> <author>...</author> <author>...</author> ... </result> What about collections in expressions ? $b/price list of n prices $b/price * 0.7 list of n numbers?? $b/price * $b/quantity list of n x m numbers ?? ◦ Valid only if the two sequences have at most one element ◦ Atomization $book1/author eq "Kennedy" - Value Comparison $book1/author = "Kennedy" - General Comparison <publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN <book> $b/title , $b/price </book> </publisher> </publisher_list> FOR $h IN //holding ORDERBY $h/title RETURN <holding> $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author </holding> FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title Before and After ◦ for dealing with order in the input Filter ◦ deletes some edges in the result tree Recursive functions Namespaces References, links … Lots more stuff … XQuery Editor XQuery Mapper XQuery Debugger XQuery Profiler XQuery Documentation Generator XML Schema Aware Query processing Invoking XQuery from web services