xquery

advertisement
name
phone
John
3634
row
phone
name
Sue
6343
Dick
6363
Relation
… in XML
row
“John”
name
row
phone
name
3634 “Sue” 6343 “Dick”
phone
6363
{ row: { name: “John”, phone:
3634 },
row: { name: “Sue”, phone:
6343 },
row: { name: “Dick”, phone:
6363 } }






Must be high-level; “SQL for XML”
Must conform to XSchema
◦ But also work in absence of schema info
Support simple and complex/nested
datatypes
Support universal and existential
quantifiers, aggregation
Operations on sequences and hierarchies
of doc structures
Capability to transform and create XML
structures
“ A query language that uses the structure of
XML intelligently and can express queries
across all kinds of data, whether physically
stored in XML or viewed as XML via
middleware. This specification describes a
query language called XQuery, which is
designed to be broadly applicable across many
types of XML data sources.”



XQuery
is an emerging standard for querying XML
documents
is strongly influenced by OQL
is a functional language in which a query is
represented as an
◦ expression (opposed to OQL and SQL which are
declarative)
◦ expressions can be nested
◦ filters can strip out fields
◦ has grouping






Extracting information from database
Generating summary reports on data stored
Searching textual documents on the web
Selecting and transforming XML data to
XHTML
Pulling data from databases for application
integration
Splitting up an XML document




Useful for both – structured and
unstructured data
Protocol independent(evaluation with
predictable results)
Able to accept collection of multiple
documents
Compatible with other W3C standards

Path expressions
Element constructors
FLWOR (“flower”) expressions

Expressions evaluated w.r.t. a context:


◦ Several other kinds of expressions as well,
including conditional expressions, list
expressions, quantified expressions, etc.
◦
◦
◦
◦
Context item (current node)
Context position (in sequence being processed)
Context size (of the sequence being processed)
Context also includes namespaces, variables,
functions, date, etc.
Examples:
 Bib/paper
 Bib/book/publisher
 Bib/paper/author/lastname
Given an XML document, the value of a path
expression p is a set of objects
Bib
&o1
paper
book
paper
references
Doc =
&o12
author
&o43
title
year
&o44
&o24
&o29
references
references
author
author
http
title publisher
title
author
author
author
&o45
&o46
&o52
page
&25
&96
1997
&o47 &o48 &o49 &o50 &o51
firstname
&o70
lastname
first
firstname
&o71
&243
“Serge”
last
lastname
“Abiteboul”
Bib/paper = <&o12,&o29>
Bib/book/publisher = <&o51>
Bib/paper/author/lastname = <&o71,&206>
&206
“Victor”
“Vianu”
122
133
Note that order of
elements matters!


An XQuery expression can construct new
values or structures
Example: Consider the path expressions
from the previous slide.
◦ Each of them returns a newly constructed sequence
of elements
◦ Key point is that we don’t just return existing
structures or atomic values; we can re-arrange
them as we wish into new structures

FOR-LET-WHERE-ORDERBY-RETURN = FLWOR
FOR / LET Clauses
List of tuples
WHERE Clause
List of tuples
ORDERBY/RETURN Clause
Instance of XQuery data model



For clause uses XPath expressions, and variable in for clause ranges
over values in the set returned by XPath
Simple FLWOR expression in XQuery
◦ find all accounts with balance > 400, with each result enclosed in
an <account_number> .. </account_number> tag
for
$x in /bank-2/account
let
$acctno := $x/@account_number
where $x/balance > 400
return <account_number> { $acctno } </account_number>
◦ Items in the return clause are XML text unless enclosed in {}, in
which case they are evaluated
Let clause not really needed in this query, and selection can be done
In XPath. Query can be written as:
for $x in /bank-2/account[balance>400]
return <account_number> { $x/@account_number }
</account_number>

FOR $x IN list-expr
◦ Binds $x in turn to each value in the list expr

LET $x = list-expr
◦ Binds $x to the entire list expr
◦ Useful for common sub-expressions and for
aggregations
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns:
<result> <book>...</book></result>
<result> <book>...</book></result>
<result> <book>...</book></result>
...
Notice that result has
several elements
LET $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns:
<result> <book>...</book>
<book>...</book>
<book>...</book>
...
</result>
Notice that result has
exactly one element
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result:
<title> abc </title>
<title> def </title>
<title> ghi </title>
For each author of a book by Morgan
Kaufmann, list all books she published:
FOR $a IN distinct(document("bib.xml")
/bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function that eliminates duplicates (after
converting inputs to atomic values)
<result>
<author>Jones</author>
Observe how nested
<title> abc </title>
structure of result
<title> def </title>
elements is determined
</result>
by the nested structure
<result>
of the query.
<author> Smith </author>
<title> ghi </title>
</result>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
For each publisher p
LET $b := document("bib.xml")/book[publisher = $p]
- Let the list of books
WHERE count($b) > 100
RETURN $p
</big_publishers>
published by p be b
Count the # books in b,
and return p if b > 100
count = (aggregate) function that returns the
number of elements
Find books whose price is larger than average:
LET $a=avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b

Ordered and unordered collections
◦ /bib/book/author = an ordered collection
◦ Distinct(/bib/book/author) = an unordered
collection

Examples:
◦ LET $a = /bib/book  $a is a collection; stmt
iterates over all books in collecion
◦ $b/author  also a collection (several
authors...)
However:
RETURN <result> $b/author </result>
Returns a single collection!
<result> <author>...</author>
<author>...</author>
<author>...</author>
...
</result>
What about collections in expressions ?



$b/price
 list of n prices
$b/price * 0.7
 list of n numbers??
$b/price * $b/quantity  list of n x m numbers ??
◦ Valid only if the two sequences have at most one element
◦ Atomization


$book1/author eq "Kennedy" - Value Comparison
$book1/author = "Kennedy" - General
Comparison
<publisher_list>
FOR $p IN distinct(document("bib.xml")//publisher)
ORDERBY $p
RETURN <publisher> <name> $p/text() </name> ,
FOR $b IN document("bib.xml")//book[publisher = $p]
ORDERBY $b/price DESCENDING
RETURN <book>
$b/title ,
$b/price
</book>
</publisher>
</publisher_list>
FOR $h IN //holding
ORDERBY $h/title
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding>
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title

Before and After
◦ for dealing with order in the input

Filter
◦ deletes some edges in the result tree




Recursive functions
Namespaces
References, links …
Lots more stuff …







XQuery Editor
XQuery Mapper
XQuery Debugger
XQuery Profiler
XQuery Documentation Generator
XML Schema Aware Query processing
Invoking XQuery from web services
Download