XTree for Declarative XML Querying Zhuo Chen, Tok Wang Ling,

advertisement
XTree for
Declarative XML Querying
Zhuo Chen, Tok Wang Ling,
Mengchi Liu, and Gillian Dobbie
January 2004
1
Outlines





Introduction
Preliminaries
XTree
Algorithm to transform XTree query to XQuery
Conclusion and future works
2
Outlines





Introduction
Preliminaries
XTree
Algorithm to transform XTree query to XQuery
Conclusion and future works
3
Introduction


How to query XML documents is an important
issue in XML research
Various query languages proposed:


XPath, XQuery, Lorel, XML-GL, XQL, XML-QL,
XSLT, YATL, XDuce, a rule-based semantic
querying, a declarative XML querying, etc
XQuery based on XPath is selected as the basis
for an official W3C query language for XML
4
Introduction

In this paper, we will




Analyze the limitations of XPath
Propose a new set of syntax rules called XTree,
which is a generalization of XPath
Show how XTree can efficiently replace the
notations of XPath
Give algorithms to convert queries based on
XTree expressions to standard XQuery queries
5
Outlines


Introduction
Preliminaries





Background on XPath
Limitations of XPath
XTree
Algorithm to transform XTree query to XQuery
Conclusion and future works
6
Preliminaries

XPath




A W3C standard
A set of syntax rules for defining parts of an XML
document
It uses paths to identify nodes (elements and
attributes) in XML documents
These path expressions look very much like
computer file system
7
Background on XPath

Sample XML document of a bibliography
<bib name=“IT”>
<book id=“b001” year=“1994”>
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
</book>
<book id =“b002” year=“1992”>
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
</book>
<book id=“b003” year=“2000”>
<title>Data on the Web</title>
<edition>3</edition>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann</publisher>
</book>
<journal id=“j001” year=“1998”>
<title>XML</title>
<editor><last>Date</last><first>C.</first></editor>
<editor><last>Gerbarg</last><first>M.</first></editor>
<publisher>Morgan Kaufmann</publisher>
</journal>
</bib>
8
Background on XPath

XPath examples
 /bib/book/@year


/bib/book/author


Get all attributes of each book
/bib/book[2]


Get all sub-elements of each book
/bib/book/@*


Get all elements named “author”, regardless of their absolute paths
/bib/book/*


Get element “author” of each book
//author


Get attribute “year” of each book
Get the second book element
/bib/book[last()]

Get the last book element
9
Background on XQuery

XQuery



An XML querying language to search XML
documents
Based on XPath
FLWOR statements

For – Let – Where – Order by – Return





For clause iterate the variable over the result of its expression
Let clause bind the variable to the result of its expression
Complex queries (nested clauses)
Complex result constructions
User-defined functions
10
Background on XQuery

XQuery example

List year an title of all books published after 1995
XQuery:
for $book in /bib/book
where $book/@year > 1995
return
<book>
{ $book/@year }
{ $book/title }
</book>
Result:
<book year=“2000”>
<title>Data on the Web</title>
</book>
11
Limitations of XPath

XPath has some limitations:
1. We can only assign one variable for each XPath
expression

It is just a linear path, which is not like the XML’s tree
structure


Inefficient
If a query needs to get values from several places, it has to
use several paths
2. It is difficult to reveal the relationship among
correlated XPaths

This may cause mistakes if a user does not pay
attention when writing a query

Eg, if we want to output title and author of each book
XPath 1: /bib/book/title, XPath 2: /bib/book/author
Wrong! The above two paths are not correlated
12
Limitations of XPath

XPath has some limitations:
3. XPath is inefficient to express query that returns
elements at path A while the condition is in a distant
path B



Difficult to distinguish condition branch from target branch
Especially for multiple conditions and nested conditions
Eg, find the value of publisher id of a book which has an author
with last name as “Stevens” and first name as “W.”
/bib/book/author[last=“Stevens” and first=“W.”]/../publisher/@pubid
4. XPath expressions are only used in the querying part of
XQuery, not in the result construction part


In XQuery, the result construction part mixes literal text,
variable evaluation and even nested sub-queries
The whole query is difficult to read and comprehend
13
Limitations of XPath

XPath has some limitations:
5. XPath can only bind variable on the whole node (element
or attribute) structure, which is a name-value pair

If we want to get the substructure of the node, we have to
invoke built-in functions



local-name() to get node name
string() to get string value
Difficult to query XML documents with unknown structure, or
to rename the nodes in the result
Eg, Suppose we do not know the for $book in /bib/book
let $attrib := $book/@*
sub-structure of book element, we
want to re-structure books in this way: return
<book>
keep text nodes and sub-elements
{ $book/text(), $book/* }
unchanged, but convert attributes to
<attribute name={ local-name($attrib) }
sub-elements:
value={ string($attrib) }/>
</book>

14
Outlines



Introduction
Preliminaries
XTree





Basic syntax
XTree for querying
XTree for result construction
Algorithm to transform XTree query to XQuery
Conclusion and future works
15
XTree



XTree is a generalization of XPath
XTree has a tree structure like XML
XTree is more efficient than XPath

In the querying part, one XTree expression can bind
multiple variables


In the result construction part, one XTree expression
can be used to define the result format



In XQuery, one XPath expression can only bind one variable
Avoid nested structure in the query
Make the whole query easier to read and understand
Supports list-valued variables explicitly, and determines
their values uniquely
16
XTree syntax

Similar to that of XPath




( ) in front to indicate the URL of the document
Sibling tree nodes are enclosed by { }, and separated by
commas



{ } can be nested
In XTree, conditions are written directly without { }
Use logic variables as place holders to bind/match the values
at their places



/ means parent-child hierarchy
// means no matter how many levels down (ancestor-descent)
→ to assign variables in the querying part
← to get values from variables in the result construction part
Only interested sub-trees are written in XTree, not the whole
XML tree structure
17
XTree for querying


Symbol → will assign values of nodes on the
left side to the variable on the right side
Example. For the sample bibliography document,
suppose we want to get the year and title of each
book, and its authors’ last names and first names

We can use the variables $y, $t, $first, $last to bind
them respectively as in the following XTree
expression:
/bib/book/{@year→$y, title→$t, author/{last→$last, first→$first}}
 We can instantiate many variables in one XTree expression
 The above XTree expression corresponds to the following 6 XPath
expressions in XQuery:
for $book in /bib/book,
$y in $book/@year, $t in $book/title, $author in $book/author,
$last in $author/last, $first in $author/first
18
XTree for querying


XTree allows a user to use path abbreviation as in XPath
Example. Suppose we want to get the last name
and first name elements at whatever depth in the
document, we can write the following XTree
expression:
/bib//{last→$last, first→$first}
 The square braces enclosing two elements last
and first specifies that these two elements are
sibling.
 According to the XML document, the parent of
sibling elements last and first is /bib/book/author
or /bib/journal/editor
19
XTree for querying

XTree allows a user to bind variables on the structure of XML
document



A user can assign variable $var on the left side of → symbol
Here $var will bind to the name of the corresponding node
Example. Suppose we want to obtain some attribute
with value “2000” in some book element, and bind
variable $b to that book:
/bib/book→$b/@$attr=“2000”

According to the sample document, $b will bind to the
third book, and $attr will bind to the attribute name “year”.
20
XTree

Two types of variables

Single-valued variables



List-valued variables




$X
An element instance of the specified path
{$X}
A list of all $X instances
Explicitly indicated by a pair of curly braces
Note that both sibling nodes and list-valued variables are
enclosed by curly braces


Sibling nodes will have commas as separators in the braces
List-valued variables does not have commas in the braces
21
List-valued variables

Object-oriented functions of list-valued variables:

Aggregate functions
Suppose list-valued variable {$nums} binds to a list of numbers





{$nums}.count()
{$nums}.avg()
{$nums}.min()
{$nums}.max()
{$nums}.sum()
returns the number of items in the list
returns the average value of items in the list
returns the minimum value in the list
returns the maximum value in the list
returns the sum of values in the list
22
List-valued variables

Object-oriented functions of list-valued variables:

List operations
Suppose list-valued variable {$names} binds to a list of name elements








{$names}.[1-3, 6]
{$names}.last()
{$names}.sort()
{$names}.sort_desc()
{$names}.distinct()
{$names}.random(3)
$name  {$names}
{$names’}  {$names}
returns a sublist of 1st to 3rd items, and 6th item
returns the last item in the list
sorts the items in the list in ascending order
sorts the items in the list in descending order
eliminates duplicate items in the list
picks out 3 items randomly
check whether an item is in the list
check whether the first list is a sub-list of the
second list
23
Semantics of list-valued variables

Definition 1. The associated path of variable $a (or {$a}) is
the absolute path expression from root to the nodes
represented by $a (or {$a}).



/bib/book→$b/title→$t
the associated path of $t is /bib/book/title.
Definition 2. Variable $a is an ancestor variable of $b if $a
and $b are defined in the same XTree expression, and the
associated path of $a is a prefix of the associated path of $b.


/bib/book→$b/{title→$t, author→$a}
$b is an ancestor variable of $t and $a, but $t is not an ancestor
variable of $a.
24
Semantics of list-valued variables

Definition 3. In an XTree expression, when a variable is
bound to a value in the query evaluation, the variable is
instantiated.



/bib/book/{author→$a/first→$first, title→$t}
In the evaluation, when we have reach /bib/book/author, $a is
instantiated; when reach /bib/book/author/first, $first is instantiated.
Definition 4. The value of list-valued variable {$a} is a list of
all instances of $a with all its ancestor variables instantiated.


/bib/book/author→{$a}
{$a} means all the author elements
of all the books
value of {$a}
/bib/book→$b/author→{$a} {$a} means all the authors of a
value of {$a}
certain book $b
25
XTree for result construction



XTree expression can also be used to define the result
format
Symbol ← will get values of variables from right side and
assign them to the expression on the left side
The result construction part is just one XTree expression

No nested structure as the return clause of XQuery



Since XTree already has a tree structure
Easy to read and understand
Must be concrete


No condition checking or uncertainty in the structure
Unlike XTree expressions in the querying part
26
XTree for result construction

Example. We want to list the titles and publishers of books which
are published after 1993, suppose we have bound the variables by
the following XTree expression:
/bib/book/{@year>1993, title→$t, publisher→$p}
We can write the following XTree expression to define the result
format:
/result/recentbook/{title←$t, publisher←$p}

The result format is defined as: under the root result, each recentbook
element will store the title and publisher of that book
<result>
<recentbook>
<title>TCP/IP Illustrated</title>
<publisher>Addison-Wesley</publisher>
</recentbook>
<recentbook>
<title>Data on the web</title>
<publisher>Morgan Kaufmann</publisher>
</recentbook>
<result>
27
XTree for result construction

Example. For each book, show the title, the number of authors and
the first author, suppose the variable bindings are defined in the
following XTree expression:
/bib/book/{title→$t, author→{$a}}
We can write the following XTree expression to return the result:
/result/book/{title←$t, authNum←{$a}.count(), author←{$a}[1]}



{$a}.count() counts the
number of items in the
{$a} list
{$a}[1] returns the first
item in the {$a} list
Output:
<result>
<book>
<title>TCP/IP Illustrated</title>
<authNum>1</authNum>
<author><last>Stevens</last><first>W.</first></author>
</book>
<book>
<title>Advanced Programming in the Unix Environment</title>
<authNum>1</authNum>
<author><last>Stevens</last><first>W.</first></author>
</book>
<book>
<title>Data on the Web</title>>
<authNum>3</authNum>
<author><last>Abiteboul</last><first>Serge</first></author>
</book>
</result>
28
XTree for result construction

The right side of ← symbol can be:




A pre-defined variable or invocation of functions on variables
Literal text, indicating static content
Omitted, indicating an empty value
Example. Suppose we want to return a book whose title is
“Computer Architecture”, and which does not have a specified
author, we can write the following XTree expression:
/bib/book/{title←“Computer Architecture”, no-author}
It will output the following XML segment:
<bib>
<book>
<title>Computer Architecture</title>
<no-author/>
</book>
</bib>
29
XTree for result construction

Query based on XTree expressions has QWOC
(Query-Where-Order by-Construct) statements




Query clause contains one or more XTree expressions for
selection and variables binding
Where clause is optional, it defines constraints
Order by clause is optional, it defines the ordering
Construct clause contains one XTree expression to define
the output format
30
Outlines




Introduction
Preliminaries
XTree
Algorithm to transform XTree query to XQuery



An algorithm to transform an XTree expression in
the query part to a set of XPath expressions
An algorithm to transform an XTree expression in
the result construction part to some nested XQuery
expressions
Conclusion and future works
31
Transformation algorithm for querying
part

Transform an XTree expression in the querying part
to a set of XPath expressions

Not as trivial as just extracting each path associated with a
variable to be an XTree expression



Variables may correlate to each other by some common
ancestors
We have to use such common ancestors to constrain the
descendent variables
The common ancestors we want are just those branching
nodes (the nodes just before every pair of square braces
for branching)

Use stack to store such common ancestors for later use
32
Transformation algorithm for querying
part


Process the XTree expression from left to right, for each
common ancestor of variables (except the root), assign a
single-valued variable on it if it is not originally bound to
a variable
Translate each single-valued variable to be an XPath
expression in a for clause; translate each list-valued
variable to be an XPath expression in a let clause

Try to write the path expression of a variable to be the relative
path of its nearest ancestor variable (make use of the stack)



If it has such ancestor variable, then write its path expression to be
the relative path from that ancestor variable
If it does not have any ancestor variable, then write its path
expression to be the absolute path from the root
The output paths will be in depth-first order of the XTree
33
Transformation algorithm for querying
part
Example:
/bib/{book→$b/{title→$t,
/bib/{book/{title→$t, author→{$a}},
author→{$a}},
journal→$j/{title→$jt,
journal/{title→$jt, editor/{last→$last,
editor/{last→$last,
editor→$e/{last→$last,
first→$first}}}
first→$first}}}
first→$first}}}


XPaths generated:
for $b in /bib/book
for $t in $b/title
let $a := $b/author
for $j in /bib/journal
for $jt in $j/title
for $e in $j/editor
for $last in $e/last
for $first in $e/first
34
Transformation algorithm for result
construction part

Transform an XTree expression in the result construction
part to some XQuery expressions

More complicated



We will often encounter nested sub-queries in XQuery
Consider the case that the node name to get the variable
value is different from the node name where the variable was
bound in the querying part
Process the XTree expression step by step



Find the corresponding XPath expression of each variable in
the XPaths generated from last algorithm
Translate each variable value substitution to some XQuery
statement
Use curly braces { } to form sub-query blocks according to the
structure of the XTree expression in construct clause
35
Transformation algorithm for result
construction part

Example:
query /bib/{book/{title→$t, author→{$a}},
journal/{title→$jt, editor/{last→$last, first→$first}}}
construct /result/{book/{name←$t, authors/{@count←{$a}.count( ), au←{$a}}},
journal/{title←$jt, editor/{first←$first, last←$last}}}

Generated XPath expressions
of the querying part:
for $b in /bib/book
for $t in $b/title
let $a := $b/author
for $j in /bib/journal
for $jt in $j/title
for $e in $j/editor
for $last in $e/last
for $first in $e/first
36
Transformation algorithm for result
construction part

Output:
<result> {
for $b in /bib/book
return <book> {
for $t in $b/title
return <name> {$t/*} {$t/@*} {$t/text()} </name>
}
{
let $a := $b/author
return <authors count={count($a)}> {
for $x in $a
return <au> {$x/*} {$x/@*} {$x/text()} </au>
}
</authors>
}
</book>
}
{
for $j in /bib/journal
return <journal> {
for $jt in $j/title
return {$jt}
}
{
for $e in $j/editor
return <editor> {
for $first in $e/first
return {$first}
}
{
for $last in $e/last
return {$last}
} </editor>
} </journal>
} </result>
37
Outlines





Introduction
Preliminaries
XTree
Algorithm to transform XTree query to XQuery
Conclusion and future works


Conclusion
Future works
38
Conclusion


Discussed the limitations of XPath
Proposed a new set of syntax rules called XTree






XTree has a tree structure
In the querying part, one XTree expression can bind
multiple variables
In the result construction part, one XTree expression can
define the result format
List-valued variables are explicitly indicated, and their
values are uniquely determined
XTree is more compact and convenient to use than XPath
Designed algorithms to transform a query based on
XTree expressions to a standard XQuery query
39
Future works

Implement an XTree query parser





Queries based on XTree expressions can be executed
directly
The query evaluation will be more efficient on this approach,
since we will have a global view of the whole query tree
Extend the transformation algorithms to support queries
with join, negation, grouping and recursion
Optimize the output XQuery queries of our transformation
algorithms according to the schema of the XML document
Observe the progressive development of XPath to
continuously enhance our XTree
40
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
S.Abiteboul, D.Quass, J.McHugh, J.Widom, and J.L. Wiener. The Lorel Query Language for
Semistructured Data. International Journal of Digital Library 1(1):68-99, 1997.
S.Ceri, S.Comai, E.Damiani, P.Fraternali, S.Paraboschi, and L.Tanca. XML-GL: a Graphical
Language for Querying and Restructuring WWW data. In Proceedings of the 8th International
World Wide Web Conference, Toronto, Canada, 1999.
S.Cluet and J.Simeon. YATL: a Functional and Declarative Language for XML. Draft manuscript,
March 2000.
H.Hosoya and B.Pierce. XDuce: A Typed XML Processing Language (Preliminary Report). In
Proceedings of WebDB Workshop, 2000.
M.Liu and T.W.Ling. Towards Declarative XML Querying. In Proceedings of WISE 2002, 127-138,
Singapore, 2002.
P.Chippimolchai, V.Wuwongse and C.Anutariya. Semantic Query Formulation and Evaluation for
XML Databases. In Proceedings of WISE 2002, 205-214, Singapore, 2002.
D.Chamberlin, P. Fankhauser, M.Marchiori, and J.Robie. XML Query Requirements. W3C
Working Draft, In http://www.w3.org/TR/xquery-requirements/, June 2003.
J. Clark and S.DeRose. XML Path Language (XPath) Version 1.0. W3C Recommendation, In
http://www.w3.org/TR/xpath, November 2001.
D.Chamberlin, D.Florescu, J.Robie, J.Simon, and M.Stefanescu. XQuery 1.0: A Query Language
for XML. W3C Working Draft, In http://www.w3.org/TR/xquery/, May 2003.
J.Robie, J.Lapp, and D.Schach. XML Query Language (XQL). In
http://www.w3.org/TandS/QL/QL98/pp/xql.html, 1998.
A. Deutsch, M.Fernandez, D.Florescu, A.Levy, and D.Suciu. XML-QL: A Query Language for XML.
In http://www.w3.org/TR/NOTE-xml-ql/, August 1998.
J.Clark. XSL Transformations (XSLT) Version 1.0. W3C Recommendation, In
http://www.w3.org/TR/xslt, November 1999.
41
Thank you
42
<bib name=“IT”>
<book id=“b001” year=“1994”>
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
</book>
<book id =“b002” year=“1992”>
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
</book>
<book id=“b003” year=“2000”>
<title>Data on the Web</title>
<edition>3</edition>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann</publisher>
</book>
<journal id=“j001” year=“1998”>
<title>XML</title>
<editor><last>Date</last><first>C.</first></editor>
<editor><last>Gerbarg</last><first>M.</first></editor>
<publisher>Morgan Kaufmann</publisher>
</journal>
</bib>
{$a}
back
43
<bib name=“IT”>
<book id=“b001” year=“1994”>
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
</book>
<book id =“b002” year=“1992”>
<title>Advanced Programming in the Unix Environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
</book>
<book id=“b003” year=“2000”>
<title>Data on the Web</title>
<edition>3</edition>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann</publisher>
</book>
<journal id=“j001” year=“1998”>
<title>XML</title>
<editor><last>Date</last><first>C.</first></editor>
<editor><last>Gerbarg</last><first>M.</first></editor>
<publisher>Morgan Kaufmann</publisher>
</journal>
</bib>
{$a}
$b
{$a}
$b
{$a}
$b
back
44
Download