XML XML and databases

advertisement
XML
XML and databases
Dennis Andersson, FOI
Andreas Borg, LiU/IDA/PELAB
XML database
• XML in RDBs
– Adding semi-structured features to strongly
typed databases
– Example: MS SQL Server 2005
– Dense vs. sparse
• XML as DBs
– An XML file IS a database
– A set of XML files is also a database
– Semi-structured
<Foo>
<Bars>
<Bar Number=2 String=”ABC” />
<Bar Number=1 />
<Bar String=”XTC”> Baz </Bar>
</Bars>
<Bar> Booze </Bar>
</Foo>
XML as a DB
• In order to effectively use an XML
document as a database we need:
– A method for persistance
• (a filesystem?)
– To allow placing constraints on data
• (a data model)
– A method for querying
• (a query language)
XQuery background
• W3C defines several XML standards:
– XML Schema: notation for defining new types
of elements and documents
– XSLT: notation for transforming XML
documents from one representation to
another
– XPath: notation for selecting elements within
an XML document
– XQuery: a query language designed
expressly for XML data sources
XQuery
• Design in progress
– Only retrieval
– Updating existing XML documents may follow
• XQuery 1.0
– W3C recommendation 23 January 2007
• Two syntaxes:
– Expressed in XML
– Human-oriented version
1
XQuery data model
• Sequence: An ordered collection of zero or more items
• Item: Node or atomic value
– Node
•
•
•
•
•
•
•
Element
Attribute
Text
Document
Comment
Processing instruction
Namespace nodes
XQuery Expressions
•
•
•
•
•
– Atomic value: E.g. strings, integers, decimals
• Typed value: A sequence of zero or more typed values
• Document order: Each node appears before its children
Example data: items.xml
•
•
•
•
Basics (literals, variables, core function library)
Path expressions (child, descendant, parent …)
Predicates (e.g. ”seller = Smith”)
Element constructors (to construct new
elements)
Iteration and sorting (FLWR: for-let-wherereturn)
Arithmetic (+,-,*,div)
Operations on sequences
Conditional expressions
Quantified expressions (some, every)
Example data: bids.xml
document
element
attribute
text node
Path expressions
• The result of each step is a sequence of nodes
• The value is the node sequence resulting from the last
step
• Q1: List the descriptions of all items offered for sale by
Smith.
Predicates
• Q3:Find the status attribute of the item that is the
parent of a given description
$description/../@status
– XML
document(”items.xml”)/child::*
/child::item[child::seller = ”Smith”]
/child::description
variable
parent
attribute node
– Human-oriented
document(”items.xml”)
/*/item[seller = ”Smith”]/description
2
Iteration and sorting
•
Q4: For each item that has more than ten bids, generate a popularitem element containing the item number, description, and bid count.
F
L
W
R
for $i in document(”items.xml”)/*/item
let $b := document(”bids.xml”)
/*/bid[itemno = $i/itemno]
where count ($b) > 10
return
<popular-item>
{
$i/itemno,
$i/description,
<bid-count> {count ($b)} </bid-count>
}
</popular-item>
A model for XML databases
• An XML document is well-formed
– tags are properly nested
– no need to conform to a particular schema
– semi-structured data
– relational and object-oriented modeling
techniques becomes complex
– efficient data models are needed
sortby bid-count descending
XML elements
XDD, XML Declarative Description
• A simple yet expressive mechanism
• Ground XML expression XML element
– explicit and implicit info
• A description in XDD consists of
• Example:
<SubElement>Bar</SubElement>
<SubElement>Baz</SubElement>
<SubElement>Boz</SubElement>
– XML elements
– XML expressions (extended XML elements
with variables)
– XML clauses (constraints and relationships)
</Element>
• Example:
(Non-ground) XML expressions
• XML element with variable
–
–
–
–
–
Name
String
Attribute-value-pair
XML-expression
Intermediate-expression
• Example:
$N:name
$S:name
$P:name
$E:name
$I:name
a name
a string
seq of pairs
seq of expressions
part of expression
<$N:element id=$S:id $P:att1>
$E:subelements
</$N:element>
Specifies a generic element with a string attribute called id, i.e. it can have
any number of attributes and any number of subelements of any depth
<Element id=1 type=”foo”>
<AnotherElement />
Generalization
<AirTrip from=”Bangkok” to=”London”>
<Path>
<City>Bangkok</City>
<City>Singapore</City>
<City>London</City>
</Path>
<Price>650</Price>
</AirTrip>
a (ground XML expression)
<AirTrip from=$S:from to=”London”>
$E:details
</AirTrip>
a’ (generalization of a)
<$I:element>
<City>Singapore</City>
</$I:element>
a’’ (another generalization of a)
3
Specialization
<AirTrip from=$S:from to=”London”>
$E:details
</AirTrip>
<AirTrip from=”Bangkok” to=”London”>
$E:details
</AirTrip>
<AirTrip from=”Bangkok” to=”London”>
$E:e1 $E:e2
</AirTrip>
<AirTrip from=”Bangkok” to=”London”>
<Path>
<City>Bangkok</City>
<City>Singapore</City>
<City>London</City>
</Path>
$E:e2
</AirTrip>
<AirTrip from=”Bangkok” to=”London”>
<Path>
<City>Bangkok</City>
<City>Singapore</City>
<City>London</City>
</Path>
<Price>650</Price>
</AirTrip>
Extensional XML DB (XDBE)
<Flight number=”TG916” airline=”TG”>
<Origin>Bangkok</Origin>
<Destination>London</Destination>
<Price>750</Price>
</Flight>
<Flight number=”SQ61” airline=”SQ”>
<Origin>Bangkok</Origin>
<Destination>Singapore</Destination>
<Price>150</Price>
</Flight>
<Flight number=”SQ320” airline=”SQ”>
<Origin>Singapore</Origin>
<Destination>London</Destination>
<Price>500</Price>
</Flight>
Constraints (XDBC)
• Example constraints
–
–
–
–
–
A flight can not have the same origin and destination
The price of a flight must be an integer
The price of a flight must be less than 1500
The flight number must be unique
Elements in the database must conform to a certain
schema
XDD database modeling
• XML document
– Formalized as an XDD description containing n
ground XML unit clauses (facts, see definition 5)
• Extensional XML DB (XDBE)
– 1+ XML documents formalized as above
• Intensional XML DB (XDBI)
– Comprised of XML non-unit clauses defining axioms,
relationships or deductible knowledge (XML non-unit
clauses)
• Set of structural and integrity constraints (XDBC)
– XML non-unit clauses defining particular constraints
• XDD Description: XDB = XDBE υ XDBI υ XDBC
Intensional XML DB (XDBI)
• Example axiom
– Minimum waiting time between two connecting flights
is 1 hour
• Example deductible information
– There is a flight from Singapore to Bangkok
– There is a flight from Bangkok to London
– Hence there is a 2-step flight from Singapore to
London
• Can be expressed in XML
– see definition 5 and figure 4
XDD querying
• An XML query can be formalized as an
XML non-unit clause (query clause)
• The result of the query is a sequence of all
possible specializations of the query
clause in the database.
• An example query is presented in figure 7
• Can be expressed in XML
– see definition 4 and figure 5
4
XQuery exercises
• Ex1: Find the names of all conflicts where
Great Britain was a party.
• Ex2: For each ongoing conflict (that has
no end-date), generate a conflictpublication element containing the conflict
name and publication titles.
• Ex3: Extend the conflict-publication
element of Ex2 to also display the number
of years the conflict has lasted (so far)
XDD Exercises
• Ex4: Create a non-ground XML expression that covers,
at the highest possible level of detail, any conflict in
conflict.xml
• Ex5: Using the expression from Ex4, show through
specialization how the following entity can be derived:
<conflict id="WW3" start="2050" type="fiction">
<name>World war 3</name>
<parties>
<party>Blue side</party>
<party>Red side</party>
</parties>
<casualties>1000000</casualties>
<civiliansKilled>10000</civiliansKilled>
</conflict>
5
Download