XML XML and databases Dennis Andersson, FOI Andreas Borg, LiU/IDA/PELAB XML database • XML in RDBs – Adding semi-structured features to strongly typed databases – Example: MS SQL Server 2005 – Dense vs. sparse • XML as DBs – An XML file IS a database – A set of XML files is also a database – Semi-structured <Foo> <Bars> <Bar Number=2 String=”ABC” /> <Bar Number=1 /> <Bar String=”XTC”> Baz </Bar> </Bars> <Bar> Booze </Bar> </Foo> XML as a DB • In order to effectively use an XML document as a database we need: – A method for persistance • (a filesystem?) – To allow placing constraints on data • (a data model) – A method for querying • (a query language) XQuery background • W3C defines several XML standards: – XML Schema: notation for defining new types of elements and documents – XSLT: notation for transforming XML documents from one representation to another – XPath: notation for selecting elements within an XML document – XQuery: a query language designed expressly for XML data sources XQuery • Design in progress – Only retrieval – Updating existing XML documents may follow • XQuery 1.0 – W3C recommendation 23 January 2007 • Two syntaxes: – Expressed in XML – Human-oriented version 1 XQuery data model • Sequence: An ordered collection of zero or more items • Item: Node or atomic value – Node • • • • • • • Element Attribute Text Document Comment Processing instruction Namespace nodes XQuery Expressions • • • • • – Atomic value: E.g. strings, integers, decimals • Typed value: A sequence of zero or more typed values • Document order: Each node appears before its children Example data: items.xml • • • • Basics (literals, variables, core function library) Path expressions (child, descendant, parent …) Predicates (e.g. ”seller = Smith”) Element constructors (to construct new elements) Iteration and sorting (FLWR: for-let-wherereturn) Arithmetic (+,-,*,div) Operations on sequences Conditional expressions Quantified expressions (some, every) Example data: bids.xml document element attribute text node Path expressions • The result of each step is a sequence of nodes • The value is the node sequence resulting from the last step • Q1: List the descriptions of all items offered for sale by Smith. Predicates • Q3:Find the status attribute of the item that is the parent of a given description $description/../@status – XML document(”items.xml”)/child::* /child::item[child::seller = ”Smith”] /child::description variable parent attribute node – Human-oriented document(”items.xml”) /*/item[seller = ”Smith”]/description 2 Iteration and sorting • Q4: For each item that has more than ten bids, generate a popularitem element containing the item number, description, and bid count. F L W R for $i in document(”items.xml”)/*/item let $b := document(”bids.xml”) /*/bid[itemno = $i/itemno] where count ($b) > 10 return <popular-item> { $i/itemno, $i/description, <bid-count> {count ($b)} </bid-count> } </popular-item> A model for XML databases • An XML document is well-formed – tags are properly nested – no need to conform to a particular schema – semi-structured data – relational and object-oriented modeling techniques becomes complex – efficient data models are needed sortby bid-count descending XML elements XDD, XML Declarative Description • A simple yet expressive mechanism • Ground XML expression XML element – explicit and implicit info • A description in XDD consists of • Example: <SubElement>Bar</SubElement> <SubElement>Baz</SubElement> <SubElement>Boz</SubElement> – XML elements – XML expressions (extended XML elements with variables) – XML clauses (constraints and relationships) </Element> • Example: (Non-ground) XML expressions • XML element with variable – – – – – Name String Attribute-value-pair XML-expression Intermediate-expression • Example: $N:name $S:name $P:name $E:name $I:name a name a string seq of pairs seq of expressions part of expression <$N:element id=$S:id $P:att1> $E:subelements </$N:element> Specifies a generic element with a string attribute called id, i.e. it can have any number of attributes and any number of subelements of any depth <Element id=1 type=”foo”> <AnotherElement /> Generalization <AirTrip from=”Bangkok” to=”London”> <Path> <City>Bangkok</City> <City>Singapore</City> <City>London</City> </Path> <Price>650</Price> </AirTrip> a (ground XML expression) <AirTrip from=$S:from to=”London”> $E:details </AirTrip> a’ (generalization of a) <$I:element> <City>Singapore</City> </$I:element> a’’ (another generalization of a) 3 Specialization <AirTrip from=$S:from to=”London”> $E:details </AirTrip> <AirTrip from=”Bangkok” to=”London”> $E:details </AirTrip> <AirTrip from=”Bangkok” to=”London”> $E:e1 $E:e2 </AirTrip> <AirTrip from=”Bangkok” to=”London”> <Path> <City>Bangkok</City> <City>Singapore</City> <City>London</City> </Path> $E:e2 </AirTrip> <AirTrip from=”Bangkok” to=”London”> <Path> <City>Bangkok</City> <City>Singapore</City> <City>London</City> </Path> <Price>650</Price> </AirTrip> Extensional XML DB (XDBE) <Flight number=”TG916” airline=”TG”> <Origin>Bangkok</Origin> <Destination>London</Destination> <Price>750</Price> </Flight> <Flight number=”SQ61” airline=”SQ”> <Origin>Bangkok</Origin> <Destination>Singapore</Destination> <Price>150</Price> </Flight> <Flight number=”SQ320” airline=”SQ”> <Origin>Singapore</Origin> <Destination>London</Destination> <Price>500</Price> </Flight> Constraints (XDBC) • Example constraints – – – – – A flight can not have the same origin and destination The price of a flight must be an integer The price of a flight must be less than 1500 The flight number must be unique Elements in the database must conform to a certain schema XDD database modeling • XML document – Formalized as an XDD description containing n ground XML unit clauses (facts, see definition 5) • Extensional XML DB (XDBE) – 1+ XML documents formalized as above • Intensional XML DB (XDBI) – Comprised of XML non-unit clauses defining axioms, relationships or deductible knowledge (XML non-unit clauses) • Set of structural and integrity constraints (XDBC) – XML non-unit clauses defining particular constraints • XDD Description: XDB = XDBE υ XDBI υ XDBC Intensional XML DB (XDBI) • Example axiom – Minimum waiting time between two connecting flights is 1 hour • Example deductible information – There is a flight from Singapore to Bangkok – There is a flight from Bangkok to London – Hence there is a 2-step flight from Singapore to London • Can be expressed in XML – see definition 5 and figure 4 XDD querying • An XML query can be formalized as an XML non-unit clause (query clause) • The result of the query is a sequence of all possible specializations of the query clause in the database. • An example query is presented in figure 7 • Can be expressed in XML – see definition 4 and figure 5 4 XQuery exercises • Ex1: Find the names of all conflicts where Great Britain was a party. • Ex2: For each ongoing conflict (that has no end-date), generate a conflictpublication element containing the conflict name and publication titles. • Ex3: Extend the conflict-publication element of Ex2 to also display the number of years the conflict has lasted (so far) XDD Exercises • Ex4: Create a non-ground XML expression that covers, at the highest possible level of detail, any conflict in conflict.xml • Ex5: Using the expression from Ex4, show through specialization how the following entity can be derived: <conflict id="WW3" start="2050" type="fiction"> <name>World war 3</name> <parties> <party>Blue side</party> <party>Red side</party> </parties> <casualties>1000000</casualties> <civiliansKilled>10000</civiliansKilled> </conflict> 5