3/24/2009 Managing Data IST400/600 Jian Qin What did you get from this video? • The Machine is Us/ing Us http://www.youtube.com/watch?v=NLlGopyXT_g What does it mean for data management? • Machine readable data • Machine understandable data • The Machine is Data The challenge is: how can we make machine‐ understandable data? Managing Data 2 1 3/24/2009 XML and enabling technologies • • • • • XML DTD and Schema Language XML Transformation and Stylesheets XML Linking XML and programming languages XML and databases <dataset> Tags have - semantic meaning - structure - extensible <title> <references> <author> bli ti <publication> <year> <subject> <date> … </dataset> Syntax is - well-formed - validated Managing Data 3 XML Technologies and Components Scientific SGML Business XML HTML Legal DTDs Schemas Vocabularies Data model Medical Computer Telephony Server-side XML Client-side HTML XHTML Multipresentations Presentation CSS DOM XPath XSLT SAX XPointer Programming XLink XSL-FO XInclude Transformation T f ti XQuery Navigation & search Managing Data 4 2 3/24/2009 Why XML? • Separation of content and presentation – Create content once and display that content with different presentations – Benefits include: • Allow for content reuse and repurpose • Allow for consistent look and feel in presentations • Easier for maintenance and updates Managing Data 5 What Are the Benefits of XML? Structure ‐‐ to model data to any level of complexity Extensibility ‐‐ to define new tags as needed Validation ‐‐ to check data for structural correctness Media independence ‐‐ to publish content in multiple formats • Vendor and platform independence ‐‐ to process any conforming document using standard commercial conforming document using standard commercial software or even simple text tools. • Single‐source document creation • • • • Managing Data 6 3 3/24/2009 Encoding metadata records • An example of encoding and transformation – Requires three files to work together: • Schema file (defining the elements and structure of XML file) • XML file (containing the data content that conforms to the structure defined by the schema) • XSL file (relies on the structure defined in schema file to transform the XML file into an HTML for presentation p or another XML file for structural change) Managing Data 7 The book catalog XML Partial XML file. For complete file please see WebCT learning module for this week Managing Data 8 4 3/24/2009 Book catalog schema Managing Data 9 XSL Stylesheet for the book catalog Run Demo Run Demo Managing Data 10 5 3/24/2009 Schema is the key Schema modes: a) single encoding schema, b) multiple encoding schemas, and c) networked encoding schemas Managing Data 11 Encoding schemas: structures (1) Dublin Core XML schemas EAD 2002 Schema structure: Single encoding schema Multiple encoding schemas Managing Data 12 6 3/24/2009 Encoding schemas: structure (2) DLESE metadata application profile XML schema structure Managing Data 13 XML AND DATABASE Managing Data 14 7 3/24/2009 Is XML a database? Yes: Only in the sense that it is a collection of data • Sort of: – Storage (XML instances) – Schemas (DTD and XML schemas – Query language (XQuery, XPath, etc.) Programming interface – Programming interface (DOM, SAX, JDOM) However it doesn’tt have: have: • However, it doesn – – – – – Indexes Security Data integrity Multi‐user access Triggers … gg Bourret, Ronald. (2005). XML and databases. http://www.rpbourret.com/xml/XMLAndDatabases.htm Managing Data 15 Database vs. XML Why use a database? Why use XML? XML < > < > Database < > < > Scenario: A digital collection catalog allows users to search search, browse browse, and order copies, as well as check order status. < > < > < > < > < > Questions: y want to use a database? • Whyy do you • Why do you want to use XML? • Which function will benefit the most from each of the two approaches? Managing Data 16 8 3/24/2009 Data-Centric XML • Characteristics of Data‐centric XML – Fairly regular structure – Fine‐grained data (that is, the smallest p independent unit of data is at the level of a PCDATA‐only element or an attribute) – Little or no mixed content – Where XML is used as a data transport • Examples: – Real‐time data feed from field instruments – Simple metadata such as author and reference information – Patient records – … Not Important Physical structure: The order of sibling elements, whether data is stored in attributes or PCDATA‐only elements, whether entities are used Managing Data 17 Example of data-centric XML <?xml version="1.0" encoding="UTF‐8"?> <Point‐of‐Contact class="vcard"> <Name class="fn">John Smith</Name> <Address class="adr"> <Street class="street‐address">10 Tremont St.</Street> <City class="locality">Boston</City> <State class="region">MA</State> </Address> / dd <Telephone class="tel">617‐123‐4567</Telephone> </Point‐of‐Contact> Managing Data 18 9 3/24/2009 Document-Centric XML (1) • Characteristics of document‐centric: – Irregular or complicated structure – Larger grained data (the smallest unit of data might be at the level of an element with mixed content or the entire document itself) – Lots of mixed content – The order in which sibling elements and PCDATA occurs is almost always significant – Examples: books, email, advertisements, and almost any Examples: books, email, advertisements, and almost any XHTML document. – Document‐centric documents are generally designed for human consumption. Managing Data 19 Document-Centric XML (2) <Product> <Name>Turkey Wrench</Name> <Developer>Full Fabrication Labs, Inc.</Developer> <Summary>Like a monkey wrench, but not as big.</Summary> <Description> <Para>The turkey wrench wrench, which comes in both rightright and left left-handed handed versions (skyhook optional), is made of the finest stainless steel. The Readigrip rubberized handle quickly adapts to your hands, even in the greasiest situations. Adjustment is possible through a variety of custom dials.</Para> <Para>You can:</Para> <List> <Item><Link URL="Order.html">Order your own turkey wrench</Link></Item> <Item><Link URL="Wrenches.htm">Read more about wrenches</Link></Item> <Item><Link URL="catalog.zip">Download the catalog</Link> </Item></List> <Para>The turkey wrench costs just $19.99 and, if you order now, comes with a hand-crafted shrimp hammer as a bonus gift.</Para> </Description> Managing Data 20 10 3/24/2009 Document Centric XML (3) • For a simple system to handle document centric XML, you will need at least five tables: – Attribute definition: defines attributes, including their type, legal values, and so on – Element/attribute association: defines which attributes apply to which elements – Content model definition: defines which elements can contain which other elements – Attribute values: contains attribute values and pointers to the appropriate rows in the attribute definition and element/attribute association tables – Element values: contains element values (PCDATA or pointers to other element values), the order in which the element occurs in its parent, a pointer to the row that contains the value of the parent element and a pointer to the appropriate that contains the value of the parent element, and a pointer to the appropriate row in the element/attribute association table It is not always the best solution to convert XML documents into database Managing Data 21 Data, Documents, and Databases • Distinction is not always clear – A data‐centric document, such as self‐descriptive d fl data files, might contain large‐grained, irregularly h l d l l structured data, such as a abstract or project description – A document‐centric document, such as a user's guidebook, might contain fine‐grained, regularly structured data (often metadata) t t d d t ( ft t d t ) Managing Data 22 11 3/24/2009 Approaches in XML storage • Convert the XML data to tables, store them in a relational database and translate queries to SQL SQ – Simpler – Short term solution • Design a database management system especially for XML data especially for XML data Managing Data 23 Transferring Data • Information that can not be stored in databases: – DTDs – Physical structure • Entity definition and usage • The order in which attribute values and sibling elements occur • The way in which binary data is stored (Base64 v. unparsed entity v. something else) • CDATA sections • Encoding information • Information retrieved from databases: – Contain no CDATA or entity usage – No order in the resulting data retrieved Managing Data 24 12 3/24/2009 Mapping Doc Structure to DB Structure • Transferring data from XML documents to databases or vice versa need to map the structure – Table‐based mapping – Object‐relational mapping Managing Data 25 Table-based mapping (1) • XML Database – XML document Table – Element / Attribute Column • Database XML – Table <database> – Column l <row> – Row <column1>, <column2>, … Managing Data 26 13 3/24/2009 Table-based mapping (2) Source: http://www.rpbourret.com/xml/DTDToDatabase.htm Managing Data 27 Object-Relational Mapping Source: http://www.rpbourret.com/xml/DTDToDatabase.htm Managing Data 28 14 3/24/2009 Storing XML: native XML database (1) • The database is specialized for storing XML data and stores all components of the XML model intact. d li • A native XML database may not actually be a standalone database at all. Source: Staken, Kimbro. (2001). Introduction to native XML databases. Available at: http://www.xml.com/pub/a/2001/10/31/nativexmldb.html Managing Data 29 Oracle’s XMLType: a native XML DB • XMLType is a native data‐type that used to store and manage XML documents in columns or tables • XML can be stored in one of the two ways: – An XMLType column in a relational table – An XML object in an XMLType table Managing Data 30 15 3/24/2009 Template-Driven Query (1) • No predefined mapping between document structure and database structure • Embed commands in a template that is processed by the data transfer middleware <?xml version="1.0"?> <FlightInfo> <Intro>The following flights have available seats: </Intro> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights </SelectStmt> <Conclude>We hope one of these meets your needs </Conclude> </FlightInfo> Managing Data 31 Template-Driven Mapping (2) Result from the query: <FlightInfo> <Intro>The following flights have available seats:</Intro> <Flights> Fli ht <Row> <Airline>ACME</Airline> <FltNumber>123</FltNumber> <Depart>Dec 12, 1998 13:43</Depart> <Arrive>Dec 13, 1998 01:21</Arrive> </Row> ... </Flights> <Conclude>We hope one of these meets your needs</Conclude> </FlightInfo> Managing Data 32 16 3/24/2009 SQL-based query language • Use modified SELECT statements, the results of which are transformed to XML SELECT Orders.SONumber, SELECT Orders SONumber XMLELEMENT(NAME "Order", XMLATTRIBUTES(Orders.SONumber AS SONumber), XMLELEMENT(NAME "Date", Orders.Date), XMLELEMENT(NAME "Customer", Orders.Customer)) AS xmldocument FROM Orders Example order result from the query: <Order SONumber="123"> <Date>10/29/02</Date> <Customer>Gallagher Industries</Customer> </Order> Managing Data 33 Summary • XML documents need database management system functions to solve issues in storage, security, concurrency control, version control, and data and referential integrity • Data conversion uses two approaches: table‐based and object‐relational • Native XML databases specialize in storing XML data and store all components of the XML model intact • XML query languages are closely related to XPath, XML Schema, and XML Infoset Managing Data 34 17 3/24/2009 Exercise: data conversion • Convert XML data to relational database • Convert relational database data into XML files Managing Data 35 18