Managing Data IST400/600 Jian Qin What did you get from this video?

advertisement
3/24/2009
Managing Data
IST400/600
Jian Qin
What did you get from this video?
• The Machine is Us/ing Us
http://www.youtube.com/watch?v=NLlGopyXT_g
What does it mean for data management?
• Machine readable data
• Machine understandable data
• The Machine is Data
The challenge is: how can we make machine‐
understandable data?
Managing Data
2
1
3/24/2009
XML and enabling technologies
•
•
•
•
•
XML DTD and Schema Language
XML Transformation and Stylesheets
XML Linking XML and programming languages
XML and databases <dataset>
Tags have
- semantic meaning
- structure
- extensible
<title>
<references>
<author>
bli ti
<publication>
<year>
<subject>
<date>
…
</dataset>
Syntax is
- well-formed
- validated
Managing Data
3
XML Technologies and Components
Scientific
SGML
Business
XML
HTML
Legal
DTDs
Schemas
Vocabularies
Data model
Medical
Computer
Telephony
Server-side XML
Client-side HTML
XHTML
Multipresentations
Presentation
CSS
DOM
XPath
XSLT
SAX
XPointer
Programming
XLink
XSL-FO
XInclude
Transformation
T
f
ti
XQuery
Navigation
& search
Managing Data
4
2
3/24/2009
Why XML?
• Separation of content and presentation
– Create content once and display that content with different presentations
– Benefits include:
• Allow for content reuse and repurpose
• Allow for consistent look and feel in presentations
• Easier for maintenance and updates Managing Data
5
What Are the Benefits of XML?
Structure ‐‐ to model data to any level of complexity Extensibility ‐‐ to define new tags as needed Validation ‐‐ to check data for structural correctness Media independence ‐‐ to publish content in multiple formats • Vendor and platform independence ‐‐ to process any conforming document using standard commercial conforming document using standard commercial
software or even simple text tools. • Single‐source document creation
•
•
•
•
Managing Data
6
3
3/24/2009
Encoding metadata records
• An example of encoding and transformation
– Requires three files to work together:
• Schema file (defining the elements and structure of XML file)
• XML file (containing the data content that conforms to the structure defined by the schema)
• XSL file (relies on the structure defined in schema file to transform the XML file into an HTML for presentation p
or another XML file for structural change)
Managing Data
7
The book catalog XML
Partial XML file. For complete
file please see WebCT
learning module for this week
Managing Data
8
4
3/24/2009
Book catalog schema
Managing Data
9
XSL Stylesheet for the book catalog
Run Demo
Run Demo
Managing Data
10
5
3/24/2009
Schema is the key
Schema modes:
a) single encoding schema,
b) multiple encoding schemas, and
c) networked encoding schemas
Managing Data
11
Encoding schemas: structures (1)
Dublin Core XML schemas
EAD 2002 Schema structure:
Single encoding schema
Multiple encoding schemas
Managing Data
12
6
3/24/2009
Encoding schemas: structure (2)
DLESE metadata application profile XML schema structure
Managing Data
13
XML AND DATABASE
Managing Data
14
7
3/24/2009
Is XML a database?
Yes:
Only in the sense that it is a collection of data
• Sort of:
– Storage (XML instances)
– Schemas (DTD and XML schemas
– Query language (XQuery, XPath, etc.)
Programming interface
– Programming interface (DOM, SAX, JDOM)
However it doesn’tt have:
have:
• However, it doesn
–
–
–
–
–
Indexes
Security
Data integrity
Multi‐user access
Triggers …
gg
Bourret, Ronald. (2005). XML and databases.
http://www.rpbourret.com/xml/XMLAndDatabases.htm
Managing Data
15
Database vs. XML
Why use a database?
Why use XML?
XML
< >
< >
Database
< >
< >
Scenario:
A digital collection catalog allows
users to search
search, browse
browse, and
order copies, as well as check
order status.
< >
< >
< >
< >
< >
Questions:
y want to use a database?
• Whyy do you
• Why do you want to use XML?
• Which function will benefit the most
from each of the two approaches?
Managing Data
16
8
3/24/2009
Data-Centric XML
• Characteristics of Data‐centric XML
– Fairly regular structure
– Fine‐grained data (that is, the smallest p
independent unit of data is at the level of a PCDATA‐only element or an attribute)
– Little or no mixed content
– Where XML is used as a data transport
• Examples: – Real‐time data feed from field instruments
– Simple metadata such as author and reference information
– Patient records
– …
Not Important
Physical structure:
The order of sibling elements, whether data is stored in attributes or PCDATA‐only elements, whether entities are used
Managing Data
17
Example of data-centric XML
<?xml version="1.0" encoding="UTF‐8"?> <Point‐of‐Contact class="vcard">
<Name class="fn">John Smith</Name> <Address class="adr"> <Street class="street‐address">10 Tremont St.</Street> <City class="locality">Boston</City> <State class="region">MA</State> </Address> / dd
<Telephone class="tel">617‐123‐4567</Telephone>
</Point‐of‐Contact> Managing Data
18
9
3/24/2009
Document-Centric XML (1)
• Characteristics of document‐centric:
– Irregular or complicated structure
– Larger grained data (the smallest unit of data might be at the level of an element with mixed content or the entire document itself)
– Lots of mixed content
– The order in which sibling elements and PCDATA occurs is almost always significant
– Examples: books, email, advertisements, and almost any Examples: books, email, advertisements, and almost any
XHTML document. – Document‐centric documents are generally designed for human consumption.
Managing Data
19
Document-Centric XML (2)
<Product>
<Name>Turkey Wrench</Name>
<Developer>Full Fabrication Labs, Inc.</Developer>
<Summary>Like a monkey wrench, but not as big.</Summary>
<Description>
<Para>The turkey wrench
wrench, which comes in both rightright and left
left-handed
handed
versions (skyhook optional), is made of the finest stainless steel. The Readigrip rubberized handle quickly adapts to your hands, even in the greasiest
situations. Adjustment is possible through a variety of custom dials.</Para>
<Para>You can:</Para>
<List>
<Item><Link URL="Order.html">Order your own turkey
wrench</Link></Item>
<Item><Link URL="Wrenches.htm">Read more about
wrenches</Link></Item>
<Item><Link URL="catalog.zip">Download the catalog</Link>
</Item></List>
<Para>The turkey wrench costs just $19.99 and, if you order now, comes with
a hand-crafted shrimp hammer as a bonus gift.</Para>
</Description>
Managing Data
20
10
3/24/2009
Document Centric XML (3)
• For a simple system to handle document centric XML, you will need at least five tables:
– Attribute definition: defines attributes, including their type, legal values, and so on – Element/attribute association: defines which attributes apply to which elements – Content model definition: defines which elements can contain which other elements – Attribute values: contains attribute values and pointers to the appropriate rows in the attribute definition and element/attribute association tables – Element values: contains element values (PCDATA or pointers to other element values), the order in which the element occurs in its parent, a pointer to the row that contains the value of the parent element and a pointer to the appropriate
that contains the value of the parent element, and a pointer to the appropriate row in the element/attribute association table It is not always the best solution to convert XML documents into database
Managing Data
21
Data, Documents, and Databases
• Distinction is not always clear
– A data‐centric document, such as self‐descriptive d fl
data files, might contain large‐grained, irregularly h
l
d
l l
structured data, such as a abstract or project description
– A document‐centric document, such as a user's guidebook, might contain fine‐grained, regularly structured data (often metadata)
t t d d t ( ft
t d t )
Managing Data
22
11
3/24/2009
Approaches in XML storage
• Convert the XML data to tables, store them in a relational database and translate queries to SQL
SQ
– Simpler
– Short term solution
• Design a database management system especially for XML data
especially for XML data
Managing Data
23
Transferring Data
• Information that can not be stored in databases:
– DTDs
– Physical structure
• Entity definition and usage
• The order in which attribute values and sibling elements occur
• The way in which binary data is stored (Base64 v. unparsed entity v. something else)
• CDATA sections
• Encoding information
• Information retrieved from databases:
– Contain no CDATA or entity usage
– No order in the resulting data retrieved
Managing Data
24
12
3/24/2009
Mapping Doc Structure to DB Structure
• Transferring data from XML documents to databases or vice versa need to map the structure
– Table‐based mapping
– Object‐relational mapping
Managing Data
25
Table-based mapping (1)
• XML  Database
– XML document  Table
– Element / Attribute  Column
• Database  XML
– Table  <database>
– Column 
l
 <row>
– Row  <column1>, <column2>, …
Managing Data
26
13
3/24/2009
Table-based mapping (2)
Source: http://www.rpbourret.com/xml/DTDToDatabase.htm
Managing Data
27
Object-Relational Mapping
Source: http://www.rpbourret.com/xml/DTDToDatabase.htm
Managing Data
28
14
3/24/2009
Storing XML: native XML database (1)
• The database is specialized for storing XML data and stores all components of the XML model intact. d li
• A native XML database may not actually be a standalone database at all. Source: Staken, Kimbro. (2001). Introduction to native XML databases.
Available at: http://www.xml.com/pub/a/2001/10/31/nativexmldb.html
Managing Data
29
Oracle’s XMLType: a native XML DB
• XMLType is a native data‐type that used to store and manage XML documents in columns or tables
• XML can be stored in one of the two ways:
– An XMLType column in a relational table
– An XML object in an XMLType table
Managing Data
30
15
3/24/2009
Template-Driven Query (1)
• No predefined mapping between document structure and database structure
• Embed commands in a template that is processed by the data transfer middleware
<?xml version="1.0"?>
<FlightInfo>
<Intro>The following flights have available seats:
</Intro>
<SelectStmt>SELECT Airline, FltNumber,
Depart, Arrive FROM Flights
</SelectStmt>
<Conclude>We hope one of these meets your needs
</Conclude>
</FlightInfo>
Managing Data
31
Template-Driven Mapping (2)
Result from the query:
<FlightInfo>
<Intro>The following flights have available seats:</Intro>
<Flights>
Fli ht
<Row>
<Airline>ACME</Airline>
<FltNumber>123</FltNumber>
<Depart>Dec 12, 1998 13:43</Depart>
<Arrive>Dec 13, 1998 01:21</Arrive>
</Row> ...
</Flights>
<Conclude>We hope one of these meets your needs</Conclude>
</FlightInfo>
Managing Data
32
16
3/24/2009
SQL-based query language
• Use modified SELECT statements, the results of which are transformed to XML
SELECT Orders.SONumber, SELECT Orders SONumber
XMLELEMENT(NAME "Order", XMLATTRIBUTES(Orders.SONumber AS SONumber), XMLELEMENT(NAME "Date", Orders.Date), XMLELEMENT(NAME "Customer", Orders.Customer)) AS xmldocument
FROM Orders Example order result from the query:
<Order SONumber="123"> <Date>10/29/02</Date> <Customer>Gallagher Industries</Customer> </Order>
Managing Data
33
Summary
• XML documents need database management system functions to solve issues in storage, security, concurrency control, version control, and data and referential integrity
• Data conversion uses two approaches: table‐based and object‐relational
• Native XML databases specialize in storing XML data and store all components of the XML model intact
• XML query languages are closely related to XPath, XML Schema, and XML Infoset
Managing Data
34
17
3/24/2009
Exercise: data conversion
• Convert XML data to relational database
• Convert relational database data into XML files
Managing Data
35
18
Download