IT420: Database Management and Organization XML 21 April 2006

advertisement
IT420: Database Management and
Organization
XML
21 April 2006
Adina Crăiniceanu
www.cs.usna.edu/~adina
Overview
 From HTML to XML
 DTDs
 Transforming XML: XSLT
2
Kroenke, Database Processing
Introduction
 Database processing and document processing
need each other
 Database processing needs document processing for
transmitting/ expressing database views
 Document processing needs database processing for
storing and manipulating data
 Internet expansion made the need obvious
3
Kroenke, Database Processing
XML
 XML: Extensible Markup Language, developed
in early 1990s
 Hybrid of document processing and database
processing
 It provides a standardized yet customizable way to
describe the content of documents
 A recommendation from the W3C
 XML = data + structure
 XML generated by applications
 XML consumed by applications
 Easy access: across platforms, organizations
4
Kroenke, Database Processing
XML
What is this? What does it mean?
<h2>Madison</h2>
HTML: How to display information on a
browser.
HTML: no “semantic” information, i.e. no
meaning ascribed to tags
5
Kroenke, Database Processing
XML: Semantic information
<presidents>
<name>Madison</name>
<US_cities>
<Wisconsin>
<name>Madison</name>
<US_Colleges>
<name>Madison, U of Wisc</name>
<name> Madison, James (JMU)</name>
6
Kroenke, Database Processing
XML vs. HTML
 XML is better than HTML because
It provides a clear separation between document
 structure
 content
 materialization
 It is standardized but allows for extension by developers
 XML tags represent the semantics of their data
7
Kroenke, Database Processing
Why is XML important with regard
to databases?
 XML provides a standardized way to
describe, validate, and materialize any
database view.
 Share information between disparate systems
 Materialize data anyway you want
 Display data on web
 Display data on sales-person computer
 Display data on mobile device
8
Kroenke, Database Processing
How does XML work?
Three Primary Components to XML
 Data has a structure
 Document Type Declarations (DTDs)
 XML Schemas can be used to describe the
content of XML documents
 Data has content
 XML document
 Data has materializations
 Extensible Style Language: Transformations
(XSLT)
9
Kroenke, Database Processing
If we want to share information is
structure important?
 Structure provides meaning
What is the meaning of this bit stream??
10111011000101110110100101010101101010110110101….
The bit stream has meaning if we assign structure
10
Kroenke, Database Processing
Example: XML DTD & Document
11
Kroenke, Database Processing
XML DTD
 XML document consists of two sections:
 Document Type Declaration (DTD)
 The DTD begins with DOCTYPE <document_type_name>
 Document data
 XML documents could be
 Type-valid if the document conforms to its DTD
 Well-formed and not be type-valid, because
 It violates the structure of its DTD
 It has no DTD
 DTD may be stored externally so many documents can
be validated against the same DTD
12
Kroenke, Database Processing
Create XML Documents from
Relational DB Data
 Most RDBMS can output data in XML
format
 MySQL: mysql –u root --xml
 For SQL Server:
 SELECT . . . FOR XML RAW | AUTO,
ELEMENTS | EXPLICIT
13
Kroenke, Database Processing
Lab exercise
 Restore some database in MySQL
 Open MySQL command line using
 mysql –u root --xml
14
Kroenke, Database Processing
XSLT
 XSLT, or the Extensible Style Language may be
used to materialize (transform) XML documents
using XSL document
 From XML documents into HTML or into XML in
another format
 XSLT is a declarative transformation language
 XSLT uses stylesheets to indicate how to
transform the elements of the XML document
into another format
15
Kroenke, Database Processing
Example: External DTD
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT customerlist (customer+)>
<!ELEMENT customer (name, address)>
<!ELEMENT name (firstname, lastname)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT address (street+, city, state, zip)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
16
Kroenke, Database Processing
Example: XML Document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE customerlist SYSTEM "http://localhost/Support-Files-Chap-13-XML/CustomerList.dtd">
<?xml-stylesheet type="text/xsl" href="http://localhost/Support-Files-Chap-13-XML/CustomerList-StyleSheet.xsl"?>
<customerlist>
<customer>
<name>
<firstname>Michelle</firstname>
<lastname>Correlli</lastname>
</name>
<address>
<street>1824 East 7th Avenue</street>
<street>Suite 700</street>
<city>Memphis</city>
<state>TN</state>
<zip>32123-7788</zip>
</address>
</customer>
<customer>
<name>
<firstname>Lynda</firstname>
<lastname>Jaynes</lastname>
</name>
<address>
<street>2 Elm Street</street>
<city>New York City</city>
<state>NY</state>
<zip>02123-7445</zip>
18
</address>
</customer>
Kroenke, Database Processing
</customerlist>
XSL Stylesheet for CustomerList
19
Kroenke, Database Processing
Example: XML  Browser
20
Kroenke, Database Processing
Show XSL document example
CustomerList.xml
21
Kroenke, Database Processing
XML Review
 STRUCTURE: DTD or XML Schema
 CONTENT: XML document
 MATERIALIZATIONS: XSL document
22
Kroenke, Database Processing
Sharing Data: Transparency
Agreed upon
structure
Database
Raw data
XML
data
XSL
Trans
Business A
Validate
DTD
SHARE
Database
Raw data
Validate
DTD
XSL
Trans
XML
data
Business B
23
Kroenke, Database Processing
Example XML Industry
Standards
 Accounting
 Extensible Financial Reporting Markup Language (XFRML)
 Architecture and Construction
 Architecture, Engineering, and Construction XML (aecXML)
 Automotive
 Automotive Industry Action Group (AIAG)
 XML for the Automotive Industry (SAE J2008)
 Banking
 Banking Industry Technology Secretariat (BITS)
 Bank Internet Payment System (BIPS)
 Electronic Data Interchange
 Data Interchange Standards Association (DISA)
 XML/EDI Group
24
Kroenke, Database Processing
What About XML Queries?
 Xpath
 A single-document language for “path
expressions”
 Not unlike regular expressions on tags
 E.g. /Contract/*/UnitPrice,
/Contract//UnitPrice, etc.
 XSLT
 XPath plus a language for formatting output
 XQuery
25
Kroenke, Database Processing
Conclusions
 XML: The new universal data exchange
format
 Unlike HTML, XML = data + semantics
 STRUCTURE: DTD or XML Schema
 CONTENT: XML document
 MATERIALIZATIONS: XSL document
 More flexible than relational model
 More difficult to query – research
26
Kroenke, Database Processing
Download