XML-intro

advertisement
XML – a data sharing standard
DSC340
Mike Pangburn
Data-sharing challenge
 Companies need to “talk” across different
locations/systems and software
 Most database systems (and other software applications,
e.g., Excel) can read text files formatted as XML.
Blind Men and Elephants
The ol’ standby: tab (or comma) delimited
data…
Who:
authored it?
to contact about data?
What:
are contents of database?
When:
was it collected?
processed? finalized?
Where:
was the study done?
… can be pretty useless!
Why:
was the data collected?
How:
were data collected?
processed? Verified?
Metadata
 Literally, “data about data”
 a set of data that describes and gives
information about other data
― Oxford English Dictionary
Early Example of Metadata
Is HTML a good way to share data?
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
XML: Tags provide meaning (metadata)
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley
</publisher>
<year> 1995 </year>
</book>
…
</bibliography>
XML vs. HTML
HTML
<h1> DELTA </h1>
<h2> 101 </h2>
<h3> Atlanta </h3>
<h3> Brussels </h3>
<p> flight information </p>
XML
<airline>DELTA </airline>
<number>101 </number>
<from> Atlanta </from>
<to> Brussels </to>
<description> flight information </description>
Another example: iTunes Library is XML
<dict>
<key>Track ID</key><integer>617</integer>
<key>Name</key><string>Take Five</string>
<key>Artist</key><string>Dave Brubeck Quartet</string>
<key>Album</key><string>Time Out</string>
<key>Genre</key><string>Jazz</string>
<key>Kind</key><string>AAC audio file</string>
<key>Size</key><integer>5892093</integer>
<key>Total Time</key><integer>363578</integer>
….
</dict>
Acct./Fin XML example
 XBRL (eXtensible Business Reporting Language)
is a language for the electronic communication
of business and financial data.
 For example, company net profit has its own
unique tag.
 Edgar Online (SEC company information) uses
XBRL
 The solution will allow users to request XBRL formatted
data for all US equities from within Microsoft Excel,
custom templates, and the web.
HTML vs. XML
 HTML started with very few tags, but…
 HTML now has many tags, and more keep being added
over time
 Messy, yet not customizable
 XML has very few standard tags
 You add custom tags that are particular to your data
needs
XML helps solves the “Is this an elephant?”
problem
More strict tagging rules than HTML
The following code is legal in HTML:
<p>This is a paragraph
<p>This is another paragraph
In XML all elements must have a closing tag like
this:
<p>This is a paragraph</p>
<p>This is another paragraph</p>
Opening and closing tags must have the same
case:
<message>This is correct</message>
<Message>This is incorrect</message>
More strict tagging rules than HTML
In HTML, improperly nested tags like the following are
frowned upon, but will not cause an error:
<b><i>This text is bold and italic</b></i>
In XML all elements must be properly nested within
each other like this
<b><i>This text is bold and italic</i></b>
Visualizing XML data: a “Tree”
Attribute
node
<data>
<person id=“o555” >
person
<name> Mary </name>
<address>
<street> Maple </street>
id
<no> 345 </no>
address
<city> Seattle </city>
name
</address>
</person>
o555
<person>
street
no
<name> John </name>
Mary
<address> Thailand </address>
<phone> 23456 </phone>
</person>
Maple
345
</data>
Element
node
data
person
name
city
address
phone
Thai
John
Seattle
23456
Value
node
Database data vs. XML Data
persons
XML:
row
Database table:
row
row
name
phone
name
phone
name
phone
“John”
3634
“Sue”
6343
“Dick”
6363
<persons>
 <row> <name>John</name>

<phone> 3634</phone></row>
 <row> <name>Sue</name>

<phone> 6343</phone>
 <row> <name>Dick</name>

<phone>
6363</phone></row>
</persons>
XML data can usually fit a DBMS table…

Consider the case of: missing attributes
<person> <name> John</name>
<phone>1234</phone>
</person>
<person> <name>Joe</name>
</person>

An acceptable fit with
table

Even though blanks are
deemed a bit undesirable
in database tables
 no phone !
name
phone
John
1234
Joe
-
…but the fit is not always good

Problematic case: Repeated attributes
<person> <name> Mary</name>
<phone>2345</phone>
<phone>3456</phone>
</person>

 two phones !
A poor fit with table, because databasedesign rule is that you should not have
two columns with the same name:
name
phone
phone
Mary
2345
3456
If XML data had
distinct “home”
and “cell” columns,
the issue goes away
Formatting (visualizing) XML
data
 Formatting (visualizing in a web page) XML data
requires a separate file
 What is that separate file? It’s called a “style sheet”
 There are a couple different standards for style
sheets
 XML-specific standard: XSL
 Flexible standard: CSS
 CSS can be used with both .xml and .html files
 We will look at CSS after XML
Download