XML Tutorial - SDSC Staff Home Pages

advertisement
XML: The Big Picture and Some Gory Details
(A brief tutorial with an eye towards e-records and archival)
Bertram Ludaescher
ludaesch@sdsc.edu
Data Intensive Computing Environments
(DICE) Group
San Diego Supercomputer Center, UCSD
1
DICE Members
Staff
Students
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Reagan Moore
Chaitan Baru
Amarnath Gupta
Bertram Ludäscher
Richard Marciano
Arcot Rajasekar
Wayne Schroeder
Michael Wan
Ilya Zaslavsky
Bing Zhu
+ NN * 4
XML Tutorial, Bertram Ludäscher
Pratik Mukhopadhyay
Azra Mulic
Kevin Munroe
Paul Nguyen
Michail Petropolis
Nicholas Puz
Pavel Velikhov
+/- NN
2
Tutorial Outline
• Roadmap & Overview
• What about XML vs. E-records and Archives?
(or: why it’s good to be here ;-)
• XML 101
• XML 232
• Querying & Transforming XML
• Mediation of Information using XML (MIX)
• Other Projects...
XML Tutorial, Bertram Ludäscher
3
Some History
(or: from fat via lean…
• SGML (Standard Generalized Markup Language)
–
–
–
–
–
ISO Standard, 1986, for data storage & exchange
Metalanguage for defining languages (through DTDs)
A famous SGML language: HTML!!
Separation of content and display
Used in U.S. gvt. & contractors, large manufacturing companies,
technical info. Publishers,...
– SGML reference is 600 pages long
• XML (eXtensible Markup Language)
– W3C (World Wide Web Consortium) -- http://www.w3.org/XML/)
recommendation in 1998
– Simple subset (80/20 rule) of SGML: “ASCII of the Web”,
“Semantic Web”.
– XML specification is 26 pages long
XML Tutorial, Bertram Ludäscher
4
… to skinny and back! )
• Canonical XML
– “normalization”, equivalence testing of XML documents
• SML (Simple Markup Language)
– “Reduce to the max”: No Attributes / No Processing Instructions (PI) / No
DTD / No non-character entity-references / No CDATA marked sections /
Support for only UTF-8 character encoding / No optional features
• XML Schema
– XML Schema definition language
– Back to complex:
• Part I (Structures), Part II (Data Types), Part III aehm 0 (Primer)
• X-Zoo (Xoo?), “Brave New X-World”
•
Specifications CSS • Digital Signatures • ebxml Project Teams • ebXML • IETF Specifications • Internationalization • IOTP (Internet Open Trading
Protocol) • OASIS • Requirements Documents • SMIL • SVG (Scalable Vector Graphics) • Topic Maps • W3C Activity Pages • W3C Notes • W3C
Standards • W3C Standards-in-progress • WAP • WebDAV • XHTML • XLink • XPath • XSLT
•
Vocabularies DTDs • Music • P3P • RDF • RSS • SMIL • W3C Standards • W3C Standards-in-progress • WML • XHTML • XSL FO's • XSLT • XUL
•
Vertical Industries Advertising • Commerce • Consortiums • Construction • Food • Insurance • Legal • Medical • Music • OASIS • Real Estate •
Science • Space Exploration • Telecommunications • Travel • Weather
XML Tutorial, Bertram Ludäscher
5
… but …
FEAR NOT!
XML Tutorial, Bertram Ludäscher
6
Back to the Future
(or Archival for the Past...)
A time traveler sends a message in the virtual bottle, containing
parts of the universal library of human and post-human mankind
back into the last third of the 20th century...
•
... when the Web, XML, WAP, B2B, and Petabytes were unheard of
•
... RAM was so precious that it was ok to deal with nibbles
•
... MS-DOS was still called CP/M
•
... and in fact Bill hadn’t moved into the garage yet but worked on
a homework assignment by Christos, trying to sort pancakes faster
(Gates, W.H. and Papadimitriou, C. "Bounds for Sorting by Prefix Reversal." Discr.
Math. 27, 47-57, 1979.)
• Task: make sense out of the futuristic message in the past!
XML Tutorial, Bertram Ludäscher
7
Our past futurist’s (future archeologist’s?)
supercomputer looked like this …
62k CP/M VER 2.23 (Z80/DJDMA/VT100)
A>dir
A: ARK
COM : ASM
A: CPM2
HLP : CBIOS
A: DDTZ
COM : DUMP
A: ERAQ
COM : FORMAT
A: HELP
HLP : LIB
A: LOAD
COM : LS
A: LU
HLP : MAC
A: MOVCPM
COM : PIP
A: PUTCPM
ASM : PUTCPM
A: STAT
COM : SUBMIT
A: THISSIM HLP : UNARK
A: UNZIP
COM : USQ
A: MBASIC
HLP : MBASIC
A>mbasic
BASIC-80 Rev. 5.22
[CP/M Version]
32783 Bytes free
Ok
COM
ASM
COM
ASM
COM
COM
COM
COM
COM
COM
COM
COM
COM
:
:
:
:
:
:
:
:
:
:
:
:
:
CLS
CBOOT
ED
FORMAT
LINK
LT
MAC
PTRDSK
SAP
SURVEY
UNCR
VDE
WS
COM
ASM
COM
COM
COM
COM
HLP
ASM
COM
COM
COM
COM
HLP
:
:
:
:
:
:
:
:
:
:
:
:
COPY
DDT
EDFILE
HELP
LINK
LU
MOUNT
PTRDSK
SQ
SYSGEN
UNERASE
XSUB
ASM
COM
COM
COM
HLP
COM
ASM
COM
COM
SUB
COM
COM
Ever wondered where the 8 letter filenames, 3 letter extensions came from? ;-)
XML Tutorial, Bertram Ludäscher
8
Message in the bottle: 1
•
•
•
ÐÏ^Qࡱ^Zá^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@>^@^C^@þÿ
^@^F^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@#^@^@^@^@^@^@^@^@^P^@^@%^@^@
^@^A^@^@^@þÿÿÿ^@^@^@^@"^@^@^@ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿì¥Á^@
q^@^D^@^@^@^R¿^@^@^@^@^@^@^P^@^@^@^@^@^D^@^@Ç^G^@^@^N^@bjbjt+t+^@^@
^@
^@Some Quotations from the Universal Library^M1 Famous Quotes^M1.1 By William I^M[2, Sonnet
XVIII]^MShall I compare thee to a summer's day?^MThou art more lovely and more temperate.^MRough winds
do shake the darling buds of May,^MAnd summer's lease hath all too short a date.^MSometime too hot the eye
of heaven shines,^MAnd often is his gold complexion dimmed.^MAnd every fair from fair some declines,^MBy
chance or nature's changing course untrimmed.^MBut thy eternal summer shall not fade,^MNor lose possession
of that fair thou owest,^MNor shall Death brag thou wander'st in his shade^MWhile in eternal lines to time thou
growest.^MSo long as men can breathe, or eyes can see,^MSo long live this, and this gives life to thee.^M1.2
By William II^M[1, p.265]^M\223The obvious mathematical breakthrough would be development of^Man easy
way to factor large prime numbers."^MReferences^M[1] W. H. Gates. The Road Ahead. Viking Penguin,
1995.^M[2] W. Shakespeare. The Sonnets of
Shakespeare.609.^M^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ^A^@þÿ^C^@^@ÿÿÿÿ^F^B^@^@^@^@^@À^@^@^
@^@^@^@F^X^@^@^@Microsoft Word
Document^@^@^@^@MSWordDoc^@^P^@^@^@Word.Document.8^@ô9²q^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^
XML Tutorial, Bertram Ludäscher
9
Message in the bottle: 2
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
{\rtf1\ansi\ansicpg1252\uc1
\deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose02020603050405020304}Times New Roman;}\
{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}^M
{\f17\froman\fcharset238\fprq2 Times New Roman CE;}{\f18\froman\fcharset204\fprq2 Times New Roman
Cyr;}{\f20\froman\fcharset161\fprq2 Times New R\
oman Greek;}{\f21\froman\fcharset162\fprq2 Times New Roman Tur;}^M
…
Some Quotations from the Universal Library^M
\par }\pard\plain \s2\sb240\sa60\keepn\widctlpar\outlinelevel1\adjustright \b\i\f1\cgrid {\cgrid0 1 Famous Quotes^M
\par }\pard\plain \s3\sb240\sa60\keepn\widctlpar\outlinelevel2\adjustright \f1\cgrid {\cgrid0 1.1 By William I^M
\par }\pard\plain \s4\sb240\sa60\keepn\widctlpar\outlinelevel3\adjustright \b\f1\cgrid {\cgrid0 [2, Sonnet XVIII]^M
\par }\pard\plain \widctlpar\adjustright \fs20\cgrid {\f1\fs24\cgrid0 Shall I compare thee to a summer's day?^M
\par Thou art more lovely and more temperate.^M
\par Rough winds do shake the darling buds of May,^M
…
\par }\pard\plain \s3\sb240\sa60\keepn\widctlpar\outlinelevel2\adjustright \f1\cgrid {\cgrid0 1.2 By William II^M
\par }\pard\plain \s4\sb240\sa60\keepn\widctlpar\outlinelevel3\adjustright \b\f1\cgrid {\cgrid0 [1, p.265]^M
\par }\pard\plain \widctlpar\adjustright \fs20\cgrid {\f1\fs24\cgrid0 \ldblquote The obvious mathematical breakthrough would be
development of^M
\par an easy way to factor large prime numbers."^M
\par }\pard\plain \s2\sb240\sa60\keepn\widctlpar\outlinelevel1\adjustright \b\i\f1\cgrid {\cgrid0 References^M
\par }\pard\plain \widctlpar\adjustright \fs20\cgrid {\f1\fs24\cgrid0 [1] W. H. Gates. The Road Ahead. Viking Penguin, 1995.^M
\par [2] W. Shakespeare. The Sonnets of Shakespeare. 1609.}{\fs28 ^M
\par }}
XML Tutorial, Bertram Ludäscher
10
Message in the bottle: 3
%!PS-Adobe-2.0
%%Creator: dvipsk 5.58f Copyright 1986, 1994 Radical Eye Software
%%Title: msg.dvi
%%Pages: 1
…
/X{S N}B /TR{translate}N /isls false N /vsize 11 72 mul N /hsize 8.5 72
mul N /landplus90{false}def /@rigin{isls{[0 landplus90{1 -1}{-1 1}
ifelse 0 0 0]concat}if 72 Resolution div 72 VResolution div neg scale
…
TeXDict begin 39158280 55380996 1000 600 600 (msg.dvi)
@start /Fa 16 117 df<0000000001C0000000000003C0000000000003C00000000000
07C000000000000FC000000000000FC000000000001FC000000000001FE000000000003F
E000000000003FE000000000007FE00000000000FFE00000000000EFE00000000001EFE0
0000000001CFE000000000038FE000000000038FE000000000070FE000000000070FE0
…
%%EndSetup
1 0 bop 659 872 a Ff(Some)44 b(Quotations)f(from)f(the)i(Univ)l(ersal)h
(Library)515 1470 y Fe(1)134 b(F)-11 b(amous)45 b(Quotes)515
1669 y Fd(1.1)112 b(By)37 b(William)d(I)515 1822 y Fc([2)o(,)d(Sonnet)h
(XVI)s(I)s(I])722 2004 y Fb(Shall)c(I)g(compare)e(thee)i(to)f(a)g
(summer's)g(da)n(y?)722 2104 y(Thou)h(art)f(more)f(lo)n(v)n(ely)h(and)g
(more)g(temp)r(erate.)722 2204 y(Rough)g(winds)h(do)f(shak)n(e)g(the)h
(darling)e(buds)i(of)g(Ma)n(y)-7 b(,)722 2303 y(And)28
b(summer's)g(lease)e(hath)i(all)f(to)r(o)h(short)f(a)g(date.)722
2403 y(Sometime)h(to)r(o)f(hot)h(the)g(ey)n(e)f(of)h(hea)n(v)n(en)e
(shines,)722 2503 y(And)i(often)g(is)g(his)f(gold)g(complexion)g
XML Tutorial, Bertram Ludäscher
11
Message in the bottle: 4
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
\documentclass{article}
\begin{document}
\title{Some Quotations from the Universal Library}
...
\section{Famous Quotes}
\subsection{By William I}
\textbf{\cite[Sonnet XVIII]{shakespeare-sonnets-1609}}
\begin{verse}
Shall I compare thee to a summer's day?\\
Thou art more lovely and more temperate. \\
Rough winds do shake the darling buds of May, \\
And summer's lease hath all too short a date. \\
Sometime too hot the eye of heaven shines, \\
And often is his gold complexion dimmed. \\
…
\qquad So long as men can breathe, or eyes can see,\\
\qquad So long live this, and this gives life to thee. \\
\end{verse}
...
\bibliographystyle{abbrv}
\bibliography{msg}
\end{document}
XML Tutorial, Bertram Ludäscher
12
Message in the bottle: 5
•
•
•
•
•
<HTML>
<HEAD>
<TITLE>Some Quotations from the Universal Library</TITLE>
</HEAD>
<BODY>
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
<B><FONT FACE="Arial" SIZE=5><P>Some Quotations from the Universal Library</P>
</FONT><I><FONT FACE="Arial"><P>1 Famous Quotes</P>
</B></I><P>1.1 By William I</P>
<B><P>[2, Sonnet XVIII]</P></B>
<P>Shall I compare thee to a summer's day?</P>
<P>Thou art more lovely and more temperate.</P>
<P>Rough winds do shake the darling buds of May,</P>
<P>And summer's lease hath all too short a date.</P>
<P>Sometime too hot the eye of heaven shines,</P>
<P>And often is his gold complexion dimmed.</P>
...
<P>So long as men can breathe, or eyes can see,</P>
<P>So long live this, and this gives life to thee.</P>
<P>1.2 By William II</P>
<B><P>[1, p.265]</P>
</B><P>"The obvious mathematical breakthrough would be development of</P>
<P>an easy way to factor large prime numbers."</P>
<B><I><P>References</P>
</B></I><P>[1] W. H. Gates. The Road Ahead. Viking Penguin, 1995.</P>
<P>[2] W. Shakespeare. The Sonnets of Shakespeare. 1609.</P></FONT></BODY>
</HTML>
XML Tutorial, Bertram Ludäscher
13
Message in the bottle: 6
<?xml version="1.0"?>
<universal_library>
<books>
<book> <title>Some Quotations from the Universal Library</title>
<section> <title>Famous Quotes</title>
<subsection> <title>By William I</title>
<quote bibref="shakespeare-sonnets-1609">
<title>Sonnet XVIII</title>
<verse>
<line>Shall I compare thee to a summer's day?</line>
<line>Thou art more lovely and more temperate. </line>
<line>Rough winds do shake the darling buds of May, </line>
</verse>
…
<subsection> <title>By William II</title>
<quote bibref="gates-road-ahead-1995">
<title>Page 265</title>
<line>``The obvious mathematical breakthrough would be development of an easy way to factor large prime
numbers.’’</line>
</quote>
</subsection>
</section>
</book>
…
</books>
</universal_library>
XML Tutorial, Bertram Ludäscher
14
XML as a Self-Describing Format
• can be “understood” using any (archaic CP/M) editor
• can be parsed easily
• contains its own structure (=parse tree) in the data
=> allows the e-archeologist to rediscover schema and
content (=semantics!?)
• may also include an explicit schema description (DTD)
=> “meta-model”: definition of a language w.r.t. which it is
valid
• allows separation of marked-up content from
presentation (=>style sheets)
• as a self-describing format good for “archival into the
past” => not bad for archival into the future
XML Tutorial, Bertram Ludäscher
15
Some thoughts on how XML can help
with e-record management...
• Assumption: represent e-records in XML
=> self-describing format (good for archival)
=> get a semistructured data model (flexible: encode regular tables,
nested structures, objects, or even (cleaned up) HTML)
=> many tools (and many more to come -- (re)use code):
parsers, validators, query languages, storage
=> standards (good for interoperation, integration, etc):
generic standards (XML, DTDs, XML Schema, XPath,...)
community/industry standards (=specific markup languages)
XML Tutorial, Bertram Ludäscher
16
...thoughts continued
• “E-Record Quality Assurance”:
– by “subscribing” to a certain XML DTD/XML Schema/XML ???, you can make
sure that “the same language is spoken”
– validation using DTDs provides a first simple quality control:
• are the right tags used?
• is the nesting of elements ok w.r.t. the DTD?
• is the order and multiplicity of element ok?
– if you need more => use validation w.r.t. an XML Schema
• now: check also data types
• use specialization and other mechanisms from object-oriented modeling
• more integrity checking possible (cardinalities,…)
– still want more integrity checks (ICs) or even “policies”?
=> use a declarative rule language for specifying the constraints and policies at
design time. Implement them at run time, e.g., by adding the ICs to the XML
DTD/Schema/…
=> checking ICs and policies is similar to issuing specific queries against the data
=> use query processors (relational DBs, XML DBs, XML tools) for integrity
checking when possible
=> for evolution of records, look at versioning models for data bases and temporal
database models and query languages
XML Tutorial, Bertram Ludäscher
17
Back to XML: Different Perspectives
• Document (SGML) Community
– data = linear text documents
– mark up (annotate) text pieces to describe context,
structure, semantics of the marked text
• Database Community
– XML as a (most prominent) example of the
semistructured data model
=> captures the whole spectrum from highly
structured, regular data to unstructured data
XML Tutorial, Bertram Ludäscher
18
More Perspectives on XML
• "XML is the cure for your data exchange, information
integration, e-commerce, [x-2-y, U name it] problems”
(“snake oil/silver bullet theory”)
• "XML is nothing but (another) syntax (for Lisp, trees,…)”
(“nothing new under the sun”)
(books (book (author “Shakespeare” )
(title “Sonnets”)
(verse (line “Shall I compare…” )
(line …) …)))
XML Tutorial, Bertram Ludäscher
19
So what is XML (all about)?
Executive Summary:
• XML = HTML – idiosyncrasies (simplified syntax)
+ user-definable ("semantic") tags
• Separation of data and its presentation
=> simple, very flexible data exchange format:
semistructured data model
=> new applications:
• Information exchange (B2B), sharing (diglib),
integration ("mediation"), archival, ...
• Web site mangement (XML+XSL stylesheets), ...
XML Tutorial, Bertram Ludäscher
20
Many X-cellent(?) Acronyms...
•
•
•
•
•
•
•
•
XML (Extensible Markup Language)
XML Namespaces
XML DTDs, XML Schema
RDF (Resource Description Framework)
XSL (Extensible Style Sheet Language)
XPath (=XSLT XPointer), XLink
XQL, XML-QL (XML Query Language)
XMAS (XML Matching And Structuring language)
• eXcelon, ...
=> XML++ (i.e. += X-tensions) >> just syntax
=> a family of technologies (XML extensions, tools, ... )
=> generic standards and industry/community standards
XML Tutorial, Bertram Ludäscher
21
XML Applications & Industry Initiatives
http://www.oasis-open.org/cover/xml.html#applications
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Advertising: adXML place an ad onto an ad network or to a single vendor
Literature: Gutenberg convert the world’s great literature into XML
Directories: dirXML Novell’s Directory Services Markup Language (DSML)
Web Servers: apacheXML parsers, XSL, web publishing
Travel: openTravel information for airlines, hotels, and car rental places
News: NewsML creation, transfer and delivery of news
Human Resources: XML-HR standardization of HR/electronic recruiting XML definitions
International Dvt: IDML improve the mgt. and exchange of info. for sustainable development
Voice: VoxML markup language for voice applications
Wireless: WAP (Wireless Application Protocol) wireless devices on the World Wide Web
Weather: OMF Weather Observation Markup Format (simulation)
Geospatial: ANZMETA distributed national directory for land information
Banking: MBA Mortgage Bankers Association of America --> credit report, loan file, underwriting…
Healthcare: HL7 DTDs for prescriptions, policies & procedures, clinical trials
Math: MathML (Mathematical Markup Language)
Surveys: DDI (Data Documentation Initiative) “codebooks” in the social and behavioral sciences
XML Tutorial, Bertram Ludäscher
22
XML E-commerce Initiatives
• CommerceNet
–
–
–
–
•
eCo Framework XML specs. to support interoperability among e-businesses
Commerce One Common Business Library (CBL): set of business components, docs. In DTD, XDR, SOX
BizTalk Microsoft spec. based on XML schemas
cXML (Commerce XML) -- tag-sets for e-procurement into BizTalk
Electronic Data Interchange (EDI)
– RosettaNet Common format for online ordering
– FpML (Financial products Markup Language): sharing of financial data (interest rate & foreign exchange products)
•
Open Buying on the Internet (OBI)
– OBI
•
high volume b2b purchasing transactions over the Internet (Office Depot, Lockheed, barnesandnoble, AX...
E-commerce and XML
– VISA Invoices
The Visa Extensible Markup Language (XML) Invoice Specification provides a comprehensive
list of data elements contained in most invoices, including: Buyer/Supplier, Shipping, Tax, Payment, Currency,
Discount, and Line Item Detail.
•
B2B Integration
– code360
XML-Broker is middleware software that manages XML based transactions
– Bluestone XML Suite
Enables to develop and deploy e-commerce, electronic data interchange, application
integration and supply chain management applications. Bluestone XML Suite products include: XML-Server, VisualXML, XML-Contact and XwingML.
– webMethods
XML Tutorial, Bertram Ludäscher
Provides companies with integrated direct links to buyers and suppliers
23
What’s Wrong with HTML?
Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina.
“Object Fusion in Mediator Systems”. In VLDB 96.
HTML confuses presentation
with content
<DT>
<IMG SRC="greenball.gif" > 
<A NAME="object-fusion"></A>
Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina.
<A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps">
"ObjectFusion in Mediator Systems".</A>
In <I>VLDB 96.</I>
</DT>
XML Tutorial, Bertram Ludäscher
24
...What’s Wrong with HTML...
No Explicit Structure,
Semantics, or Object-Orientation
<DT>
<IMG SRC= "greenball.gif" > 
Author
<A NAME="object-fusion"></A>
Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina.
<A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps">
"ObjectFusion in Mediator Systems".</A>
In <I>VLDB 96.</I>
</DT>
Title
Conference
XML Tutorial, Bertram Ludäscher
25
... And Some Repercussions
• Lack of schema/semantics when querying the
Web (HTML):
– "find documents (books, papers, ...)
where author = Michael Jackson"
(... and learn how software engineering meets the moon walker ...)
– "create a list of M. Jackson's books and (if available)
their prices"
=> HTML is inappropriate for
 data exchange
 automation of information management
(retrieval, manipulation, integration)
XML Tutorial, Bertram Ludäscher
26
XML is Based on Markup
<bibliography>
Markup indicates
structure and semantics
<paper ID= "object-fusion">
<authors>
<author>Y.Papakonstantinou</author>
<author>S. Abiteboul</author>
<author>H. Garcia-Molina</author>
</authors>
<fullPaper source="fusion"/>
<title>Object Fusion in Mediator Systems</title>
<booktitle>VLDB 96</booktitle>
</paper>
</bibliography>
XML Tutorial, Bertram Ludäscher
Decoupled from
presentation
27
Elements and their Content
<bibliography>
element name
Element
Content
<paper ID="object-fusion">
<authors>
<author>Y.Papakonstantinou</author>
<author>S. Abiteboul</author>
<author>H. Garcia-Molina</author>
</authors>
<fullPaper source="fusion"/>
<title>Object Fusion in Mediator Systems</title>
<booktitle>VLDB 96</booktitle>
</paper>
element
Empty
Element
</bibliography>
XML Tutorial, Bertram Ludäscher
Character content
28
Element Attributes
<bibliography>
Attribute name
<paper ID="object-fusion">
<authors>
<author>Y.Papakonstantinou</author>
<author>S. Abiteboul</author>
<author>H. Garcia-Molina</author>
</authors>
<fullPaper source="fusion"/>
<title>Object Fusion in Mediator Systems</title>
<booktitle>VLDB 96</booktitle>
</paper>
Attribute Value
</bibliography>
XML Tutorial, Bertram Ludäscher
29
XML = Labeled Ordered Trees
bibliography
authors
author
Yannis
...
paper
paper
fullpaper
author
...
title
Object Fusion
Serge
 semistructured data
 labeled trees/graphs
XML Tutorial, Bertram Ludäscher
can also represent
• relational and
• object-oriented data
<bibliography>
<paper ...>
<authors>
<author>Yannis</author>
<author>Serge</author>
...
</authors>
<title>Object Fusion</title>
...
</paper>
</bibliography>
30
In Search of the Lost Structure &
Semantics
How do I share
structure and
metadata/semantics
How do I learn and use
with
the element structure
my community?
of a document?
XML Tutorial, Bertram Ludäscher
How to make all
this automatable?
31
Adding Structure and Semantics
• XML Document Type Definitions (DTDs):
• define the structure of "allowed" documents
(i.e., valid wrt. a DTD)
•  database schema
=> improve query formulation, execution, ...
• XML Schema
– defines structure and data types
– allows developers to build their own libraries of
interchanged data types
• XML Namespaces
– identify your vocabulary
XML Tutorial, Bertram Ludäscher
32
XML DTDs as Extended CFGs
XML DTD
<!element bibliography paper*>
<!element paper
(authors,fullPaper?,title,booktitle)>
<!element authors
author+>
Grammar
bibliography
paper
authors
paper*
authors fullPaper? title booktitle
author+
lhs = element (name)
rhs = regular expression over elements + strings (PCDATA)
XML Tutorial, Bertram Ludäscher
33
Document Type Definitions (DTDs)
Define and Constrain
Element Names & Structure
<!element
<!element
<!element
<!element
<!element
<!element
<!element
<!attlist
<!attlist
bibliography paper*>
paper (authors, fullPaper?, title, booktitle)>
authors author+>
Element Type
author (#PCDATA)>
fullPaper EMPTY>
Declaration
title (#PCDATA)>
booktitle (#PCDATA)>
fullPaper source ENTITY #REQUIRED>
paper ID ID>
XML Tutorial, Bertram Ludäscher
Attribute List
Declaration
34
Element Declarations
Sequence of 0 or
more paper
<!element
<!element
<!element
<!element
Authors followed by
optional fullpaper,
followed by title,
followed by booktitle
bibliography paper*>
paper (authors, fullPaper?, title, booktitle)>
authors author+>
Sequence of 1 or
author (#PCDATA)>
more author
Character content
<!element
<!element
<!element
<!attlist
<!attlist
fullPaper EMPTY>
title (#PCDATA)>
booktitle (#PCDATA)>
fullPaper source ENTITY #REQUIRED>
paper ID ID>
XML Tutorial, Bertram Ludäscher
35
Element Content Declarations
Declaration
<element 2>
cardinality: R?
R*
R+
R1|R2|…|Rn
R1, R2 , …, Rn
#PCDATA
EMPTY
(#PCDATA e*)*
ANY
XML Tutorial, Bertram Ludäscher
Meaning
Exactly one <element 2>
Zero or one instances of R
Zero or more instances of R
One or more instances of R
One instance of R 1 or R 2 or … Rn
Sequence of R’s, order matters
Character content
Empty element
Mixed Content
Anything goes
36
Attributes
<person ID="yannis"> Yannis’ info </person>
<bibliography>
Object Identity Attribute
<paper ID="object-fusion" ROLE="publication">
CDATA (character data)
<authors>
<author authorRef="yannis">
IDREF
Y.Papakonstantinou</author>
intradocument
</authors>
reference
<fullPaper source="fusion"/>
<title>Object Fusion in Mediator Systems</title>
<related papers= "semistructured-data" "mediators"/>
</paper>
</bibliography>
XML Tutorial, Bertram Ludäscher
Reference to
external ENTITY
37
Attribute Types
Type
ID
IDREF
IDREFS
ENTITY
ENTITIES
CDATA
NMTOKEN
NMTOKENS
NOTATION
Enumeration
Conditional Sec
Meaning
Token unique within the document
Reference to an ID token
Reference to multiple ID tokens
External entity (image, video, …)
External entities
Character data
Name token
Name tokens
Data other than XML
Choices
INCLUDE & IGNORE declarations
Attributes may be: REQUIRED, IMPLIED (optional)
can have: default values, which may be FIXED
XML Tutorial, Bertram Ludäscher
38
Uses of XML Entities
• Physical partition
– size, reuse, "modularity", … (both XML docs & DTDs)
• Non-XML data
– unparsed entities  binary data
• Non-standard characters
– character entities
• Shorthand for phrases & markup
XML Tutorial, Bertram Ludäscher
39
Entities & Physical Structure
Mylife.xml
DTD...
<mylife>
Chap1.xml
<teen>yada yada
</teen>
A logical element
can be split into
multiple
physical entities
Chap2.xml
<adult>blah blah..
</adult>
</mylife>
XML Tutorial, Bertram Ludäscher
40
External Text Entities
External Text Entity Declaration
<!ENTITY chap1 SYSTEM "chap1.xml">
URL
Entity Reference
<mylife> &chap1; &chap2;</mylife>
Logically equivalent to inlining file contents
<mylife> <teen>yada yada</teen>
<adult> blah blah</adult>
</mylife>
XML Tutorial, Bertram Ludäscher
41
Types of Entities
• Internal (to a doc) vs. External ( use URI)
• General (in XML doc) vs. Parameter (in DTD)
• Parsed (XML) vs. Unparsed (non-XML)
XML Tutorial, Bertram Ludäscher
42
Internal Text Entities
Internal Text Entity Declaration
<!ENTITY WWW "World Wide Web">
Entity Reference
<p>We all use the &WWW;.</p>
Logically equivalent to actually appearing
<p>We all use the World Wide Web.</p>
XML Tutorial, Bertram Ludäscher
43
Unparsed (& "Binary") Entities
Declare external...
... and unparsed entity
<!ENTITY fusion SYSTEM "fusion.ps" NDATA ps>
Declare attribute type to be entity
<!attlist fullPaper source ENTITY #REQUIRED>
Element with ENTITY attribute
<fullPaper source="fusion"/>
NOTATION declaration (helper app)
<!NOTATION ps SYSTEM "ghostview.exe">
XML Tutorial, Bertram Ludäscher
44
From Docs to Data: XML Schema
• XML DTDs (part of the XML spec.)
– flexible, semistructured data model (nesting, ANY, ?,
*, |, ...)
– but document-oriented (SGML heritage)
– no support for namespaces, datatypes, inheritance
(e.g., type of book.title may be different from
poem.title)
• XML Schema (W3C working draft)
– schema definition language in XML
– data-oriented: data types
– extends capabilities of DTD
XML Tutorial, Bertram Ludäscher
45
XML Schema: Example
Define an order "record" with (mandatory) fields and an
(optional) attribute:
<type name="Order" >
<element name="name" type="string" />
<element name="street" type="string" />
<element name="zip" type="integer" />
<...>
<attribute name="orderDate" type="date" />
</type>
XML Tutorial, Bertram Ludäscher
46
XML Schema: Example
New types can be derived by extension or restriction:
<type name="personName">
<element name="title" minOccurs="0"/>
<element name="forename" minOccurs="0" maxOccurs="*"/>
<element name="surname"/>
</type>
<type name="extendedName" source="personName" derivedBy="extension">
<element name="generation" minOccurs="0"/>
</type>
<type name="simpleName" source="personName" derivedBy="restriction">
<restrictions>
<element name="title" maxOccurs="0"/>
<element name="forename" minOccurs="1" maxOccurs="1"/> </restrictions>
</type>
XML Tutorial, Bertram Ludäscher
47
W3C Work on XML Schemas
• Structures:
– Specify complex element structure and
– Set constraints on the permitted values of the
content of those elements
• Datatypes:
– Sets forth a standard of content datatypes and
– Sets rules for generating new types from them
XML Tutorial, Bertram Ludäscher
48
Further Approaches
• RELAX (REgular LAnguage description for XML)
– Standardized by INSTAC XML SWG of Japan.
– Compared with DTD, RELAX has new features:
 RELAX grammars are represented in the XML instance syntax
 RELAX borrows rich data types of XML Schema Part 2
 RELAX is namespace-aware
 many others
– XML-Data, XML-DR, DCD, SOX, DDML, DSD, Schematron...
XML Tutorial, Bertram Ludäscher
49
Normalized Data/Metadata
Representation
• Resource Description Framework (RDF)
– Metadata model
– The designer can describe objects, add properties to
define and describe them, and also make complicated
statements about the objects (statements about
relationships between resources).
– The specification comes in two sections:
• Model & Syntax (viewed as directed, labeled
graphs)
• RDF Schemas (using an XML vocabulary)
XML Tutorial, Bertram Ludäscher
50
Resource Description Framework (RDF)
• Metadata is useful for information retrieval (esp. if no
other schema info or semantics is available)
• Idea: representation independent encoding of metadata
as triples (Resource, PropertyType, Value):
– (uri1, DC:creator, uri2), (uri2, vCard:name, smith), ...
uri1
DC:creator
• "Semantic Net"
XML Tutorial, Bertram Ludäscher
uri2
vCard:name
smit
h
51
Identifying Vocabularies
• My element may not be your element:
– geometry context: <element>line</element>
– chemistry context: <element>oxygen</element>
– SGML/XML context: ....
 use XML namespaces to identify the vocabulary
XML Tutorial, Bertram Ludäscher
52
XML Namespaces
• mechanism for globally unique tag names:
<h:html xmlns:xdc="http://www.xml.com/books"
xmlns:h="http://www.w3.org/HTML/1998/html4">
<h:head><h:title>Book Review</h:title></h:head>
...
<xdc:bookreview>
<xdc:title>XML: A Primer</xdc:title>
...
</h:html>
 mix of different tag vocabularies without confusion
• namespaces only identify the vocabulary; additional
mechanisms required for structure and meaning of tags
XML Tutorial, Bertram Ludäscher
53
Processing XML
• Non-validating parser:
– checks that XML doc is syntactically well-formed
• Validating parser:
– checks that XML doc is also valid w.r.t. a given DTD
• Parsing yields tree/object representation:
– Document Object Model (DOM) API
• Or a stream of events (open/close tag, data):
– Simple API for XML (SAX)
XML Tutorial, Bertram Ludäscher
54
DOM Structure Model and API
• hierarchy of Node objects:
– document, element, attribute, text, comment, ...
• language independent programming DOM API:
–
–
–
–
get... first/last child, prev/next sibling, childNodes
insertBefore, replace
getElementsByTagName
...
• alternative event-based SAX API (Simple API for XML)
– does not build a parse tree (reports events when
encountering begin/end tags)
– for (partially) parsing large documents
XML Tutorial, Bertram Ludäscher
55
DOM Summary
• Object-Oriented approach to traverse the XML node tree
• Automatic processing of XML docs
• Manipulation & Updating of XML on client & server
• Database interoperability mechanism
• Memory-intensive
XML Tutorial, Bertram Ludäscher
56
SAX Event-Based API
• Pros:
–
–
–
–
The whole file doesn’t need to be loaded into memory
XML stream processing
Simple and fast
Allows you to ignore less interesting data
• Cons:
– limited expressive power (query/update) when working
on streams
=> application needs to build (some) parse-tree when
necessary
XML Tutorial, Bertram Ludäscher
57
Querying XML
• What can be done to XML so far:
– generation: from HTML, DBs, manually, …
– parsing: with/without DTD (valid/well-formed XML)
– accessing: APIs for XML applications:
• DOM (in memory, tree-based), SAX (event-based)
• Now: Query languages for XML
– XML-QL, XMAS, XPath, XSL(T), XQL, ...
XML Tutorial, Bertram Ludäscher
58
Querying XML
• Why not just query XML with SAX or DOM?
– SAX: very simple “event-based” queries: ok
– DOM: simple navigational queries (getChildNodes,
getNextSibling, getElementsByTagName,…): ok
• But: these are “low-level” APIs
–  iterator/cursor API for RDBs (but more powerful!)
– used to write XML applications
– “high-level” querying, restructuring and
transformation (and updates??) is tedious
– => analogue to high-level relational query languages
(SQL, QBE, Logic (Datalog), …)
=> Query languages for XML
XML Tutorial, Bertram Ludäscher
59
Querying XML
• No "official" W3C XML QL yet (but bits and pieces)
• numerous quite different XML QLs are popping up
• some XML QL overviews, comparisons, and resources:
– XML Query Languages: Experiences and Exemplars
(co-authored by several XML QL gurus)
– XML and Query Languages (Oasis Cover Pages)
– Comparative Analysis of Five XML Query Languages (A.
Bonifati, S. Ceri)
– A Data Model and Algebra for XML Query (Philip Wadler et.al.
“functional (Haskell) perspective”)
– XML-QL vs XSLT queries (Geert Jan Bex and Frank Neven; for
(future) XSLT experts only ;-)
– Introduction to XMAS (the XML QL of the MIX project)
XML Tutorial, Bertram Ludäscher
60
Querying XML
• Different XML QL paradigms depending on the
community:
– (relational, oo, semistructured) database perspective
• Lorel, YaTL, XML-QL, XMAS, FLORA/FLORID, ...
– document processing perspective
• XQL, XSL(T), XPath, ...
– functional programming perspective
• QLs with structural recursion, …
XML Tutorial, Bertram Ludäscher
61
Important QL Features (DB Perspective)
– typical parts of a query:
• (match) pattern (selects parts of the source XML tree
without looking at data)
• filter condition (selects further, now looking at the
data)
• answer construction (putting the results together,
possibly reordered, grouped, etc.)
– reordering based on nested queries, grouping, sorting, or
Skolem functions
– tag variables, path expressions for defining the patterns
without requiring knowledge of the DTD
XML Tutorial, Bertram Ludäscher
62
Selection Queryies with XQL/XPath
Find the root element (bookstore) of this document:
/bookstore
Find all author elements anywhere within the current document:
//author
Find all books where the value of the style attribute on the book is
equal to the value of the specialty attribute of the bookstore
element at the root of the document:
//book[/bookstore/@specialty = @style]
XML Tutorial, Bertram Ludäscher
63
Sample Queries with XQL/XPath
•
Find the root element (bookstore) of this document:
/bookstore
•
Find all author elements anywhere within the current document:
//author
•
Find all books where the value of the style attribute on the book is
equal to the value of the specialty attribute of the bookstore element at
the root of the document:
//book[/bookstore/@specialty = @style]
•
Find all books with author/first-name equal to 'Bob' and all magazines
with price less than 10:
//(book[author/first-name = 'Bob'] $union$ magazine[price $lt$ 10])
XML Tutorial, Bertram Ludäscher
64
Presenting XML: Extensible Stylesheet
Language (XSL)
• Why Stylesheets?
– separation of content (XML) from presentation
(XSL)
• Why not just CSS for XML?
– XSL is far more powerful:
• selecting elements
• transforming the XML tree
• content based display (result may depend on
data)
XML Tutorial, Bertram Ludäscher
65
XSL Overview
• XSL stylesheets are denoted in XML syntax
• XSL components:
1. a language for transforming XML documents
(XSLT: integral part of the XSL specification)
2. an XML formatting vocabulary
(Formatting Objects: >90% of the formatting
properties inherited from CSS)
XML Tutorial, Bertram Ludäscher
66
XSLT Processing Model
Transformatio
n
XSLT stylesheet
XML source tree
XML Tutorial, Bertram Ludäscher
XML,HTML,csv, text… result tree
67
XSLT Processing Model
• XSL stylesheet: collection of template rules
• template rule:
(pattern  template)
• main steps:
– match pattern against source tree
– instantiate template (replace current node “.” by the
template in the result tree)
– select further nodes for processing
• control can be a mix of
– recursive processing ("push": <xsl:apply-templates>
...)
– program-driven ("pull": <xsl:foreach> ...)
XML Tutorial, Bertram Ludäscher
68
But first: some syntactic sugar, PLEASE...
• instead of something complicated like
y=f(x)
• in the brave new XSLT world you can “simply” write
this as:
<xsl:variable name="y">
<xsl:call-template name="f">
<xsl:with-param name="x"/>
</xsl:call-template>
</xsl:variable name="y">
XML Tutorial, Bertram Ludäscher
69
pattern
Template Rule: Example
template
<xsl:template match="product">
<table>
<xsl:apply-templates select="sales/domestic"/>
</table>
<table>
<xsl:apply-templates select="sales/foreign"/>
</table>
</xsl:template>
(i) match pattern: process <product> elements
(ii) instantiate template: replace each a product with two HTML tables
(iii) select the <product> grandchildren (“sales/domestic”,
“sales/foreign”) for further processing
XML Tutorial, Bertram Ludäscher
70
Match/Select Patterns
• match patterns  select patterns = defined in
http://w3.org/TR/xpath
• Examples:
– /mybook/chapter[2]/section/*
– chapter|appendix
– chapter//para
– div[@class="appendix" and position()
mod 2 = 1]//para
– ../@lang
XML Tutorial, Bertram Ludäscher
71
XSLT Processing Flavors: Recursive Descent
Processing
• take some XML file on books: books.xml
• now prepare it with style: books.xsl
• and enjoy the result: books.html
• the recipe for cooking this was:
java com.icl.saxon.StyleSheet books.xml books.xsl > books.html
• and now some different flavors: books2.xsl books3.xsl
XML Tutorial, Bertram Ludäscher
72
Creating the Result Tree...
• Literal result elements: non-XSL elements (e.g.,
HTML) appear “literally” in the result tree
• Constructing elements:
<xsl:element name = "…">
attribute & children definition
</xsl:element>
(similar for
xsl:attribute, xsl:text, xsl:comment,…)
• Generating text:
<xsl:template match="person">
<p>
<xsl:value-of select="@first-name"/>
<xsl:text> </xsl:text>
<xsl:value-of select="@surname"/>
</p>
</xsl:template>
XML Tutorial, Bertram Ludäscher
73
Creating the Result Tree...
• Further XSL elements for ...
– Numbering
• <xsl:number value="position()" format="1 ">
– Conditions
• <xsl:if test="position() mod 2 = 0">
– Repetition...
XML Tutorial, Bertram Ludäscher
74
Creating the Result Tree: Repetition
<xsl:template match="/">
<html>
<head>
<title>customers</title>
</head>
<body>
<table>
<tbody>
<xsl:for-each select="customers/customer">
<tr>
<th>
<xsl:apply-templates select="name"/>
</th>
<xsl:for-each select="order">
<td>
<xsl:apply-templates/>
</td>
...
</html>
</xsl:template>
XML Tutorial, Bertram Ludäscher
75
Creating the Result Tree: Sorting
<xsl:template match="employees">
<ul>
<xsl:apply-templates select="employee">
<xsl:sort select="name/last"/>
<xsl:sort select="name/first"/>
</xsl:apply-templates>
</ul>
</xsl:template>
<xsl:template match="employee">
<li>
<xsl:value-of select="name/first"/>
<xsl:text> </xsl:text>
<xsl:value-of select="name/last"/>
</li>
</xsl:template>
XML Tutorial, Bertram Ludäscher
76
More on XSL
• XSL(T):
– Conflict resolution for multiple applicable rules
– Modularization <xsl:include> <xsl:import>
– …
• XSL Formatting Objects
– a la CSS
• XPath (navigation syntax + functions)
= XSLT  XPointer
• ...
XML Tutorial, Bertram Ludäscher
77
The MIX Project:
Mediation of Information using XML
Joint effort between SDSC and the UCSD CSE
Department
XML Tutorial, Bertram Ludäscher
78
Mediation of Information using XML (MIX)
XML Query
XML
XML View
Document(s)
Wrapper
Data Source
(eg. home ads)
XML Tutorial, Bertram Ludäscher
Export:
• Schema & Metadata
(DTD, RDF,…)
• Capabilities
XML View
Document(s)
XML View
Document(s)
Wrapper
Native XML
Database
Legacy
Source
79
Integrated / Mediated views
Integrated
XML View
View Definition in
XML Query Lang
Mediator
XML View
Document(s)
Wrapper
Data
Source
XML Tutorial, Bertram Ludäscher
XML View
Document(s)
XML View
Document(s)
XML Data
Source
Wrapper
Data
Source
80
A Typical Mediation Scenario
User
Interface
Query
Results
Mediator
(integrated views over
heterogeneous sources)
Query “fragment”
Convert incoming query Wrapper
and outgoing data
SQL Database
XML Tutorial, Bertram Ludäscher
Query “fragment”
Wrapper
Wrapper
GIS
HTML
81
MIX Components
• MIXm Mediator tool-kit
– allows definition of views across multiple resources
– views are expressed in a declarative query language
– query engine to execute queries on views
• XML Matching And Structuring (XMAS) query
language
– operates on a given set of XML documents to produce a
new XML document, using XMAS algebra
XML Tutorial, Bertram Ludäscher
82
An XML Query (XMAS)
$C:<*.condo>
<address zip=$Z/>
</condo> AT www.condo.com
AND
$S:<*.school type=elementary>
<address zip=$Z/>
</school> AT schools.org
...
<RealEstateAgent>
<name>J. Smith</name>
<condos>
<condo>
<address ... zip=92037>
<price>$170k OBO</price>
<bedrooms>2</bedrooms>
</condo>
<condos>
</RealEstateAgent>
XML Tutorial, Bertram Ludäscher
<folder>
$C
$S for $S
</folder> for $C
<condosAndSchools>
<folder>
<condo>
<address ... zip=92037>
<price>$170k OBO</price>
<bedrooms>2</bedrooms>
</condo>
<school>
<name>La Jolla High</name>
<address … zip=92037>
</school>
<school>…</school>
83
</folder>
MIX components...
• DOM-VXD: DOM Virtual XML Document extension
– a “lazy” implementation of DOM. Supports browsing/
navigation of XML documents with a server-side,
“compute as you go” model
• Blended Browsing and Querying (BBQ) interface
– supports navigation and querying of XML documents
– generates XMAS queries on mediator views
– generates XMAS queries modified by DOM-VXD operations
to incrementally evaluate the result set, to support
navigation of XML documents
XML Tutorial, Bertram Ludäscher
84
Navigation driven evaluation
client
navigation
commands
result
view definition
q( s1 … sn )
Lazy Mediator
source
navigation
commands
s1
XML source
XML Tutorial, Bertram Ludäscher
...
sn
XML source
85
Blended Browsing and Querying UI (BBQ)
XML Tutorial, Bertram Ludäscher
86
Another MIX Example: CDL/AMICO Mediator
Prototype
BBQ
Interface
Request for
image
(X.509)
XMAS query
HPSS
XML Tutorial, Bertram Ludäscher
Q2: Find creator and related metadata
XML doc
of paintings
MIXm
View based on
AMICO DTD
tif file
SRB/MCAT
Q1: Find title, type, and image ID of
paintings
Wrapper
MARC
Database
AMICO XML AMICO XML
Database
Database
AMICO/XML Demo
87
XSL Stylesheet for AMICO Answer Docs
XML Tutorial, Bertram Ludäscher
88
... and the Result (+BBQ)
BBQ query
composition
XSL rendered
output
XML answer
document
XML Tutorial, Bertram Ludäscher
89
Projects at DICE/SDSC
• National Archives and Records
Administration, NARA
– Persistent Archives and Electronic Records
• NHPRC/NARA
• XML and GIS
– aXioMap
• I2T: An Information Integration Testbed
for Digital Government
XML Tutorial, Bertram Ludäscher
90
Projects at SDSC (… cont)
• AMICO
– In conjunction with the California Digital Library
(CDL)
– Part of the NSF DLI-2 project
• ESRI
• Community of Science, Inc.
• Networked Earthquake Engineering
Simulation (NEES)
– NSF program
XML Tutorial, Bertram Ludäscher
91
Information Based Computing
Data
Storage
Applications
Collection
Building
Information
Management
Applications
Digital Sky
Neuroscience
Protein Data Bank
Molecular Structures
Earth Systems Science
XML Tutorial, Bertram Ludäscher
Archival
Storage
Digital
Library
Digital Libraries
CDL
UCB - Elib
UCSB - ADL
Stanford - SDLIP
U Michigan - UMDL
92
Integrating Data Set Management
• Model-Based Information Management
– Rule-based ontology mapping, conceptual-level mediation - CMIX
• Data Grid
– Data federation across multiple libraries - MIX
• Digital Library
– Interoperable services for information discovery and presentation SDLIP
• Data Collection
– Tools for managing data set collections on databases - MCAT
• Data Handling
– Systems for data retrieval from remote storage - SRB
• Persistent Archives
– Storage of data collections for 30 years
XML Tutorial, Bertram Ludäscher
93
Model-Based Mediation
• Knowledge-based mediation
– conceptual-level integration
• Rule-based ontology maps
– map source XML to CM to FL (ontologies, views)
• Models for exporting
–
–
–
–
rules
integrity constraints
query capabilities
data & schema (XML/DTDs)
XML Tutorial, Bertram Ludäscher
94
Federation of Brain Data
Result (XML/XSLT)
PROTLOC
Result (VML)
ANATOM

MODEL-BASED Mediation
CCB, Montana SU
Surface atlas, Van Essen Lab
stereotaxic atlas LONI
XML Tutorial, Bertram Ludäscher
MCell, CNL, Salk
NCMIR, UCSD
95
Further Information
•
•
•
•
•
xml.com
w3.org
xml.org
ibm.com/xml
...
• Mediation of Information using XML (MIX):
– www.npaci.edu/DICE/MIX/
– www.db.ucsd.edu/Projects/MIX/
XML Tutorial, Bertram Ludäscher
96
Download