Lecture 6

advertisement
CS 502: Computing Methods for
Digital Libraries
Lecture 6
DTDs
1
Markup and Style Sheets
document
content and
structure
style sheet
rendering
software
formatted
document
2
Computer Systems for Markup and
Style Sheets
Server(s)
style sheet
Client
document
with XML
markup
DTD
rendering
software
formatted
document
3
Document with XML Markup
(Metadata)
<?xml version="1.0"?>
<!DOCTYPE dlib-meta0.1 SYSTEM "http://www.dlib.org/dlib/dlibmeta01.dtd">
<dlib-meta0.1>
<title>Digital Libraries and the Problem of Purpose</title>
<creator>David M. Levy</creator>
<publisher>Corporation for National Research Initiatives</publisher>
<date date-type = "publication">January 2000</date>
<type resource-type = "work">article</type>
Continued on next slide
4
Document with XML Markup
(Metadata) - 2
Continued from previous slide
<identifier uri-type = "DOI">10.1045/january2000-levy</identifier>
<identifier uri-type =
"URL">http://www.dlib.org/dlib/january00/01levy.html</identifier>
<language>English</language>
Continued on next slide
5
Document with XML Markup
(Metadata) - 3
Continued from previous slide
<relation rel-type = "InSerial">
<serial-name>D-Lib Magazine</serial-name>
<issn>1082-9873</issn>
<volume>6</volume>
<issue>1</issue>
</relation>
<rights>Copyright (c) David M. Levy</rights>
</dlib-meta0.1>
6
The D-Lib Magazine DTD - 1
<!-- DTD to mark up the metadata elements in D-Lib Magazine -->
<!-- William Y. Arms, Cathy Rey, March 8, 1999 Updated June 16,
1999 -->
<!ELEMENT dlib-meta0.1 (title, creator+, publisher, date, type,
identifier+, language*, relation, rights+)>
<!-- Element names are from the Dublin Core set of 15 names. -->
<!-- Attributes are used to clarify the usage by D-Lib Magazine. -->
Continued on next slide
7
The D-Lib Magazine DTD - 2
<!ELEMENT title (#PCDATA)>
<!-- Title as supplied with all punctuation -->
<!ELEMENT creator (#PCDATA)>
<!-- This element is repeated for each author or other creator -->
<!-- It contains the name of the author as provided, -->
<!-- without affiliation or contact information. -->
<!ELEMENT publisher (#PCDATA)>
<!-- Publisher is "Corporation for National Research Initiatives" -->
Continued on next slide
8
The D-Lib Magazine DTD - 3
<!ELEMENT date (#PCDATA)>
<!ATTLIST date
date-type CDATA #FIXED "publication">
<!-- Issue date, e.g., "July 1995", or "July/August 1998" -->
<!ELEMENT type (#PCDATA)>
<!ATTLIST type
resource-type CDATA #FIXED "work">
<!-- D-Lib Magazine assigns metadata to works -->
<!-- The default type is an "article" -->
Continued on next slide
9
The D-Lib Magazine DTD - 4
<!ELEMENT identifier (#PCDATA)>
<!ATTLIST identifier
uri-type (DOI | URL) #REQUIRED>
<!-- Every work should have a single DOI and one or more URLs. -->
10
The D-Lib Magazine DTD - 5
<!ELEMENT relation (serial-name, (issn, volume, issue)*)>
<!ATTLIST relation
rel-type CDATA #FIXED "InSerial">
<!ELEMENT serial-name (#PCDATA)>
<!ELEMENT issn (#PCDATA)>
<!ELEMENT volume (#PCDATA)>
<!ELEMENT issue (#PCDATA)>
<!-<!-<!-<!--
The serial name is "D-Lib Magazine". -->
The ISSN is "1082-9873". -->
Volume corresponds to year of publication, 1995 is "1". -->
The issue is a count of the actual issues in the volume. -->
Continued on next slide
11
The D-Lib Magazine DTD - 6
<!ELEMENT language (#PCDATA)>
<!-- The name of the language in English as: "English", "French,
"Japanese" -->
<!ELEMENT rights (#PCDATA)>
<!-- The copyright statement as given on the work. -->
12
Constructing a DTD: Grammar
Every DTD has a grammar that defines:
• entities
• elements
The grammar is expressed as a set of rules that can be
processed automatically.
13
Constructing a DTD: Parameters
A parameter entity is a shorthand notation, e.g.,
<!ENTITY % Shape "(rect|circle|poly|default)">
<!ENTITY % pub "Éditions Gallimard" >
Example. Given the following declarations:
<!ENTITY % pub "Éditions Gallimard" >
<!ENTITY book "La Peste: Camus, © 1947 %pub;." >
The replacement text for the entity "book" is:
La Peste: Camus, © 1947 Éditions Gallimard.
14
An Example (DTD for XHTML)
Objective:
Design a markup specification that is
(a) Correct XML
(b) Similar to HTML, so that
users of HTML can learn it easily
existing HTML documents can be converted
(c) Has features that permit long-term growth in the web
15
Some Assumptions
• Full Unicode and UTF-8 support
• All tags are structural
no <b>, <font>, etc
• Empty tags defined as necessary
e.g., <br />, <img />
•
Enforce syntax rules
e.g., <p> </p>
correct nesting
16
A Minimal Document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
<xhtml xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
<head>
<title>Virtual Library</title>
</head>
<body>
<p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>
</body>
</xhtml>
17
Constructing a DTD: Entities
<!ENTITY nbsp " ">
<!-- no-break space = non-breaking space, U+00A0 ISOnum -->
<!ENTITY iexcl "¡">
<!-- inverted exclamation mark, U+00A1 ISOnum -->
<!ENTITY cent "¢">
<!-- cent sign, U+00A2 ISOnum -->
<!ENTITY pound "£">
<!-- pound sign, U+00A3 ISOnum -->
<!ENTITY curren "¤">
<!-- currency sign, U+00A4 ISOnum -->
<!ENTITY yen "¥">
<!-- yen sign = yuan sign, U+00A5 ISOnum -->
18
Constructing a DTD: Entities
Latin-1 characters
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
Special characters
<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
Symbols
<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
19
The Full Example (XHTML)
The full DTD is:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
20
Download