CS 502: Computing Methods for Digital Libraries Lecture 6 DTDs 1 Markup and Style Sheets document content and structure style sheet rendering software formatted document 2 Computer Systems for Markup and Style Sheets Server(s) style sheet Client document with XML markup DTD rendering software formatted document 3 Document with XML Markup (Metadata) <?xml version="1.0"?> <!DOCTYPE dlib-meta0.1 SYSTEM "http://www.dlib.org/dlib/dlibmeta01.dtd"> <dlib-meta0.1> <title>Digital Libraries and the Problem of Purpose</title> <creator>David M. Levy</creator> <publisher>Corporation for National Research Initiatives</publisher> <date date-type = "publication">January 2000</date> <type resource-type = "work">article</type> Continued on next slide 4 Document with XML Markup (Metadata) - 2 Continued from previous slide <identifier uri-type = "DOI">10.1045/january2000-levy</identifier> <identifier uri-type = "URL">http://www.dlib.org/dlib/january00/01levy.html</identifier> <language>English</language> Continued on next slide 5 Document with XML Markup (Metadata) - 3 Continued from previous slide <relation rel-type = "InSerial"> <serial-name>D-Lib Magazine</serial-name> <issn>1082-9873</issn> <volume>6</volume> <issue>1</issue> </relation> <rights>Copyright (c) David M. Levy</rights> </dlib-meta0.1> 6 The D-Lib Magazine DTD - 1 <!-- DTD to mark up the metadata elements in D-Lib Magazine --> <!-- William Y. Arms, Cathy Rey, March 8, 1999 Updated June 16, 1999 --> <!ELEMENT dlib-meta0.1 (title, creator+, publisher, date, type, identifier+, language*, relation, rights+)> <!-- Element names are from the Dublin Core set of 15 names. --> <!-- Attributes are used to clarify the usage by D-Lib Magazine. --> Continued on next slide 7 The D-Lib Magazine DTD - 2 <!ELEMENT title (#PCDATA)> <!-- Title as supplied with all punctuation --> <!ELEMENT creator (#PCDATA)> <!-- This element is repeated for each author or other creator --> <!-- It contains the name of the author as provided, --> <!-- without affiliation or contact information. --> <!ELEMENT publisher (#PCDATA)> <!-- Publisher is "Corporation for National Research Initiatives" --> Continued on next slide 8 The D-Lib Magazine DTD - 3 <!ELEMENT date (#PCDATA)> <!ATTLIST date date-type CDATA #FIXED "publication"> <!-- Issue date, e.g., "July 1995", or "July/August 1998" --> <!ELEMENT type (#PCDATA)> <!ATTLIST type resource-type CDATA #FIXED "work"> <!-- D-Lib Magazine assigns metadata to works --> <!-- The default type is an "article" --> Continued on next slide 9 The D-Lib Magazine DTD - 4 <!ELEMENT identifier (#PCDATA)> <!ATTLIST identifier uri-type (DOI | URL) #REQUIRED> <!-- Every work should have a single DOI and one or more URLs. --> 10 The D-Lib Magazine DTD - 5 <!ELEMENT relation (serial-name, (issn, volume, issue)*)> <!ATTLIST relation rel-type CDATA #FIXED "InSerial"> <!ELEMENT serial-name (#PCDATA)> <!ELEMENT issn (#PCDATA)> <!ELEMENT volume (#PCDATA)> <!ELEMENT issue (#PCDATA)> <!-<!-<!-<!-- The serial name is "D-Lib Magazine". --> The ISSN is "1082-9873". --> Volume corresponds to year of publication, 1995 is "1". --> The issue is a count of the actual issues in the volume. --> Continued on next slide 11 The D-Lib Magazine DTD - 6 <!ELEMENT language (#PCDATA)> <!-- The name of the language in English as: "English", "French, "Japanese" --> <!ELEMENT rights (#PCDATA)> <!-- The copyright statement as given on the work. --> 12 Constructing a DTD: Grammar Every DTD has a grammar that defines: • entities • elements The grammar is expressed as a set of rules that can be processed automatically. 13 Constructing a DTD: Parameters A parameter entity is a shorthand notation, e.g., <!ENTITY % Shape "(rect|circle|poly|default)"> <!ENTITY % pub "&#xc9;ditions Gallimard" > Example. Given the following declarations: <!ENTITY % pub "&#xc9;ditions Gallimard" > <!ENTITY book "La Peste: Camus, &#xA9; 1947 %pub;." > The replacement text for the entity "book" is: La Peste: Camus, © 1947 Éditions Gallimard. 14 An Example (DTD for XHTML) Objective: Design a markup specification that is (a) Correct XML (b) Similar to HTML, so that users of HTML can learn it easily existing HTML documents can be converted (c) Has features that permit long-term growth in the web 15 Some Assumptions • Full Unicode and UTF-8 support • All tags are structural no <b>, <font>, etc • Empty tags defined as necessary e.g., <br />, <img /> • Enforce syntax rules e.g., <p> </p> correct nesting 16 A Minimal Document <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> <xhtml xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p> </body> </xhtml> 17 Constructing a DTD: Entities <!ENTITY nbsp "&#160;"> <!-- no-break space = non-breaking space, U+00A0 ISOnum --> <!ENTITY iexcl "&#161;"> <!-- inverted exclamation mark, U+00A1 ISOnum --> <!ENTITY cent "&#162;"> <!-- cent sign, U+00A2 ISOnum --> <!ENTITY pound "&#163;"> <!-- pound sign, U+00A3 ISOnum --> <!ENTITY curren "&#164;"> <!-- currency sign, U+00A4 ISOnum --> <!ENTITY yen "&#165;"> <!-- yen sign = yuan sign, U+00A5 ISOnum --> 18 Constructing a DTD: Entities Latin-1 characters <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent"> Special characters <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent"> Symbols <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent"> 19 The Full Example (XHTML) The full DTD is: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd 20