714: Metadata Encoding Records in XML: DC, MARCXML and MODS Margaret E.I. Kipp - kipp@uwm.edu https://pantherfile.uwm.edu/kipp/public/courses/714 1 Encoding Metadata in XML 2 HTML ● ● HTML is an acronym for hypertext markup language HTML markup language widely used on the Web ● ● ● http://www.w3.org/TR/html4/ HTML and XML are related and are all based on SGML SGML stands for Standard Generalized Markup Language ● Developed as publishing industry standard 3 From SGML to HTML and XML ● ● SGML was developed first. HTML and XML were developed from SGML HTML combines definitions of what an item is with how it is displayed ● ● HTML may also use CSS to define layout and display XML separates these two aspects of a document just as SGML does 4 XML vs HTML ● HTML was designed to display data HTML describes how to display data (e.g. font, bold, double spaced) XML was designed to describe data ● ● XML tags data with field/element names An HTML document describes how to display a paragraph, an XML document describes it as an abstract ● ● 5 XML ● ● HTML and XML were designed to implement parts of SGML on the web XML stands for eXtensible Markup Language because it can be modified with a DTD (or an XSD) 6 Document Type Definition (DTD) ● The DTD is a non-SGML language that describes SGML document types. It describes ● ● Information elements that the document handles (e.g. title, chapters, etc) Relationships between information elements – A chapter contains sections – A title comes at the top of the document 7 XML Schema (XSD) ● replacement for DTDs ● store information about elements and relationships ● stored in XML format unlike DTDs 8 Markup ● ● ● ● Markup is everything in a document that is not content (e.g. font, layout, graphics) XHTML, XML and HTML share similar syntax in their markup e.g. <html></html> are the tags that enclose an entire page in XHTML and HTML for XML these tags could be ● <?xml version="1.0"?> ● <metadata></metadata> 9 Basic HTML Page <!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN” “http://www.w3.org/TR/html4/loose.dtd”> <html> <head></head> <body> <p>This is a basic HTML page. </body> </html> 10 Basic XML Document ● <?xml version="1.0"?> ● <books xmlns="book.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchemainstance" xsi:schemaLocation="book.xsd" > ● <book> – ● ● <author>Jane Q. Public</author> – <title>Metadata for Everybody</title> – <identifier>www.example.org/metadataforeveryb ody</identifier> </book> </books> 11 XML Elements ● XML documents are based on elements ● Elements = tags in angle brackets <> ● methods of writing an element ● <name>value</name> ● <name/> (= <name></name>) – ● ● empty element, no value but may have attributes <name attributes="attribute1">value</name> elements can be nested <b><i>style</i></b> 12 Syntax Rules for Attributes ● ● ● Attribute names are separated from their values by the = sign. The equal sign can be surrounded by whitespace. Attribute values can be enclosed in single or double quotes, but most people use double quotes. Attribute names must be unique (i.e. Attributes cannot be repeated) 13 More Attribute Rules ● ● Elements or tags cannot be placed inside attributes. Attributes must have a value, but the value could be empty. ● ● <name attribute=""/> Attributes separated by a space. ● <name attrib1="one" attrib2="two"/> 14 Well Formed Documents ● ● ● XML documents must be well formed Well formed documents have correct syntax (no mistakes in use and order of tags) All elements must be properly nested and closed. You can only close the outer element after all child elements are closed ● <a><b></a></b> not well-formed ● <a><b></b></a> well formed 15 Simple DTD: Book ● <!ELEMENT books (book+)> ● <!ELEMENT book (authors,title)> ● <!ELEMENT authors (author+)> ● <!ELEMENT author (#PCDATA)> ● <!ELEMENT title (#PCDATA)> ● specifies a books object which can contain multiple book objects (+) ● each book has an author and title 16 Simple XSD: Book ● <?xml version="1.0"?> ● <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> ● <xs:element name="books"> ● <xs:complexType><xs:sequence> ● <xs:element ref="book" maxOccurs="unbounded"/> ● </xs:sequence></xs:complexType></xs:element> ● <xs:element name="book"> ● <xs:complexType><xs:sequence> ● <xs:element name="author" type="xs:string"/> ● <xs:element name="title" type="xs:string"/> ● ● </xs:sequence></xs:complexType> </xs:element></xs:schema> 17 Simple XML Document ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● <?xml version="1.0"?> <!DOCTYPE books SYSTEM "book.dtd"> <books> <book> <authors> <author>Jane Q. Public</author> </authors> <title>Metadata for Everybody</title> </book> <book> <authors> <author>Marcia Lei Zeng</author> <author>Jian Qin</author> </authors> <title>Metadata</title> </book> </books> 18 Encoding DC in HTML ● HTML documents contain two main parts a <head> and a <body>, the body is the visible portion of the webpage, the head contains the metadata ● HTML and XHTML share an element set ● other HTML tags: http://www.w3schools.com/tags/default.asp 19 DC in HTML Metadata ● two tags are of interest for storing metadata: meta and link ● meta syntax: <meta name="[property name]" scheme="[value]" content="[value]" /> ● ● meta stores metadata, property name = element name, a lang attribute can also be added, scheme is optional link syntax: <link rel="[property name]" href="[URI]" /> ● stores references or relationship 20 Examples of DC in HTML: META ● meta: meta is the tag, name and content are the attributes ● <meta name="DC.title" lang="en" content="Metadata for Everybody" /> ● <meta name="DC.creator" content="Jane Q. Public" /> ● <meta name="DC.date" scheme="DCTERMS.W3CDTF" content="2008" /> 21 Examples of DC in HTML: LINK ● link: link is the tag, rel and href are the attributes (you may recognise href from the anchor or <a> tag) ● <link rel="schema.DC" href=" http://purl.org/dc/elements/1.1/" /> ● a lang tag can also be added 22 More DC Examples ● Examples of sites that encode DC metadata in HTML ● http://dlist.sir.arizona.edu/ ● http://eprints.rclis.org/ ● http://dspace.mit.edu/ 23 DC in HTML (excerpt) ● <link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" /> ● <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> ● <meta name="DC.creator" content="Coleman, Anita Sundaram" xml:lang="en_US" /> ● <meta name="dc.date" content="2004-12" xml:lang="en_US" /> ● <meta name="DC.format" content="application/pdf" xml:lang="en_US" /> 24 DC in HTML (screenshot) 25 Encoding DC in XML ● XML provides a formal syntax for describing the relationships between the entities, elements and attributes in an XML document [Zeng and Qin] ● an XML document consists of a root element, matching the name of the defined XML schema or DTD and a set of elements ● element syntax: <name attribute="[value]">content</name> ● may have xml:lang or other attributes 26 Example of Encoding DC in XML <?xml version="1.0"?> <metadata xmlns="http://dublincore.org/schemas/xmls/qdc/2008/02/11/d c.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://dublincore.org/schemas/xmls/qdc/ 2008/02/11/dc.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/"> <dc:title>Metadata for Everybody</dc:title> <dc:creator>Jane Q. Public</dc:creator> <dc:date scheme="DCTERMS.W3CDTF">2008</dc:date> </metadata> 27 More XML Examples ● https://pantherfile.uwm.edu/kipp/public/courses /714/metadataexamples/dcinxmlonlinethesis.xml ● http://z3950.loc.gov:7090/voyager? version=1.1&operation=searchRetrieve&query= dinosaur&startRecord=1&maximumRecords=10 &recordSchema=dc ● http://export.arxiv.org/oai2? verb=GetRecord&identifier=oai:arXiv.org:0804.2 273&metadataPrefix=oai_dc 28 DC in XML (excerpt) <zs:record><zs:recordSchema>info:srw/schema/1/dcv1.1</zs:recordSchema><zs:recordPacking>xml</zs:recordPacking><zs:record Data><srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="info:srw/schema/1/dc-schema http://www.loc.gov/standards/sru/resources/dc-schema.xsd"> <title>3-D dinosaur adventure.</title> <creator>Knowledge Adventure, Inc.</creator> <creator>Copyright Collection (Library of Congress) DLC</creator> <type>software, multimedia</type> <type>Educational games.</type> <type>Video games.</type> <publisher>Glendale, CA : Knowledge Adventure,</publisher> <date>c1995.</date> <language>eng</language> <subject>Dinosaurs--Juvenile software.</subject> <identifier>URN:ISBN:1569972133</identifier> </srw_dc:dc></zs:recordData><zs:recordPosition>1</zs:recordPosition></zs:re29 cord> DC in XML (screenshot) 30 In Class Exercise: DC in XML ● XML encode the DC record for an item from the class 2 exercise (or another existing DC record) using the full DCTERMS schema ● Use the following XML template. Be sure to right click to save. Do not open directly from browser. Edit with notepad++ or Oxygen (Never Word). ● https://pantherfile.uwm.edu/kipp/public/courses/shared/dcinxmlt emplate-allterms.xml ● You can delete elements you are not using, but be sure to delete all of the element from start to end tag. ● Validate your XML by opening in a browser or use http://validator.w3.org/ 31 MARCXML and MODS 32 Example Record Metadata by Marcia Lei Zeng and Jian Qin http://lccn.loc.gov/2008015176 Part of a MARC Record 100 1_ |a Zeng, Marcia Lei, |d 1956245 10 |a Metadata / |c Marcia Lei Zeng and Jian Qin. 260 __ |a New York : |b Neal-Schuman Publishers, |c c2008. 300 __ |a xvii, 365 p. : |b ill. ; |c 23 cm. 504 __ |a Includes bibliographical references (p. 327-353) and index. 505 0_ |a Introduction -- Current standards -- Schemas : structure and semantics -- Schemas : syntax -- Metadata records -- Metadata services -- Metadata quality measurement and improvement. 650 _0 |a Metadata. 700 1_ |a Qin, Jian, |d 1956- MARC leader 01271cam 2200277 a 4500001000900000005001700009008004100026906004500067925004400112 9550166001560100017003220200038003390200035003770400018004120500 0220043008200140045210000290046624500460049526000490054130000350 0590504006400625505018500689650001400874700002200888856008300910 #15258260#20090831125647.0#080411s2008 nyua b 001 0 eng # #a7#bcbc#corignew#d1#eecip#f20#gy-gencatlg#0 #aacquire#b2 shelf copies#xpolicy default# #alh39 2008-04-11#ilh39 2008-04-11#elh39 2008-04-11 to Dewey#aaa20 2008-04-15#aps04 2008-06-20 1 copy rec'd., to CIP ver.#flh36 2008-06-27#glh36 2008-06-27 to BCCD# #a 2008015176# #a9781555706357 (pbk. : alk. paper)# #a1555706355 (pbk. : alk. paper)# #aDLC#cDLC#dDLC#00#aZ666.7#b.Z46 2008#00#a025.3#222#1 #aZeng, Marcia Lei,#d1956-#10#aMetadata /#cMarcia Lei Zeng and Jian Qin.# #aNew York :#bNeal-Schuman Publishers,#cc2008.# #axvii, 365 p. :#bill. ;#c23 cm.# #aIncludes bibliographical references (p. 327-353) and index.#0 #aIntroduction -- Current standards -- Schemas : structure and semantics -- Schemas : syntax -- Metadata records -- Metadata services -- Metadata quality measurement and improvement.# 0#aMetadata.#1 #aQin, Jian,#d1956-#41#3Table of contents only#uhttp://www.loc.gov/catdir/toc/ecip0816/2008015176.html## http://www.loc.gov/marc/bibliographic/ecbdlist.html 35 MARC Fixed Fields ● The leader and fixed fields specify language, format and explicitly spell out how long each of the other MARC fields are... ● take the following chunk from the MARC record: 245004600495260004900541 ● this specifies that the 245 field is 46 characters long and starts at position 495 in the record ● the 260 field is 49 characters long and starts at 541 36 Why MARCXML? ● designed to eliminate the need to specify length of fields ● uses XML standard to encode MARC records ● exact duplicate of variable fields in a MARC record, does not duplicate the fixed fields as this information is no longer needed ● conversion from MARC to MARCXML is exact (lossless) it is not a crosswalk as all fields can be exactly represented 37 MARCXML Examples ● ● http://lccn.loc.gov/2008015176/marcxml ● <datafield tag="245" ind1="1" ind2="0"><subfield code="a">Metadata /</subfield><subfield code="c">Marcia Lei Zeng and Jian Qin.</subfield> ● </datafield> ● <datafield tag="260" ind1=" " ind2=" "><subfield code="a">New York :</subfield><subfield code="b">Neal-Schuman Publishers,</subfield><subfield code="c">c2008.</subfield> ● </datafield> http://apps.appl.cuny.edu:5661/U-CUN01? version=1.1&operation=searchRetrieve&query=dc.creator= %22william+faulkner%22&startRecord=1&maximumRecords=10 38 MODS (Metadata Object Description Schema) ● MARC allows cataloguers to create complex records (rich metadata) and provides a much more expressive element set than Dublin Core, but is complex to use ● MODS was designed to allow complex data to be encoded in a more interoperable format and to allow existing MARC records to be translated to other formats ● http://www.loc.gov/standards/mods/ Features of MODS ● originates from MARC (inherited semantics and a subset of fields) ● uses language based tags rather than numeric ● regroups similar elements from MARC (e.g. 1XX, 7XX which are both creator fields) ● uses attributes to refine elements ● does not assume the use of AACR2 as a cataloguing standard so allows for introduction of RDA MODS Elements ● MODS is hierarchical, elements may have subelements ● MODS has two root elements which may hold elements (mods and modsCollection) ● MODS has 20 top level elements which may have subelements ● elements and subelements may have attributes ● Outline: http://www.loc.gov/standards/mods/mods-outline.html ● Schema: http://www.loc.gov/standards/mods/v3/mods-3-3.xsd MODS Encoding and Display ● MODS uses XML to encode the content of a record but does not specify a display format (just like RDA which does not specify a display format only what content should be present) ● MODS will use XML Stylesheets for formatting (XSLT) MODS titleinfo 1. titleInfo Subelements: title subTitle partNumber partName nonSort Attributes: ID; xlink; lang; xml:lang; script; transliteration type (enumerated: abbreviated, translated, alternative, uniform) authority (see: LOC Authorites) displayLabel MODS Excerpt ● <mods xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd" version="3.4"> ● <titleInfo><title>Metadata</title></titleInfo> ● <name type="personal"><namePart>Zeng, Marcia Lei</namePart><namePart type="date">1956</namePart><role><roleTerm type="text" authority="marcrelator">creator</roleTerm></role></name> ● <name type="personal"><namePart>Qin, Jian</namePart><namePart type="date">1956</namePart></name> ● http://lccn.loc.gov/2008015176/mods MODS Title Entry ● <titleInfo><title>Metadata</title></titleInfo> ● titleInfo is the top level element for holding information about the title ● title is a subelement of titleInfo which holds the title proper (as defined in AACR2) ● subTitle would hold a sub title ● partNumber, partName etc. would handle chapters, and other portions of a whole work MODS Name Entry ● <name type="personal"><namePart>Zeng, Marcia Lei</namePart><namePart type="date">1956-</namePart> ● <role><roleTerm type="text" authority="marcrelator">creator</roleTerm></rol e> ● </name> ● two parts: name and role ● name: creator name and dates ● role: indicates if this creator was the main or an added entry MODS Subject Entry ● <subject authority="lcsh"> ● <topic>Metadata</topic> ● </subject> ● indicates that the subject is an LCSH subject heading MODS, MARC and DC Structure ● MODS has a hierarchical structure, DC is flat ● DC used by the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) ● MODS is the format used by the Metadata Encoding and Transmission Standard (METS) and large scale web archiving projects like the Library of Congress American Memory Project ● common goal is making information discoverable MARC 245 to MODS titleInfo ● 245 $a$f$g$k <title> with no <titleInfo> type attribute and ● 245 $b <subTitle> ● 245 $n (and $f$g$k following $n) <partNumber> ● 245 $p (and $f$g$k following $p) <partName> ● 245 ind2 is not 0 <nonSort> around characters excluded from sort as indicated in indicator value MODS Examples ● https://pantherfile.uwm.edu/kipp/public/courses /714/metadataexamples/modsinxml-onlinethesis.xml ● http://www.americanhistoryonline.org/sru? operation=searchRetrieve&version=1.1&query=dog&r ecordSyntax=mods&maximumRecords=10 ● http://z3950.loc.gov:7090/voyager? version=1.1&operation=searchRetrieve&query=dinosa ur&startRecord=1&maximumRecords=10&recordSche ma=mods In Class Exercise: MARCXML or MODS ● Encode one of the records from the metadata creation exercise in week 2 in MODS. ● Use the following record as a template. Be sure to right click to save. Edit with notepad++ or Oxygen (Never Word). ● https://pantherfile.uwm.edu/kipp/public/courses /714/metadataexamples/modsinxml-onlinethesis.xml ● You can simply replace the existing values with your record.