Slides - pantherFILE

advertisement
714: Metadata
Encoding Records in XML: DC,
MARCXML and MODS
Margaret E.I. Kipp - kipp@uwm.edu
https://pantherfile.uwm.edu/kipp/public/courses/714
1
Encoding Metadata in XML
2
HTML
●
●
HTML is an acronym for hypertext
markup language
HTML markup language widely used on
the Web
●
●
●
http://www.w3.org/TR/html4/
HTML and XML are related and are all
based on SGML
SGML stands for Standard Generalized
Markup Language
●
Developed as publishing industry standard
3
From SGML to HTML and XML
●
●
SGML was developed first. HTML and XML
were developed from SGML
HTML combines definitions of what an
item is with how it is displayed
●
●
HTML may also use CSS to define
layout and display
XML separates these two aspects of a
document just as SGML does
4
XML vs HTML
●
HTML was designed to display data
HTML describes how to display data (e.g.
font, bold, double spaced)
XML was designed to describe data
●
●
XML tags data with field/element names
An HTML document describes how to display a
paragraph, an XML document describes it as
an abstract
●
●
5
XML
●
●
HTML and XML were designed to
implement parts of SGML on the web
XML stands for eXtensible Markup
Language because it can be modified
with a DTD (or an XSD)
6
Document Type Definition
(DTD)
●
The DTD is a non-SGML language that
describes SGML document types. It
describes
●
●
Information elements that the document
handles (e.g. title, chapters, etc)
Relationships between information elements
–
A chapter contains sections
–
A title comes at the top of the document
7
XML Schema (XSD)
●
replacement for DTDs
●
store information about elements and
relationships
●
stored in XML format unlike DTDs
8
Markup
●
●
●
●
Markup is everything in a document that
is not content (e.g. font, layout, graphics)
XHTML, XML and HTML share similar
syntax in their markup
e.g. <html></html> are the tags that
enclose an entire page in XHTML and
HTML
for XML these tags could be
●
<?xml version="1.0"?>
●
<metadata></metadata>
9
Basic HTML Page
<!DOCTYPE HTML PUBLIC “-//W3C//DTD
HTML 4.01 Transitional//EN”
“http://www.w3.org/TR/html4/loose.dtd”>
<html>
<head></head>
<body>
<p>This is a basic HTML page.
</body>
</html>
10
Basic XML Document
●
<?xml version="1.0"?>
●
<books xmlns="book.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchemainstance" xsi:schemaLocation="book.xsd" >
●
<book>
–
●
●
<author>Jane Q. Public</author>
– <title>Metadata for Everybody</title>
– <identifier>www.example.org/metadataforeveryb
ody</identifier>
</book>
</books>
11
XML Elements
●
XML documents are based on elements
●
Elements = tags in angle brackets <>
●
methods of writing an element
●
<name>value</name>
●
<name/> (= <name></name>)
–
●
●
empty element, no value but may have attributes
<name attributes="attribute1">value</name>
elements can be nested <b><i>style</i></b>
12
Syntax Rules for Attributes
●
●
●
Attribute names are separated from their
values by the = sign. The equal sign can
be surrounded by whitespace.
Attribute values can be enclosed in single
or double quotes, but most people use
double quotes.
Attribute names must be unique (i.e.
Attributes cannot be repeated)
13
More Attribute Rules
●
●
Elements or tags cannot be placed inside
attributes.
Attributes must have a value, but the
value could be empty.
●
●
<name attribute=""/>
Attributes separated by a space.
●
<name attrib1="one" attrib2="two"/>
14
Well Formed Documents
●
●
●
XML documents must be well formed
Well formed documents have correct
syntax (no mistakes in use and order of
tags)
All elements must be properly nested and
closed. You can only close the outer
element after all child elements are
closed
●
<a><b></a></b> not well-formed
●
<a><b></b></a> well formed
15
Simple DTD: Book
●
<!ELEMENT books (book+)>
●
<!ELEMENT book (authors,title)>
●
<!ELEMENT authors (author+)>
●
<!ELEMENT author (#PCDATA)>
●
<!ELEMENT title (#PCDATA)>
●
specifies a books object which can contain
multiple book objects (+)
●
each book has an author and title
16
Simple XSD: Book
●
<?xml version="1.0"?>
●
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
●
<xs:element name="books">
●
<xs:complexType><xs:sequence>
●
<xs:element ref="book" maxOccurs="unbounded"/>
●
</xs:sequence></xs:complexType></xs:element>
●
<xs:element name="book">
●
<xs:complexType><xs:sequence>
●
<xs:element name="author" type="xs:string"/>
●
<xs:element name="title" type="xs:string"/>
●
●
</xs:sequence></xs:complexType>
</xs:element></xs:schema>
17
Simple XML Document
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
<?xml version="1.0"?>
<!DOCTYPE books SYSTEM "book.dtd">
<books>
<book>
<authors>
<author>Jane Q. Public</author>
</authors>
<title>Metadata for Everybody</title>
</book>
<book>
<authors>
<author>Marcia Lei Zeng</author>
<author>Jian Qin</author>
</authors>
<title>Metadata</title>
</book>
</books>
18
Encoding DC in HTML
●
HTML documents contain two main parts a
<head> and a <body>, the body is the visible
portion of the webpage, the head contains the
metadata
●
HTML and XHTML share an element set
●
other HTML tags:
http://www.w3schools.com/tags/default.asp
19
DC in HTML Metadata
●
two tags are of interest for storing metadata:
meta and link
●
meta syntax: <meta name="[property name]"
scheme="[value]" content="[value]" />
●
●
meta stores metadata, property name = element
name, a lang attribute can also be added, scheme
is optional
link syntax: <link rel="[property name]"
href="[URI]" />
●
stores references or relationship
20
Examples of DC in HTML: META
●
meta: meta is the tag, name and content are
the attributes
●
<meta name="DC.title" lang="en"
content="Metadata for Everybody" />
●
<meta name="DC.creator" content="Jane Q.
Public" />
●
<meta name="DC.date"
scheme="DCTERMS.W3CDTF"
content="2008" />
21
Examples of DC in HTML: LINK
●
link: link is the tag, rel and href are the
attributes (you may recognise href from the
anchor or <a> tag)
●
<link rel="schema.DC" href="
http://purl.org/dc/elements/1.1/" />
●
a lang tag can also be added
22
More DC Examples
●
Examples of sites that encode DC metadata in
HTML
●
http://dlist.sir.arizona.edu/
●
http://eprints.rclis.org/
●
http://dspace.mit.edu/
23
DC in HTML (excerpt)
●
<link rel="schema.DCTERMS"
href="http://purl.org/dc/terms/" />
●
<link rel="schema.DC"
href="http://purl.org/dc/elements/1.1/" />
●
<meta name="DC.creator" content="Coleman,
Anita Sundaram" xml:lang="en_US" />
●
<meta name="dc.date" content="2004-12"
xml:lang="en_US" />
●
<meta name="DC.format"
content="application/pdf" xml:lang="en_US" />
24
DC in HTML (screenshot)
25
Encoding DC in XML
●
XML provides a formal syntax for describing the
relationships between the entities, elements
and attributes in an XML document [Zeng and
Qin]
●
an XML document consists of a root element,
matching the name of the defined XML schema
or DTD and a set of elements
●
element syntax: <name
attribute="[value]">content</name>
●
may have xml:lang or other attributes
26
Example of Encoding DC in XML
<?xml version="1.0"?>
<metadata
xmlns="http://dublincore.org/schemas/xmls/qdc/2008/02/11/d
c.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://dublincore.org/schemas/xmls/qdc/
2008/02/11/dc.xsd"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/">
<dc:title>Metadata for Everybody</dc:title>
<dc:creator>Jane Q. Public</dc:creator>
<dc:date scheme="DCTERMS.W3CDTF">2008</dc:date>
</metadata>
27
More XML Examples
●
https://pantherfile.uwm.edu/kipp/public/courses
/714/metadataexamples/dcinxmlonlinethesis.xml
●
http://z3950.loc.gov:7090/voyager?
version=1.1&operation=searchRetrieve&query=
dinosaur&startRecord=1&maximumRecords=10
&recordSchema=dc
●
http://export.arxiv.org/oai2?
verb=GetRecord&identifier=oai:arXiv.org:0804.2
273&metadataPrefix=oai_dc
28
DC in XML (excerpt)
<zs:record><zs:recordSchema>info:srw/schema/1/dcv1.1</zs:recordSchema><zs:recordPacking>xml</zs:recordPacking><zs:record
Data><srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://purl.org/dc/elements/1.1/"
xsi:schemaLocation="info:srw/schema/1/dc-schema
http://www.loc.gov/standards/sru/resources/dc-schema.xsd">
<title>3-D dinosaur adventure.</title>
<creator>Knowledge Adventure, Inc.</creator>
<creator>Copyright Collection (Library of Congress) DLC</creator>
<type>software, multimedia</type>
<type>Educational games.</type>
<type>Video games.</type>
<publisher>Glendale, CA : Knowledge Adventure,</publisher>
<date>c1995.</date>
<language>eng</language>
<subject>Dinosaurs--Juvenile software.</subject>
<identifier>URN:ISBN:1569972133</identifier>
</srw_dc:dc></zs:recordData><zs:recordPosition>1</zs:recordPosition></zs:re29
cord>
DC in XML (screenshot)
30
In Class Exercise: DC in XML
●
XML encode the DC record for an item from the class 2
exercise (or another existing DC record) using the full
DCTERMS schema
●
Use the following XML template. Be sure to right click to save.
Do not open directly from browser. Edit with notepad++ or
Oxygen (Never Word).
●
https://pantherfile.uwm.edu/kipp/public/courses/shared/dcinxmlt
emplate-allterms.xml
●
You can delete elements you are not using, but be sure to
delete all of the element from start to end tag.
●
Validate your XML by opening in a browser or use
http://validator.w3.org/
31
MARCXML and MODS
32
Example Record
Metadata by Marcia Lei Zeng and Jian Qin
http://lccn.loc.gov/2008015176
Part of a MARC Record
100 1_ |a Zeng, Marcia Lei, |d 1956245 10 |a Metadata / |c Marcia Lei Zeng and Jian Qin.
260 __ |a New York : |b Neal-Schuman Publishers, |c
c2008.
300 __ |a xvii, 365 p. : |b ill. ; |c 23 cm.
504 __ |a Includes bibliographical references (p. 327-353)
and index.
505 0_ |a Introduction -- Current standards -- Schemas :
structure and semantics -- Schemas : syntax -- Metadata
records -- Metadata services -- Metadata quality
measurement and improvement.
650 _0 |a Metadata.
700 1_ |a Qin, Jian, |d 1956-
MARC leader
01271cam 2200277 a
4500001000900000005001700009008004100026906004500067925004400112
9550166001560100017003220200038003390200035003770400018004120500
0220043008200140045210000290046624500460049526000490054130000350
0590504006400625505018500689650001400874700002200888856008300910
#15258260#20090831125647.0#080411s2008 nyua b 001 0 eng #
#a7#bcbc#corignew#d1#eecip#f20#gy-gencatlg#0 #aacquire#b2 shelf
copies#xpolicy default# #alh39 2008-04-11#ilh39 2008-04-11#elh39 2008-04-11
to Dewey#aaa20 2008-04-15#aps04 2008-06-20 1 copy rec'd., to CIP ver.#flh36
2008-06-27#glh36 2008-06-27 to BCCD# #a 2008015176# #a9781555706357
(pbk. : alk. paper)# #a1555706355 (pbk. : alk. paper)#
#aDLC#cDLC#dDLC#00#aZ666.7#b.Z46 2008#00#a025.3#222#1 #aZeng,
Marcia Lei,#d1956-#10#aMetadata /#cMarcia Lei Zeng and Jian Qin.# #aNew
York :#bNeal-Schuman Publishers,#cc2008.# #axvii, 365 p. :#bill. ;#c23 cm.#
#aIncludes bibliographical references (p. 327-353) and index.#0 #aIntroduction
-- Current standards -- Schemas : structure and semantics -- Schemas : syntax
-- Metadata records -- Metadata services -- Metadata quality measurement and
improvement.# 0#aMetadata.#1 #aQin, Jian,#d1956-#41#3Table of contents
only#uhttp://www.loc.gov/catdir/toc/ecip0816/2008015176.html##

http://www.loc.gov/marc/bibliographic/ecbdlist.html

35
MARC Fixed Fields
●
The leader and fixed fields specify language,
format and explicitly spell out how long each of
the other MARC fields are...
●
take the following chunk from the MARC record:
245004600495260004900541
●
this specifies that the 245 field is 46 characters
long and starts at position 495 in the record
●
the 260 field is 49 characters long and starts at
541
36
Why MARCXML?
●
designed to eliminate the need to specify length
of fields
●
uses XML standard to encode MARC records
●
exact duplicate of variable fields in a MARC
record, does not duplicate the fixed fields as
this information is no longer needed
●
conversion from MARC to MARCXML is exact
(lossless) it is not a crosswalk as all fields can be
exactly represented
37
MARCXML Examples
●
●
http://lccn.loc.gov/2008015176/marcxml
●
<datafield tag="245" ind1="1" ind2="0"><subfield
code="a">Metadata /</subfield><subfield code="c">Marcia Lei
Zeng and Jian Qin.</subfield>
●
</datafield>
●
<datafield tag="260" ind1=" " ind2=" "><subfield code="a">New
York :</subfield><subfield code="b">Neal-Schuman
Publishers,</subfield><subfield code="c">c2008.</subfield>
●
</datafield>
http://apps.appl.cuny.edu:5661/U-CUN01?
version=1.1&operation=searchRetrieve&query=dc.creator=
%22william+faulkner%22&startRecord=1&maximumRecords=10
38
MODS (Metadata Object
Description Schema)
●
MARC allows cataloguers to create complex
records (rich metadata) and provides a much
more expressive element set than Dublin Core,
but is complex to use
●
MODS was designed to allow complex data to
be encoded in a more interoperable format and
to allow existing MARC records to be translated
to other formats
●
http://www.loc.gov/standards/mods/
Features of MODS
●
originates from MARC (inherited semantics and
a subset of fields)
●
uses language based tags rather than numeric
●
regroups similar elements from MARC (e.g.
1XX, 7XX which are both creator fields)
●
uses attributes to refine elements
●
does not assume the use of AACR2 as a
cataloguing standard so allows for introduction
of RDA
MODS Elements
●
MODS is hierarchical, elements may have
subelements
●
MODS has two root elements which may hold
elements (mods and modsCollection)
●
MODS has 20 top level elements which may
have subelements
●
elements and subelements may have attributes
●
Outline: http://www.loc.gov/standards/mods/mods-outline.html
●
Schema: http://www.loc.gov/standards/mods/v3/mods-3-3.xsd
MODS Encoding and Display
●
MODS uses XML to encode the content of a
record but does not specify a display format
(just like RDA which does not specify a display
format only what content should be present)
●
MODS will use XML Stylesheets for formatting
(XSLT)
MODS titleinfo
1. titleInfo
Subelements:
title
subTitle
partNumber
partName
nonSort
Attributes:
ID; xlink; lang; xml:lang; script; transliteration
type (enumerated: abbreviated, translated, alternative, uniform)
authority (see: LOC Authorites)
displayLabel
MODS Excerpt
●
<mods xmlns="http://www.loc.gov/mods/v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.loc.gov/mods/v3
http://www.loc.gov/standards/mods/v3/mods-3-4.xsd"
version="3.4">
●
<titleInfo><title>Metadata</title></titleInfo>
●
<name type="personal"><namePart>Zeng, Marcia
Lei</namePart><namePart type="date">1956</namePart><role><roleTerm type="text"
authority="marcrelator">creator</roleTerm></role></name>
●
<name type="personal"><namePart>Qin,
Jian</namePart><namePart type="date">1956</namePart></name>
●
http://lccn.loc.gov/2008015176/mods
MODS Title Entry
●
<titleInfo><title>Metadata</title></titleInfo>
●
titleInfo is the top level element for holding
information about the title
●
title is a subelement of titleInfo which holds the
title proper (as defined in AACR2)
●
subTitle would hold a sub title
●
partNumber, partName etc. would handle
chapters, and other portions of a whole work
MODS Name Entry
●
<name type="personal"><namePart>Zeng,
Marcia Lei</namePart><namePart
type="date">1956-</namePart>
●
<role><roleTerm type="text"
authority="marcrelator">creator</roleTerm></rol
e>
●
</name>
●
two parts: name and role
●
name: creator name and dates
●
role: indicates if this creator was the main or an
added entry
MODS Subject Entry
●
<subject authority="lcsh">
●
<topic>Metadata</topic>
●
</subject>
●
indicates that the subject is an LCSH subject
heading
MODS, MARC and DC Structure
●
MODS has a hierarchical structure, DC is flat
●
DC used by the Open Archives Initiative
Protocol for Metadata Harvesting (OAI-PMH)
●
MODS is the format used by the Metadata
Encoding and Transmission Standard (METS)
and large scale web archiving projects like the
Library of Congress American Memory Project
●
common goal is making information
discoverable
MARC 245 to MODS titleInfo
●
245 $a$f$g$k <title> with no <titleInfo> type
attribute and
●
245 $b <subTitle>
●
245 $n (and $f$g$k following $n)
<partNumber>
●
245 $p (and $f$g$k following $p)
<partName>
●
245 ind2 is not 0 <nonSort> around characters
excluded from sort as indicated in indicator
value
MODS Examples
●
https://pantherfile.uwm.edu/kipp/public/courses
/714/metadataexamples/modsinxml-onlinethesis.xml
●
http://www.americanhistoryonline.org/sru?
operation=searchRetrieve&version=1.1&query=dog&r
ecordSyntax=mods&maximumRecords=10
●
http://z3950.loc.gov:7090/voyager?
version=1.1&operation=searchRetrieve&query=dinosa
ur&startRecord=1&maximumRecords=10&recordSche
ma=mods
In Class Exercise: MARCXML or
MODS
●
Encode one of the records from the metadata creation
exercise in week 2 in MODS.
●
Use the following record as a template. Be sure to right
click to save. Edit with notepad++ or Oxygen (Never
Word).
●
https://pantherfile.uwm.edu/kipp/public/courses
/714/metadataexamples/modsinxml-onlinethesis.xml
●
You can simply replace the existing values with your
record.
Download