Homework #2

advertisement
LIS 7410
Metadata Standards and Interoperability Study
Written by: Jamie Rivers
Dr. Wu; Spring 2013
Rivers, Jamie
LIS 7410 Homework 2
1
Introduction
One of the dilemmas inherent in the rise of digital libraries is the need to describe a variety
of resources in a variety of formats and make them easily accessible. Traditionally, libraries
have relied on MARC records and the AACR2 descriptive framework for metadata needs.
Unfortunately, no other professions or organizations use these organizational tools, making
their use inadequate for digital libraries that are creating electronic bibliographic metadata.
Reese explains that the “days of a homogenous bibliographic standard for all content are
coming to an end as more specialized descriptive formats are needed to describe the various
types of materials being produced today and into the future” (86). Two metadata schemas
which have experienced a great degree of success, and allow for the discovery of nontraditional materials via a variety of access points, are the Dublin Core (DC) and the Metadata
Object Description Schema (MODS). This paper will explore the functionality of DC, including
the role it plays in resource descriptions. Attention will be given to how DC compares with
MODS in interoperability and functionality.
The Dublin Core
According to Chen and Reilly, Dublin Core (DC) is “the most widely used descriptive metadata
schema among digital libraries” (86). For those who are creating or studying digital libraries,
understanding Dublin Core is fundamental due to its prominence in the field of digital
metadata. The DC arose out of concerns about the difficulties of locating educational materials
on the Internet. As a result of these concerns, the Online Computer Library Center (OCLC) and
the National Center for Supercomputing Applications (NCSA) met in Dublin, OH to focus on
three goals:
1. Decide what descriptive elements would be needed to promote findability of
all documents on the Web.
2. Explore how to create a solution that would be flexible for past, present, and
future online publication on the Web.
3. Explore how to promote the usage of such a solution if it exists (Reese, 124).
This meeting resulted in a set of 13 (later extended to 15) “metadata elements that are
designed specifically for non-specialist use” and are intended “for the description of electronic
materials” such as a Web pages or sites (Witten, 294). Appendix A provides a list of the Dublin
Core metadata standards and their definitions for reference.
Metadata for electronic sources must be well-organized and complete in order to provide
accessibility to available resources. Yasser explains that when metadata does not fully describe
a resource, the resource is all but invisible to potential users. Although a digital library may be
skillfully compiled, “poorly created metadata would corrupt services offered by the library”
(52). The purpose of digital libraries is to provide fast and reliable access to unique and useful
sources of information. If “findability” is hindered by poor metadata, it does not matter how
amazing the resources contained in the library are. The resources will not be found; therefore,
Rivers, Jamie
LIS 7410 Homework 2
2
they will not be used. Dublin Core’s 15 metadata elements comprise the Unqualified Dublin
Core and allow for the description of resources in a simple and concise manner. The Unqualified
Dublin Core can, however, be refined by qualifiers (Qualified Dublin Core) allowing DC to be
customized based on local needs. Using qualifiers allows for a great deal of specificity within
the DC. For example, the “date” element can be qualified to “date.available”, “date.issued”
and “date. copyright”. This gives creators of metadata the ability to design a wide variety of
access points for resources while maintaining compatibility within the DC format “because
metadata elements can always be reduced down to the core 15 unqualified elements” (Reese,
128). The Dublin Core Metadata Initiative (DCMI) lists suggested refinements at
http://dublincore.org/documents/usageguide/qualifiers.shtml for the convenience of users.
The greater the degree of specificity and description metadata creators can provide, the greater
the chance that users will find important resources. DC not only allows for the refinement of
elements, it also allows elements to be repeated as many times as necessary. Additionally, all
DC elements are optional and may be presented in any order (Understanding Metadata, 3). This
flexibility truly allows digital libraries to customize metadata to their own personal needs.
Another significant aspect of DC’s format is its use of Extensible Metadata Language (XML)
syntax. Metadata created from binary formats, such as MARC 21, are difficult to interpret by
laypersons. Figure 1 is an example of a MARC 21 record provided by the Library of Congress.
Only someone trained in the usage of MARC records could effectively use this data.
Figure 1: Sample MARC 21 metadata for the book “Fifty Years of Television”
*****nam##22*****#a#4500
<control number>
<control number identifier>
19920331092212.7
ta
LDR
001
003
005
007
008
020
020
040
050
082
100
245
820305 s1991#### nyu#### ###### #001#0# eng##
##
##
##
14
04
1#
10
246 1#
260 ##
300 ##
$a0845348116 :$c$29.95 (£19.50 U.K.)
$a0845348205 (pbk.)
$a<organization code>$c<organization code>
$aPN1992.8.S4$bT47 1991
$a791.45/75/0973$219
$aTerrace, Vincent,$d1948$aFifty years of television :$ba guide to series and pilots,
1937-1988 /$cVincent Terrace.
$a50 years of television
$aNew York :$bCornwall Books,$cc1991.
$a864 p. ;$c24 cm.
Source: http://lib2.dss.go.th/elib/marc21/book.html
Rivers, Jamie
LIS 7410 Homework 2
3
An example of DC metadata can be compared using Figure 2. It is important to note that the
XML-based schema clearly links elements such as “title” and “publisher” to appropriate
information. Users do not necessarily need to be metadata experts to interpret the DC
metadata.
Figure 2: Sample Dublin Core metadata for a fictitious website
<?xml version=”1.0”?>
<metadata
xmlns=”http://example.org/myapp/”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation=”http://example.org/myapp/
http://example.org/myapp/schema.xsd”
xmlns:dc=”http://purl.org/dc/elements/1.1/”>
<dc:title>
UKOLN
</dc:title>
<dc:description>
UKOLN is a national focus of expertise in digital information
management. It provides policy, research and awareness services
to the UK library, information and cultural heritage communities.
UKOLN is based at the University of Bath.
</dc:description>
<dc:publisher>
UKOLN, University of Bath
</dc:publisher>
<dc:identifier>
http://www.ukoln.ac.uk/
</dc:identifier>
</metadata>
Source: http://dublincore.org/documents/dc-xml-guidelines/
According to Reese, the use of XML serves three important functions. “XML (1) makes data
more transparent, (2) makes the data less susceptible to data corruption, and (3) reduces the
likelihood of data lockup (97). Because a variety of metadata schemas use XML, the ability to
share and exchange bibliographic information increases with their use. Chen and Reilly use
METS, DC, and MIX as an example. Since each of these schemas uses XML, they can be
exchanged, “making it possible to implement the blending of DC records and MIX records into
one METS record” (87). If metadata is created in XML, whether it is structural, administrative,
or descriptive, it can be easily merged together into a cohesive whole or shared with other
XML systems. Dublin Core’s use of XML provides many present and future benefits when
compared to other non-XML formats.
Who is using Dublin Core? According to the publication Understanding Metadata, there “are
hundreds of projects worldwide that use the Dublin Core either for cataloging or to collect data
from the Internet” (3). A list of projects and organizations that currently use DC can be found at
http://dublincore.org/projects/index.shtml. Because of DC’s flexibility, it has become “one of
Rivers, Jamie
LIS 7410 Homework 2
4
the most widely used metadata schemas within the library community” and is widely used by
digital platforms such as DSpace and CONTENTdm (Reese, 128). Additionally, many library
protocols such as SRU and Z39.50 now “provide standard methods for utilizing Dublin Core as
the metadata capture language” (129). Although DC lacks the standardization that librarians
have traditionally valued, Dublin Core’s flexibility has proven to be a very valuable descriptive
tool for the wide variety of resources that must be accessed electronically. It is being used
internationally, making DC the metadata schema with the greatest influence when describing
resources that will be available via the World Wide Web.
Metadata Object Description Schema (MODS) and the Dublin Core (DC)
MODS was created in 2002 in order to “carry selected data from existing MARC 21 records as
well as to enable the creation of original resource description records”
(http://www.loc.gov/standards/mods/mods-overview.html). Whereas MARC fields use numeric
tags, MODS uses XML to create language-based tags similar to the Dublin Core. MODS allows
for a full description of electronic resources because its “elements are richer than the Dublin
Core; its elements are more compatible with library data. . .and it is simpler to apply than the
full MARC 21 bibliographic format” (Understanding Metadata, 6). Not only does MODS offer 20
top-level elements, it also allows for the application of attributes to these elements. Figure 3
provides an example of how the top-level element “name” can be enhanced by the addition of
detailed attributes.
Figure 3: Sample MODS metadata for the “name” element
<name type="personal">
<namePart type="given">Jack</namePart>
<namePart type="family">May</namePart>
<namePart type="termsOfAddress">I</namePart>
<description lang="eng">District Commissioner</description>
<description lang="fre">Préfet de région</description>
</name>
Source: http://www.loc.gov/standards/mods/userguide/generalapp.html
In addition to the granularity provided by MODS, the above example demonstrates how it
“provides records with highly structured data, whereas simple Dublin Core is flat” (Gunning, 9).
Qualified Dublin Core allows for the refinement of metadata; however, richness of description
was not the primary intention of the schema. Whereas DC originated as a simplistic metadata
scheme, MODS originated as a more detail oriented alternative to schemas like the DC. Witten
suggests that “MODS allows richer descriptions than Dublin Core and is a useful alternative
when qualified Dublin Core is inadequate for describing your resources” (297).
MODS enjoys the advantages of schemas which are based on XML syntax. Because both
Dublin Core and MODS are XML based systems, they are more interoperable than many other
Rivers, Jamie
LIS 7410 Homework 2
5
metadata schemas. Interoperability is the “ability of various systems and organizations to be
linked and to share data” (Gunning, 7-8). When metadata is interoperable it is not only more
likely to be used outside of a local context, it is easier to migrate to new or future systems. The
“MODS to Dublin Core Metadata Element Set Mapping Version 3” provides element
equivalencies “for converting a MODS record to Dublin Core”
(http://www.loc.gov/standards/mods/mods-dcsimple.html). Because MODS contains more
elements than Dublin Core, there are instances where more than one MODS element is the
equivalent of one DC element. It is in such circumstances that some metadata may be lost in
conversion. For example, the three MODS elements “abstract, note, and Table of Contents” are
only represented by the element “description” in Dublin Core. It is possible to lose the Table of
Contents if moving to DC. Likewise, the “Dublin Core Metadata Element Set Mapping to MODS
Version 3” provides information to “be used for converting a Dublin Core record to MODS”
(http://www.loc.gov/standards/mods/dcsimple-mods.html). Whereas converting from MODS
to DC results in some data loss, conversions from DC to MODS may need to be refined to more
specific sub-elements. Because there are multiple elements in MODS, many DC elements must
be defaulted before conversion can successfully occur. For example, the DC element “format”
has three equivalents in MODS:
<physicalDescription><internetMediaType>
<physicalDescription><extent>
<physicalDescription><form>.
The Library of Congress’s mapping guide tells users to default the DC value to
“<physicalDescription><form>” in order to avoid data loss. Although this can be labor
intensive, it is still much easier than migrating from a binary system such as MARC to a textbased XML schema.
Both Dublin Core and MODS have their advantages and disadvantages. Dublin Core is a
flexible general metadata schema with world-wide acceptance. This international support
results in dialogue, consensus, and “strong leadership in the shape of the Dublin Core Initiative”
(Reese, 129). The refinements provided by the Qualified Dublin Core allow users to customize
metadata, resulting in often-necessary granularity. Unfortunately, many of the DC elements
are ambiguous, resulting in confusion about how to best apply terms and “inconsistencies
within certain element fields” (Gunning, 11). Although the simplicity of Dublin Core is one of its
strengths, it is also a weakness. Because DC elements lack granularity, “a great deal of data,
both real and contextual, is lost when data needs to be moved between metadata formats”
(Reese, 129). MODS, on the other hand, has an expanded element set plus refinement options.
This results in a very descriptive and detailed schema. This is not only an advantage for
descriptive metadata, many of the elements such as the “physical description” element can
function as a “check-sum”. For example, Gunning explains that if a record shows “a document
is 67KB, but the document is actually 38KB, or an item indicated in the table of contents is not
present in the actual document, the document’s integrity may have been compromised” (11).
The ability of MODS to be used for administrative metadata is a strength not fully supported by
DC. MODS’s primary weakness is its “need to maintain compatibility with the library
community’s MARC legacy data” (Reese, 133). Because it was created as a sort of crosswalk
between MARC and XML schemas, MODS must retain its ability to reorganize MARC elements.
Rivers, Jamie
LIS 7410 Homework 2
6
This purpose has the potential to constrain MODS, keeping it from becoming the international
power that Dublin Core has become.
Conclusion
In conclusion, those interested in creating digital collections must also be concerned with
how the materials in their collections will be described. Simply stated, resources that cannot be
located cannot be used. Metadata schemas such as the Dublin Core and the Metadata Object
Description Schema provide formats for the creation and sharing of metadata. Reese reminds
us that “we find ourselves in a place where multiple standard formats have been developed to
allow description to happen in the schema best suited for a particular material type” (118).
Digital librarians must analyze the needs of their collections and the needs of their users. These
needs must then be compared to the strengths and weaknesses of the available metadata
formats before a final decision is made. Whether implementers choose MARC, Dublin Core,
MODS, TEI, EAD, or any of the other schemas, the most important factor to consider is whether
or not the descriptive elements will allow users to locate resources. If the answer to that
singular question is “no” then another metadata format should be considered. As
demonstrated by the discussions of Dublin Core and MODS, each schema offers benefits and
shortcomings. The benefits of a clear and descriptive metadata schema, however, will always
outweigh the schema’s limitations.
Rivers, Jamie
LIS 7410 Homework 2
7
Appendix A:
Dublin Core Metadata Elements
Source: http://dublincore.org/documents/dces/
Term Name: contributor
Definition:
An entity responsible for making contributions to the resource.
Comment:
Examples of a Contributor include a person, an organization, or a service.
Term Name: coverage
Definition:
The spatial or temporal topic of the resource, the spatial applicability of the
resource, or the jurisdiction under which the resource is relevant.
Comment:
Spatial topic may be a location specified by its geographic coordinates. Temporal
topic may be a named period, date, or date range. A jurisdiction may be a named
administrative entity or place. Recommended best practice is to use a controlled vocabulary
such as the Thesaurus of Geographic Names [TGN].
Term Name: creator
Definition: An entity primarily responsible for making the resource.
Comment: Examples of a Creator include a person, an organization, or a service. Typically, the
name of a Creator should be used to indicate the entity.
Term Name: date
Definition: A point or period of time associated with an event in the lifecycle of the resource.
Comment: Date may be used to express temporal information at any level of granularity.
Term Name: description
Definition: An account of the resource.
Comment: Description may include but is not limited to: an abstract, a table of contents, a
graphical representation, or a free-text account of the resource.
Term Name: format
Definition: The file format, physical medium, or dimensions of the resource.
Comment: Examples of dimensions include size and duration. Recommended best practice is
to use a controlled vocabulary such as the list of Internet Media Types [MIME].
Term Name: identifier
Definition: An unambiguous reference to the resource within a given context.
Comment: Recommended best practice is to identify the resource by means of a string
conforming to a formal identification system.
Term Name: language
Definition: A language of the resource.
Comment: Recommended best practice is to use a controlled vocabulary such as RFC 4646.
Rivers, Jamie
LIS 7410 Homework 2
Term Name: publisher
Definition: An entity responsible for making the resource available.
Comment: Examples of a Publisher include a person, an organization, or a service. Typically,
the name of a Publisher should be used to indicate the entity.
Term Name: relation
Definition:
A related resource.
Comment:
Recommended best practice is to identify the related resource by means of a
string conforming to a formal identification system.
Term Name: rights
Definition: Information about rights held in and over the resource.
Comment: Typically, rights information includes a statement about various property rights
associated with the resource, including intellectual property rights.
Term Name: source
Definition: A related resource from which the described resource is derived.
Comment: The described resource may be derived from the related resource in whole or in
part. Recommended best practice is to identify the related resource by means of a string
conforming to a formal identification system.
Term Name: subject
Definition: The topic of the resource.
Comment: Typically, the subject will be represented using keywords, key phrases, or
classification codes. Recommended best practice is to use a controlled vocabulary.
Term Name: title
Definition: A name given to the resource.
Comment: Typically, a Title will be a name by which the resource is formally known.
Term Name: type
Definition: The nature or genre of the resource.
Comment: Recommended best practice is to use a controlled vocabulary such as the DCMI
Type Vocabulary. To describe the file format, physical medium, or dimensions of the resource,
use the Format element.
8
Rivers, Jamie
LIS 7410 Homework 2
9
Works Cited
Chen, Mingyu and Michele Reilly. “Implementing METS, MIX, and DC for Sustaining Digital
Preservation at the University of Houston Libraries.” Journal of Library Metadata 11.2
(2011): 83-99. Web. 10 Feb. 2013.
http://www.lib.lsu.edu.libezp.lib.lsu.edu/apps/onoffcampus.php?url=http://search.ebsc
ohost.com.libezp.lib.lsu.edu/login.aspx?direct=true&db=lih&AN=60828147&site=ehostlive&scope=site
DCMI Home: Dublin Core Metadata Initiative. DCMI, Feb. 1, 2013. Web. 10 Feb. 2013.
http://dublincore.org/
Gunning, Tashina. “Metadata Creation at Institutional Repositories.” PNLA Quarterly 75.4
(2011): 5-18. Web. 13 Feb. 2013.
http://www.lib.lsu.edu.libezp.lib.lsu.edu/apps/onoffcampus.php?url=http://search.ebsc
ohost.com.libezp.lib.lsu.edu/login.aspx?direct=true&db=lih&AN=67047604&site=ehostlive&scope=site
Metadata Object Description Schema: MODS. The Library of Congress, Feb. 14, 2013. Web. 15
Feb. 2013. http://www.loc.gov/standards/mods/
Reese, Terry and Kyle Banerjee. Building Digital Libraries: A How-To-Do-It Manual. New York:
Neal-Schuman Publishers, Inc., 2008. Print.
Understanding Metadata. Bethesda, MD: NISO Press, 2004. Web.
http://www.niso.org/publications/press/UnderstandingMetadata.pdf
Witten, Ian H., David Bainbridge, and David M. Nichols. How to Build a Digital Library. 2nd ed.
Burlington, MA: Elsevier, 2010. Print.
Yasser, Chutter M. “An Analysis of Problems in Metadata Records.” Journal of Library Metadata
11.2 (2011): 51-62. Web. 11 Feb. 2013.
http://www.lib.lsu.edu.libezp.lib.lsu.edu/apps/onoffcampus.php?url=http://search.ebsc
ohost.com.libezp.lib.lsu.edu/login.aspx?direct=true&db=lih&AN=60828145&site=ehostlive&scope=site
Download