LIS 7410 Metadata Standards and Interoperability Study Written by: Jamie Rivers Dr. Wu; Spring 2013 Rivers, Jamie LIS 7410 Homework 2 1 Introduction One of the dilemmas inherent in the rise of digital libraries is the need to describe a variety of resources in a variety of formats and make them easily accessible. Traditionally, libraries have relied on MARC records and the AACR2 descriptive framework for metadata needs. Unfortunately, no other professions or organizations use these organizational tools, making their use inadequate for digital libraries that are creating electronic bibliographic metadata. Reese explains that the “days of a homogenous bibliographic standard for all content are coming to an end as more specialized descriptive formats are needed to describe the various types of materials being produced today and into the future” (86). Two metadata schemas which have experienced a great degree of success, and allow for the discovery of nontraditional materials via a variety of access points, are the Dublin Core (DC) and the Metadata Object Description Schema (MODS). This paper will explore the functionality of DC, including the role it plays in resource descriptions. Attention will be given to how DC compares with MODS in interoperability and functionality. The Dublin Core According to Chen and Reilly, Dublin Core (DC) is “the most widely used descriptive metadata schema among digital libraries” (86). For those who are creating or studying digital libraries, understanding Dublin Core is fundamental due to its prominence in the field of digital metadata. The DC arose out of concerns about the difficulties of locating educational materials on the Internet. As a result of these concerns, the Online Computer Library Center (OCLC) and the National Center for Supercomputing Applications (NCSA) met in Dublin, OH to focus on three goals: 1. Decide what descriptive elements would be needed to promote findability of all documents on the Web. 2. Explore how to create a solution that would be flexible for past, present, and future online publication on the Web. 3. Explore how to promote the usage of such a solution if it exists (Reese, 124). This meeting resulted in a set of 13 (later extended to 15) “metadata elements that are designed specifically for non-specialist use” and are intended “for the description of electronic materials” such as a Web pages or sites (Witten, 294). Appendix A provides a list of the Dublin Core metadata standards and their definitions for reference. Metadata for electronic sources must be well-organized and complete in order to provide accessibility to available resources. Yasser explains that when metadata does not fully describe a resource, the resource is all but invisible to potential users. Although a digital library may be skillfully compiled, “poorly created metadata would corrupt services offered by the library” (52). The purpose of digital libraries is to provide fast and reliable access to unique and useful sources of information. If “findability” is hindered by poor metadata, it does not matter how amazing the resources contained in the library are. The resources will not be found; therefore, Rivers, Jamie LIS 7410 Homework 2 2 they will not be used. Dublin Core’s 15 metadata elements comprise the Unqualified Dublin Core and allow for the description of resources in a simple and concise manner. The Unqualified Dublin Core can, however, be refined by qualifiers (Qualified Dublin Core) allowing DC to be customized based on local needs. Using qualifiers allows for a great deal of specificity within the DC. For example, the “date” element can be qualified to “date.available”, “date.issued” and “date. copyright”. This gives creators of metadata the ability to design a wide variety of access points for resources while maintaining compatibility within the DC format “because metadata elements can always be reduced down to the core 15 unqualified elements” (Reese, 128). The Dublin Core Metadata Initiative (DCMI) lists suggested refinements at http://dublincore.org/documents/usageguide/qualifiers.shtml for the convenience of users. The greater the degree of specificity and description metadata creators can provide, the greater the chance that users will find important resources. DC not only allows for the refinement of elements, it also allows elements to be repeated as many times as necessary. Additionally, all DC elements are optional and may be presented in any order (Understanding Metadata, 3). This flexibility truly allows digital libraries to customize metadata to their own personal needs. Another significant aspect of DC’s format is its use of Extensible Metadata Language (XML) syntax. Metadata created from binary formats, such as MARC 21, are difficult to interpret by laypersons. Figure 1 is an example of a MARC 21 record provided by the Library of Congress. Only someone trained in the usage of MARC records could effectively use this data. Figure 1: Sample MARC 21 metadata for the book “Fifty Years of Television” *****nam##22*****#a#4500 <control number> <control number identifier> 19920331092212.7 ta LDR 001 003 005 007 008 020 020 040 050 082 100 245 820305 s1991#### nyu#### ###### #001#0# eng## ## ## ## 14 04 1# 10 246 1# 260 ## 300 ## $a0845348116 :$c$29.95 (£19.50 U.K.) $a0845348205 (pbk.) $a<organization code>$c<organization code> $aPN1992.8.S4$bT47 1991 $a791.45/75/0973$219 $aTerrace, Vincent,$d1948$aFifty years of television :$ba guide to series and pilots, 1937-1988 /$cVincent Terrace. $a50 years of television $aNew York :$bCornwall Books,$cc1991. $a864 p. ;$c24 cm. Source: http://lib2.dss.go.th/elib/marc21/book.html Rivers, Jamie LIS 7410 Homework 2 3 An example of DC metadata can be compared using Figure 2. It is important to note that the XML-based schema clearly links elements such as “title” and “publisher” to appropriate information. Users do not necessarily need to be metadata experts to interpret the DC metadata. Figure 2: Sample Dublin Core metadata for a fictitious website <?xml version=”1.0”?> <metadata xmlns=”http://example.org/myapp/” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://example.org/myapp/ http://example.org/myapp/schema.xsd” xmlns:dc=”http://purl.org/dc/elements/1.1/”> <dc:title> UKOLN </dc:title> <dc:description> UKOLN is a national focus of expertise in digital information management. It provides policy, research and awareness services to the UK library, information and cultural heritage communities. UKOLN is based at the University of Bath. </dc:description> <dc:publisher> UKOLN, University of Bath </dc:publisher> <dc:identifier> http://www.ukoln.ac.uk/ </dc:identifier> </metadata> Source: http://dublincore.org/documents/dc-xml-guidelines/ According to Reese, the use of XML serves three important functions. “XML (1) makes data more transparent, (2) makes the data less susceptible to data corruption, and (3) reduces the likelihood of data lockup (97). Because a variety of metadata schemas use XML, the ability to share and exchange bibliographic information increases with their use. Chen and Reilly use METS, DC, and MIX as an example. Since each of these schemas uses XML, they can be exchanged, “making it possible to implement the blending of DC records and MIX records into one METS record” (87). If metadata is created in XML, whether it is structural, administrative, or descriptive, it can be easily merged together into a cohesive whole or shared with other XML systems. Dublin Core’s use of XML provides many present and future benefits when compared to other non-XML formats. Who is using Dublin Core? According to the publication Understanding Metadata, there “are hundreds of projects worldwide that use the Dublin Core either for cataloging or to collect data from the Internet” (3). A list of projects and organizations that currently use DC can be found at http://dublincore.org/projects/index.shtml. Because of DC’s flexibility, it has become “one of Rivers, Jamie LIS 7410 Homework 2 4 the most widely used metadata schemas within the library community” and is widely used by digital platforms such as DSpace and CONTENTdm (Reese, 128). Additionally, many library protocols such as SRU and Z39.50 now “provide standard methods for utilizing Dublin Core as the metadata capture language” (129). Although DC lacks the standardization that librarians have traditionally valued, Dublin Core’s flexibility has proven to be a very valuable descriptive tool for the wide variety of resources that must be accessed electronically. It is being used internationally, making DC the metadata schema with the greatest influence when describing resources that will be available via the World Wide Web. Metadata Object Description Schema (MODS) and the Dublin Core (DC) MODS was created in 2002 in order to “carry selected data from existing MARC 21 records as well as to enable the creation of original resource description records” (http://www.loc.gov/standards/mods/mods-overview.html). Whereas MARC fields use numeric tags, MODS uses XML to create language-based tags similar to the Dublin Core. MODS allows for a full description of electronic resources because its “elements are richer than the Dublin Core; its elements are more compatible with library data. . .and it is simpler to apply than the full MARC 21 bibliographic format” (Understanding Metadata, 6). Not only does MODS offer 20 top-level elements, it also allows for the application of attributes to these elements. Figure 3 provides an example of how the top-level element “name” can be enhanced by the addition of detailed attributes. Figure 3: Sample MODS metadata for the “name” element <name type="personal"> <namePart type="given">Jack</namePart> <namePart type="family">May</namePart> <namePart type="termsOfAddress">I</namePart> <description lang="eng">District Commissioner</description> <description lang="fre">Préfet de région</description> </name> Source: http://www.loc.gov/standards/mods/userguide/generalapp.html In addition to the granularity provided by MODS, the above example demonstrates how it “provides records with highly structured data, whereas simple Dublin Core is flat” (Gunning, 9). Qualified Dublin Core allows for the refinement of metadata; however, richness of description was not the primary intention of the schema. Whereas DC originated as a simplistic metadata scheme, MODS originated as a more detail oriented alternative to schemas like the DC. Witten suggests that “MODS allows richer descriptions than Dublin Core and is a useful alternative when qualified Dublin Core is inadequate for describing your resources” (297). MODS enjoys the advantages of schemas which are based on XML syntax. Because both Dublin Core and MODS are XML based systems, they are more interoperable than many other Rivers, Jamie LIS 7410 Homework 2 5 metadata schemas. Interoperability is the “ability of various systems and organizations to be linked and to share data” (Gunning, 7-8). When metadata is interoperable it is not only more likely to be used outside of a local context, it is easier to migrate to new or future systems. The “MODS to Dublin Core Metadata Element Set Mapping Version 3” provides element equivalencies “for converting a MODS record to Dublin Core” (http://www.loc.gov/standards/mods/mods-dcsimple.html). Because MODS contains more elements than Dublin Core, there are instances where more than one MODS element is the equivalent of one DC element. It is in such circumstances that some metadata may be lost in conversion. For example, the three MODS elements “abstract, note, and Table of Contents” are only represented by the element “description” in Dublin Core. It is possible to lose the Table of Contents if moving to DC. Likewise, the “Dublin Core Metadata Element Set Mapping to MODS Version 3” provides information to “be used for converting a Dublin Core record to MODS” (http://www.loc.gov/standards/mods/dcsimple-mods.html). Whereas converting from MODS to DC results in some data loss, conversions from DC to MODS may need to be refined to more specific sub-elements. Because there are multiple elements in MODS, many DC elements must be defaulted before conversion can successfully occur. For example, the DC element “format” has three equivalents in MODS: <physicalDescription><internetMediaType> <physicalDescription><extent> <physicalDescription><form>. The Library of Congress’s mapping guide tells users to default the DC value to “<physicalDescription><form>” in order to avoid data loss. Although this can be labor intensive, it is still much easier than migrating from a binary system such as MARC to a textbased XML schema. Both Dublin Core and MODS have their advantages and disadvantages. Dublin Core is a flexible general metadata schema with world-wide acceptance. This international support results in dialogue, consensus, and “strong leadership in the shape of the Dublin Core Initiative” (Reese, 129). The refinements provided by the Qualified Dublin Core allow users to customize metadata, resulting in often-necessary granularity. Unfortunately, many of the DC elements are ambiguous, resulting in confusion about how to best apply terms and “inconsistencies within certain element fields” (Gunning, 11). Although the simplicity of Dublin Core is one of its strengths, it is also a weakness. Because DC elements lack granularity, “a great deal of data, both real and contextual, is lost when data needs to be moved between metadata formats” (Reese, 129). MODS, on the other hand, has an expanded element set plus refinement options. This results in a very descriptive and detailed schema. This is not only an advantage for descriptive metadata, many of the elements such as the “physical description” element can function as a “check-sum”. For example, Gunning explains that if a record shows “a document is 67KB, but the document is actually 38KB, or an item indicated in the table of contents is not present in the actual document, the document’s integrity may have been compromised” (11). The ability of MODS to be used for administrative metadata is a strength not fully supported by DC. MODS’s primary weakness is its “need to maintain compatibility with the library community’s MARC legacy data” (Reese, 133). Because it was created as a sort of crosswalk between MARC and XML schemas, MODS must retain its ability to reorganize MARC elements. Rivers, Jamie LIS 7410 Homework 2 6 This purpose has the potential to constrain MODS, keeping it from becoming the international power that Dublin Core has become. Conclusion In conclusion, those interested in creating digital collections must also be concerned with how the materials in their collections will be described. Simply stated, resources that cannot be located cannot be used. Metadata schemas such as the Dublin Core and the Metadata Object Description Schema provide formats for the creation and sharing of metadata. Reese reminds us that “we find ourselves in a place where multiple standard formats have been developed to allow description to happen in the schema best suited for a particular material type” (118). Digital librarians must analyze the needs of their collections and the needs of their users. These needs must then be compared to the strengths and weaknesses of the available metadata formats before a final decision is made. Whether implementers choose MARC, Dublin Core, MODS, TEI, EAD, or any of the other schemas, the most important factor to consider is whether or not the descriptive elements will allow users to locate resources. If the answer to that singular question is “no” then another metadata format should be considered. As demonstrated by the discussions of Dublin Core and MODS, each schema offers benefits and shortcomings. The benefits of a clear and descriptive metadata schema, however, will always outweigh the schema’s limitations. Rivers, Jamie LIS 7410 Homework 2 7 Appendix A: Dublin Core Metadata Elements Source: http://dublincore.org/documents/dces/ Term Name: contributor Definition: An entity responsible for making contributions to the resource. Comment: Examples of a Contributor include a person, an organization, or a service. Term Name: coverage Definition: The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant. Comment: Spatial topic may be a location specified by its geographic coordinates. Temporal topic may be a named period, date, or date range. A jurisdiction may be a named administrative entity or place. Recommended best practice is to use a controlled vocabulary such as the Thesaurus of Geographic Names [TGN]. Term Name: creator Definition: An entity primarily responsible for making the resource. Comment: Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity. Term Name: date Definition: A point or period of time associated with an event in the lifecycle of the resource. Comment: Date may be used to express temporal information at any level of granularity. Term Name: description Definition: An account of the resource. Comment: Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource. Term Name: format Definition: The file format, physical medium, or dimensions of the resource. Comment: Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME]. Term Name: identifier Definition: An unambiguous reference to the resource within a given context. Comment: Recommended best practice is to identify the resource by means of a string conforming to a formal identification system. Term Name: language Definition: A language of the resource. Comment: Recommended best practice is to use a controlled vocabulary such as RFC 4646. Rivers, Jamie LIS 7410 Homework 2 Term Name: publisher Definition: An entity responsible for making the resource available. Comment: Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity. Term Name: relation Definition: A related resource. Comment: Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. Term Name: rights Definition: Information about rights held in and over the resource. Comment: Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights. Term Name: source Definition: A related resource from which the described resource is derived. Comment: The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. Term Name: subject Definition: The topic of the resource. Comment: Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. Term Name: title Definition: A name given to the resource. Comment: Typically, a Title will be a name by which the resource is formally known. Term Name: type Definition: The nature or genre of the resource. Comment: Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary. To describe the file format, physical medium, or dimensions of the resource, use the Format element. 8 Rivers, Jamie LIS 7410 Homework 2 9 Works Cited Chen, Mingyu and Michele Reilly. “Implementing METS, MIX, and DC for Sustaining Digital Preservation at the University of Houston Libraries.” Journal of Library Metadata 11.2 (2011): 83-99. Web. 10 Feb. 2013. http://www.lib.lsu.edu.libezp.lib.lsu.edu/apps/onoffcampus.php?url=http://search.ebsc ohost.com.libezp.lib.lsu.edu/login.aspx?direct=true&db=lih&AN=60828147&site=ehostlive&scope=site DCMI Home: Dublin Core Metadata Initiative. DCMI, Feb. 1, 2013. Web. 10 Feb. 2013. http://dublincore.org/ Gunning, Tashina. “Metadata Creation at Institutional Repositories.” PNLA Quarterly 75.4 (2011): 5-18. Web. 13 Feb. 2013. http://www.lib.lsu.edu.libezp.lib.lsu.edu/apps/onoffcampus.php?url=http://search.ebsc ohost.com.libezp.lib.lsu.edu/login.aspx?direct=true&db=lih&AN=67047604&site=ehostlive&scope=site Metadata Object Description Schema: MODS. The Library of Congress, Feb. 14, 2013. Web. 15 Feb. 2013. http://www.loc.gov/standards/mods/ Reese, Terry and Kyle Banerjee. Building Digital Libraries: A How-To-Do-It Manual. New York: Neal-Schuman Publishers, Inc., 2008. Print. Understanding Metadata. Bethesda, MD: NISO Press, 2004. Web. http://www.niso.org/publications/press/UnderstandingMetadata.pdf Witten, Ian H., David Bainbridge, and David M. Nichols. How to Build a Digital Library. 2nd ed. Burlington, MA: Elsevier, 2010. Print. Yasser, Chutter M. “An Analysis of Problems in Metadata Records.” Journal of Library Metadata 11.2 (2011): 51-62. Web. 11 Feb. 2013. http://www.lib.lsu.edu.libezp.lib.lsu.edu/apps/onoffcampus.php?url=http://search.ebsc ohost.com.libezp.lib.lsu.edu/login.aspx?direct=true&db=lih&AN=60828145&site=ehostlive&scope=site