Enhancing the Quality of Metadata: Modular Approach to Digital Resource Lifecycle Management

advertisement
Enhancing the Quality of Metadata:
Modular Approach to Digital Resource
Lifecycle Management
Daniel Gelaw Alemneh & Mark E. Phillips
IS&T, Archiving-2007 Conference
May 23, 2007, Arlington Virginia
University of North Texas
University of North Texas (UNT) Libraries Digital Initiatives
Collaborative Initiatives
•CyberCemetery
•GPO
•NARA – Affiliated Archive
•Texas Register Archive
•Secretary of State’s Office
•Texas Laws and Resolutions Archive
•Secretary of State’s Office
•The Portal to Texas History
•45 Libraries & Museums
•Web-at-Risk Project
•California Digital Library
•New York University
University of North Texas
University of North Texas (UNT) Libraries Digital Initiatives
Library Digital Collections
•Congressional Research Service Archive
•9500+ CRS Reports
•Portal to Texas History
•20,000 records
•World War Poster Collection
•500 WWI and WWII Posters
•Advisory Commission on Intergovernmental Relations
•408 reports = 47,874 pages
•Federal Communications Commission (FCC) Record
•136 issues = 43,115 pages (6 of 21 volumes completed)
•GovDocs A to Z digitization project
•186 scanned 500+ in queue
•Jean-Baptiste Lully Collection
•27 scores = 10,000 pages
University of North Texas
Metadata Environment
• Metadata-based digital resource management activities
• UNT Libraries metadata locally qualified Dublin Core based
descriptive metadata.
• Detailed technical and preservation metadata elements
• Web based metadata creation and editing
• Interoperability
• Metadata Crosswalks
•Mods
•Marc
•oai_dc
•PREMIS
University of North Texas
Metadata Quality
• The two aspects of digital library data quality:
• The quality of the data in the objects themselves
• The quality of the metadata associated with the objects
• Poor metadata quality:
• Ambiguities
• Poor recall
• Poor precision
• Inconsistency of search results
University of North Texas
Metadata Quality …
• Most
Common errors:
• Incorrect Data:
• Letter transposition
• Letter omission
• Letter insertion
• Letter substitution or misstrokes
• Missing Data
• Elements and values not present at all (null)
• Insufficient or incomplete data
•Ambiguous Data
• Confusing or inconsistent data e.g. multiple spellings, multiple
possible meanings, mixed cases, initials, etc.
University of North Texas
Factors Influencing Metadata Quality
• Local
Requirements:
• Objects Heterogeneity
• What type of objects will the repository contain?
•Granularity
•How will they be described?
•Functionality
• What functionality is required?
• How will it be interfaced?
University of North Texas
Factors Influencing Metadata Quality …
• Collaborative
Requirements:
• Diversity of Users
•How best diverse information-seeking behaviors can be met?
• Interoperability
• Will metadata be meaningful within aggregations of various kinds?
• What is required for interoperability? (Structure, semantics, & syntax)
• Digital rights issues
• Will access restrictions be imposed?
• Are requirements formal or informal?
• Are there other access and associated digital rights issues?
University of North Texas
Factors Influencing Metadata Quality…
• Training Issues
• Necessary expertise to create and manage rigorous metadata
• Metadata quality can be determined to a great extent by:
• knowledge of the source, and
• knowledge of the methodology used to create the statement
• Cost
• Rigorous metadata is resource intensive and too costly
University of North Texas
UNT Metadata Quality Assurance Mechanisms & Tools
•
The two main stages of metadata qualities assurances:
•
Pre-injust
•
•
1. Metadata Creation tools (Templates)
Post-injust
•
2. Metadata Analysis tools (Web-based tools)
University of North Texas
Quality Assurance Mechanisms and Tools: Templates
1. Metadata Creation Tools (Templates)
•
Validates Mandatory elements
•
Metadata Template Creator
•
Template Reader
•
Controlled vocabularies (UNTLBS)
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
UNT Metadata Quality Assurance Mechanisms & Tools…
2. Metadata Analysis Tools
•
NULL Values
•
List/Browse All Values (by each qualifiers and elements)
•
List Authorities Values
•
Graphical reports and other fun stuff
•
Clickable Maps by Institution and Collection
•
Word Clouds by elements
•
Records added overtime and other graphical reports
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
University of North Texas
Summary
• Determine level of quality required
• Partners may have much in common, but they have diverse and
sometimes conflicting metadata requirements.
• Determine nature of gap and how to close it
• effectiveness, efficiency, practicability, scalability
• Machine verses human error handling
• How much of the process can be automated?
• Human review of results is still essential (e.g. highlighted items)
• Compromise
• One size does not fit all!
• Prioritize
• Resources very unlikely to be available to meet all requirements
• Test the workflow
• Test, retest, and evaluate the quality cycle continuously
University of North Texas
University of North Texas
Questions?
University of North Texas
Download