metadata management

advertisement
Harold B Lee Library is considered to be a leader in the library catalog and authority profession
and it is important to continue this tradition in being a leader in setting and following metadata
standards as well. When digital projects are undertaken there is an expenditure of time, effort, and
money to create resources that should be accessible. Find, Identify, Select, Obtain ( FISO) is the goal of
creating digital resources and metadata is the key to being able to reach these steps for access and use.
Since metadata is the vital component in providing description and access for digital resources,
providing metadata expertise for these projects should always be considered in the workflow of digital
projects. It is necessary to be planned out and accounted for in the discussion period as well as during
the creation of, field mapping, and technical loading into the digital asset management system decided
upon for the project. Maintenance following the completion of the project is continual and migration to
new software and systems are issues that need attention in all digital projects. Decisions by
departments can impact and have far reaching effects on metadata and digital collections because of
the high number of digital objects and the variety of collection owners/curators.
Digital resources in the Harold B Lee Library include:
2.57 million items in CONTENTdm
21,530 items in Internet Archive
6000 url’s and 26 million documents in Archive-It
Bibliographies in ATOM – 224,357 in Utah County Obituary Index; 14,621 in the Mormon Bibliography;
39,192 in Relief Society Magazine
24 Journals in Open Journal System (migrated from CONTENTdm)
Digital collections owners/curators:
Special Collections, Family History, European Studies, Institute for the Preservation of Ancient Religious
Texts, Ancient Textual Imaging Group, Art Department, Humanities Department, Religion Department,
Maxwell Institute, Daughters of the Utah Pioneers, Springville, BYU Hawaii, BYU Idaho, LDS Business
College.
Examples of metadata problems that developed without appropriate metadata management of
projects:
1. LIT reformatted internal copyright links from html to php. No notice was given to metadata until
AFTER this action was taken. Understandably LIT wants to keep internal links as secure as
possible, but they were unaware that this affected over 2 million metadata records in
CONTENTdm. This required a review of all 180 digital collections to note the copyright link used
in each, as many are custom designed and all are written and approved by Carl Johnson. With 16
unique copyright links in CONTENTdm, LIT subsequently needed to create 16 replacement links.
Following that was the actual link replacement within each digital collection. This was an
2.
3.
4.
5.
6.
7.
unexpected project that was not planned on by the metadata unit. Having a voice within the LIT
would help both that department as well as metadata to be aware of the ‘domino effect’ when
any action is taken that can affect metadata records.
The Relief Society Magazine Index began as a bibliographic database from a curator and evolved
into a digital project when it was chosen to be scanned into Internet archive. The index was
used as the source of metadata, but when the index was matched with the digital scans it was
not up to standards and needed corrections on over 39 thousand items. Although it received
approval as a digital project, there was no metadata preview of the issues involved which were
further reaching than expected and took three times longer than anticipated to complete. It is
important for a digital project to keep to the scope definition and not allow it ‘scope creep’ as it
proceeds. This will always have an impact on the metadata production, maintenance and
review.
In connection with the Florence Nightingale exhibition, Special Collections digitized specific
historical documents to enhance the physical exhibit with an online exhibit. With a deadline fast
approaching for the exhibit, the metadata unit did not have enough notice beforehand to
complete the metadata for loading into the digital collection. Metadata is more time consuming
and detailed than the scanning process which must be accounted for when planning projects.
A music reference bibliography database was monitored by Special Collections music students
who added metadata over several years. The database became corrupted from manipulation
and excessive copy and paste from excel. Correcting the metadata became a crisis for the
curator who needed it fixed for addition into a music website. Reviewing metadata created by
other sources is a vital step in digital project maintenance.
Most of the digitization projects come from Special Collections. The curators are the ‘owners’
and subject specialists of these collections and are therefore have the responsibility to provide
metadata for these projects. Metadata is now coming from the archival finding aid. It is mapped
into the Dublin Core fields by the DI Lab as they scan the images and then loaded as batches into
CONTENTdm. In this workflow, the projects are approved by the curator with a completed
finding aid and the metadata is not seen by the metadata unit until after it is loaded by the lab
students.
Journals originally digitized into CONTENTdm have been migrated into OJS. When these journals
were in CONTENTdm there was rich subject analysis metadata created at the article level to aid
in searching and access (FISO). This metadata was lost when migrated to OJS. Many thousands
of hours of work were lost as well as the rich metadata.
Items scanned into Internet Archive require metadata in order to be loaded into the system. All
the metadata comes from our catalog records. Problems in this procedure develop when:
a. Catalog records are very minimal or are only an acquisition record, so there is not
enough information in the record to load into Internet Archive. This results in missing
information that is created by lab students not trained in cataloging
b. DI Lab students choose the wrong records to match with the item in hand, thus loading
the item with the wrong metadata
c. The library does not own the materials, so there is no catalog record to match with the
item. The necessary metadata is created by the lab from the item in hand or is taken
from outside records for loading into Internet Archive and is not first reviewed by
metadata; i.e. Clarence Dixon Taylor materials containing items that belong to donors
brought in to be digitized, also the Brussels opera music containing items scanned in
the Brussels Archive and added to Internet Archive collection, resulting in approximately
1700 titles added to a cataloging backlog.
d. Items that belong to the library have not been cataloged yet, resulting in the lab
creating the metadata and causing an additional backlog for cataloging following the
scanning .
Suggestions to improve metadata management:
1. Thorough metadata consultation on every digital project, starting with the proposal and
following through to the finished project and subsequent monitoring through any other
changes. This would include metadata inclusion and consultation in all departments that
may have any effect upon metadata.
2. More oversight of projects done by units outside of the library as well as departments
inside the library.
3. Have the metadata unit responsible for loading batches of digital objects into
CONTENTdm. This would allow the metadata to be reviewed as it is loaded.
4. Have the metadata unit load metadata into the Dublin Core fields prior to uploading
into CONTENTdm. As students trained to do so, they have the expertise needed to
know the correct information and format in connection with the digital object.
5. Eliminate ‘scope creep’ on digital projects that produce changes in the metadata
creation.
Download