Digital Resources Integration Research

advertisement
Digital Resources Integration Research
This document contains research collected about possible methods for integration of digital resources into LINKcat. This is meant
to guide future catalogers when it comes time to catalog these resources, and provide options for generating MARC records.
What MARC records are available?
-
Project Gutenberg: offers MARC records for their over 48,000 free ebooks. Scroll down here to
access: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs
(note: automatically generated MARC records served by Project Gutenberg are unavailable as of
June 2014 but will hopefully be available again soon. You can still download MARC records
offsite served by the University of Adelaide here: http://ebooks.adelaide.edu.au/meta/pg/ )
o Problems:
 We’ve downloaded a sample of the MARC records from the University of
Adelaide source and found that the level of detail in the records are not
adequate and will not be easy to batch edit using MarcEdit. Subject headings
are all in the 653 field instead of the 650 field (which is not a difficult change).
They are also all put into subfield a and separated by em dashes ( – ) instead of
proper subfields (which is very time consuming to change).
 We’ve also downloaded the catalog rdf file provided by Project Gutenberg and
processed the records through MarcEdit to view the .mrk files, and quality of
the metadata is also not acceptable. Subject headings suffer from the same
problem and are placed into the 690 field (local subject access).
This process can be recreated by downloading the RDF zip file and unzipping it:
http://www.gutenberg.org/cache/epub/feeds/catalog.rdf.zip. Using MarcEdit
and clicking on MARC Tools, upload the RDF file as the input file. Set the output
file to save in you desired location and make sure that the file extension is .mrc.
Download the Project Gutenberg RDF => MARC XSLT file from the MarcEdit
website in order to process the RDF file correctly:
http://marcedit.reeset.net/downloads. Once it’s downloaded, select the
Project Gutenberg RDF => MARC file from the XML conversions list and press
execute. MarcEdit will produce a .mrc file. To view the file and batch edit the
records, repeat the process using MarcBreaker, import the .mrc file and export
it as an .mrk file. Opening the .mrk file in MarcEdit will allow you to view and
edit the records. When you’re finished, don’t forget to save it as a .mrc file again
before importing the files into Koha.
 We believe that each individual record of these 48,000 will need to be manually
edited, at least for subject cataloging.
o Examples of other libraries with Project Gutenberg:
 The Colorado Library Consortium (CLiC) uploaded 466 MARC records for Project
Gutenberg titles into their catalog (Koha). They received the MARC records for
the most popular titles on Project Gutenberg and sent them to a bibliographic
services company Marcive for cleanup. Then catalogers from CLiC and other
o
-
partners made further edits before loading them into the catalog. Article
citation: Colorado Consortium Adds Free Ebooks to Catalogs. (2011). Library
Journal, 136(2), 16. PDF: C:\Users\scaprac\Desktop\colorado clic adds
ebooks.pdf
The metadata in the records are fairly bare-bones but an improvement over the
originals. Here is a keyword search of Project Gutenberg titles in their catalog:
http://www.aspencat.info/cgi-bin/koha/opac-search.pl?q=project+gutenberg
Here is the E-Discover the Classics website and the link to import their MARC
records for the project: http://www.clicweb.org/e-discover-home/58-ediscover/e-discover/191-importing-marc-records
 Crawford County Federated Library System has integrated Project Gutenberg
titles into their library catalog (Koha)
http://meadvillelibrary.org/ebooks/project-gutenberg
Their records have extremely minimal metadata: http://catalog.ccfls.org/cgibin/koha/opacsearch.pl?idx=kw&idx=kw&idx=kw&sort_by=relevance&limit=mcccode%3AGUTENBERG&do=Search
 Contra Costa County Public Library has integrated Project Gutenberg into their
library catalog (TLC) http://guides.ccclib.org/project_gutenberg
Their metadata is fairly thorough (especially with subject headings):
http://catalog.ccclib.org/#section=search&term=gutenberg%20ebook&dbTab=l
s2pac
 Glendowie College has integrated Project Gutenberg into their library catalog
(Koha). Their metadata is not very good, and they’ve kept the uncontrolled
subject headings in their records so that clicking on them does nothing:
http://gclibrary.koha.kiwi.nz/cgi-bin/koha/opacsearch.pl?q=pb:Project%20Gutenberg,
Other interesting info
 Post about integrating free e-books into Koha (using Project Gutenberg titles) http://bywatersolutions.com/2013/05/06/koha-ebooks/
 Paper on “Integration of Gutenberg on-line books into the Austrian TESTLAB
catalogue” - http://www.snv.jussieu.fr/inova/publi/ntevh/integration.htm
 Terry Reese (developer of MarcEdit) blog post on Project Gutenberg files:
http://blog.reeset.net/archives/271#comments
 Terry Reese blog post on replacing fields and subfields using MarcEdit (will help
with subject heading issue) - http://blog.reeset.net/archives/1295
 More guidance on using MarcEdit to batch process, including a section on
messed up subject headings - http://journal.code4lib.org/articles/8336
Making of America: offers MARC records for their digital collection. Click here to access:
http://quod.lib.umich.edu/m/moagrp/moa-marc.html. Access the listed FTP site and download
the .zip file titled “moa2006.zip.” The .zip file contains several sets of .mrc records, which can
be imported and edited in MarcEdit.
o We have performed little analysis on these records and we have not come to the
decision to catalog them, especially since libraries who use the records may have to pay
a fee. This record set will need to be discussed in the future.
-
OCLC records: Some collections in the inventory have been cataloged in OCLC at the collection
or sub-collection level. Most of the UWDC collections can be found in OCLC along with several
other collections throughout the inventory. If an OCLC record is available, its OCLC number will
be provided within the collection’s profile.
o Pros: There are numerous collection and sub-collection level records available on OCLC
that could be easily imported into the catalog. This may be the easiest first steps we
take toward integration because it is the most familiar and least complicated set of
records, and the MARC cataloging would require little tweaking.
o Cons: The process of gathering these records together may be lengthy since they must
be searched for and downloaded individually, and batch edits may only be appropriate
for records created by the same institution.
o Other news: WorldCat has released an API that works to integrate MarcEdit and OCLC
records. According to the site, libraries can “upload new records to WorldCat or create
new WorldCat-derived records through MarcEdit. In addition, libraries can now update
WorldCat holdings, with options to add or delete holdings in batch.” Read about it here:
https://oclc.org/news/announcements/2013/marcedit.en.html
What are other methods of getting MARC records?
Many of the collections and items within the collections have been described using Dublin Core or
another metadata schema instead of MARC, and there are no MARC records available for download. We
are investigating methods to extract that metadata and convert it into MARC through a fairly automated
process (instead of transferring the metadata via crosswalk manually).
-
MarcEdit
o MarcEdit is a powerful application that has the capability of translating certain metadata
schemas, as well as tab delimited data, into MARC.
o We are still investigating how to grab the metadata from Dublin Core (and other
metadata schemas) that are presented in HTML on the web rather than as files and
import them into MarcEdit so they can be transformed into MARC. (emailed Terry Reese
7/17)
 My email to Terry Reese:
Hello Terry,
I was hoping you could help me understand if this task is possible or not.
I'm helping to integrate some digital resources into my library's ILS, and
many of the resources I'd like to integrate are described using Dublin Core.
I only have access to the public view of these records rather than files
that I normally would be able to import into MarcEdit. Is it possible to
transform this Dublin Core metadata into MARC for use in the OPAC using
MarcEdit?
Here is an example of a couple items I'd like to convert to MARC:
http://digicoll.library.wisc.edu/WebZ/initialize?sessionid=0:javascript=true
:dbchoice=1:active=1:entityCurrentPage=Search1:dbname=WI:style=WI:next=NEXTC
MD%7FQUERY?&context;:term=WI.HistAgSchaf.bib:index=oi%3A:fmtclass=multifulln
f:bad=error/badsearch.html:entitytoprecno=1:entitycurrecno=1:entitytempjds=T
RUE:numrecs=12:next=NEXTCMD%7FFETCH?&context;:recno=1:resultset=1:format=F:n
ext=html/nffull.html:bad=error/badfetch.html:entityresultsrecno=1%7F%7F
http://dcms.beloit.edu/cdm/compoundobject/collection/histories/id/6259/rec/1
Thank you so much for your time and guidance.

Terry Reese’s reply:
So, I know that the second link is a CONTENTdm database, so you could
harvest records from that pretty easy using the OAI-PMH gateway so long as
it has been enabled. The first link however, I'm not familiar with the
software being used. If the software supports some kind of harvesting
format like OAI-PMH then yes, you could automatically harvest entire
collections. If it doesn't, you could use the Generate Record from URL tool
in the MarcEditor -- though I'm not sure it will provide that much
information for you, and would link to the metadata page rather than the
item which I'm assuming is linked to on the page.
Anyway -- if you can find out if there is any support for things like
oai-pmh, or a different API - then there might be an option.
o
Generate MARC from URL tool – MarcEditor
 Following Terry Reese’s advice, I tried out the “generate MARC from URL” tool
in the MarcEditor. (To access this tool, open MarcEdit, and select the
MarcEditor icon from the menu or find MarcEditor under “File.” Once in the
MarcEditor, click on “Tools” and select “Generate MARC from URL” from the
drop-down menu. Paste in the URL of the record that you wish to generate, and
click OK.)
I tested this with the UWDC Dublin Core record I sent Terry in my email. Here is
the result:
=LDR 00000nam 2200000Ka 45e0
=008 140721suuuu\\\\xx\\\\\\\\\\\\000\0\eng\d
=245 00$aThe State of Wisconsin Collection Record Display$h[electronic resource]
=520 \\$a Save this record Title: A history of agriculture in Wisconsin Author: Schafer, Joseph, 1867-1941
Date: 1922 Publisher: Madison, Wisconsin: State Historical Society of Wisconsin LCSH Subjects: Agriculture-Wisconsin--History Language: English Type: Text Format: text/html / text/sgml / image/jpeg / xiii, 212 p. ill
=856
40$qtext/html$uhttp://digicoll.library.wisc.edu/WebZ/initialize?sessionid=0:javascript=true:dbchoice=1:active
=1:entityCurrentPage=Search1:dbname=WI:style=WI:next=NEXTCMD%7FQUERY?&context;:term=WI.HistAgSc
haf.bib:index=oi%3A:fmtclass=multifullnf:bad=error/badsearch.html:entitytoprecno=1:entitycurrecno=1:entit
ytempjds=TRUE:numrecs=12:next=NEXTCMD%7FFETCH?&context;:recno=1:resultset=1:format=F:next=html/n
ffull.html:bad=error/badfetch.html:entityresultsrecno=1%7F%7F
We can see that this is not a very useful way to generate MARC records as the
program doesn’t recognize many of the elements included in the record. The
program also can only recognize one record at a time, so pasting a URL of a web
page that contains multiple records does not work.
-
-
I additionally tested this tool out with the ContentDM record, and the tool could
only recognize the title of the work.
 The conclusion for this tool is that it will not be useful for our purposes.
Harvesting metadata via OAI-PMH
o Terry Reese’s blog post on using OAI-PMH harvesting using MarcEdit http://blog.reeset.net/archives/497 (also see his article with instructions on the process
in the articles section below)
o ContentDM collections allow OAI-PMH harvesting as long as the feature has been
enabled by the hosting institution. Here is the ContentDM page on enabling OAI-PMH
harvesting: http://contentdm.org/help6/server-admin/oai.asp
According to the website, the base URL for these OAI repositories is:
http://your.CdmWebsite.address/oai/oai.php.
o We know that Recollection Wisconsin harvests metadata from its 200+ collections via
OAI-PMH harvesting. Since MarcEdit has the capability of transforming OAI into MARC,
it would be worth it to look into how Recollection Wisconsin harvests their metadata
and replicate their process to MARC records for all of the collections in Recollection
Wisconsin. Recollection Wisconsin contact information (Emily Pfotenhauer is the
Program Manager): http://recollectionwisconsin.org/contact
Other tools for metadata
o The Dublin Core website lists several tools that may be helpful for our collections
described using Dublin Core and do not have MARC records.
http://dublincore.org/tools/
 We’ve only done a preliminary scan of these resources and have not yet
evaluated if they will be useful for this project.
 One of particular interest is the Editor-Convertor Dublin Core metadata online
tool by the Kirovohrad Regional Universal Research Library. You’re supposed to
be able to paste in a URL, and the tool will recognize the metadata on the page
and enter it into its online form. You can then edit the metadata if you wish, and
then you can convert the metadata to MARC by pressing a button. This may
help us automate the process a little bit, but the tool rarely works for
recognizing the metadata on the page. The only advantage to using this tool
over the “Generate MARC from URL” tool in MarcEdit is that the user can edit
the metadata fields before processing the record into MARC.
http://library.kr.ua/dc/dceditunie.html
o The Library of Congress Bibliographic Enrichment Advisory Team developed a tool called
the Web Cataloging Assistant, which allows users to paste a URL to PDF or to an abstract
page for a publication and it will generate a MARC record from it. Librarians still must
perform authority control and subject cataloging, and the tool generally only works for
established monographs. Read more here: http://www.loc.gov/catdir/beat/webcat.html
o This document provides information on getting MARC records from metadata
crosswalks using XML:
-
http://drtc.isibang.ac.in:8080/xmlui/bitstream/handle/1849/127/H_aditya_crosswalk.p
df?sequence=2
More resources
o Terry Reese’s blog post on MarcEdit support for JSON files (which is how DPLA offers
their data) - http://blog.reeset.net/archives/1208
Articles about integrating e-resources into the library catalog
-
Beall, Jeffrey. Free Books: Loading Brief MARC Records for Open-Access Books in an Academic
Library Online Catalog. Cataloging & Classification Quarterly, 2009, vol. 47, n. 5, pp. 452-463.
http://eprints.rclis.org/15841/
This is an article detailing the process that Auraria Library went through to harvest metadata from
Hathitrust (Mbooks), along with the problems they encountered and how the dealt with them. They
used MarcEdit to harvest the OAI-PMH records and convert them to MARC.
-
Hill, H. & Bossaller, J. Public library use of free e-resources. Journal of Librarianship and
Information Science. 45(2) 103-112. doi: 10.1177/0961000611435253
Abstract for this article: “This article describes a multi-method research project examining the
use of various freely available online collections and projects, such as Project Gutenberg, the
Internet Archive, and Creative Commons-licensed ebooks, by public libraries. This research
begins with the questions: what are libraries doing with freely available materials? Are there
barriers to incorporating them into the collection? What role are librarians playing in expanding
access and awareness of these resources?”
-
Zhang, L., & Jin, M. (2014). Cataloging E-Books: Dealing with Vendors and Various Other
Problems. Serials Librarian, 67(1), 76-80. doi: 10.1080/0361526X.2014.899295
This article details some of the issues libraries face when cataloging ebooks. Though this article
focuses on paid ebooks supplied through vendors, there is still some relevant information here.
-
Panchyshyn, R. (2013). Asking the Right Questions: An E-Resource Checklist for Documenting
Cataloging Decisions for Batch Cataloging Projects. Technical Services Quarterly, 30(1), 15-37.
doi: 10.1080/07317131.2013.735951
This article provides an excellent checklist for cataloging e-books. While these catalogers are
mostly dealing with vendor records, they nevertheless encounter similar issues as ours with
batch editing MARC records. They also provide an extensive literature review on batch-editing
MARC records for e-books.
-
Brown, C. C., & Meagher, E. S. (2008). Cataloging free e-resources: is it worth the investment?.
Interlending & Document Supply, 36(3), 135-141. doi: 10.1108/02641610810897845
This article breaks down the pros and cons of adding free e-resources to the library OPAC. They
also provide a list of freely available resources. This article could be useful in making decisions
about the collections that are available to add to the catalog.
-
Harcourt, K., Wacker, M., & Wolley, I. (2007). Automated Access Level Cataloging for Internet
Resources at Columbia University Libraries. Library Resources & Technical Services, 51(3), 212225.
This article discusses how Columbia University dealt with the impossible workload for catalogers
to catalog e-resources, both paid and freely available. Their solution was to provide access-level
records for e-resources instead of full descriptive records because it proved to be far more cost
and time effective.
-
Reese, T. (2009). Automated Metadata Harvesting: Low-Barrier MARC Record Generation
from OAI-PMH Repository Stores Using MarcEdit. Library Resources & Technical Services,
53(2), 121-134.
This article provides instructions for harvesting metadata via OAI-PMH using MarcEdit (written
by the developed of MarcEdit). This will be very useful for some of the ContentDM sites that
have enabled OAI-PMH harvesting.
Other helpful resources
-
Terry Reese’s (developer of MarcEdit) blog - http://blog.reeset.net/
Terry Reese’s youtube channel (with lots of tutorials on MarcEdit) https://www.youtube.com/channel/UC7OLudoObYgiN_EmyDtZ_DQ
Download