Digital Resources Integration Research This document contains research collected about possible methods for integration of digital resources into LINKcat. This is meant to guide future catalogers when it comes time to catalog these resources, and provide options for generating MARC records. What MARC records are available? - Project Gutenberg: offers MARC records for their over 48,000 free ebooks. Scroll down here to access: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs (note: automatically generated MARC records served by Project Gutenberg are unavailable as of June 2014 but will hopefully be available again soon. You can still download MARC records offsite served by the University of Adelaide here: http://ebooks.adelaide.edu.au/meta/pg/ ) o Problems: We’ve downloaded a sample of the MARC records from the University of Adelaide source and found that the level of detail in the records are not adequate and will not be easy to batch edit using MarcEdit. Subject headings are all in the 653 field instead of the 650 field (which is not a difficult change). They are also all put into subfield a and separated by em dashes ( – ) instead of proper subfields (which is very time consuming to change). We’ve also downloaded the catalog rdf file provided by Project Gutenberg and processed the records through MarcEdit to view the .mrk files, and quality of the metadata is also not acceptable. Subject headings suffer from the same problem and are placed into the 690 field (local subject access). This process can be recreated by downloading the RDF zip file and unzipping it: http://www.gutenberg.org/cache/epub/feeds/catalog.rdf.zip. Using MarcEdit and clicking on MARC Tools, upload the RDF file as the input file. Set the output file to save in you desired location and make sure that the file extension is .mrc. Download the Project Gutenberg RDF => MARC XSLT file from the MarcEdit website in order to process the RDF file correctly: http://marcedit.reeset.net/downloads. Once it’s downloaded, select the Project Gutenberg RDF => MARC file from the XML conversions list and press execute. MarcEdit will produce a .mrc file. To view the file and batch edit the records, repeat the process using MarcBreaker, import the .mrc file and export it as an .mrk file. Opening the .mrk file in MarcEdit will allow you to view and edit the records. When you’re finished, don’t forget to save it as a .mrc file again before importing the files into Koha. We believe that each individual record of these 48,000 will need to be manually edited, at least for subject cataloging. o Examples of other libraries with Project Gutenberg: The Colorado Library Consortium (CLiC) uploaded 466 MARC records for Project Gutenberg titles into their catalog (Koha). They received the MARC records for the most popular titles on Project Gutenberg and sent them to a bibliographic services company Marcive for cleanup. Then catalogers from CLiC and other o - partners made further edits before loading them into the catalog. Article citation: Colorado Consortium Adds Free Ebooks to Catalogs. (2011). Library Journal, 136(2), 16. PDF: C:\Users\scaprac\Desktop\colorado clic adds ebooks.pdf The metadata in the records are fairly bare-bones but an improvement over the originals. Here is a keyword search of Project Gutenberg titles in their catalog: http://www.aspencat.info/cgi-bin/koha/opac-search.pl?q=project+gutenberg Here is the E-Discover the Classics website and the link to import their MARC records for the project: http://www.clicweb.org/e-discover-home/58-ediscover/e-discover/191-importing-marc-records Crawford County Federated Library System has integrated Project Gutenberg titles into their library catalog (Koha) http://meadvillelibrary.org/ebooks/project-gutenberg Their records have extremely minimal metadata: http://catalog.ccfls.org/cgibin/koha/opacsearch.pl?idx=kw&idx=kw&idx=kw&sort_by=relevance&limit=mcccode%3AGUTENBERG&do=Search Contra Costa County Public Library has integrated Project Gutenberg into their library catalog (TLC) http://guides.ccclib.org/project_gutenberg Their metadata is fairly thorough (especially with subject headings): http://catalog.ccclib.org/#section=search&term=gutenberg%20ebook&dbTab=l s2pac Glendowie College has integrated Project Gutenberg into their library catalog (Koha). Their metadata is not very good, and they’ve kept the uncontrolled subject headings in their records so that clicking on them does nothing: http://gclibrary.koha.kiwi.nz/cgi-bin/koha/opacsearch.pl?q=pb:Project%20Gutenberg, Other interesting info Post about integrating free e-books into Koha (using Project Gutenberg titles) http://bywatersolutions.com/2013/05/06/koha-ebooks/ Paper on “Integration of Gutenberg on-line books into the Austrian TESTLAB catalogue” - http://www.snv.jussieu.fr/inova/publi/ntevh/integration.htm Terry Reese (developer of MarcEdit) blog post on Project Gutenberg files: http://blog.reeset.net/archives/271#comments Terry Reese blog post on replacing fields and subfields using MarcEdit (will help with subject heading issue) - http://blog.reeset.net/archives/1295 More guidance on using MarcEdit to batch process, including a section on messed up subject headings - http://journal.code4lib.org/articles/8336 Making of America: offers MARC records for their digital collection. Click here to access: http://quod.lib.umich.edu/m/moagrp/moa-marc.html. Access the listed FTP site and download the .zip file titled “moa2006.zip.” The .zip file contains several sets of .mrc records, which can be imported and edited in MarcEdit. o We have performed little analysis on these records and we have not come to the decision to catalog them, especially since libraries who use the records may have to pay a fee. This record set will need to be discussed in the future. - OCLC records: Some collections in the inventory have been cataloged in OCLC at the collection or sub-collection level. Most of the UWDC collections can be found in OCLC along with several other collections throughout the inventory. If an OCLC record is available, its OCLC number will be provided within the collection’s profile. o Pros: There are numerous collection and sub-collection level records available on OCLC that could be easily imported into the catalog. This may be the easiest first steps we take toward integration because it is the most familiar and least complicated set of records, and the MARC cataloging would require little tweaking. o Cons: The process of gathering these records together may be lengthy since they must be searched for and downloaded individually, and batch edits may only be appropriate for records created by the same institution. o Other news: WorldCat has released an API that works to integrate MarcEdit and OCLC records. According to the site, libraries can “upload new records to WorldCat or create new WorldCat-derived records through MarcEdit. In addition, libraries can now update WorldCat holdings, with options to add or delete holdings in batch.” Read about it here: https://oclc.org/news/announcements/2013/marcedit.en.html What are other methods of getting MARC records? Many of the collections and items within the collections have been described using Dublin Core or another metadata schema instead of MARC, and there are no MARC records available for download. We are investigating methods to extract that metadata and convert it into MARC through a fairly automated process (instead of transferring the metadata via crosswalk manually). - MarcEdit o MarcEdit is a powerful application that has the capability of translating certain metadata schemas, as well as tab delimited data, into MARC. o We are still investigating how to grab the metadata from Dublin Core (and other metadata schemas) that are presented in HTML on the web rather than as files and import them into MarcEdit so they can be transformed into MARC. (emailed Terry Reese 7/17) My email to Terry Reese: Hello Terry, I was hoping you could help me understand if this task is possible or not. I'm helping to integrate some digital resources into my library's ILS, and many of the resources I'd like to integrate are described using Dublin Core. I only have access to the public view of these records rather than files that I normally would be able to import into MarcEdit. Is it possible to transform this Dublin Core metadata into MARC for use in the OPAC using MarcEdit? Here is an example of a couple items I'd like to convert to MARC: http://digicoll.library.wisc.edu/WebZ/initialize?sessionid=0:javascript=true :dbchoice=1:active=1:entityCurrentPage=Search1:dbname=WI:style=WI:next=NEXTC MD%7FQUERY?&context;:term=WI.HistAgSchaf.bib:index=oi%3A:fmtclass=multifulln f:bad=error/badsearch.html:entitytoprecno=1:entitycurrecno=1:entitytempjds=T RUE:numrecs=12:next=NEXTCMD%7FFETCH?&context;:recno=1:resultset=1:format=F:n ext=html/nffull.html:bad=error/badfetch.html:entityresultsrecno=1%7F%7F http://dcms.beloit.edu/cdm/compoundobject/collection/histories/id/6259/rec/1 Thank you so much for your time and guidance. Terry Reese’s reply: So, I know that the second link is a CONTENTdm database, so you could harvest records from that pretty easy using the OAI-PMH gateway so long as it has been enabled. The first link however, I'm not familiar with the software being used. If the software supports some kind of harvesting format like OAI-PMH then yes, you could automatically harvest entire collections. If it doesn't, you could use the Generate Record from URL tool in the MarcEditor -- though I'm not sure it will provide that much information for you, and would link to the metadata page rather than the item which I'm assuming is linked to on the page. Anyway -- if you can find out if there is any support for things like oai-pmh, or a different API - then there might be an option. o Generate MARC from URL tool – MarcEditor Following Terry Reese’s advice, I tried out the “generate MARC from URL” tool in the MarcEditor. (To access this tool, open MarcEdit, and select the MarcEditor icon from the menu or find MarcEditor under “File.” Once in the MarcEditor, click on “Tools” and select “Generate MARC from URL” from the drop-down menu. Paste in the URL of the record that you wish to generate, and click OK.) I tested this with the UWDC Dublin Core record I sent Terry in my email. Here is the result: =LDR 00000nam 2200000Ka 45e0 =008 140721suuuu\\\\xx\\\\\\\\\\\\000\0\eng\d =245 00$aThe State of Wisconsin Collection Record Display$h[electronic resource] =520 \\$a Save this record Title: A history of agriculture in Wisconsin Author: Schafer, Joseph, 1867-1941 Date: 1922 Publisher: Madison, Wisconsin: State Historical Society of Wisconsin LCSH Subjects: Agriculture-Wisconsin--History Language: English Type: Text Format: text/html / text/sgml / image/jpeg / xiii, 212 p. ill =856 40$qtext/html$uhttp://digicoll.library.wisc.edu/WebZ/initialize?sessionid=0:javascript=true:dbchoice=1:active =1:entityCurrentPage=Search1:dbname=WI:style=WI:next=NEXTCMD%7FQUERY?&context;:term=WI.HistAgSc haf.bib:index=oi%3A:fmtclass=multifullnf:bad=error/badsearch.html:entitytoprecno=1:entitycurrecno=1:entit ytempjds=TRUE:numrecs=12:next=NEXTCMD%7FFETCH?&context;:recno=1:resultset=1:format=F:next=html/n ffull.html:bad=error/badfetch.html:entityresultsrecno=1%7F%7F We can see that this is not a very useful way to generate MARC records as the program doesn’t recognize many of the elements included in the record. The program also can only recognize one record at a time, so pasting a URL of a web page that contains multiple records does not work. - - I additionally tested this tool out with the ContentDM record, and the tool could only recognize the title of the work. The conclusion for this tool is that it will not be useful for our purposes. Harvesting metadata via OAI-PMH o Terry Reese’s blog post on using OAI-PMH harvesting using MarcEdit http://blog.reeset.net/archives/497 (also see his article with instructions on the process in the articles section below) o ContentDM collections allow OAI-PMH harvesting as long as the feature has been enabled by the hosting institution. Here is the ContentDM page on enabling OAI-PMH harvesting: http://contentdm.org/help6/server-admin/oai.asp According to the website, the base URL for these OAI repositories is: http://your.CdmWebsite.address/oai/oai.php. o We know that Recollection Wisconsin harvests metadata from its 200+ collections via OAI-PMH harvesting. Since MarcEdit has the capability of transforming OAI into MARC, it would be worth it to look into how Recollection Wisconsin harvests their metadata and replicate their process to MARC records for all of the collections in Recollection Wisconsin. Recollection Wisconsin contact information (Emily Pfotenhauer is the Program Manager): http://recollectionwisconsin.org/contact Other tools for metadata o The Dublin Core website lists several tools that may be helpful for our collections described using Dublin Core and do not have MARC records. http://dublincore.org/tools/ We’ve only done a preliminary scan of these resources and have not yet evaluated if they will be useful for this project. One of particular interest is the Editor-Convertor Dublin Core metadata online tool by the Kirovohrad Regional Universal Research Library. You’re supposed to be able to paste in a URL, and the tool will recognize the metadata on the page and enter it into its online form. You can then edit the metadata if you wish, and then you can convert the metadata to MARC by pressing a button. This may help us automate the process a little bit, but the tool rarely works for recognizing the metadata on the page. The only advantage to using this tool over the “Generate MARC from URL” tool in MarcEdit is that the user can edit the metadata fields before processing the record into MARC. http://library.kr.ua/dc/dceditunie.html o The Library of Congress Bibliographic Enrichment Advisory Team developed a tool called the Web Cataloging Assistant, which allows users to paste a URL to PDF or to an abstract page for a publication and it will generate a MARC record from it. Librarians still must perform authority control and subject cataloging, and the tool generally only works for established monographs. Read more here: http://www.loc.gov/catdir/beat/webcat.html o This document provides information on getting MARC records from metadata crosswalks using XML: - http://drtc.isibang.ac.in:8080/xmlui/bitstream/handle/1849/127/H_aditya_crosswalk.p df?sequence=2 More resources o Terry Reese’s blog post on MarcEdit support for JSON files (which is how DPLA offers their data) - http://blog.reeset.net/archives/1208 Articles about integrating e-resources into the library catalog - Beall, Jeffrey. Free Books: Loading Brief MARC Records for Open-Access Books in an Academic Library Online Catalog. Cataloging & Classification Quarterly, 2009, vol. 47, n. 5, pp. 452-463. http://eprints.rclis.org/15841/ This is an article detailing the process that Auraria Library went through to harvest metadata from Hathitrust (Mbooks), along with the problems they encountered and how the dealt with them. They used MarcEdit to harvest the OAI-PMH records and convert them to MARC. - Hill, H. & Bossaller, J. Public library use of free e-resources. Journal of Librarianship and Information Science. 45(2) 103-112. doi: 10.1177/0961000611435253 Abstract for this article: “This article describes a multi-method research project examining the use of various freely available online collections and projects, such as Project Gutenberg, the Internet Archive, and Creative Commons-licensed ebooks, by public libraries. This research begins with the questions: what are libraries doing with freely available materials? Are there barriers to incorporating them into the collection? What role are librarians playing in expanding access and awareness of these resources?” - Zhang, L., & Jin, M. (2014). Cataloging E-Books: Dealing with Vendors and Various Other Problems. Serials Librarian, 67(1), 76-80. doi: 10.1080/0361526X.2014.899295 This article details some of the issues libraries face when cataloging ebooks. Though this article focuses on paid ebooks supplied through vendors, there is still some relevant information here. - Panchyshyn, R. (2013). Asking the Right Questions: An E-Resource Checklist for Documenting Cataloging Decisions for Batch Cataloging Projects. Technical Services Quarterly, 30(1), 15-37. doi: 10.1080/07317131.2013.735951 This article provides an excellent checklist for cataloging e-books. While these catalogers are mostly dealing with vendor records, they nevertheless encounter similar issues as ours with batch editing MARC records. They also provide an extensive literature review on batch-editing MARC records for e-books. - Brown, C. C., & Meagher, E. S. (2008). Cataloging free e-resources: is it worth the investment?. Interlending & Document Supply, 36(3), 135-141. doi: 10.1108/02641610810897845 This article breaks down the pros and cons of adding free e-resources to the library OPAC. They also provide a list of freely available resources. This article could be useful in making decisions about the collections that are available to add to the catalog. - Harcourt, K., Wacker, M., & Wolley, I. (2007). Automated Access Level Cataloging for Internet Resources at Columbia University Libraries. Library Resources & Technical Services, 51(3), 212225. This article discusses how Columbia University dealt with the impossible workload for catalogers to catalog e-resources, both paid and freely available. Their solution was to provide access-level records for e-resources instead of full descriptive records because it proved to be far more cost and time effective. - Reese, T. (2009). Automated Metadata Harvesting: Low-Barrier MARC Record Generation from OAI-PMH Repository Stores Using MarcEdit. Library Resources & Technical Services, 53(2), 121-134. This article provides instructions for harvesting metadata via OAI-PMH using MarcEdit (written by the developed of MarcEdit). This will be very useful for some of the ContentDM sites that have enabled OAI-PMH harvesting. Other helpful resources - Terry Reese’s (developer of MarcEdit) blog - http://blog.reeset.net/ Terry Reese’s youtube channel (with lots of tutorials on MarcEdit) https://www.youtube.com/channel/UC7OLudoObYgiN_EmyDtZ_DQ