Putting it all together for Digital Assets

advertisement
Putting it all
together for
Digital
Assets
Jon Morley
Beck Locey
LDS Church History Library
• Our collections consist of manuscripts, books, Church records,
photographs, oral histories, architectural drawings, pamphlets,
newspapers, periodicals, maps, microforms, and audiovisual
materials. The collection continues to grow annually and is a
prime resource for the study of Church history. Our collections
contains approximately:
• 270,000 books, pamphlets, magazines, and newspapers
• 240,000 collections of original, unpublished records (journals,
diaries, correspondence, minutes, etc.)
• 3.5 million blessings for Church members
• 13,000 photograph collections
• 23,000 audiovisual items
Digitization Objective
Patrons –
Globalization – To provide
access to library content to 14 million church
members and the public throughout
worldwide. Remote sites for new content.
Internal Operations –
To create an
automated digital pipeline from digitizing
content to patron consumption.
Extending to crowdsourcing in future.
Agenda
• The first half of this presentation
will focus on “what” we want to
accomplish with an automated
digital pipeline.
• The second half will focused on
“how” we built the digital pipeline.
What we want to accomplish
DIGITAL PIPELINE
Physical Assets
Digital Content
Church
History
Library
Rosetta
2011
EAD Tool
Aleph
Primo
(Encoded
Archival
Description)
(Master
Record)
(Discovery)
2010
Rosetta Dual Role
Two instances of Rosetta
1. DRPS – Digital Records Preservation
System (dark archive)
2. DCMS – Digital Content Management
System (public display)
Digital Pipeline
Ingest Collection
Rosetta
EAD Tool
Aleph
(Encoded
Archival
Description)
(Master
Record)
PID
555 tag
855 tag
Primo
Harvest
& Index
EAD Tool
Interface
Click on
Thumbnail
Patron can a item to be
digitized.
Organizing and
displaying
Collections
EAD Tool
(Encoded Archival Description)
A finding aid that adds a viewable structure to a collection.
-
Many large, complex collections
-
Aleph stores only the collection level descriptive metadata
-
Add / edit / delete component level metadata
-
Rosetta 3.0 doesn’t have enough functionality for our
collections.
EAD Tool - Staff View
EDIT
Drag and drop
XML or CSV files
Challenges
• Configuration is fairly complex within and between systems
• Large Batch Ingest has been difficult
• Re-ingesting content into Rosetta has been problematic
(preservation system)
• Collections Management
– Built EAD tool to manage collections
– Serials?
• PDF Content
– Progressive PDF download in Adobe Reader X is not fully supported
until Rosetta v3
– 2 searches to see text within a PDF
Successes
– Ingested over 57,000 Family history books ranging
from 1 MB to 800 MB for the Family History
department. (books.familysearch.org)
– Ingested 700,000 files. Most are linked to
collections for Church History library.
– Consistent user experiences with large files
– Good response times
• 75,000 views / Month
– Once the pipelines are established, they work well.
– Able to create customizable solutions for multiple
institutions.
Under Development
– Restricted content
• Based on IP addresses
• authentication
– Multiple viewing experiences (Responsive
Design)
– Reporting capabilities
– Monitor usage
How we glued it all together
DIGITAL PIPELINE
Digital Pipeline
Church
History
Library
Rosetta
EAD Tool
Aleph
(Encoded
Archival
Description)
(Master
Record)
Primo
Catalog in Aleph
• Physical assets are
cataloged in Aleph
(collection level).
• Staff assigns a call
number and Aleph
assigns a BIB number.
• While browsing, a patron
requests that the asset be
digitized.
Call #: MS 2877 4
BIB #: 000114027
Aleph
Digitized into Rosetta
• The asset(s) from the
Aleph collection are
digitized.
• A custom tool (SIP tool)
queries the Aleph SRU
server using BIB # to get
collection level metadata.
• Digitized assets, BIB #
(CMS ID) and metadata
are ingested into Rosetta.
PIDs get assigned.
BIB #: 000114027
Call #: MS 2877 4
Title: Parley P. Pratt…
Query
Response
Aleph SRU
000114027
MS 2877 4
Parley P. Pratt…
+
Rosetta
Rosetta to Aleph / EAD Tool
• Every night a custom script
(cron job) queries Rosetta
using the OAI harvester.
Query
Response
Rosetta OAI
Dublin
Core
• The PIDs and item titles from
Rosetta are inserted into the
EAD Tool or prepared for
Aleph (856 tags).
• Every night a custom script
(job_list) ingests the metadata
into Aleph.
EAD Tool
Aleph
From Aleph to Primo
• Every night Aleph publishes
all data sets to Primo (publish06 in job_list).
• Aleph data sets include
collection level metadata plus
856 tags (Rosetta PID) or 555
tags (EAD link).
• During harvest, Primo creates
links from 555 or 856 tags
which point to the EAD Tool
and Rosetta respectively.
Publish
MARC
XML
Aleph
Primo
Digital Pipeline Notes
• Aleph BIB # provides the linkage between
the various systems.
– Call # provides the reference
• Linux scripting glues it all together.
– Cron jobs, job_list, wget and custom logs work
together to get data, re-format data, move data
and start new jobs.
– NFS shares allow us to move data around
easily. Rights are a bit of a hassle.
• Timing matters.
– Pull from Rosetta, update EAD Tool, send to
Aleph, publish to Primo, and run Primo pipes.
Backup Slides
Rosetta OAI Harvester
• Linux script using wget requests
Rosetta data from last 24 hours
– Query: by publication set with from / until
– Response: Dublin Core
Query
Response
Linux script
Rosetta OAI
Wget http://<host>.<domain>.org/oaiprovider/request?verb=ListRecords&
metadataPrefix=oai_dc&set=<pubset>&from=2012-08-20T20:00:00Z&
until=2012-08-21T19:59:59Z
Aleph SRU Server
• Aleph SRU server response to Rosetta
– Query: by BIB # (CMIS ID)
– Response: Dublin Core (or MARC XML)
Query
Response
Aleph SRU
Rosetta
Aleph SRU Server URL
• http://<your-aleph-host>.<your-domain>.org:5661/
<xyz01>?version=1.1&operation=searchRetrieve...
• …&query=dc.callno=“MS 318”&maximumRecords=1
• …&query=rec.id=000082419&maximumRecords=1
• …&query=dc.title=“Book of
Mormon”&maximumRecords=5
• …&query=dc.subject=Apostles&maximumRecords=10
• …&query=dc.creator=“John
Taylor”&maximumRecords=10
Download