Slides for The Purdue University Research Repository

advertisement
THE PURDUE UNIVERSITY RESEARCH
REPOSITORY:
HUBZERO CUSTOMIZATION FOR DATASET
PUBLICATION AND DIGITAL PRESERVATION
Carly Dearborn, MSIS
Digital Preservation and Electronic Records Archivist
Amy Barton, MLS
Assistant Professor of Library Science, Metadata Specialist
Neal Harmeyer, MLS
Digital Archivist
WHAT IS PURR?
TECHNICAL AND INSTITUTIONAL
INFRASTRUCTURE
THE PURDUE UNIVERSITY RESEARCH REPOSITORY
A BRIEF OVERVIEW:
The Purdue University Research Repository (PURR) is a research
collaboration and data management solution for Purdue researchers
and their collaborators.
• Data management support
• A workspace for researchers to collaborate on research and publish
datasets online
• Access to published datasets with unique Digital Object Identifier
(DOI)
• Long-term preservation component
• https://purr.purdue.edu
THE PURDUE UNIVERSITY RESEARCH REPOSITORY
A CUSTOMIZED INSTANCE OF HUBZERO ®
• PURR utilizes HUBzero as its foundation: https://hubzero.org
• Designed to facilitate virtual communities, online collaboration,
research, and teaching
• Built on open source LAMP (Linux Apache, MySQL, and PHP)
platform with Joomla! Content Management System (CMS)
• PURR was specially customized for data stewardship which includes
a workflow for the curation, publication, dissemination and
preservation of datasets
• Unique customization of HUB software will be added to base
HUBzero package in next release
THE PURDUE UNIVERSITY RESEARCH REPOSITORY
COLLABORATIVE INSTITUTIONAL INFRASTRUCTURE
• Collaborative effort
•
•
•
Purdue University Libraries
Information Technology at Purdue (ITaP)
Office of the Vice President for Research (OVPR)
• Governed by an Executive Committee, Steering Group, and a Working Group
• PURR Libraries team
•
•
•
•
•
•
•
Project Director 50%
Digital Data Repository Specialist 100%
Two Software Developers 100%
Metadata Specialist 20%
Digital Archivist 25%
Two Graduate Assistants 50%
Graduate Assistant 25%
DATA PRESERVATION
ISO 16363 & OAIS
DIGITAL PRESERVATION IN PURR
THE DATA DELUGE
• Long-term data management plans required by many federal funding
agencies
• Trustworthy repositories, sound metadata creation and capture, open
standards for file formats, and information literacy vital to longevity of
digital resources
• Working Group drafted PURR Digital Preservation Policy using the
Trustworthy Repository Audit Checklist (TRAC) as guiding document.
• TRAC/ISO 16363 influenced documentation such as mission
statement, policies, job descriptions, business plan, etc.
DIGITAL PRESERVATION IN PURR
DEVELOPING POLICIES AND STRATEGIES
• PURR’s preservation mandate and its organizational commitment.
• PURR commits to preservation for a period of 10 years after which
the content is subject to the Libraries’ selection criteria and archival
appraisal
• Preservation strategies: full preservation, bit-level preservation and no
preservation.
• All objects receive bit-level maintenance, a DOI permanent identifier,
PREMIS preservation metadata, onsite and offsite backups, regular
virus checks, regular rotation to new storage media.
• PURR accepts all file formats but recommends formats which are
more sustainable long-term.
DIGITAL PRESERVATION IN PURR
OAIS & DISTRIBUTED DIGITAL PRESERVATION
• The Open Archival Information System (OAIS) Reference Model is a
standard in digital preservation and an ISO standard – ISO 14721
• Producers submit content item for publication with appropriate Dublin
Core metadata – this acts as the Submission Information Package
(SIP)
• The Content Information (CI) is then bundled together with
Preservation Description Information using Library of Congress
specifications for BagIt. This is the Archival Information Package (AIP)
• Unlike most OAIS repositories, the Dissemination Information
Package (DIP) is not derived from AIP but rather its SIP.
• In February 2013, Purdue joined The MetaArchive Cooperative
SIP
purr.purdue.edu
Designated
Community
APACHE
DIP
PHP
MySQL
JOOMLA!
Backup
(MetaArchive)
HUBzero
LOCKSS access
BagIt
LINUX
Diagram designed by: Sriram Kiran Valavala
Media
AIP
PURR METADATA
WEAVING OF STANDARDS
FOR PRESERVATION
METADATA & AIP CREATION TOOL
TALKING POINTS
•
•
•
•
Metadata Overview
Dataset Publication Process
Archive Information Package Generation
Metadata Generation
METADATA GOALS FOR PURR
DATASET METADATA
• Capture all pertinent information about the dataset file for long term
preservation
• Descriptive metadata
• Administrative metadata
– Technical metadata
– Structural metadata
– Rights metadata
– Preservation metadata
METADATA GENERATION
METADATA STANDARDS FOR PURR
• Metadata Encoding and Transmission Standard (METS)
• Wrapper
• DCMI Metadata Terms (dcterms)
• Descriptive metadata
• Metadata Object Description Schema (MODS)
• Dataset ownership
• Access condition
• Preservation Metadata: Implementation Strategies (PREMIS)
• Preservation metadata
METADATA GENERATION
WHY METS?
• METS acts as a structured container into which other standard
metadata schemas can be pointed to externally or embedded
internally.
• Structure:
DCTERMS
• Descriptive Section <mets:dmdSec>
• Administrative Section <mets:amdSec>
PREMIS
• Technical Section <mets:techMD>
PREMIS & MODS
• Rights Section <mets:rightsMD>
• Digital Provenance Section <mets:digiprovMD>
• File Section <mets:fileSec>
METS
• File Structure Section <mets:structMap>
METADATA GENERATION
WHY QUALIFIED DUBLIN CORE?
• Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
• Our Digital Library Software Developer, Brandon Beatty,
developed OAI-PMH functionality
• The code was submitted to the HUBzero development group and
added the core HUBzero code.
• HUBzero now comes standard with OAI-PMH functionality
• A contribution for the greater good 
METADATA GENERATION
WHY PREMIS?
• PREMIS is a robust preservation standard that captures digital
preservation activities applied to a digital object.
• Intellectual Entity
•
A coherent set of content that is reasonably described as a unit
(dataset).
• Objects
•
A discrete unit of information in digital form.
• Events
•
An action that involves at least one object or agent known to the
preservation repository.
• Rights
•
Assertions of one or more rights or permissions pertaining to an object
and/or agent.
• Agents
•
A person, organization, or software program associated with
preservation events in the life of an object.
Data Dictionary for Preservation Metadata
(http://www.oclc.org/content/dam/research/activities/pmwg/premis-final.pdf)
METADATA GENERATION
PREMIS EVENTS FOR PURR
Event Name
capture
in-revision
validation
ingestion
fixity check
replication
migration
Event Description
Initial capture of the
publication data from the
user -- the first event in
the event stream.
Generated when a SIP
must be revised before it
can be approved (it would
occur between capture
and validation).
Validation of the SIP to
ensure it is ready to
become an AIP.
Creation of the AIP from
the approved SIP.
Periodic event where the
fixity of the files in the AIP
is re-validated.
Copying the AIP bit-for-bit
to another location for
preservation purposes (as
in LOCKSS).
Transforming the AIP and
its contents into a morecontemporary format.
Event Preservation Explanation
Preserving capture would help with HUBzero debugging, as it tells us when
the SIP capture/creation process started.
Preserving in-revision would help with HUBzero debugging, as in-revision is
a major change in the SIP status. Note that in-revision only occurs if the SIP
is sent back to the author(s) for revision before it can be approved for AIP
status.
Validation is one of the major steps in a SIP's journey to AIP status, so it
should be preserved.
Ingestion is the creation of the AIP, so it should be preserved.
Preserving fixity check would help with HUBzero debugging, as it tells when
there is a problem with the preservation process.
As replication creates another copy of the AIP, it should be preserved for
debugging purposes.
As migration creates a newer, automatically-generated version of the AIP, it
should be preserved for debugging purposes.
Data Dictionary for Preservation Metadata
(http://www.oclc.org/content/dam/research/activities/pmwg/premis-final.pdf)
METADATA GENERATION
WHY MODS?
• Capture Dataset Ownership
• <name> The name of a person, organization, or event (conference,
meeting, etc.) associated in some way with the resource.
• <affiliation> The name of an organization, institution, etc. with which the
entity recorded in <name> was associated at the time that the resource
was created.
• <role> Designates the relationship (role) of the entity recorded in name to
the resource described in the record.
• <accessCondition> Information about restrictions imposed on access to
a resource.
•
•
<mods:accessCondition type="restriction on access">publically accessible
</mods:accessCondition>
<mods:accessCondition type="restriction on access">embargoed until 2015-06-30
</mods:accessCondition>
METADATA GENERATION
AIP CREATION
METADATA GENERATION
AIP CREATION
METADATA GENERATION
AIP CREATION
METADATA GENERATION
AIP CREATION
CC0 - Creative Commons
Creative Commons Attribution Unported 3.0 License
Creative Commons Attribution-NoDerivs Unported 3.0 License
Creative Commons Attribution-NonCommercial-ShareAlike Unported 3.0 License
Creative Commons Attribution-ShareAlike Unported 3.0 License
Creative Commons Attribution-NonCommercial Unported 3.0 License
Creative Commons Attribution-NonCommercial-NoDerivs Unported 3.0 License
METADATA GENERATION
AIP CREATION
METADATA GENERATION
AIP CREATION
PURR
Puuuurrrrrrrrrrr….
METADATA GENERATION
PREMIS EVENT CAPTURED
<mets:digiprovMD ID="METS-digiprovMD-premis-event-unpacking-20130312T112352processId-17937-seq-1">
<mets:mdWrap MDTYPE="PREMIS:EVENT">
<mets:xmlData>
<premis:event>
<premis:eventIdentifier>
<premis:eventIdentifierType>HUBzero</premis:eventIdentifierType>
<premis:eventIdentifierValue>premis-event-unpacking20130312T112352-processId-17937-seq-1
</premis:eventIdentifierValue>
</premis:eventIdentifier>
<premis:eventType>unpacking</premis:eventType>
<premis:eventDateTime>2013-03-12T11:23:52+00:00
</premis:eventDateTime>
<premis:eventDetail>tool: HUBzero</premis:eventDetail>
<premis:eventOutcomeInformation>
<premis:eventOutcome>unpackaged</premis:eventOutcome>
</premis:eventOutcomeInformation>…
METADATA GENERATION
METS WRAPPER & DC TERMS
<mets:dmdSec ID="METS-dmdSec-doi__10.5072__FK250925">
<mets:mdWrap MDTYPE="DC">
<mets:xmlData>
<mets:dcterms>
<dcterms:creator>Amy Barton</dcterms:creator>
<dcterms:date>2013-01-07T16:40:43-05:00</dcterms:date>
<dcterms:description>projectName: Metadata Project</dcterms:description>
<dcterms:description>projectAlias: metadata</dcterms:description>
<dcterms:description>publicationState: Draft under review</dcterms:description>
<dcterms:description>publicationVersion: 1</dcterms:description>
<dcterms:description>abstract: A metadata workshop was developed based on subject
liaison librarians’ feedback in a Qualtrics survey.
</dcterms:description>
<dcterms:description>notes: The dataset contains survey data.</dcterms:description>
<dcterms:description>synopsis: Subject Librarian survey and resulting metadata
workshop.</dcterms:description>
<dcterms:format>BagIt</dcterms:format>
<dcterms:identifier>doi:10.5072/FK250925</dcterms:identifier>
<dcterms:publisher>Purdue University Research Repository</dcterms:publisher>
<dcterms:rights>CC0 - Creative Commons</dcterms:rights>
<dcterms:subject>Instruction</dcterms:subject>
<dcterms:subject>Metadata</dcterms:subject>
<dcterms:subject>Survey data</dcterms:subject>
<dcterms:subject>Library Science</dcterms:subject>
<dcterms:title>Metadata Madness Workshop:</dcterms:title>
<dcterms:type>dataset</dcterms:type>
</mets:dcterms>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
METADATA GENERATION
METS TECHNICAL SECTION
METADATA GENERATION
METS TECHNICAL SECTION
METADATA GENERATION
METS RIGHTS
METADATA GENERATION
MODS DATASET OWNERSHIP
METADATA GENERATION
PREMIS AGENT
METADATA GENERATION
PREMIS EVENT
METADATA GENERATION
METS FILES AND STRUCTURE MAP
PURR
Puuuurrrrrrrrrrr….
WANT TO LEARN MORE?
PURR CONTACTS
• Visit https://purr.purdue.edu/
• Digital Data Repository Specialist:
Courtney Matthews at matthew6@purdue.edu
SPECIAL THANKS TO:
•
•
•
•
Neal Harmeyer, Digital Archivist
Brandon Beatty, Digital Library Software Developer
Courtney Matthews, Digital Data Repository Specialist
Mark Fisher, Digital Library Software Developer
REFERENCES
•
•
•
•
•
Faniel, Ixchel M., Zimmerman, Ann (2011) “Beyond the Data Deluge: A Research Agenda for
Largee-Scale Data Sharing and Reuse.” The International Journal of Digital Curation 6(1): 59
Lee, C., and Tibbo, H. “Digital Curation and Trusted Repositories: Steps toward Success” (2007).
Journal of Digital Information. http://journals.tdl.org/jodi/index.php/jodi/article/view/229/183
Klimeck, G., McLennan, M., Brophy, S.P., Adams, G.B., & Lundstrom, M.S.(2008). “nanoHUB.org:
Advancing Education and Research in Nanotechnology,” Computing in Science and
Engineering,10(5): 17, 19, 21
Witt, M., (2012). “Curation Service Models: Purdue University Research Repository” Libraries and
Staff Presentations. Paper 3. http://docs.lib.purdue.edu/lib_fspress/3
Witt, M. (2012). Co-designing, Co-developing, and Co-implementing an Institutional Data Repository
Service. Journal of Library Administration, 52(2). DOI:10.1080/01930826.2012.655607
Download