Abstract - H-Net

advertisement
Preserving Electronic Mailing Lists: The H-Net Archive
Lisa M. Schmidt
Electronic Records Archivist, MATRIX
Michigan State University
409 Natural Science Building
East Lansing, MI 48823
001-517-884-2463
lisa.schmidt@matrix.msu.edu
ABSTRACT
This poster illustrates an NHPRC-funded project to evaluate and
improve upon preservation practices for the H-Net academic
e-mail list archive.
Keywords
Digital curation, digital preservation, electronic record
preservation, e-mail list archives, e-mail preservation, trustworthy
digital repository.
1. INTRODUCTION
H-Net—an international consortium of scholars and teachers with
the oldest collection of born-digital and content-moderated arts,
humanities, and social science material on the Internet—is hosted
by MATRIX, a digital humanities research center at Michigan
State University. MATRIX received a grant from the National
Historical Publications and Records Commission (NHPRC) to
assess and improve upon the preservation practices for the H-Net
e-mail lists. This collection of more than one million e-mail
messages on more than 180 public networks and more than 230
private (administrative) lists is considered a valuable scholarly
resource. The Center for Research Libraries (CRL) and Online
Computer Library Center (OCLC)’s Trustworthy Repository
Audit & Certification (TRAC): Criteria and Checklist was the
primary tool used in the assessment. As a testbed for use of the
TRAC, this H-Net preservation project will interest archivists and
other information professionals who manage large collections of
electronic records.
browser interface allows users to retrieve individual messages,
each of which is identified by the system through a combination
of a notebook file name and an individual message’s MD5 hash.
3. PRESERVATION ASSESSMENT
3.1 Use of the TRAC
The Trustworthy Repository Audit & Certification (TRAC):
Criteria and Checklist was used to assess the preservation
practices for the H-Net e-mail lists. This TRAC includes three
sections:
Organizational
Infrastructure;
Digital
Object
Management; and Technologies, Technical Infrastructure, and
Security. Each section consists of a number of criteria that require
supporting documentation as proof of fulfillment. Institutions
wishing to become third-party repositories for other
organizations’ digital archives may use the TRAC to establish the
requisite credentials. In the case of H-Net, MATRIX used the
TRAC for self-assessment to ensure its efficacy as a preservation
environment and to highlight areas that required improvement.
3.2 Existing Preservation Practices
3.2.1 OAIS Model
The H-Net preservation system follows the Open Archival
Information System (OAIS) model, a reference model for an
archive that has accepted the responsibility to preserve
information for a designated community.
2. HOW H-NET WORKS
H-Net runs on LISTSERV software. Users subscribed to a public
e-mail list send messages in plain text, with no attachments, to the
list editor for approval. Messages may take from a few seconds to
several days to post after approval, at which point they are stored
in files called “notebooks.” These notebooks are concatenations of
messages posted during a seven-day time period. A log browse
cache extracts key metadata and creates MD5 hashes for each
message, and this metadata is written to a database cache. A web
Figure 1. H-Net mapped to the OAIS model
3.2.2 Backup and Storage
At the time of assessment, the backup and storage system for
H-Net and MATRIX as a whole was based on daily incremental
and weekly full backups to tapes that were cycled through the
system approximately every six weeks and replaced as needed.
Monthly full, “permanent” backup tapes were also made and kept
in a secured room.
4.4 Other Improvements
4.4.1 Technical
3.2.3 Format
H-Net messages are created and stored in ASCII, UTF-8, and
other plain text formats. As plain text is a non-proprietary,
archival format, there is no need for a migration plan at this time.
Attachments to messages on the private lists are in proprietary
formats that require a migration strategy, however.
3.2.4 Authenticity
Informal authenticity measures were in place at the time of
assessment. Authors and editors could check messages after
posting to ensure their correctness. When attempting retrieval of a
message through the browser interface, users would receive
notification of a broken URL if the selected message had been
altered. Fixity was not established for messages or notebook files.
4. PRESERVATION IMPROVEMENTS
4.1 Backup and Archival Storage
Backup and storage improvements for MATRIX include
establishing reciprocal and offsite storage arrangements; creating
more than one set of “permanent” backup tapes; putting all
“permanent” backup tapes on a retention schedule; and
establishing a backup log. Archival copies of the H-Net data will
be made annually to tapes that will be refreshed every five years,
and “dark” and distributed archival storage options are being
explored.
4.2 Authenticity
4.2.1 Fixity
Fixity must be established for individual messages on submission
and for notebook files on creation. If calculated at time of ingest,
the messages may use the same MD5 hashes generated for
discovery purposes for fixity checks. SHA-2 message digests will
be calculated for the notebook files. Message hashes will be
validated at the time of notebook completion, with notebook
hashes validated on a weekly basis. Digital signatures will be
generated for each list and updated and validated as needed to
ensure against notebook deletion.
4.2.2 Other Authenticity Measures
Notebook modification rights will be restricted to authorized
MATRIX personnel.
4.3 Attachments
Preservation of attachments to messages on the private lists must
begin with providing browser access to those lists. Conversion
tools will be kept in reserve or pointers to websites containing
conversion tools provided for the most common formats. A
technology watch will be established to keep up with the
availability and usage of new formats and versions.
Links within messages will be preserved and redirected to
archived websites as needed. Shorter persistent URLs for
messages will be mapped to the actual URLs, easing the citing of
H-Net messages by researchers.
4.4.2 Administrative
A succession plan is being negotiated with another institution, in
the event that there comes a time when MATRIX can no longer
host H-Net. Supporting documentation for the criteria laid out by
the TRAC checklist is being gathered and policies created to
ensure the soundness of H-Net as a preservation system and
trusted digital repository.
5. ACKNOWLEDGMENTS
MATRIX and Michigan State University thank the National
Historical Publications and Records Commission (NHPRC) for
funding this project.
6. REFERENCES
[1] The Center for Research Libraries (CRL) and Online
Computer Library Center Inc. (OCLC). “Trustworthy
Repositories Audit & Certification: Criteria and Checklist.”
Version 1.0. February 2007.
http://www.crl.edu/PDF/trac.pdf.
[2] Consultative Committee for Space Data Systems. “Reference
Model for an Open Archival Information System (OAIS).”
Blue Book 1, Issue 1, CCSDS Secretariat, January 2002.
http://public.ccsds.org/publications/archive/650x0b1.pdf.
[3] H-Net: Humanities and Social Sciences Online. 1995-2008.
http://www.h-net.org.
[4] H-Net: Humanities and Social Sciences Online. H-Net:
Preserving and Improving Access to Specialized Electronic
Mailing List Archives. 2007-2008. http://www.hnet.org/archive.
[5] International Research on Permanent Authentic Records in
Electronic Systems (InterPARES). 1999-2008.
http://www.interpares.org.
[6] Lee, Bronwyn, Gerard Clifton, and Somaya Langley.
“PREMIS Requirement Statement Project Report.”
Appendix 2: Recommended list of supported formats.
Australian Partnership for Sustainable Repositories (APSR),
National Library of Australia. July 2006.
http://www.apsr.edu.au/publications/presta.pdf.
[7] MATRIX: The Center for Humane Arts, Letters, and Social
Sciences Online. http://www.matrix.msu.edu.
[8] Research Libraries Group (RLG) and Online Computer
Library Center Inc. (OCLC). “Trusted Digital Repositories:
Attributes and Responsibilities.” May 2002.
http://www.oclc.org/programs/ourwork/past/trustedrep/reposi
tories.pdf.
[9] Schmidt, Lisa M. “Preservation of the H-Net E-Mail Lists:
Current Practices.” March 2008. http://www.h-
net.org/archive/documentation/HNet%20Current%20Practices%20Post2.pdf.
[10] Schmidt, Lisa M. “Preservation of the H-Net E-Mail Lists:
Suggested Improvements.” August 2008. http://www.hnet.org/archive/documentation/hnetpresimprov.pdf.
Download