Preserving Electronic Mailing Lists: The H-Net Archive Lisa M. Schmidt Electronic Records Archivist, MATRIX Michigan State University 409 Natural Science Building East Lansing, MI 48823 001-517-884-2463 lisa.schmidt@matrix.msu.edu ABSTRACT This poster illustrates an NHPRC-funded project to evaluate and improve upon preservation practices for the H-Net academic e-mail list archive. Keywords Digital curation, digital preservation, electronic record preservation, e-mail list archives, e-mail preservation, trustworthy digital repository. 1. INTRODUCTION H-Net—an international consortium of scholars and teachers with the oldest collection of born-digital and content-moderated arts, humanities, and social science material on the Internet—is hosted by MATRIX, a digital humanities research center at Michigan State University. MATRIX received a grant from the National Historical Publications and Records Commission (NHPRC) to assess and improve upon the preservation practices for the H-Net e-mail lists. This collection of more than one million e-mail messages on more than 180 public networks and more than 230 private (administrative) lists is considered a valuable scholarly resource. The Center for Research Libraries (CRL) and Online Computer Library Center (OCLC)’s Trustworthy Repository Audit & Certification (TRAC): Criteria and Checklist was the primary tool used in the assessment. As a testbed for use of the TRAC, this H-Net preservation project will interest archivists and other information professionals who manage large collections of electronic records. browser interface allows users to retrieve individual messages, each of which is identified by the system through a combination of a notebook file name and an individual message’s MD5 hash. 3. PRESERVATION ASSESSMENT 3.1 Use of the TRAC The Trustworthy Repository Audit & Certification (TRAC): Criteria and Checklist was used to assess the preservation practices for the H-Net e-mail lists. This TRAC includes three sections: Organizational Infrastructure; Digital Object Management; and Technologies, Technical Infrastructure, and Security. Each section consists of a number of criteria that require supporting documentation as proof of fulfillment. Institutions wishing to become third-party repositories for other organizations’ digital archives may use the TRAC to establish the requisite credentials. In the case of H-Net, MATRIX used the TRAC for self-assessment to ensure its efficacy as a preservation environment and to highlight areas that required improvement. 3.2 Existing Preservation Practices 3.2.1 OAIS Model The H-Net preservation system follows the Open Archival Information System (OAIS) model, a reference model for an archive that has accepted the responsibility to preserve information for a designated community. 2. HOW H-NET WORKS H-Net runs on LISTSERV software. Users subscribed to a public e-mail list send messages in plain text, with no attachments, to the list editor for approval. Messages may take from a few seconds to several days to post after approval, at which point they are stored in files called “notebooks.” These notebooks are concatenations of messages posted during a seven-day time period. A log browse cache extracts key metadata and creates MD5 hashes for each message, and this metadata is written to a database cache. A web Figure 1. H-Net mapped to the OAIS model 3.2.2 Backup and Storage At the time of assessment, the backup and storage system for H-Net and MATRIX as a whole was based on daily incremental and weekly full backups to tapes that were cycled through the system approximately every six weeks and replaced as needed. Monthly full, “permanent” backup tapes were also made and kept in a secured room. 4.4 Other Improvements 4.4.1 Technical 3.2.3 Format H-Net messages are created and stored in ASCII, UTF-8, and other plain text formats. As plain text is a non-proprietary, archival format, there is no need for a migration plan at this time. Attachments to messages on the private lists are in proprietary formats that require a migration strategy, however. 3.2.4 Authenticity Informal authenticity measures were in place at the time of assessment. Authors and editors could check messages after posting to ensure their correctness. When attempting retrieval of a message through the browser interface, users would receive notification of a broken URL if the selected message had been altered. Fixity was not established for messages or notebook files. 4. PRESERVATION IMPROVEMENTS 4.1 Backup and Archival Storage Backup and storage improvements for MATRIX include establishing reciprocal and offsite storage arrangements; creating more than one set of “permanent” backup tapes; putting all “permanent” backup tapes on a retention schedule; and establishing a backup log. Archival copies of the H-Net data will be made annually to tapes that will be refreshed every five years, and “dark” and distributed archival storage options are being explored. 4.2 Authenticity 4.2.1 Fixity Fixity must be established for individual messages on submission and for notebook files on creation. If calculated at time of ingest, the messages may use the same MD5 hashes generated for discovery purposes for fixity checks. SHA-2 message digests will be calculated for the notebook files. Message hashes will be validated at the time of notebook completion, with notebook hashes validated on a weekly basis. Digital signatures will be generated for each list and updated and validated as needed to ensure against notebook deletion. 4.2.2 Other Authenticity Measures Notebook modification rights will be restricted to authorized MATRIX personnel. 4.3 Attachments Preservation of attachments to messages on the private lists must begin with providing browser access to those lists. Conversion tools will be kept in reserve or pointers to websites containing conversion tools provided for the most common formats. A technology watch will be established to keep up with the availability and usage of new formats and versions. Links within messages will be preserved and redirected to archived websites as needed. Shorter persistent URLs for messages will be mapped to the actual URLs, easing the citing of H-Net messages by researchers. 4.4.2 Administrative A succession plan is being negotiated with another institution, in the event that there comes a time when MATRIX can no longer host H-Net. Supporting documentation for the criteria laid out by the TRAC checklist is being gathered and policies created to ensure the soundness of H-Net as a preservation system and trusted digital repository. 5. ACKNOWLEDGMENTS MATRIX and Michigan State University thank the National Historical Publications and Records Commission (NHPRC) for funding this project. 6. REFERENCES [1] The Center for Research Libraries (CRL) and Online Computer Library Center Inc. (OCLC). “Trustworthy Repositories Audit & Certification: Criteria and Checklist.” Version 1.0. February 2007. http://www.crl.edu/PDF/trac.pdf. [2] Consultative Committee for Space Data Systems. “Reference Model for an Open Archival Information System (OAIS).” Blue Book 1, Issue 1, CCSDS Secretariat, January 2002. http://public.ccsds.org/publications/archive/650x0b1.pdf. [3] H-Net: Humanities and Social Sciences Online. 1995-2008. http://www.h-net.org. [4] H-Net: Humanities and Social Sciences Online. H-Net: Preserving and Improving Access to Specialized Electronic Mailing List Archives. 2007-2008. http://www.hnet.org/archive. [5] International Research on Permanent Authentic Records in Electronic Systems (InterPARES). 1999-2008. http://www.interpares.org. [6] Lee, Bronwyn, Gerard Clifton, and Somaya Langley. “PREMIS Requirement Statement Project Report.” Appendix 2: Recommended list of supported formats. Australian Partnership for Sustainable Repositories (APSR), National Library of Australia. July 2006. http://www.apsr.edu.au/publications/presta.pdf. [7] MATRIX: The Center for Humane Arts, Letters, and Social Sciences Online. http://www.matrix.msu.edu. [8] Research Libraries Group (RLG) and Online Computer Library Center Inc. (OCLC). “Trusted Digital Repositories: Attributes and Responsibilities.” May 2002. http://www.oclc.org/programs/ourwork/past/trustedrep/reposi tories.pdf. [9] Schmidt, Lisa M. “Preservation of the H-Net E-Mail Lists: Current Practices.” March 2008. http://www.h- net.org/archive/documentation/HNet%20Current%20Practices%20Post2.pdf. [10] Schmidt, Lisa M. “Preservation of the H-Net E-Mail Lists: Suggested Improvements.” August 2008. http://www.hnet.org/archive/documentation/hnetpresimprov.pdf.