Preserving Digital Records for the Long-Term: Building a Trustworthy Digital Repository at the Archives of Ontario Association for Manitoba Archives – April 29th, 2011 Ryan Carpenter Senior Coordinator, Archival Electronic Records Archives of Ontario ontario.ca/archives Agenda • • • • Archives of Ontario – A Brief Introduction Digital Preservation Challenge Digital Preservation at the Archives of Ontario Trustworthy Digital Repository (TDR) – – – – – What is it Why do we need it What has been done What is being done What’s next • TDR & ECM • Digital Preservation Collaboration ontario.ca/archives Archives of Ontario: A Brief Introduction • The Archives was established in 1903 • Provides leadership to collect, manage and preserve the records of Ontario and to promote and facilitate their use by present and future generations • Recently became part of Information, Privacy and Archives Division of Corporate Chief Information Office. • Archives is made up of three integrated program delivery areas: – Collections Development and Management – Customer Service and Outreach – Recordkeeping Support ontario.ca/archives Digital Preservation Challenge ontario.ca/archives The Digital Environment • Digital records encompass email , audiovisual recordings, textual documents, websites, images, etc. • Digital records are pervasive in all aspects of our personal and working life. • The creation of digital information is exploding at an exponential rate. • Some similarities but many differences between digital and analog records. ontario.ca/archives The Digital Environment – Government • Ontario Public Service (OPS) digital records experience mirrors what is happening in other jurisdictions. • Currently, 98% of new information created in the OPS is in digital format only. • The implementation of the Enterprise Content Management (ECM) system will shift government recordkeeping from paper to electronic media across the OPS with the electronic form of the record, rather than the paper records, will be considered authoritative. • The complexity involved in the long-term digital preservation coupled with the explosive growth of archival digital records in the next few years presents the Archives with a critical challenge; the volume of potentially archival digital records is roughly estimated to be 100 terabytes by 2013 across OPS. • Under the Archives and Recordkeeping Act, 2006, the Archives is mandated to preserve and make available archival electronic records for as long as required. ontario.ca/archives Long-term Digital Preservation - Volume Impact to the Archives Volume of Potential Archival Electronic Records in OPS 120 Terabytes 100 Volume of Electronic Information in OPS 2500 2226 80 60 40 20 0 2007 2008 2011 2012 2000 Terabytes 1713 1500 Volume • Approximately 85 TB of electronic information created in OPS in 2011 is of archival value and will potentially have to be transferred to the AO eventually. (Literature suggests that 3-5% of government records (paper records) are archival) 1000 780 500 600 0 2007 2008 The OPS is managing about 1.7PB of electronic information in 2011. (Source: Managing Information Assets in the OPS: The Future is Now) 2011 2012 • The current total volume of digital records collections in the Archives is 5.5 TB. • The average annual volume increase rate is approximately 400% (1998-2010) • With future ECM implementations in the OPS, there will be more rigour in transferring archival electronic records to the Archives. ontario.ca/archives What is Digital Preservation? • Digital Preservation is the management of digital information to ensure it is accessible and understandable over time. OR • Digital Preservation encompasses a broad range of activities designed to extend the usable life of digital files and protect them from media failure, physical loss, and obsolescence. • However, it is one thing to preserve a bitstream, but quite another to preserve the content, form, style, appearance, and functionality. ontario.ca/archives Digital Preservation Threats • File Format and Software Obsolescence • Hardware and Media Obsolescence • Physical Threats ontario.ca/archives Digital Preservation Strategies Basic • Bitstream Copying (backups) • Refreshing • Durable/Persistent Media (e.g. Gold CDs) • Analog Backups (e.g. microfilm) Expensive – Not Feasible • Technology Preservation (‘computer museum’) • Digital Archaeology (data recovery) Preferred Approaches • Migration (most preferred approach currently) • Normalization (reliance on standard format – PDF/A) • Emulation (e.g. Universal Virtual Computer) • Encapsulation (‘wrapping’) ontario.ca/archives Digital Preservation Standards - ISO • • • • • ISO 14721:2003 - Open Archival Information System (OAIS) - Reference model Metrics for Digital Repository Audit and Certification RED BOOK, CCSDS. Oct 2009 ISO/TR 18492:2005 - Long-term preservation of electronic document-based information ISO 19005-1:2005 - Document management Electronic document file format for long-term preservation - Part 1: Use of PDF 1.4 (PDF/A-1) ISO 15801 - Electronic imaging - Information stored electronically - Recommendations for trustworthiness and reliability ontario.ca/archives Digital Preservation at the Archives of Ontario ontario.ca/archives 12 Existing Archival Digital Records Program • Program has existed since 1997. • Program is focused on the long-term preservation of archival digital records. • 2 full-time employees – Senior Coordinators, Archival Electronic Records. • Created Electronic Records Online section of AO website in 2009. ontario.ca/archives Existing Archival Digital Repository • Existing digital repository is on a virtual server maintained by Infrastructure Technology Services (ITS). • Current digital holdings are about 5.5 TB, consisting of some 1.5 TB of archival born-digital records and 4 TB of digitized images (mostly VS records). • These digital records are in various formats: MS Office documents, e-mails, HTML, digital audio and video files, databases, digital images, and websites etc. • Existing repository is not adequate to meet future operational requirements as it offers little functionality to preserve and secure the digital records properly or make them accessible online. ontario.ca/archives Transfer of Digital Records • The Archives of Ontario currently acquires archival digital records from Ontario public bodies and private donors. • Guideline for Transferring Electronic Records to the Archives of Ontario was revised in September 2009. • Assists with the transfer of archival digital records to the Archives in accordance with an approved records series that has a final disposition of ‘Transfer to Archives’. • This guideline applies to all Ontario government public bodies that are subject to the requirements of the Archives and Recordkeeping Act, 2006. ontario.ca/archives Transfer of Digital Records – Cont’d • Originating public bodies are responsible for ensuring that all digital records in their custody remain readable, accessible, secure, free of viruses, and are able to satisfy legal and evidentiary requirements throughout their lifecycle. • Digital records are to be transferred in a software independent format whenever possible, or in a format the Archives finds acceptable. • In general, the Archives will not acquire specialized software applications and their ongoing licenses. ontario.ca/archives Transfer of Digital Records – Cont’d • Transfer Procedures – Consult with Archives – Identify Records for Transfer – Complete a Test Transfer – Transfer Official Records and Documentation – Confirm Receipt of Records Transfer ontario.ca/archives Trustworthy Digital Repository (TDR) ontario.ca/archives Trustworthy Digital Repository (TDR) – What is it? Definition: ‘a mission to provide reliable, long-term access to managed digital resources to its Designated Community, now and into the future’ Taken from ‘Audit and Certification of Trustworthy Digital Repositories’ - October 2009 ontario.ca/archives TDR - What is it - Cont’d • A TDR is a long-term solution for the preservation of digital records of archival value. • It will be driven by the Archives’ business requirements and will be modelled on ISO standards and other best practices as well. ontario.ca/archives TDR - What is it - Key Components TDR will be modelled on ISO standards – OAIS Reference Model, and Audit and Certification of Trustworthy Digital Repository. The Archives’ TDR will be certified once an international/national certification process is developed. Staff ontario.ca/archives TDR – What is it - OAIS Reference Model ontario.ca/archives TDR - Why do we need it? • Ensures the Archives meets its mandated statutory obligations as per the Archives and Recordkeeping Act, 2006. • Meets the priority for long-term digital preservation as identified in Ontario’s Five Year Corporate I&IT Plan (2008-2013). • Meets the government’s priority of strengthening front-line service delivery by greatly improving services to the public at the Archives. TDR will provide ‘anytime, anywhere’ remote 24/7 online access to archival digital records. ontario.ca/archives TDR - Why do we need it? Cont’d • • • • • To preserve any type of electronic record, Created using any type of application, On any computing platform, Delivered on any digital media, From any public body in the Ontario Government and any private donor, • To provide discovery and delivery to anyone with an interest and legal right of access, • For present and future generations … … Revised from: http://www.archives.gov/era (U.S. A. National Archives and Records Administration Electronic Records Archives) ontario.ca/archives TDR - What has been done? • Full Business Case – Main recommendation: Acquire a Modifiable Off-the-Shelf (MOTS) solution or a Commercial Off-the-Shelf (COTS) solution • Request for Information (RFI) for a trusted digital repository solution – Identified 5 vendors with viable long-term digital preservation repository solutions • High-level Functional Requirement Analysis for the future trustworthy digital repository – For main entities and functions of digital repository • IT Governance Process – Gate 0 approval and Gate 1 GGRC endorsement ontario.ca/archives TDR – What has been done - Full Business Case • Main recommendation: Acquire a Modifiable Off-theShelf (MOTS) solution or a Commercial Off-the-Shelf (COTS) solution • Other options which have been analyzed for the development of a TDR are: – – – – Utilize an integrated open source software (OSS) solution Acquire a commercial custom system Develop a digital preservation system in-house Rely on OPS public bodies to preserve archival digital records ontario.ca/archives TDR – What has been done - Request for Information • The RFI has been well received by potential vendors with none finding difficulty with the concepts and constructs (such as OAIS Reference Model and TDR etc.) contained in the RFI document. A wealth of valuable information was received from the 7 respondents. • All 5 TDR-focused submissions meet or exceed the basic requirements for a TDR as outlined in the RFI and demonstrate the availability of modifiable off-the-shelf (MOTS) products on the digital repository market. • The estimated cost of purchasing and implementing such a solution (including software, hardware, customization, integration, and implementation, etc.) varies from $400,000 to $2,000,000. • The adoption of Open Source Software (OSS) applications seems inevitable. Among the 5 TDR-focused submissions, 3 solutions comprise OSS components; while 2 other solutions are completely made up of OSS applications. • The OAIS Reference Model, and the other TDR-related standards and best practices are highly accepted and followed by the solution providers. • The use of any solution proposed alone will not guarantee the TDR’s compliance with the OAIS Reference Model and Trustworthy Repositories Audit & Certification. ontario.ca/archives TDR – What has been done - High-level Functional Requirement Analysis Ingest (Entity) 35 Use Cases were developed for main Entities and Functions of a TDR: – Ingest (7) – Archival Storage (8) – Data Management (4) – Access (4) – Administration (7) – Preservation Planning (5) OAIS (Function) TDR (Function) Comparison Manage Transfer Agreement Move Transfer Agreement Management from Administration to Ingest Receive Submission Receive SIP Submission Quality Assurance Perform SIP Quality Assurance Generate AIP Generate AIP Generate Descriptive Information Extract Descriptive Metadata Coordinate Updates Delete Coordinate Updates, and incorporate the functionalities into Generate AIP and Extract Descriptive Metadata under Ingest Notify Transfer Result ontario.ca/archives Add Notify Transfer Result TDR - What has been done - High-level Functional Requirement Analysis cont’d Use Case Template ontario.ca/archives 29 TDR - What has been done - High-level Functional Requirement Analysis cont’d RADR – Potential Integration with other IT applications Friday, October 15, 2010 CSS Federated + 6. Interface The Series Management Database ADD CTS 7. Interface 5. Integration De 1 SIP Producer – r sc m i ve ipt Ingest eta da ta Data Management ry re s ult/ re p o rt Access Metadata maintenance AIP AIP EIM Systemt Qu e DIP 3. Interface Search results RADR 4. Interface Consumer 2. Interface Archival Storage Administration Digitization Preservation Planning Notes: 1. TDR (ingest) interfaces to Producers’ system, especially their EIM Open Text System for the transfer of SIPs.. 2. TDR (Ingest) interfaces to Digitization projects in coordinating transfers of digitized images. 3. TDR (Ingest) interfaces to the CTS in coordinating transfers of mixed physical/digital records. Functionality might be very limited at early stage of RADR implementation. 4. TDR (Ingest) interfaces to the Series Management Database to collecting records schedule information. Functionality might be very limited at early stage of RADR implementation. 5. TDR (Ingest) integrates with the ADD to cooperate on metadata capture and describing digital records. 6. TDR (Data Management) interfaces to ADD in proper storage and maintenance of metadata, especially duplicate descriptive metadata. 7. TDR (Access) interfaces to AO Federated Search Engine and Customer Service System (CSS) in assisting users’ searching and ordering activities. TDR doesn’t interact with users directly, however TDR is responsible for preparing query results, reports and DIPs for Search Engine and/or CSS to deliver. ontario.ca/archives 30 Page 1 TDR - What has been done - High-level Functional Requirement Analysis cont’d Archival process within Ingest Functions Integrated Archival Process Scheduling records Manage Transfer Agreement Receive SIP Submission Perform SIP Quality Assurance Generate AIP Extract Descriptive Metadata Notify Transfer Result Developing Transfer Arrangement Receiving, Selection, Quality checking, Selection, Accessioning, Culling Arrangement, Description, Metadata Capture, Creation of AIPs Extraction of descriptive metadata Notifying producers about transfer status Reengineering of digital records management process is one of the biggest challenges we are facing. We mapped the archival process into OAIS Entities and Functions. ontario.ca/archives 31 TDR - What has been done - High-level Functional Requirement Analysis cont’d Digital Records Transfer Guideline TDR Media Management Guideline TDR AIP Packaging Standard TDR AIP Migration Procedure Ingest TDR Database administration policy …… Archival Storage Data Management DIP Packaging Standard TDR Import and Export Guideline …… Access …… Technology Monitoring Guideline … ... Administration Preservation Planning TDR Overall Policies & Procedures TDR Mission Statement TDR Security Policy Backup and Recovery Policy TDR Naming /Numbering Convention TDR User Access Control TDR Contingency Plan …... System Configuration Manual Digital Collection Policy Digital Records Selection and Culling Guideline The Archives Fundamental Digital Preservation Polices& Procedures Digital Preservation Policy Digital Preservation Strategic Plan Digital Preservation Method Digital Records File Format Guideline ontario.ca/archives Structure of Policies and Procedures Recommended TDR Entity-specific Policies & Procedures …... 32 TDR- What is being done - Open Source Software (OSS) Experiments OSS testing: objectives • Test functionalities of various products • Assess the feasibility of utilizing these tools for interim • Validate and refine the detailed functional requirements for the TDR • Inform revisions to the Archives’ existing digital records guidelines and associated policies • Determine appropriate preservation tools • Further understand our existing electronic records, identify preservation risks, and potential mitigation approaches ontario.ca/archives TDR- What is being done - Open Source Software (OSS) Experiments – Cont’d OSS testing: tools to be tested • Tools which validate file formats and extract technical metadata: – DROID (created by The National Archives of UK) – JHOVE (created by Harvard University) – NLNZ (created by the National Library of New Zealand) • Tools which convert digital objects to open formats: – XENA (created by the National Archives of Australia) • Tools which manage the object assessment and ingest process: – Archivematica (created by Artefactual Systems) • Preservation testbed environment and project management software: – Planets Comparator, Planets Testbed, Planets Plato ontario.ca/archives TDR- What is being done - Open Source Software (OSS) Experiments – Cont’d Technical Inventory of Digital Records in the AO’s eRepository • Identify the file formats and the other technical features of digital records in the Archives holdings • Identify records requiring immediate preservation action • Assess preservation risks of digital records in the Archives’ holdings • Determine priorities for future preservation operations • Inform revisions to current procedures ontario.ca/archives TDR – Next Steps? • Work will proceed in-house on developing detailed functional requirements for the TDR. • Explore options for the development of the TDR. • Creation of long-term digital preservation strategy. • Creation of long-term digital preservation policy. ontario.ca/archives TDR - Detailed Requirements – Preliminary Plan • Deliverables – Detailed requirement specifications for all 6 Entities (Ingest, Archival Storage, Data Management, Access, Preservation Planning and Administration) of a future TDR to be developed and validated – Detailed workflow for the management of archival digital records, starting from receiving, selection, accessioning, through archival description, storage to search and ordering etc. to be developed and validated • Objectives – Provide a sound foundation for the future development and implementation of a TDR in the Archives; – Ensure the future TDR can fit well into the overall Archives business environment, meet actual business requirements, work smoothly with the other IT applications already in place, and – Follow related ISO standards and digital preservation/TDR best practices. ontario.ca/archives 37 TDR - Detailed Requirements - Reference Materials ontario.ca/archives 38 TDR - Detailed Requirements - Reference Materials cont’d ontario.ca/archives 39 TDR - Detailed Requirements - Methodology ontario.ca/archives 40 TDR & ECM ontario.ca/archives Linkages with ECM • Long-term digital preservation begins at the desktop active records. • Proper recordkeeping during all stages of IM lifecycle will ensure that records can be properly managed in TDR. • Preservation policy required to mitigate risks to legacy digital records. • IT and information management areas need to partner to address challenges, incorporating recordkeeping requirements. ontario.ca/archives Linkages with ECM Cont’d • Elements of a TDR can be applied to nonarchival active/semi-active records that have long-term retention requirements. • TDR ensures the sustainability of an Enterprise Content Management (ECM) strategy by providing a trustworthy exporting channel and permanent repository for archival digital records initially managed by ECM system. ontario.ca/archives TDR vs. ECM/RDMS TDR ≠ ECM • Have different objectives. • Use different standards. • Look forward to future developments such as an integrated solution with both records management and long-term digital preservation capabilities. ontario.ca/archives TDR vs. ECM/RDMS Cont’d ECM (RDMS as major component) Trustworthy Digital Repository Objectives To regain control over electronic records/information by providing system tools to capture, classify and apply retention schedules and access controls to erecords. To preserve and provide access to digital records/information, free from dependence on any specific hardware and software, for as long as required Functions Capture, File plan, Retention and disposition, Access control, Document management, Workflow, Collaboration Ingest, Archival Storage, Data Management, Preservation Planning, Access, and Administration Standards/Be st Practices ISO 15489; DOD 5015.2, MoReq, Functional Requirements for ERMS(ICA 2008) etc ISO 14721:2003: Open Archival Information System (OAIS) Reference model; ISO 20652:2006 Producer-archive interface -Methodology abstract standard; Trustworthy Repositories Audit & Certification: Criteria and Checklist V1.0; etc Suppliers Open Text, EMC2 (Documentum), HP (Trim), IBM (Filenet) etc Lockheed Martin, Tessella, Ex Libris, IBM, SUN, HP, Microsoft ontario.ca/archives TDR vs. ECM/RDMS Cont’d Active Inactive Semi-active Public electronic records with long retention periods Almost all public electronic records ECM Repositories Transfer of archival electronic records into the Archives' Repository ontario.ca/archives All archival electronic records that have fulfilled their retention periods Archives’ TDR Digital Preservation Collaboration: Pan-Canadian Efforts & External/Internal Partnerships ontario.ca/archives Collaboration - Goals • Similar to the Archives of Ontario, other archives and many areas of government are facing preservation challenges. • Promote the awareness of long-term digital preservation. • Bring key stakeholders together. • Collectively share the knowledge gained from the important work being done in the Archives and across government. ontario.ca/archives National Digital Preservation Working Group (NDPWG) • The group was established by the Archives of Ontario in August 2008. 8 meetings have been held to date. • The mandate of the group is to provide a forum for practitioners in the field of digital preservation to share ideas and expertise, discuss best practices and lessons learned. • The membership includes : – Saskatchewan – Manitoba – Nova Scotia – Nunavut – Northwest Territories – Yukon – Alberta – Manitoba – Library and Archives Canada • The Archives of Ontario is the current chair for the NDPWG. ontario.ca/archives Canadian Preservation Cooperation Strategy • Library and Archives Canada (LAC) visited Archives on July 27th, 2010, to discuss a number of digital preservation projects where they could work collaboratively with the Archives. • Subsequent to the meeting, the Archives, LAC and the Saskatchewan Archives Board agreed to develop a Canadian Preservation Cooperation Strategy on Digital Preservation that outlines the principles of the group and its proposed projects. • Meetings have been held to develop work plans and other planning documents. • Canadian Preservation Cooperation Strategy was presented at National, Provincial and Territorial Archivists Conference (NPTAC) on Friday 22 October 2010. • First joint project is Canadian Registry of Digital Storage Media – final draft completed. ontario.ca/archives Canadian TDR Network • Initiative started by LAC and the University of Alberta in March 2010. • Emerged out of the process that built the Canadian Digital Information Strategy. • Idea is to start with a small group of pioneering institutions that will begin a process of understanding and articulating the issues involved with building a TDR network. • The short-term goal is to create a coalition from which the group can begin to build its preservation capacity. • Kick-off meeting held November 26th at LAC. • Development of a strategy and vision document is underway (by LAC, University of Alberta Library, Archives of Ontario, University of British Columbia Library). ontario.ca/archives Academic Partnerships - iSchool • Archives has partnered with the Faculty of Information (iSchool) at the University of Toronto on a number of digital preservation activities: – Attended Digital Preservation Reading Course led by Dean Seamus Ross from February – April 2010. – Hosted practicum (internship) for iSchool student Suzanne Leblanc from May-August 2010. She completed a survey and report on digital preservation file formats for digital video. – Attended iSchool hosted Digital Curation Matters conference June 16-17 2010. – Have explored possibility of employing PhD. students and jointly applying for grant funding for preservation research projects. ontario.ca/archives International Liaisons • Have had numerous interactions with international digital preservation jurisdictions. – Hosted delegations from international archives including: • Hefei City, Anhui Province, China – April 17, 2009 • National Archives of Japan - March 19, 2010 • Malaysia National Archives – April 30, 2010 – Ongoing information sharing with colleagues in the USA, UK, Australia and New Zealand. ontario.ca/archives Plans for Ontario Government • Creation of Digital Preservation Collaboration Committee • Launching a Digital Preservation OPSpedia (internal social networking) site • Setting up digital preservation web presence on the Archives’ inter/intranet ontario.ca/archives Thank You! Questions? ontario.ca/archives Contact Information Ryan Carpenter Senior Coordinator, Archival Electronic Records Archives of Ontario Ryan.Carpenter@ontario.ca, 416-327-8174 Lijuan Yu Senior Coordinator, Archival Electronic Records Archives of Ontario Lijuan.Yu@ontario.ca, 416-327-1588 ontario.ca/archives