2011 Digital Preservation/Media At Risk slides

advertisement
Preserving Digital Records for the Long-Term:
Building a Trustworthy Digital Repository at the
Archives of Ontario
Association for Manitoba Archives – April 29th, 2011
Ryan Carpenter
Senior Coordinator, Archival Electronic Records
Archives of Ontario
ontario.ca/archives
Agenda
•
•
•
•
Archives of Ontario – A Brief Introduction
Digital Preservation Challenge
Digital Preservation at the Archives of Ontario
Trustworthy Digital Repository (TDR)
–
–
–
–
–
What is it
Why do we need it
What has been done
What is being done
What’s next
• TDR & ECM
• Digital Preservation Collaboration
ontario.ca/archives
Archives of Ontario: A Brief Introduction
• The Archives was established in 1903
• Provides leadership to collect, manage and preserve the
records of Ontario and to promote and facilitate their use by
present and future generations
• Recently became part of Information, Privacy and Archives
Division of Corporate Chief Information Office.
• Archives is made up of three integrated program delivery
areas:
– Collections Development and Management
– Customer Service and Outreach
– Recordkeeping Support
ontario.ca/archives
Digital Preservation Challenge
ontario.ca/archives
The Digital Environment
• Digital records encompass email ,
audiovisual recordings, textual documents,
websites, images, etc.
• Digital records are pervasive in all aspects
of our personal and working life.
• The creation of digital information is
exploding at an exponential rate.
• Some similarities but many differences
between digital and analog records.
ontario.ca/archives
The Digital Environment – Government
•
Ontario Public Service (OPS) digital records experience mirrors what is
happening in other jurisdictions.
•
Currently, 98% of new information created in the OPS is in digital format only.
•
The implementation of the Enterprise Content Management (ECM) system will
shift government recordkeeping from paper to electronic media across the OPS
with the electronic form of the record, rather than the paper records, will be
considered authoritative.
•
The complexity involved in the long-term digital preservation coupled with the
explosive growth of archival digital records in the next few years presents the
Archives with a critical challenge; the volume of potentially archival digital
records is roughly estimated to be 100 terabytes by 2013 across OPS.
•
Under the Archives and Recordkeeping Act, 2006, the Archives is mandated to
preserve and make available archival electronic records for as long as required.
ontario.ca/archives
Long-term Digital Preservation - Volume Impact to
the Archives
Volume of Potential Archival Electronic Records in OPS
120
Terabytes
100
Volume of Electronic Information in
OPS
2500
2226
80
60
40
20
0
2007
2008
2011
2012
2000
Terabytes
1713
1500
Volume
• Approximately 85 TB of electronic information created in OPS in 2011 is of
archival value and will potentially have to be transferred to the AO eventually.
(Literature suggests that 3-5% of government records (paper records) are
archival)
1000
780
500
600
0
2007
2008
The OPS is managing about 1.7PB
of electronic information in 2011.
(Source: Managing Information
Assets in the OPS: The Future is
Now)
2011
2012
• The current total volume of digital records collections in the Archives is 5.5 TB.
• The average annual volume increase rate is approximately 400% (1998-2010)
• With future ECM implementations in the OPS, there will be more rigour in
transferring archival electronic records to the Archives.
ontario.ca/archives
What is Digital Preservation?
• Digital Preservation is the management of digital
information to ensure it is accessible and understandable
over time.
OR
• Digital Preservation encompasses a broad range of
activities designed to extend the usable life of digital files
and protect them from media failure, physical loss, and
obsolescence.
• However, it is one thing to preserve a bitstream, but
quite another to preserve the content, form, style,
appearance, and functionality.
ontario.ca/archives
Digital Preservation Threats
• File Format and Software
Obsolescence
• Hardware and Media Obsolescence
• Physical Threats
ontario.ca/archives
Digital Preservation Strategies
Basic
• Bitstream Copying (backups)
• Refreshing
• Durable/Persistent Media (e.g. Gold CDs)
• Analog Backups (e.g. microfilm)
Expensive – Not Feasible
• Technology Preservation (‘computer museum’)
• Digital Archaeology (data recovery)
Preferred Approaches
• Migration (most preferred approach currently)
• Normalization (reliance on standard format – PDF/A)
• Emulation (e.g. Universal Virtual Computer)
• Encapsulation (‘wrapping’)
ontario.ca/archives
Digital Preservation Standards - ISO
•
•
•
•
•
ISO 14721:2003 - Open Archival Information System
(OAIS) - Reference model
Metrics for Digital Repository Audit and Certification
RED BOOK, CCSDS. Oct 2009
ISO/TR 18492:2005 - Long-term preservation of
electronic document-based information
ISO 19005-1:2005 - Document management Electronic document file format for long-term
preservation - Part 1: Use of PDF 1.4 (PDF/A-1)
ISO 15801 - Electronic imaging - Information stored
electronically - Recommendations for trustworthiness
and reliability
ontario.ca/archives
Digital Preservation at the
Archives of Ontario
ontario.ca/archives
12
Existing Archival Digital Records
Program
• Program has existed since 1997.
• Program is focused on the long-term
preservation of archival digital records.
• 2 full-time employees – Senior Coordinators,
Archival Electronic Records.
• Created Electronic Records Online section of
AO website in 2009.
ontario.ca/archives
Existing Archival Digital Repository
• Existing digital repository is on a virtual server maintained by
Infrastructure Technology Services (ITS).
• Current digital holdings are about 5.5 TB, consisting of some
1.5 TB of archival born-digital records and 4 TB of digitized
images (mostly VS records).
• These digital records are in various formats: MS Office
documents, e-mails, HTML, digital audio and video files,
databases, digital images, and websites etc.
• Existing repository is not adequate to meet future operational
requirements as it offers little functionality to preserve and
secure the digital records properly or make them accessible
online.
ontario.ca/archives
Transfer of Digital Records
• The Archives of Ontario currently acquires archival
digital records from Ontario public bodies and private
donors.
• Guideline for Transferring Electronic Records to the
Archives of Ontario was revised in September 2009.
• Assists with the transfer of archival digital records to the
Archives in accordance with an approved records series
that has a final disposition of ‘Transfer to Archives’.
• This guideline applies to all Ontario government public
bodies that are subject to the requirements of the
Archives and Recordkeeping Act, 2006.
ontario.ca/archives
Transfer of Digital Records – Cont’d
• Originating public bodies are responsible for
ensuring that all digital records in their custody
remain readable, accessible, secure, free of
viruses, and are able to satisfy legal and
evidentiary requirements throughout their lifecycle.
• Digital records are to be transferred in a software
independent format whenever possible, or in a
format the Archives finds acceptable.
• In general, the Archives will not acquire
specialized software applications and their ongoing
licenses.
ontario.ca/archives
Transfer of Digital Records – Cont’d
• Transfer Procedures
– Consult with Archives
– Identify Records for Transfer
– Complete a Test Transfer
– Transfer Official Records and
Documentation
– Confirm Receipt of Records Transfer
ontario.ca/archives
Trustworthy Digital Repository
(TDR)
ontario.ca/archives
Trustworthy Digital Repository (TDR) – What
is it?
Definition:
‘a mission to provide reliable, long-term access to
managed digital resources to its Designated
Community, now and into the future’
Taken from ‘Audit and Certification of Trustworthy Digital Repositories’ - October
2009
ontario.ca/archives
TDR - What is it - Cont’d
• A TDR is a long-term solution for the
preservation of digital records of archival
value.
• It will be driven by the Archives’ business
requirements and will be modelled on ISO
standards and other best practices as
well.
ontario.ca/archives
TDR - What is it - Key Components
TDR will be modelled on ISO
standards – OAIS Reference
Model, and Audit and
Certification of Trustworthy
Digital Repository.
The Archives’ TDR will be
certified once an
international/national
certification process is
developed.
Staff
ontario.ca/archives
TDR – What is it - OAIS Reference Model
ontario.ca/archives
TDR - Why do we need it?
• Ensures the Archives meets its mandated statutory
obligations as per the Archives and Recordkeeping Act,
2006.
• Meets the priority for long-term digital preservation as
identified in Ontario’s Five Year Corporate I&IT Plan
(2008-2013).
• Meets the government’s priority of strengthening front-line
service delivery by greatly improving services to the public
at the Archives. TDR will provide ‘anytime, anywhere’
remote 24/7 online access to archival digital records.
ontario.ca/archives
TDR - Why do we need it? Cont’d
•
•
•
•
•
To preserve any type of electronic record,
Created using any type of application,
On any computing platform,
Delivered on any digital media,
From any public body in the Ontario Government and
any private donor,
• To provide discovery and delivery to anyone with an
interest and legal right of access,
• For present and future generations … …
Revised from: http://www.archives.gov/era (U.S. A. National Archives and Records Administration Electronic
Records Archives)
ontario.ca/archives
TDR - What has been done?
• Full Business Case
– Main recommendation: Acquire a Modifiable Off-the-Shelf
(MOTS) solution or a Commercial Off-the-Shelf (COTS) solution
• Request for Information (RFI) for a trusted digital repository solution
– Identified 5 vendors with viable long-term digital preservation
repository solutions
• High-level Functional Requirement Analysis for the future
trustworthy digital repository
– For main entities and functions of digital repository
• IT Governance Process
– Gate 0 approval and Gate 1 GGRC endorsement
ontario.ca/archives
TDR – What has been done - Full Business
Case
• Main recommendation: Acquire a Modifiable Off-theShelf (MOTS) solution or a Commercial Off-the-Shelf
(COTS) solution
• Other options which have been analyzed for the
development of a TDR are:
–
–
–
–
Utilize an integrated open source software (OSS) solution
Acquire a commercial custom system
Develop a digital preservation system in-house
Rely on OPS public bodies to preserve archival digital records
ontario.ca/archives
TDR – What has been done - Request for
Information
•
The RFI has been well received by potential vendors with none finding difficulty with the
concepts and constructs (such as OAIS Reference Model and TDR etc.) contained in the
RFI document. A wealth of valuable information was received from the 7 respondents.
•
All 5 TDR-focused submissions meet or exceed the basic requirements for a TDR as
outlined in the RFI and demonstrate the availability of modifiable off-the-shelf (MOTS)
products on the digital repository market.
•
The estimated cost of purchasing and implementing such a solution (including software,
hardware, customization, integration, and implementation, etc.) varies from $400,000 to
$2,000,000.
•
The adoption of Open Source Software (OSS) applications seems inevitable. Among the 5
TDR-focused submissions, 3 solutions comprise OSS components; while 2 other solutions
are completely made up of OSS applications.
•
The OAIS Reference Model, and the other TDR-related standards and best practices are
highly accepted and followed by the solution providers.
•
The use of any solution proposed alone will not guarantee the TDR’s compliance with the
OAIS Reference Model and Trustworthy Repositories Audit & Certification.
ontario.ca/archives
TDR – What has been done - High-level
Functional Requirement Analysis
Ingest (Entity)
35 Use Cases were
developed for main Entities
and Functions of a TDR:
– Ingest (7)
– Archival Storage (8)
– Data Management (4)
– Access (4)
– Administration (7)
– Preservation Planning (5)
OAIS
(Function)
TDR
(Function)
Comparison
Manage Transfer
Agreement
Move Transfer Agreement
Management from
Administration to Ingest
Receive
Submission
Receive SIP
Submission
Quality
Assurance
Perform SIP
Quality Assurance
Generate AIP
Generate AIP
Generate
Descriptive
Information
Extract
Descriptive
Metadata
Coordinate
Updates
Delete Coordinate Updates, and
incorporate the functionalities
into Generate AIP and Extract
Descriptive Metadata under
Ingest
Notify Transfer
Result
ontario.ca/archives
Add Notify Transfer Result
TDR - What has been done - High-level
Functional Requirement Analysis cont’d
Use Case Template
ontario.ca/archives
29
TDR - What has been done - High-level Functional
Requirement Analysis cont’d
RADR – Potential Integration with other IT applications
Friday, October 15, 2010
CSS
Federated
+
6. Interface
The Series
Management
Database
ADD
CTS
7. Interface
5. Integration
De
1 SIP
Producer –
r
sc
m
i ve
ipt
Ingest
eta
da
ta
Data
Management
ry
re s
ult/
re p
o rt
Access
Metadata
maintenance
AIP
AIP
EIM Systemt
Qu
e
DIP
3. Interface
Search results
RADR
4. Interface
Consumer
2. Interface
Archival Storage
Administration
Digitization
Preservation Planning
Notes:
1. TDR (ingest) interfaces to Producers’ system, especially their EIM Open Text System for the transfer of SIPs..
2. TDR (Ingest) interfaces to Digitization projects in coordinating transfers of digitized images.
3. TDR (Ingest) interfaces to the CTS in coordinating transfers of mixed physical/digital records. Functionality might be very limited at early stage of RADR implementation.
4. TDR (Ingest) interfaces to the Series Management Database to collecting records schedule information. Functionality might be very limited at early stage of RADR implementation.
5. TDR (Ingest) integrates with the ADD to cooperate on metadata capture and describing digital records.
6. TDR (Data Management) interfaces to ADD in proper storage and maintenance of metadata, especially duplicate descriptive metadata.
7. TDR (Access) interfaces to AO Federated Search Engine and Customer Service System (CSS) in assisting users’ searching and ordering activities. TDR doesn’t interact with users directly,
however TDR is responsible for preparing query results, reports and DIPs for Search Engine and/or CSS to deliver.
ontario.ca/archives
30
Page 1
TDR - What has been done - High-level
Functional Requirement Analysis cont’d
Archival process within Ingest
Functions
Integrated
Archival
Process
Scheduling
records
Manage
Transfer
Agreement
Receive SIP
Submission
Perform SIP
Quality
Assurance
Generate AIP
Extract
Descriptive
Metadata
Notify Transfer
Result
Developing
Transfer
Arrangement
Receiving,
Selection,
Quality checking,
Selection,
Accessioning,
Culling
Arrangement,
Description,
Metadata Capture,
Creation of AIPs
Extraction of
descriptive
metadata
Notifying
producers about
transfer status
Reengineering of digital records management process is one of the biggest challenges we are
facing. We mapped the archival process into OAIS Entities and Functions.
ontario.ca/archives
31
TDR - What has been done - High-level Functional
Requirement Analysis cont’d
Digital Records
Transfer Guideline
TDR Media
Management
Guideline
TDR AIP
Packaging Standard
TDR AIP Migration
Procedure
Ingest
TDR Database
administration policy
……
Archival Storage
Data Management
DIP Packaging
Standard
TDR Import and
Export Guideline
……
Access
……
Technology
Monitoring Guideline
… ...
Administration
Preservation Planning
TDR Overall Policies & Procedures
TDR Mission Statement
TDR Security Policy
Backup and Recovery Policy
TDR Naming /Numbering
Convention
TDR User Access Control
TDR Contingency Plan
…...
System Configuration Manual
Digital Collection Policy
Digital Records Selection and
Culling Guideline
The Archives Fundamental Digital Preservation Polices& Procedures
Digital Preservation Policy
Digital Preservation Strategic
Plan
Digital Preservation Method
Digital Records File Format
Guideline
ontario.ca/archives
Structure of Policies and Procedures
Recommended
TDR Entity-specific Policies & Procedures
…...
32
TDR- What is being done - Open Source
Software (OSS) Experiments
OSS testing: objectives
• Test functionalities of various products
• Assess the feasibility of utilizing these tools for interim
• Validate and refine the detailed functional requirements
for the TDR
• Inform revisions to the Archives’ existing digital records
guidelines and associated policies
• Determine appropriate preservation tools
• Further understand our existing electronic records,
identify preservation risks, and potential mitigation
approaches
ontario.ca/archives
TDR- What is being done - Open Source
Software (OSS) Experiments – Cont’d
OSS testing: tools to be tested
• Tools which validate file formats and extract technical metadata:
– DROID (created by The National Archives of UK)
– JHOVE (created by Harvard University)
– NLNZ (created by the National Library of New Zealand)
• Tools which convert digital objects to open formats:
– XENA (created by the National Archives of Australia)
• Tools which manage the object assessment and ingest process:
– Archivematica (created by Artefactual Systems)
• Preservation testbed environment and project management
software:
– Planets Comparator, Planets Testbed, Planets Plato
ontario.ca/archives
TDR- What is being done - Open Source
Software (OSS) Experiments – Cont’d
Technical Inventory of Digital Records in the AO’s eRepository
• Identify the file formats and the other technical
features of digital records in the Archives
holdings
• Identify records requiring immediate
preservation action
• Assess preservation risks of digital records in
the Archives’ holdings
• Determine priorities for future preservation
operations
• Inform revisions to current procedures
ontario.ca/archives
TDR – Next Steps?
• Work will proceed in-house on developing
detailed functional requirements for the
TDR.
• Explore options for the development of the
TDR.
• Creation of long-term digital preservation
strategy.
• Creation of long-term digital preservation
policy.
ontario.ca/archives
TDR - Detailed Requirements – Preliminary
Plan
•
Deliverables
– Detailed requirement specifications for all 6 Entities (Ingest, Archival
Storage, Data Management, Access, Preservation Planning and
Administration) of a future TDR to be developed and validated
– Detailed workflow for the management of archival digital records,
starting from receiving, selection, accessioning, through archival
description, storage to search and ordering etc. to be developed and
validated
•
Objectives
– Provide a sound foundation for the future development and
implementation of a TDR in the Archives;
– Ensure the future TDR can fit well into the overall Archives business
environment, meet actual business requirements, work smoothly with
the other IT applications already in place, and
– Follow related ISO standards and digital preservation/TDR best
practices.
ontario.ca/archives
37
TDR - Detailed Requirements - Reference
Materials
ontario.ca/archives
38
TDR - Detailed Requirements - Reference
Materials cont’d
ontario.ca/archives
39
TDR - Detailed Requirements - Methodology
ontario.ca/archives
40
TDR & ECM
ontario.ca/archives
Linkages with ECM
• Long-term digital preservation begins at the desktop active records.
• Proper recordkeeping during all stages of IM lifecycle will
ensure that records can be properly managed in TDR.
• Preservation policy required to mitigate risks to legacy
digital records.
• IT and information management areas need to partner to
address challenges, incorporating recordkeeping
requirements.
ontario.ca/archives
Linkages with ECM Cont’d
• Elements of a TDR can be applied to nonarchival active/semi-active records that have
long-term retention requirements.
• TDR ensures the sustainability of an Enterprise
Content Management (ECM) strategy by
providing a trustworthy exporting channel and
permanent repository for archival digital records
initially managed by ECM system.
ontario.ca/archives
TDR vs. ECM/RDMS
TDR ≠ ECM
• Have different objectives.
• Use different standards.
• Look forward to future developments such as an
integrated solution with both records
management and long-term digital preservation
capabilities.
ontario.ca/archives
TDR vs. ECM/RDMS Cont’d
ECM (RDMS as major component)
Trustworthy Digital Repository
Objectives
To regain control over electronic records/information
by providing system tools to capture, classify and
apply retention schedules and access controls to erecords.
To preserve and provide access to digital records/information, free
from dependence on any specific hardware and software, for as long
as required
Functions
Capture, File plan, Retention and disposition, Access
control, Document management, Workflow,
Collaboration
Ingest, Archival Storage, Data Management, Preservation Planning,
Access, and Administration
Standards/Be
st Practices
ISO 15489; DOD 5015.2,
MoReq, Functional Requirements for ERMS(ICA
2008) etc
ISO 14721:2003: Open Archival Information System (OAIS) Reference model; ISO 20652:2006 Producer-archive interface -Methodology abstract standard; Trustworthy Repositories Audit &
Certification: Criteria and Checklist V1.0; etc
Suppliers
Open Text, EMC2 (Documentum), HP (Trim), IBM
(Filenet) etc
Lockheed Martin, Tessella, Ex Libris, IBM, SUN, HP, Microsoft
ontario.ca/archives
TDR vs. ECM/RDMS Cont’d
Active
Inactive
Semi-active
Public electronic
records with long
retention periods
Almost all public
electronic records
ECM
Repositories
Transfer of archival
electronic records into the
Archives' Repository
ontario.ca/archives
All archival electronic
records that have fulfilled
their retention periods
Archives’ TDR
Digital Preservation Collaboration:
Pan-Canadian Efforts &
External/Internal Partnerships
ontario.ca/archives
Collaboration - Goals
• Similar to the Archives of Ontario, other archives
and many areas of government are facing
preservation challenges.
• Promote the awareness of long-term digital
preservation.
• Bring key stakeholders together.
• Collectively share the knowledge gained from
the important work being done in the Archives
and across government.
ontario.ca/archives
National Digital Preservation Working Group
(NDPWG)
• The group was established by the Archives of Ontario in
August 2008. 8 meetings have been held to date.
• The mandate of the group is to provide a forum for
practitioners in the field of digital preservation to share ideas
and expertise, discuss best practices and lessons learned.
• The membership includes :
– Saskatchewan – Manitoba – Nova Scotia – Nunavut
– Northwest Territories – Yukon – Alberta
– Manitoba – Library and Archives Canada
• The Archives of Ontario is the current chair for the NDPWG.
ontario.ca/archives
Canadian Preservation Cooperation
Strategy
• Library and Archives Canada (LAC) visited Archives on July 27th,
2010, to discuss a number of digital preservation projects where
they could work collaboratively with the Archives.
• Subsequent to the meeting, the Archives, LAC and the
Saskatchewan Archives Board agreed to develop a Canadian
Preservation Cooperation Strategy on Digital Preservation that
outlines the principles of the group and its proposed projects.
• Meetings have been held to develop work plans and other planning
documents.
• Canadian Preservation Cooperation Strategy was presented at
National, Provincial and Territorial Archivists Conference (NPTAC)
on Friday 22 October 2010.
• First joint project is Canadian Registry of Digital Storage Media –
final draft completed.
ontario.ca/archives
Canadian TDR Network
• Initiative started by LAC and the University of Alberta in March 2010.
• Emerged out of the process that built the Canadian Digital
Information Strategy.
• Idea is to start with a small group of pioneering institutions that will
begin a process of understanding and articulating the issues
involved with building a TDR network.
• The short-term goal is to create a coalition from which the group can
begin to build its preservation capacity.
• Kick-off meeting held November 26th at LAC.
• Development of a strategy and vision document is underway (by
LAC, University of Alberta Library, Archives of Ontario, University of
British Columbia Library).
ontario.ca/archives
Academic Partnerships - iSchool
• Archives has partnered with the Faculty of Information
(iSchool) at the University of Toronto on a number of
digital preservation activities:
– Attended Digital Preservation Reading Course led by Dean
Seamus Ross from February – April 2010.
– Hosted practicum (internship) for iSchool student Suzanne
Leblanc from May-August 2010. She completed a survey and
report on digital preservation file formats for digital video.
– Attended iSchool hosted Digital Curation Matters conference
June 16-17 2010.
– Have explored possibility of employing PhD. students and jointly
applying for grant funding for preservation research projects.
ontario.ca/archives
International Liaisons
• Have had numerous interactions with
international digital preservation jurisdictions.
– Hosted delegations from international archives including:
• Hefei City, Anhui Province, China – April 17, 2009
• National Archives of Japan - March 19, 2010
• Malaysia National Archives – April 30, 2010
– Ongoing information sharing with colleagues in the USA, UK,
Australia and New Zealand.
ontario.ca/archives
Plans for Ontario Government
• Creation of Digital Preservation
Collaboration Committee
• Launching a Digital Preservation
OPSpedia (internal social networking)
site
• Setting up digital preservation web
presence on the Archives’
inter/intranet
ontario.ca/archives
Thank You!
Questions?
ontario.ca/archives
Contact Information
Ryan Carpenter
Senior Coordinator, Archival Electronic Records
Archives of Ontario
Ryan.Carpenter@ontario.ca, 416-327-8174
Lijuan Yu
Senior Coordinator, Archival Electronic Records
Archives of Ontario
Lijuan.Yu@ontario.ca, 416-327-1588
ontario.ca/archives
Download