Terry Harrison – notes on _________________________________________________________________

advertisement
Terry Harrison – notes on
Seminal reading:
_________________________________________________________________
Preserving Digital Information: Final Report and Recommendations
Task Force on Archiving of Digital Information – Research Library Group
http://www.rlg.org/ArchTF/
_________________________________________________________________
This document takes a broad look at the goals and requirements of preservation of digital objects from a
nationwide (and possibly even larger) perspective. A distributed archive solution is recommended, for
reasons of both cost and redundancy. A digital archive certification mechanism will be needed to ensure
the capabilities of participating archives. – TH
Background:
At the end of 1994 the Commission on Preservation and Access (CPA) and RLG created a Task Force on
Archiving of Digital Information charged with investigating and recommending means to ensure
"continued access indefinitely into the future of records stored in digital electronic form." In May 1996, the
21-member task force, co-chaired by Donald Waters and John Garrett, completed their final report. Both
RLG and CPA have made this widely available. (In 1997, CPA merged with the Council on Library
Resources to become CLIR—the Council on Library and Information Resources.)
This is perhaps the first commissioned study on digital archiving.
64 pages
Status: currently reading
_________________________________________________________________
Thoughts:
- Status of emulation software / hardware
Migration – to move from one technology to another
Refresh – to periodically recopy (e.g. to a drive w. less hours on it)
Digital archives are different than digital libraries
Archives need a “certification” process (to assure competency)
Archives need to be able to exercise an aggressive rescue function for digital information
Features of digital landscape
Stakeholders
Hardware obsolescence – no new parts for old machines
 1960 Census – only 2 UNIVAC type II-A machines left in world, when Census Bureau decided to
attempt to refresh data

1964 – the 1st email was not saved. Not sure which research group sent first message: MIT,
Carnegie Institute of Tech, or Cambridge University

1960s LUNR Project – Land Use and Natural Resources Inventory Project; computerized map of
NY depicting patterns of land usage and identifying natural resources. In 1980’s data could not be
retrieved off of computer tapes, leaving only printouts and transparency overlays.

1985 - Committee on the Records of Government “The United States is in danger of losing its
memory”
“If we are effectively to preserve for future generations the portion of this rapidly expanding corpus of
information in digital form that represents our cultural record, we need to understand the costs of doing so
and we need to commit ourselves technically, legally, economically and organizationally to the full
dimensions of the task. Failure to look for trusted means and methods of digital preservation will certainly
exact a stiff, long-term cultural penalty.” PAGE 4
Refreshing is good, but not a compete solution
 2-5 year technical obsolescence cycle (shorter than media shelf life)
 Hardware and software dependent records may not be forward compatible
 Expensive for hardware and software to maintain backward compatibility
 Proprietary systems often don’t work w. competing products
 Emulators could help
Migration
 Defined as “periodic transfer of digital materials from one hardware/software configuration to
another, or from one generation of computer technology to another”
 Migration may not yield perfect copies onto newer technologies
 E.g. - .psd to .jpeg is a lossy transition
 Forward migration of information to a new standard or application program is “time consuming,
costly, and more complex than simply refreshing”
Legal and Organizational Issues
 Always complex
 “Bits know no borders”
Sudo summary
 “(The) greatest fear about the life of information in the digital future: namely that owners or
custodians who can no longer bear the expense and difficulty (of moving digital information
forward in the digital future) will deliberately or inadvertently, through a simple failure to act,
destroy the objects without regard for future use.
The Need for a Deep Infrastructure
 w. “Various systematic supports”
 Many diff aspects of the digital environment will have to be addressed in diff ways - th
 No one stop solution – th
Conceptual Framework
 A national system of digital archives envisioned
 Long-term storage and access goals (which are issues not always dealt w. by dig. libraries)
 Repository criteria:
o Development of an archive certification standard suggested
o “Aggressive fail-safe mechanism” to rescue information that is endangered at its current
location.
Plan of Work
 Archival responsibility starts at creation (w. creator)
 “We can afford to continue and increase economic and social investments in digital information
objects and in the responsibilities for them on the information superhighway if , and only if, we
also create the archival means for the knowledge the objects and repositories contain to endure and
redound to the benefit of future generations”
INFORMATION OBJECTS IN THE DIGITAL LANDSCAPE p.11
Integrity of digital information: determined by content, fixity, reference, provenance, and context gives the
digital information object its value.
Content:
 Definition as: preserving unique bit configuration, checksums to check, all well but limiting, esp.
where limited by hardware/software (think word-processing document).
 Bits versus use. To save as an image as a JPEG is lossy, but aids in storage and use.
 Save bits? Save the idea conveyed? Save it so that it can be used? Answers will be different for
different information
Fixity:
 An object’s integrity is lost if it is constantly changing (ie document revisions can obscure the
original document).
 Watermarking of digital objects (i.e. as canonical version)
 Snapshots in time (i.e. for databases) – sounds familiar to Internet Archives – Wayback Machine
Reference:
 “Must be able to locate it definitively and reliably over time…”
 URN, URLs
 Must take into account provenance and context
Provenance: tracing of the path from where a digital object came
 Helpful in sorting out multiple versions, derived works, source of data (instrumentation),
migration, transformations, authenticity
Context: how a digital object interacts with “the wider digital environment” p.18
 Technical context: Hardware/software dependencies (i.e.: disk may need special drive, format may
require special application)
 Emulators are helpful in dealing w. some of there issues (i.e.: video game emulation)
 Medium – i.e. capturing the experience of using a CD-ROM
Stakeholder Interests p.19 - Those interested in making, appending, or using an object. Must be careful
not to corrupt the actual object.
ARCHIVAL ROLES AND RESPONSIBILITIES p.21
 Best plan: distributed digital archives
 Must be stored and maintained in an accessible form p.22
 “Fail-safe mechanism” to protect records if in danger of neglect, destruction, abandonment (where
one agency could rescue another’s endangered materials).
 Current and proposed Copyright law don’t provide for an aggressive fail-safe mechanism
 Possibility: Legally mandated depositories – make creator bound to place a copy of their digital
work in a certified digital archive in a standard archival format
Appraisal and Selection – an ongoing process, know where copies are to reduce redundant acquisition.
 "Which things do you keep?"
 Decisions on migration when it fundamentally alters the work
Accession – preparation for object for archiving
 "Carefully packing them up"
 Metadata
 Deaccesion to public should be announced so rescue efforts can be made
 Access Control – Terms and Condition (i.e.: to meet copyright requirements)
Storage
 "Where to put them"
 TH- new costs metrics are making tape libraries no longer cost effective


Online, near-line (i.e. robotic jukeboxes), off-line
REDUNDANCY – cover you bum! - TH
Access
 "How to get to them. How to protect them from unauthorized users"
 Prevent unauthorized use, protect intellectual property right (facilitate transactions between rightsholders and users
Systems Engineering
 "When to copy and how to do so"
 Help to determine when digital archives should migrate to new hardware and software p.27
MIGRATION STRATEGIES
Change Media –
 I.e.: text digital objects  printed out and stored to microfilm (long-lasting, low operational
barrier format)
 Migration may cause “flattening” of an non-standard object (i.e.: can’t really make microfilm
version of a spreadsheet and maintain its internal computational capabilities
Change Format
 Good to transform to a standard format, but this may be lossy (i.e.: JPEG is a lossy image
compression)
Incorporate Standards
 Archives technological infrastructure should conform to widely adopted standards p29
Build Migration Paths
 Work w. industry/donors on backwards compatibility and migration paths.
 Support the development of industry standards w. these issues in mind
 Vendors don’t often build migration paths between/with competitors
Using Processing Centers
 Develop centers that specialize in reformatting and migration of obsolete materials
 Emulation (both hardware and software) is valuable here.
 Development of a national laboratory for digital preservation (modeled after National Media
Laboratory)
Managing Costs and Finances
 Storage costs declining
 Unknown rights-management costs
 Investment in systems engineering and infrastructure is critical for distributed archive system
Cost Modeling p.31

Hard to determine the costs of archiving the different kinds of digital objects
 Yale model
o Used to compare traditional paper based archive versus digital equivalent
o Digital model most effective when resources are distributed
Financing
 Who will pay for all of this?
 Tax incentives and accounting rules favoring preservation?
 Digital information direct charging?
 APS (America Physics Society) and ACM (Association of Computing Machinery) are facing these
issues as they create their own digital libraries
SUMMARY
 “First line of defense against loss of valuable digital information resides w. the creator, providers
and owners of digital information”




Need for deep infrastructure to support a distributed digital archive
Need sufficient number of trusted organizations capable of storing, migrating, and providing
access to the digital information
Need a process of digital archive certification to develop trust
“Certified archives must have the right and duty to pursue an aggressive rescue function … for
valuable digital information in jeopardy of destruction, neglect or abandonment by its current
custodian.”
Related documents
Download