Lesk-preservation.ppt

advertisement
Digital Preservation
Preservation through permanent media
Preservation through emulation
Preservation through distributed duplicates
Metadata and labeling
Whose responsibility?
Digital has its risks
We know the first telegraph message:
“what hath God wrought”
Samuel F. B. Morse, Baltimore, 24 May 1844
We know the first telephone call:
“Watson, come here, I need you”
Alexander Graham Bell, Boston, March 10, 1876
We do not know the first e-mail message!
(Spring 1964, but Cambridge UK? MIT? or CMU?
How safe is digital?
The Wall Street Journal,
October 5, 1993.
By the way, Hanson was
split up into four companies
in January 1996. They no
longer make cigarettes or
batteries. But they still
make bricks.
What gets lost?
Less than you would think - hard to find examples
Usually, failure to access information while something
was going wrong.
Partial loss of obsolete tapes from 1960 US Census
Loss of some Brazilian Landsat tapes
Sometimes, staff issues: old GDR records.
Conversion to standards
Normal strategy involves use of standard software
formats and regular hardware copying. This may
involve updates of content (eg early computer files
may not even have had lower case letters).
Librarians want to avoid proprietary formats but this
can be hard:
some formats are virtually monopolies (Microsoft)
patents can suddenly surface (JPEG)
Permanent Media
Not in favor in research community. Optical discs and
punched cards outlasted their equipment.
DVD-RAM now popular, following CD-ROM
Not just technological obsolescence to be feared:
entertainment community might make it impossible to buy
PCs that can read and write arbitrary files.
Clockwise from top
left: DECtape, 1/4 inch
cartridge, 8mm, DAT,
1/2 inch reel, and
Univac I steel tape.
Emulation
Jeff Rothenberg the leader here.
Goal: preserve old files by emulating equipment.
Example: the laser readers for vinyl sound disks.
First effort: 1986 UK Domesday project.
BBC Domesday project - 1986
Attempt to re-do the survey of
England done in 1085.
Unfortunately, published on 12inch laser disc in a format that
died quickly in the marketplace.
Leeds Univ. and U. Michigan
now trying to emulate original
hardware.
Duplication and sharing
Protection against loss via multiple copies
Individual backups are traditional, but do not
protect against organizational bankruptcy and if
not regularly exercised might not be dependable
NEC (Yanilos) shareable file systems.
Stanford has two projects on sharing files.
LOCKSS
Vicky Reich, Stanford; David Rosenthal, Sun.
LOCKSS: time and consensus
The LOCKSS project is particularly interesting for
two reasons: running slowly and relying on a
combination of consensus and reputation.
A. By making it fast to find one copy of something,
but slow to find all copies, it becomes difficult for a
vandal to find and destroy all copies of a file.
B. By relying on a weighted polling system in which
a site can gain weight only by agreeing with many
prior decisions, it is difficult even for an insider to
insist on installing bad versions of files.
ARCHSIM
Arturo Crespo, Hector Garcia-Molina
Simulation system for a multiple-copy digital
preservation archive. Easy evaluation of effect of
the number of copies, the reliability of each device,
and the frequency of testing and error detection.
Metadata and labeling
It’s no good having the data if you can’t find it.
Issues about labeling copies: are slightly different
versions equivalent? What do we need to retain
about format? What about access control?
There are some technical answers (checksums) but
also thought needed.
Current focus: EAD (Encoded Archival Description),
and OAI (Open Archives Initiative).
Whose responsibility?
What if the publisher will not let subscribers do
archiving and copying?
What if the publisher provides temporary access
only to encrypted files, and then goes bankrupt?
What if the file format is only usable by a program
that might become obsolete, and the file owner
refuses permission to migrate to a new format?
Elsevier has promised to maintain its files.
A “right of aggressive rescue” has been proposed
But we don’t really have an answer.
Conclusions
We know about technology: migration to
standard formats, and sharing of files, are
probably good answers.
We don’t know about economics, law and
society.
Should there be compulsory clear-text
deposit of electronic resources?
How should digital preservation be funded?
Should we select or just keep everything?
Download