Digital Preservation Preservation through permanent media Preservation through emulation Preservation through distributed duplicates Metadata and labeling Whose responsibility? Digital has its risks We know the first telegraph message: “what hath God wrought” Samuel F. B. Morse, Baltimore, 24 May 1844 We know the first telephone call: “Watson, come here, I need you” Alexander Graham Bell, Boston, March 10, 1876 We do not know the first e-mail message! (Spring 1964, but Cambridge UK? MIT? or CMU? How safe is digital? The Wall Street Journal, October 5, 1993. By the way, Hanson was split up into four companies in January 1996. They no longer make cigarettes or batteries. But they still make bricks. What gets lost? Less than you would think - hard to find examples Usually, failure to access information while something was going wrong. Partial loss of obsolete tapes from 1960 US Census Loss of some Brazilian Landsat tapes Sometimes, staff issues: old GDR records. Conversion to standards Normal strategy involves use of standard software formats and regular hardware copying. This may involve updates of content (eg early computer files may not even have had lower case letters). Librarians want to avoid proprietary formats but this can be hard: some formats are virtually monopolies (Microsoft) patents can suddenly surface (JPEG) Permanent Media Not in favor in research community. Optical discs and punched cards outlasted their equipment. DVD-RAM now popular, following CD-ROM Not just technological obsolescence to be feared: entertainment community might make it impossible to buy PCs that can read and write arbitrary files. Clockwise from top left: DECtape, 1/4 inch cartridge, 8mm, DAT, 1/2 inch reel, and Univac I steel tape. Emulation Jeff Rothenberg the leader here. Goal: preserve old files by emulating equipment. Example: the laser readers for vinyl sound disks. First effort: 1986 UK Domesday project. BBC Domesday project - 1986 Attempt to re-do the survey of England done in 1085. Unfortunately, published on 12inch laser disc in a format that died quickly in the marketplace. Leeds Univ. and U. Michigan now trying to emulate original hardware. Duplication and sharing Protection against loss via multiple copies Individual backups are traditional, but do not protect against organizational bankruptcy and if not regularly exercised might not be dependable NEC (Yanilos) shareable file systems. Stanford has two projects on sharing files. LOCKSS Vicky Reich, Stanford; David Rosenthal, Sun. LOCKSS: time and consensus The LOCKSS project is particularly interesting for two reasons: running slowly and relying on a combination of consensus and reputation. A. By making it fast to find one copy of something, but slow to find all copies, it becomes difficult for a vandal to find and destroy all copies of a file. B. By relying on a weighted polling system in which a site can gain weight only by agreeing with many prior decisions, it is difficult even for an insider to insist on installing bad versions of files. ARCHSIM Arturo Crespo, Hector Garcia-Molina Simulation system for a multiple-copy digital preservation archive. Easy evaluation of effect of the number of copies, the reliability of each device, and the frequency of testing and error detection. Metadata and labeling It’s no good having the data if you can’t find it. Issues about labeling copies: are slightly different versions equivalent? What do we need to retain about format? What about access control? There are some technical answers (checksums) but also thought needed. Current focus: EAD (Encoded Archival Description), and OAI (Open Archives Initiative). Whose responsibility? What if the publisher will not let subscribers do archiving and copying? What if the publisher provides temporary access only to encrypted files, and then goes bankrupt? What if the file format is only usable by a program that might become obsolete, and the file owner refuses permission to migrate to a new format? Elsevier has promised to maintain its files. A “right of aggressive rescue” has been proposed But we don’t really have an answer. Conclusions We know about technology: migration to standard formats, and sharing of files, are probably good answers. We don’t know about economics, law and society. Should there be compulsory clear-text deposit of electronic resources? How should digital preservation be funded? Should we select or just keep everything?