18 Media Selection and Migration

advertisement
18 Media Selection and Migration
This best practice deals with both digital storage media reliability concerns versus costs and the
inevitable need to migrate existing data to different (usually newer technology) storage.
Digital storage media is sometimes intertwined with digital file formats, but formats are dealt with in
Best Practice 2.8.
Table of Contents
18.1 Media Selection
18.2 Digital Storage Media in Use
18.3 Handling & Storage of Optical Media
18.4 Risks and Reliability Considerations
18.5 Error Detection & Correction: Media validation / Fixity tests
18.6 Media Migration
18.7 References and Additional Sources
18.1 Media Selection
There are many types of data storage media and selection of the best type(s) for a given situation can be
complex. Factors such as total data size, rate of data growth, user access needs, desired length of
retention, preservation needs, data value, and available budget can all affect the suitability of storage
types for a given situation. Even in cases where we know a data set should be online, some of those
factors will affect which online storage array(s) are used and what sort of backup or replication
configuration is used.
•
•
•
For questions about online storage needs, start by contacting IT Infrastructure & Software
Development (ISD). Depending on the project needs and funding, storage might be provided to
you by ISD, Grainger Library, or outsourced.
For questions about experience with optical media, contact Digital Content Creation (DCC).
For questions about the Library’s Fedora digital repository currently in planning stages, contact
Tom Habing.
18.2 Digital Storage Media in Use
At the Library, several types of digital media are currently used to store digitized content. Each one
presents tradeoffs with respect to reliability, expected longevity, ease of access and validation, and
various costs (including purchase, maintenance, labor, energy, and space). To keep this document
manageable this document only makes reference to storage media in use by the UIUC Library.
•
Magnetic disk drives
o
Online: Networked storage volumes (includes SAN and server-attached storage disk
arrays)
o
•
•
Positive considerations: high density; highly accessible & available; high speed;
automated validation is very feasible; duplication and remote replication is fast
and relatively easy; most flexible option; many variations available.

Negative considerations: High cost; high energy use; most susceptible to data
loss by human error;
Offline: Disk drives dismounted and stored unpowered.

Positive considerations: Inexpensive; very high-density; no energy use while
offline; fast data storage and retrieval while connected (temporarily online); can
be connected to servers for multi-user access if needed.

Negative considerations: Should be duplicated to multiple physical discs;
periodic verification is feasible but requires some labor; not all hard disks are
designed for external storage; some hard disks still fail with power cycling.
Magnetic tape, offline. (various formats)
o
Summary: Not recommended for primary storage but can be effective for
duplicate/backup copies.
o
Negative considerations: Hundreds of tape formats become effectively obsolete quicker
than the media deteriorates; often recorded using proprietary hardware and/or data
encoding; medium labor costs; periodic verification may be prohibitively costly;
dependability is variable.
o
Positive considerations: Very low energy use; data density can be much higher than
optical; can be cost-effective for some storage scenarios (most often backups of primary
online storage); CITES has a highly reliable and scalable tape architecture, but does not
(yet) provide archiving capabilities beyond one year.
Optical discs
o
o
•

CD-R and DVD-R.

Choose discs using phthalocyanine dye with gold reflective layer(s).

Negative considerations: High labor costs, poor accessibility; periodic
verification is not cost-effective; low density by today’s standards.

Positive considerations: Low cost for equipment; very low energy use; writeonce mode of operation.
CD-RW and DVD-RW. [Not recommended for data preservation]
Externally-hosted repositories
o
Digital preservation systems (e.g. HathiTrust)
o
Other (e.g. Internet Archive)
18.3 Handling & Storage of Optical Media
To maximize the longevity and readability of optical media (CD-R, DVD-R, etc.), the following are
recommended: 1
•
•
•
•
•
•
•
•
Select only discs manufactured with phthalocyanine dye and gold reflective layer(s).
Record discs using moderate recording speeds. High-speed recording increases the likelihood
they’ll be unreadable on many other systems.
Immediately after recording them, re-read and verify the data against the original. Preferably
(though less efficiently) this should be done on a different optical disk drive from a different
manufacturer.
Handle them only by the outer edge or center hole.
Don’t purchase huge quantities of optical discs in advance of need. Their pre-recorded shelf life
is relatively short.
Don’t apply labels or other items to the discs. Label them carefully using a CD-safe permanent
marker.
Store them in individual jewel cases upright.
Store them in a cool, dry area with stable temperature and humidity and no direct UV light.
NARA recommends 62-70 degrees F (+/- 2 degrees fluctuation), 35-50% relative humidity (+/5% fluctuations). Other sources recommend different ranges for different optical media types 2.
A longer but similar list of these recommendations is on of NIST Special Publication 500-252 3, page vi.
18.4 Risks and Reliability Considerations
Every digital storage medium is subject to partial and total data loss. The causes for loss include human
error, software or hardware malfunction, physical media deterioration, mechanical failure, damage from
electromagnetic fields or environmental conditions, theft, disaster damage (fire, flood, earthquake,
etc.), and eventual unreadability due to obsolescence and unavailability of hardware and software that
can still read or interface with a given media. These disparate causes require different solutions to
address their risk of occurrence.
Best practice for increasing the reliability of digital storage media always involves one or more means of
creating redundancy in the data to significantly reduce the statistical likelihood of actual information
loss even when the inevitable failure occurs with any specific digital media storage unit. Best practices
also require methods of detecting data corruption in the media. Moreover, all highly-reliable and
disaster-resistant storage systems require the data reside in at least two physical locations as
geographically distant as feasible.
Even using high-quality CD-R and DVD-R media with a gold substrate layer, as has been the practice by
DSD and DCC for some collections, their experience has shown significant media failure rates both
initially and upon later attempts to read the discs.
18.5 Error Detection & Correction: Media validation / Fixity tests
Even “best practice” RAID-protected storage volumes suffer from data loss which ordinarily goes
undetected4,5. To address these issues, a few highly resilient file systems have been developed. Most of
these are very expensive proprietary systems out of our reach, but Library IT Infrastructure & Software
Development (ISD) unit has begun working with Sun’s open source ZFS 6 in a new pair of storage systems
for this additional security.
For any long-term digital preservation system this type of silent data loss must be addressed at a level
above the hardware using software methods of recurring validation and recovery. In practice, this can
be done by a digital preservation system running proactive fixity checks, or by an advanced file system
like ZFS or both. These systems all incorporate the computation and storage of one or more checksums
(e.g. CRC) or stronger digest hashes (e.g. MD5, SHA256, etc.) of the files and file system metadata. Later
we can reread files and recompute the checksum/hash and compare it to the original. Any difference
indicates data corruption on the media and should trigger restoring that data from another copy.
Note, however, that running such fixity checks is rarely feasible in offline storage scenarios because of
high labor requirements. In addition to the increased convenience of access to stored material, this is a
strong argument in favor of using online or automated near-line storage systems, despite typically
higher cost and energy use.
18.6 Media Migration
Since all storage media eventually deteriorates and/or becomes obsolete and inefficient, long-term data
storage requires periodic migrations to newer physical media. This involves re-selecting the most
appropriate medium at that point in time followed by a process of copying all the desired data from the
old medium to the new and verifying its integrity.
Once migration is completed, the old media may be retired or destroyed as appropriate, unless it still
has some useful lifespan remaining and it is intentionally being retained as an additional backup copy.
18.7 References and Additional Sources
NARA Technical Information Paper No. 12: “Digital-Imaging and Optical Digital Data Disk Storage
Systems: Long-Term Access Strategies for Federal Agencies”.
http://www.archives.gov/preservation/technical/imaging-storage-report.html
Optical Storage Technology Association (OSTA) – Understanding CD-R and CD-RW Longevity.
http://www.osta.org/technology/cdqa13.htm
1
NARA Frequently Asked Questions (FAQs) about Optical Storage Media: Storing Temporary Records on CDs and
DVDs. http://www.archives.gov/records-mgmt/initiatives/temp-opmedia-faq.html
2
NIST Special Publication 500-252. Care and Handling of CDs and DVDs – A Guide for Librarians and Archivists
http://www.itl.nist.gov/iad/894.05/docs/CDandDVDCareandHandlingGuide.pdf, pg. 16, table 3.
3
NIST Special Publication 500-252, pg. vi
4
Summary of CERN’s data storage reliability study http://storagemojo.com/2007/09/19/cerns-data-corruptionresearch/
5
Carnegie Mellon Univ. paper “Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to
You?” http://www.usenix.org/events/fast07/tech/schroeder.html
6
The presentation at http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/zfslast.pdf
summarizes the many benefits of ZFS compared to traditional online storage systems including how it provides
end-to-end file integrity and recovery. The most relevant pages are 12-18, 21-23, and 41.
Download