5. Plan for Migrating NSIDC Version 0 Archive to MAID Technology

advertisement
An Implementation Plan for the
Transition of the NSIDC DAAC V0
Archive to MAID Technology
April 2004
Prepared Under Contract NAS5-03099
RESPONSIBLE AUTHOR
Vincent J. Troisi
NSIDC DAAC
University of Colorado
Boulder, Colorado
Date
1. Introduction
The Version 0 archive at the NSIDC DAAC is stored on DLT IV tapes, which reside in a
StorageTek 9710 Digital Tape Library (Timberwolf). The StorageTek 9710 is configured
with DLT 7000 drives. Both the tape drives and the Digital Tape Library have been
designated as end-of-life by the respective vendors. The archive server is an SGI Origin
200 which hosts the AMASS File System Management Software. The SGI server was
purchased in 1997. The NSIDC DAAC plans for FY04-FY05 include refresh of the SGI
server, and the Timberwolf subsystem to newer technologies.
This document provides a high-level description for an approach to transition the current
Version 0 archive from an architecture based on tape storage to a disk architecture that
only recently has been made available to the consumer. The technology proposed for the
disk subsystem element of the V0 archive architecture at NSIDC is MAID, massive arrays
of idle disks.
2. Description of MAID Technology
MAID refers to massive arrays of idle drives. MAID is similar to RAID solutions with a
significant difference: all of the disks in the MAID storage array are not spinning all of the
time. Many of the disks in a MAID subsystem remain powered off until information
residing on the disk is requested or the disk is exercised by an algorithm embedded in
the controller. By reducing the number of active disks in the array, the MAID technology
requires less power and generates less heat than current RAID subsystems.
Current MAID architectures employ Advanced Technology Architecture (ATA) disk drives.
The current generation of serial ATA (SATA) drives provide 250 gigabytes of storage and
bus speeds of 150 MB/sec on dedicated serial paths. The next generation SATA-II drives
scheduled for release during 2004 will provide a 300MB/sec bus speed. Engineered for
frequent power up and power down cycles, SATA drives are often found in laptop
configurations. SATA disk drives take at most 10 seconds to power up.
Fully configured MAIDs provide high capacity storage subsystem with the potential of
storing volumes of information in the hundreds of terabytes. However, the space
requirements for a fully configured MAID rack consumes much less space than an
automated tape library providing similar storage capacities. For instance, A Copan
Systems MAID rack provides 224 terabytes of raw disk storage only requires 10 square
feet of floor space plus three-foot access areas in front and rear of the rack.
A MAID provides large capacity with higher per unit density of disk versus tape storage
solutions. These virtual tape libraries (VTLs) emulate similar capabilities as digital tape
libraries. In fact, MAID technologies use individual disk drives as storage units instead of
individual magnetic tape cartridges used in digital tape libraries. Since the initial
implementation of MAID is to emulate tape libraries, storage applications for MAID
subsystems are similar to those as nearline tape subsystems. These applications include
storage of fixed content data and reference information such as satellite observations,
derived geophysical products, related documents and multi-media objects, email, and
historical analog records that have been converted to digital format. Unlike nearline
storage requirements for automated tape libraries, MAID applications are referred to as
Inline storage where static data and information remain online at all times.
The current implementation of MAID technology enables organizations to transition their
data and information to online media with little or no impact to the investments made in
the applications that were developed for the purpose of accessing the data and
information stored in nearline automated tape libraries. Copan Systems provides VTL
emulations by using the FalconStor VTL software. The FalconStor product emulates a
number of the more popular automated tape libraries subsystems. Specifically, the
FalconStor software emulates both the StorageTek 9710 digital tape library and the
Quantum DLT 7000 tape drive.
3. Description of the V0 Archive and Interfaces
The NSIDC DAAC Version 0 archive stores both heritage data sets and active data sets
such as the SSM/I time series, AVHRR polar pathfinder collections, various in situ data
sets, including collections being acquired during the various AMSR-E validation field
campaigns. The data stored in the Version 0 archive are rarely modified. Both data
collections generated by the NSIDC DAAC and data collections transferred to NSIDC by
external data producers are ingested into the Version 0 archive. The volume of data
stored in the archive is approximately twelve (12) terabyte; this volume does not include
those data collections stored off-line. Assuming current rates of ingest into the Version 0
archive, the NSIDC DAAC archive services staff expects the StorageTek 9710 digital
tape library to reach capacity early in fiscal year 2005.
Routine science processing is performed on data stored in the Version 0 archive to
generate higher-level science products. Data residing in the Version 0 archive are
distributed to the scientific and research communities on tape media and CD-R or DVD
media; users can also receive the data electronically from the NSIDC DAAC Version 0 ftp
server. In addition, custom products are derived from data stored in the Version 0 archive
by researchers using the GISMO and PSQ client interfaces. The GISMO and PSQ clients
provide users with the ability to extract subsets of the archived data and to receive these
subsets in any number of map projections and formats.
The diagram below depicts interfaces to the Version 0 archive.
Figure 1. NSIDC DAAC V0 Archive Interfaces
User Communities
Off-site
Archive
Tape and CD-R/DVD Production
GISMO/PSQ requests
StorageTek
9710
Data out
Science
Processing
Data Request
Data out
External
Providers
Data out
AMASS
FSMS
Data in
(400
DLT IV
tapes)
SGI
Origin 200
Data in
ftp services
Data in
Quantum
DLT 7000
(5)
4. Impacts Associated with a Transition of NSIDC Version 0
Archive to MAID Technology
NSIDC DAAC management believes the benefits to the DAAC, the user communities,
and to ESDIS Project for transitioning the Version 0 archive from an automated tape
library to the MAID architecture greatly outweigh any risks associated with such a
transition. As mentioned earlier in this document, the DAAC archive collections must be
moved to another technology no later than the end of fiscal year 2005. In order to
complete the transition of the DAAC Version 0 collections to an alternative archive
solution, migration of the data in the archive should commence before the end of fiscal
year 2004. Besides the risk in not beginning a migration strategy soon, this section
identifies the benefits and the risks for implementing MAID as a solution for transitioning
the NSIDC DAAC Version 0 archive to an alternate technology.
Benefits
NSIDC staff will not be required to make changes to the applications that interface to the
Version 0 archive. The DAAC will continue to use AMASS as the file system
management software used to emulate a Unix filesystem, since the MAID is a virtual tape
library that emulates the StorageTek 9710 and DLT 7000 tape drives.
The data and information residing in the archive will be online, or more correctly, Inline
since the MAID emulates an automated tape library. Hence, access times to the
information stored on the MAID will be greatly reduced. Retrievals from the MAID storage
architecture will be a direct access and not a sequential access as required with tapebased solutions.

File access times for the MAID may be a little over 10 seconds if the disk drive
containing the target data is powered off, while the optimal average file access
time for tape is 60 seconds.

Moving cartridges into drives in automated tape libraries can cause access
delays as much as two minutes.
The MAID configuration requires less floor space than an automated tape library with
similar capacities. The power and cooling requirements are very similar to a server
located in the Version 0 facility that is scheduled for relocation to another floor. Hence,
the system can be installed in the DAAC Version 0 facility without augmentation to power,
cooling, or floor space.
Tape media deteriorates over time, which often results in loss of data if the data is not
transferred to new media periodically. The risk of losing data due to disk failure is greatly
reduced when appropriate RAID levels are implemented for disk storage arrays; data is
recovered automatically when the information on a failed disk volume is reestablished
onto one of the spare disks in the array.
As a pathfinder for MAID subsystem, the NSIDC DAAC will document their experiences
using the MAID architecture in an operational environment. The DAAC will provide these
lessons learned reports to the ESDIS Project. The DAAC believes these experiences will
be invaluable in assessing whether this technology as a viable solution for upgrading or
for replacing automated tape library subsystems as archive storage elements in current
and future Earth Science data systems.
Risks
MAID is a cutting edge technology. Over the past several months the NSIDC DAAC has
been in negotiations with a local MAID manufacturing firm for the purpose of acquiring a
MAID subsystem as a replacement for the StorageTek 9710 subsystem. The name of the
company is Copan Systems. Copan expects to announce their MAID product during
April 2004.During these negotiations NSIDC DAAC has investigated risks associated with
the MAID such as disk failure, data corruption, and lack of redundancy. The DAAC feels
these risks are no more severe than with any archive technology. The MAID has a
sufficient scale of redundancy to mitigate issues related to disk failure. MAID is similar to
RAID and, the MAID subsystem being evaluated by the DAAC incorporates redundancy
at the Logical Unit level as well as at the disk level. In addition, the MAID subsystem
NSIDC is considering purchasing has a unique disk management function that monitors
and predicts potential failure of any disk, automatically moving the data from a disk
predicted to fail to a healthy spare disk.
The NSIDC DAAC intends to work closely with the MAID vendor to assess risks related
to network vulnerabilities of the MAID subsystem. The MAID manufactured by Copan
systems is deployed with a Linux Operating System. NSIDC staff will conduct a rigorous
review of the operating system environment in order to assess the security policies of the
configuration.
As a final mitigation strategy, the DAAC will retain the Version 0 data in the current
StorageTek tape library until the Copan MAID subsystem passes the DAAC acceptance
criteria. If the MAID subsystem fails DAAC acceptance criteria, NSIDC will still have time
to implement another migration strategy.
5. Plan for Migrating NSIDC Version 0 Archive to MAID
Technology
This section defines the tasks required for migrating the NSIDC DAAC Version 0 archive
to MAID technology.
The DAAC will conduct the following activities prior to the implementation of the data
migration from the StorageTek 9710 digital tape library to the MAID architecture:

Complete a site review by Copan Systems engineers to verify adequate floor
space, electrical, and cooling are available for the MAID subsystem.

Procure the two-shelf MAID subsystem (56 terabyte raw disk capacity) including
standard warranty and support agreement from Copan Systems. Additional
shelves (28 terabyte raw disk capacity) can be purchased at any time. The
subsystem configuration supports up to eight shelves.

Procure a server from Sun Microsystems with fiber channel interfaces. The
server will host the AMASS FSM software.

Negotiate with ADIC for a temporary AMASS license to install on the Sun server
until completion of the data migration activities.

Complete the relocation of the ESS/HSA server to the NSIDC DAAC ECS facility.
This activity must be completed in order to ensure sufficient floor space is
available for the MAID.

Develop test scenarios to validate operational needs are met with the MAID
subsystem.

Schedule installation and checkout of the Sun server.

Schedule installation and checkout of the MAID subsystem.

Review network security policies of the MAID subsystem and make
recommendations to enhance the subsystem security as necessary to the
manufacturer.

Conduct operational tests against the integrated Sun server and MAID
subsystem configuration.
The following activities will begin after the MAID subsystem has been validated for
operational use:

Retrieve data via AMASS from the StorageTek 9710 and ingest the data to the
MAID archive configuration.

Update metadata database to reflect location of the data stored in the MAID
archive subsystem.

Remove the StorageTek 9710 subsystem and the SGI server from the Version 0
architecture.

Transfer permanent AMASS license from the SGI server to the Sun server.
The diagram below depicts interfaces to the Version 0 archive after the MAID subsystem
(Copan System 200t) is fully functional. Note there is little difference in the interfaces to
the archive subsystem shown in Figure 1.
Figure 2. NSIDC DAAC Version 0 Archive Interfaces with MAID Subsystem
User Communities
Off-site
Archive
Tape and CD-R/DVD Production
GISMO/PSQ requests
Copan Systems 200t
(2-shelf subsystem)
Data out
Science
Processing
Data Request
External
Providers
Sun Microsystems Server
Data out
Data in
AMASS
FSMS
Data out
Data in
Data in
ftp services
Download