A NOAA/NASA Pilot Project for the Earth Observing System (EOS)

advertisement
A NOAA/NASA Pilot Project for the
Preservation of MODIS Data from the
Earth Observing System (EOS)
Robert H. Rank
NOAA/NESDIS
Kenneth R. McDonald
NASA/GSFC
Topics
•
•
•
•
•
•
•
Data Life Cycle Background
NASA’s Earth Science Data
NOAA Data Centers & Mission
Guiding Principles
CLASS Overview and Goals
‘Long-Term’ Archive Challenges
MODIS Pilot Project
–
–
–
–
–
Reference Model
OAIS Responsibilities
Data Submission Agreement (DSA)
Schedule
Expected Outcomes
• Experience Using OAIS
• Conclusion
Data Life Cycle
A simple model showing the four major lifecycle entities within a
context of an overall set of guiding policies
long-term
archive
active
archive
mission
science
product
generation
Overall
Lifecycle
Policies
Some of these functions may be grouped together in any given
mission or project
Long Term Archive Requirements
• NASA shares the responsibility for stewardship of its
Earth science data resources with NOAA and USGS.
– NASA holds the responsibility for its data during the life of
each mission plus four years.
– NOAA and USGS provide the long term archive for ocean
and atmosphere data and land processes data, respectively.
• Agreements are in place between NASA and NOAA
and between NASA and USGS that document these
responsibilities.
NASA’s Earth Observing System (EOS)
Data and Information System (EOSDIS)
Data Acquisition
Flight Operations,
Data Capture,
Initial Processing,
Backup Archive
Data
Transport
to DAACS/SIPS
Science Data
Processing,
Info Mgmt, Data
Archive, & Distribution
Distribution,
Access,
Interoperability,
Reuse
RESACs
RACs
EOS
Spacecraft
Tracking
& Data
Relay Satellite
(TDRS)
White Sands
Complex
(WSC)
ESIP
2/3’s
Data
Processing
&
Mission
Control
Distributed
Active
Archive
Centers
Internet
Research
Users
(Search,order,
distribution)
Education
Users
Media
(Distribution)
Public
Value-Added
Providers
EOS Polar Ground Stations
Instrument
Teams and
SIPSs
Interagency
Data
Centers
Int’l Partners
& Data Centers
EOSDIS Overview
• EOSDIS Functions:
– A production capability for standard data products from EOS
instruments
– An “active archive” of Earth science data from EOS and other
past and present missions
– A distributed information framework (data centers, SIPS,
networks, interoperability infrastructure)
• EOSDIS Operations:
– Supporting EOS missions since 1999 and heritage data archives
since 1994.
– Operations at 8 DAACs and 13 SIPS.
– Total archive of over 4 petabytes, growing at 4 terabytes per day
– Over 200,000 distinct users obtaining data from DAACs
– Annual distribution of 33 million data products, 2 TB per day.
NOAA’s National Data Centers -Environmental Data Stewards
Scientific Data
Stewardship is ownership,
knowledge, utilization, and
application of the data
CLASS is the Information
Technology infrastructure
(hardware and software
environment, and tools)
underpinning SDS
Data Rescue preserves and
makes available historical data
sets from obsolete media
NOAA’s National Data Centers
• NOAA’s National Data Centers are major archive, access, and
assessment sites maintaining, processing, and distributing
environmental and geospatial data.
– National Climatic Data Center – WWW.NCDC.NOAA.GOV
• Asheville, NC
– National Coastal Data Development Center –
• Stennis, MS
WWW.NCDDC.NOAA.GOV
– National Geophysical Data Center – WWW.NGDC.NOAA.GOV
• Boulder, CO
– National Oceanographic Data Center – WWW.NODC.NOAA.GOV
• Silver Spring, MD
NOAA’s National Data Centers
(Continued)
• These Centers provide long-term stewardship for most of
NOAA’s environmental and geospatial data, and a broad range
of user services.
• They serve as both:
– Centers of Data -- facilities where extensive collections of given
environmental parameter(s) are maintained because of individual or
institutional research or operational requirements
– Agency Record Centers -- facilities where data is made accessible to a
large user community, as well as being preserved and protected to
certain standards
Guiding Principles
All NOAA environmental data will…
• be made accessible to broader data integration efforts, such as
Global Earth Observation System of Systems GEOSS
• reside in secure archives conforming to National Archives and
Records Administration (NARA) and Continuity Of Operations
(COOP) standards
• be maintained at the highest standards of scientific data stewardship
• be searchable using advanced data discovery tools, to facilitate
interdisciplinary studies
• be accessible through a common portal available to the scientific
community, commercial sector, and general public based on
advanced access tools
Comprehensive Large Array-data
Stewardship System (CLASS)
NOAA's National Data Centers and their world-wide clientele of customers
look to CLASS as the sole NOAA IT infrastructure project in which “all”
NOAA’s current and future environmental data sets will reside.
CLASS provides permanent, secure storage, and safe, efficient data discovery
and access between the Data Centers and the customers.
WHY a CLASS?
• Fulfill NOAA’s legal requirement to provide for archive
and access to its data
• The source for the vast majority of observational
environmental data generated by NOAA.
• Provide critical products to Customers:
– Public and Private Research & Development efforts
• Colleges and Universities
–
–
–
–
Federal, State, and Local Climatologists
Agriculture Users, Drought Monitors, and Flood Management
Accident Investigators & Legal Community
Coastal Monitoring, Algae Blooms, and Fishing Management
CLASS Overview
• CLASS is a web-based data archive and distribution
system for NOAA’s environmental data
• CLASS is an evolving system which will support
additional “campaigns,” broader user base, new
functionality as implementation continues for the next 10
years
• CLASS is the principal IT system supporting NOAA’s
responsibility as environmental data stewards
– CLASS concurrently supports both ongoing operations and new requirements
implementation
CLASS Campaigns
– NOAA and Department of Defense (DoD) Polar-orbiting Operational
Environmental Satellites (POES) and Defense Meteorological Satellite
Program (DMSP)
– NOAA Geostationary Operational Environmental Satellites (GOES)
– EUMETSAT Meteorological Operational Satellite (Metop) Program
– NOAA NEXT generation weather RADAR (NEXRAD) Program and
future dual polarized and phased-array radars.
– National Aeronautics and Space Administration (NASA) Earth Observing
System (EOS) Moderate-resolution Imaging Spectrometer (MODIS)
– The NPOESS Preparatory Project (NPP)
– National Polar-orbiting Operational Environmental Satellite System
(NPOESS)
– National Center’s for Environmental Prediction Model Datasets,
including Reanalysis Products
CLASS GOALS
• Give any potential customer access to all NOAA (and
possibly non-NOAA) data through a single portal
• Eliminate the need to keep creating “stovepipe” systems
for each new type of data, but, in as much as possible use
already polished portions/modules of existing legacy
systems
• Describe a cost-effective architecture that can primarily
handle large array data sets but also be capable of handling
smaller data sets as well
CLASS Summary
• A NOAA-wide Data Management System (DMS) can evolve from CLASS by
initially integrating with the NOAA National Data Centers and ultimately with
the NOAA Centers of Data
• The CLASS backbone will provide the DMS for large-array (largely NESDIS)
data sets, but also provide secure archival services to other NESDIS and
NOAA users who participate in the NMMR and NOAA-Server
• This approach will leverage the resources of CLASS, NVDS, SDS, and the
various funding vehicles being use by non-NESDIS NOAA organizational
components
• This semi-distributed architecture, with central data and metadata archives
built on international standards and will allow future integration of NOAA
systems into GEOSS
• CLASS will be the NOAA archive for NPP/NPOESS, EOS and GOES-R data
• CLASS is accessible via the web at www.class.noaa.gov
LTA Challenges
• What data are needed for long-term archive?
• How is long-term preservation achieved?
• What services do users need to deal with these data
volumes?
• What are the people vs. machine issues?
• How will new technology help?
• Metrics for assessing how we are doing…
• National Research Council Panel enabled to help address
this issue
…challenges
• The archived information must be useable by
consumers who are separated in time, distance
and background from the producers
– producers no longer available
• cannot answer questions on ad-hoc basis
– producers’ software not supported - may be obsolete
• knowledge captured by the software becomes unavailable
– documentation is lost over time
...challenges
• The user community will change over time
– new community will be unfamiliar with the background to the
information
– may use different analysis environment
– may want to combine information from many sources
• The archive will change over time
– migration to new technology - hardware/software
• may require reorganization of information
• possible changes in implicit relationships
– migration to different institutions
– possible changes to management, data structure, file format
MODIS Pilot Project
• Purpose is to define system interfaces and implement
transfer capability.
– Established MODIS L0 and L1B as initial candidates for
data transfer - L0 is stable and L1B has high user demand.
– Team includes representatives from ESDIS, GES DAAC,
MODIS SDST, CLASS/Suitland, NCDC, NGDC, Fairmont,
WV.
• Established the collaboration tools and methods.
– Pilot Plan – Actions and Schedules
– Working Group Charter
• Following Open Archival Information System
Standard (OAIS) as an LTA Reference Model.
Pilot Schedule Highlights
•
•
•
•
•
•
•
•
•
•
•
•
•
CLASS Operational at NSOF (ingest node)
– Jan 2006
Prototype 1 week MODIS L0 transfer
– Feb 2006
Prototype 1 month L0 transfer (6x rate)
– Mar 2006
Evaluate L0 continuous feed (40 days)
– Jun 2006
DSA for MODIS L1B
– Feb 2006
ICD for MODIS L1B
– Apr 2006
Prototype 1 week L1B transfer
– May 2006
Prototype 1 month L1B transfer (6x rate)
– Jun 2006
Evaluate continuous feed (20 days)
– Jul 2006
Access and Delivery Capability
– Aug 2006
Pilot Project Evaluation Report
– Oct 2006
Project Plan for NOAA MODIS LTA
– Dec 2006
NOAA/NASA Panel Report
– Dec 2006
MODIS Pilot Project –
Expected Outcomes
• NASA and NOAA will have a better hands-on
understanding of system capabilities, conventions (e.g.
data model) standards and processes of respective
systems.
• A draft set of interface documentation (DSA, ICD,
etc.).
• An interface between EOSDIS and CLASS - defined
and exercised.
• An actual demonstration of CLASS support for EOS
data.
• The foundation for the development of a sound
NASA/NOAA LTA plan.
Reference Model
• A Reference Model is needed to provide a common
framework for discussion & description
• A major aim is to facilitate a much wider understanding
of what is required to preserve information for the long
term
• Facilitates description and comparison of archives
• Provides a basis for further standardization
– help broaden the market for commercial providers
...Reference Model
• We are particularly concerned with Long-Term
Preservation of digital information
– long term is long enough to be concerned about
changing technologies
– not just bit preservation
– starting point for model addressing non-digital
information
......Reference Model
• But this work is also of use for “Short-Term archives”
because
– technological change is rapid (years, not decades)
– the short-term archive may eventually hand information
over to another, longer-term, archive
Areas for Standards to follow
•
•
•
•
Interfaces between OAIS type archives
Submission to OAIS (SIP)
Dissemination from OAIS (DIP)
Search & retrieve metadata from OAIS
– Sufficient information should be provided to ensure the rendered
content may be interpreted and understood by its intended users.
• Information migration
– Procedures should indicate the file format and version to be
created and software used to create it.
• Provenance
– A description of the content history, including its origins,
changes to the object or its content over time, and its chain of
custody (if known).
OAIS Responsibilities
• Negotiates & accepts Submission IPs
• Determines communities which need to be able to
understand Content Information
• Ensures information to be preserved is understandable
to designated communities
• Assumes sufficient control of information to be able to
ensure long-term preservation
• Follows policies & procedures to ensure information is
preserved
• Makes the information available to the designated
communities in appropriate forms
Data Submission Agreement (DSA)
• Include all the information that is necessary for the
producer to provide data products to the archive and for
the archive to receive the data products from the produce
– It seems a daunting (or rather an impossible) task to collect all of
the information listed above in a single document in a timely
fashion.
• Need a high-level agreement in place before we proceed to
specify the details of the Producer-Archive interface and
to design the respective systems.
– There is no way of compiling operational information until near
the start of the operational phase.
DSA Groupings
• High-level agreement
– the content is rather static (i.e., temporally stable) and provides a framework
for both the Producer and the Archive to move on to defining details.
• Detailed level interface and some functional specification
– the content is somewhat dynamic (i.e., changing with time) and requires for
the Producer and the Archive to do some in-depth studies.
• Operational information
– the content is not available until near the time when the data flows commence
(e.g., IP addresses, host directory names, operations contacts)
• Quasi-static metadata details
– the definite content is hard to come by, especially for a planned spacecraft
missions.
• More??
DSA Groupings
• With these considerations in mind, we suggest that the
Producer-Archive Agreement
– Divided into several separate documents
– Each being signed at a different management/technical
level and at a different time:
•
•
•
•
•
Memorandum of Agreement (MOA)
Interface Control Document (ICD)
Operations Agreement (OA)
Quasi-Static Metadata Specification (QSMS)
Others?
DSA Groupings
• The MOA should be developed early on and signed by a
high-level management of both parties.
– It should provide a firm ground for detailed level technical work
to proceed.
– Any details that will become clearer later or simply are unknown
will have to be deferred to the lower level components of the
agreement (i.e., ICD, OA, and QSMS).
• Once the MOA is signed, both parties may start
developing ICD and QSMS.
– Forms the basis for the design of the physical systems (for both
the Producer and the Archive).
• The OA can wait until the time of the system I&T
DSA Groupings
• Depending on the circumstances, the Producer and the
Archive may include additional documents.
– There are certain items that do not belong in the MOA and yet
are not covered by the ICD.
– We may call it Supplement to the MOA (yet still separate from
the MOA).
• This approach of creating the Producer-Archive
Agreement in multiple volumes and releasing each
sequentially in time appears to be far better than the
current approach of creating a single volume agreement.
Use of OAIS
• Benefits
– Good overall framework of terms, functions and processes to
structure the LTA discussion
– Identifies a set of documents to capture and record requirements
and specifications
• Challenges
– Timing - starting to use OAIS in the middle of the data life cycle
of EOS data has been difficult
– Complexity of EOS LTA requirements - numbers of products,
data volumes, processing S/W
– Overload on Data Submission Agreement - Interface
Requirements Document, Interface Control Document,
Operations Agreement
Conclusions
• Transfer of NASA’s Earth science data to NOAA for longterm preservation and stewardship is a major undertaking
• NOAA/NASA MODIS Pilot Project - way to get started
• Project provides great case study for use of OAIS
Reference Model
– Services to project and source of feedback on RM
• Still learning how to best use OAIS
• Expect that as OAIS is more widely used, over entire data
life cycle, it will be even more valuable
Download