A NOAA/NASA Pilot Project for the Preservation of MODIS Data from the Earth Observing System (EOS) Robert H. Rank NOAA/NESDIS Kenneth R. McDonald NASA/GSFC Topics • • • • • • • Data Life Cycle Background NASA’s Earth Science Data NOAA Data Centers & Mission Guiding Principles CLASS Overview and Goals ‘Long-Term’ Archive Challenges MODIS Pilot Project – – – – – Reference Model OAIS Responsibilities Data Submission Agreement (DSA) Schedule Expected Outcomes • Experience Using OAIS • Conclusion Data Life Cycle A simple model showing the four major lifecycle entities within a context of an overall set of guiding policies long-term archive active archive mission science product generation Overall Lifecycle Policies Some of these functions may be grouped together in any given mission or project Long Term Archive Requirements • NASA shares the responsibility for stewardship of its Earth science data resources with NOAA and USGS. – NASA holds the responsibility for its data during the life of each mission plus four years. – NOAA and USGS provide the long term archive for ocean and atmosphere data and land processes data, respectively. • Agreements are in place between NASA and NOAA and between NASA and USGS that document these responsibilities. NASA’s Earth Observing System (EOS) Data and Information System (EOSDIS) Data Acquisition Flight Operations, Data Capture, Initial Processing, Backup Archive Data Transport to DAACS/SIPS Science Data Processing, Info Mgmt, Data Archive, & Distribution Distribution, Access, Interoperability, Reuse RESACs RACs EOS Spacecraft Tracking & Data Relay Satellite (TDRS) White Sands Complex (WSC) ESIP 2/3’s Data Processing & Mission Control Distributed Active Archive Centers Internet Research Users (Search,order, distribution) Education Users Media (Distribution) Public Value-Added Providers EOS Polar Ground Stations Instrument Teams and SIPSs Interagency Data Centers Int’l Partners & Data Centers EOSDIS Overview • EOSDIS Functions: – A production capability for standard data products from EOS instruments – An “active archive” of Earth science data from EOS and other past and present missions – A distributed information framework (data centers, SIPS, networks, interoperability infrastructure) • EOSDIS Operations: – Supporting EOS missions since 1999 and heritage data archives since 1994. – Operations at 8 DAACs and 13 SIPS. – Total archive of over 4 petabytes, growing at 4 terabytes per day – Over 200,000 distinct users obtaining data from DAACs – Annual distribution of 33 million data products, 2 TB per day. NOAA’s National Data Centers -Environmental Data Stewards Scientific Data Stewardship is ownership, knowledge, utilization, and application of the data CLASS is the Information Technology infrastructure (hardware and software environment, and tools) underpinning SDS Data Rescue preserves and makes available historical data sets from obsolete media NOAA’s National Data Centers • NOAA’s National Data Centers are major archive, access, and assessment sites maintaining, processing, and distributing environmental and geospatial data. – National Climatic Data Center – WWW.NCDC.NOAA.GOV • Asheville, NC – National Coastal Data Development Center – • Stennis, MS WWW.NCDDC.NOAA.GOV – National Geophysical Data Center – WWW.NGDC.NOAA.GOV • Boulder, CO – National Oceanographic Data Center – WWW.NODC.NOAA.GOV • Silver Spring, MD NOAA’s National Data Centers (Continued) • These Centers provide long-term stewardship for most of NOAA’s environmental and geospatial data, and a broad range of user services. • They serve as both: – Centers of Data -- facilities where extensive collections of given environmental parameter(s) are maintained because of individual or institutional research or operational requirements – Agency Record Centers -- facilities where data is made accessible to a large user community, as well as being preserved and protected to certain standards Guiding Principles All NOAA environmental data will… • be made accessible to broader data integration efforts, such as Global Earth Observation System of Systems GEOSS • reside in secure archives conforming to National Archives and Records Administration (NARA) and Continuity Of Operations (COOP) standards • be maintained at the highest standards of scientific data stewardship • be searchable using advanced data discovery tools, to facilitate interdisciplinary studies • be accessible through a common portal available to the scientific community, commercial sector, and general public based on advanced access tools Comprehensive Large Array-data Stewardship System (CLASS) NOAA's National Data Centers and their world-wide clientele of customers look to CLASS as the sole NOAA IT infrastructure project in which “all” NOAA’s current and future environmental data sets will reside. CLASS provides permanent, secure storage, and safe, efficient data discovery and access between the Data Centers and the customers. WHY a CLASS? • Fulfill NOAA’s legal requirement to provide for archive and access to its data • The source for the vast majority of observational environmental data generated by NOAA. • Provide critical products to Customers: – Public and Private Research & Development efforts • Colleges and Universities – – – – Federal, State, and Local Climatologists Agriculture Users, Drought Monitors, and Flood Management Accident Investigators & Legal Community Coastal Monitoring, Algae Blooms, and Fishing Management CLASS Overview • CLASS is a web-based data archive and distribution system for NOAA’s environmental data • CLASS is an evolving system which will support additional “campaigns,” broader user base, new functionality as implementation continues for the next 10 years • CLASS is the principal IT system supporting NOAA’s responsibility as environmental data stewards – CLASS concurrently supports both ongoing operations and new requirements implementation CLASS Campaigns – NOAA and Department of Defense (DoD) Polar-orbiting Operational Environmental Satellites (POES) and Defense Meteorological Satellite Program (DMSP) – NOAA Geostationary Operational Environmental Satellites (GOES) – EUMETSAT Meteorological Operational Satellite (Metop) Program – NOAA NEXT generation weather RADAR (NEXRAD) Program and future dual polarized and phased-array radars. – National Aeronautics and Space Administration (NASA) Earth Observing System (EOS) Moderate-resolution Imaging Spectrometer (MODIS) – The NPOESS Preparatory Project (NPP) – National Polar-orbiting Operational Environmental Satellite System (NPOESS) – National Center’s for Environmental Prediction Model Datasets, including Reanalysis Products CLASS GOALS • Give any potential customer access to all NOAA (and possibly non-NOAA) data through a single portal • Eliminate the need to keep creating “stovepipe” systems for each new type of data, but, in as much as possible use already polished portions/modules of existing legacy systems • Describe a cost-effective architecture that can primarily handle large array data sets but also be capable of handling smaller data sets as well CLASS Summary • A NOAA-wide Data Management System (DMS) can evolve from CLASS by initially integrating with the NOAA National Data Centers and ultimately with the NOAA Centers of Data • The CLASS backbone will provide the DMS for large-array (largely NESDIS) data sets, but also provide secure archival services to other NESDIS and NOAA users who participate in the NMMR and NOAA-Server • This approach will leverage the resources of CLASS, NVDS, SDS, and the various funding vehicles being use by non-NESDIS NOAA organizational components • This semi-distributed architecture, with central data and metadata archives built on international standards and will allow future integration of NOAA systems into GEOSS • CLASS will be the NOAA archive for NPP/NPOESS, EOS and GOES-R data • CLASS is accessible via the web at www.class.noaa.gov LTA Challenges • What data are needed for long-term archive? • How is long-term preservation achieved? • What services do users need to deal with these data volumes? • What are the people vs. machine issues? • How will new technology help? • Metrics for assessing how we are doing… • National Research Council Panel enabled to help address this issue …challenges • The archived information must be useable by consumers who are separated in time, distance and background from the producers – producers no longer available • cannot answer questions on ad-hoc basis – producers’ software not supported - may be obsolete • knowledge captured by the software becomes unavailable – documentation is lost over time ...challenges • The user community will change over time – new community will be unfamiliar with the background to the information – may use different analysis environment – may want to combine information from many sources • The archive will change over time – migration to new technology - hardware/software • may require reorganization of information • possible changes in implicit relationships – migration to different institutions – possible changes to management, data structure, file format MODIS Pilot Project • Purpose is to define system interfaces and implement transfer capability. – Established MODIS L0 and L1B as initial candidates for data transfer - L0 is stable and L1B has high user demand. – Team includes representatives from ESDIS, GES DAAC, MODIS SDST, CLASS/Suitland, NCDC, NGDC, Fairmont, WV. • Established the collaboration tools and methods. – Pilot Plan – Actions and Schedules – Working Group Charter • Following Open Archival Information System Standard (OAIS) as an LTA Reference Model. Pilot Schedule Highlights • • • • • • • • • • • • • CLASS Operational at NSOF (ingest node) – Jan 2006 Prototype 1 week MODIS L0 transfer – Feb 2006 Prototype 1 month L0 transfer (6x rate) – Mar 2006 Evaluate L0 continuous feed (40 days) – Jun 2006 DSA for MODIS L1B – Feb 2006 ICD for MODIS L1B – Apr 2006 Prototype 1 week L1B transfer – May 2006 Prototype 1 month L1B transfer (6x rate) – Jun 2006 Evaluate continuous feed (20 days) – Jul 2006 Access and Delivery Capability – Aug 2006 Pilot Project Evaluation Report – Oct 2006 Project Plan for NOAA MODIS LTA – Dec 2006 NOAA/NASA Panel Report – Dec 2006 MODIS Pilot Project – Expected Outcomes • NASA and NOAA will have a better hands-on understanding of system capabilities, conventions (e.g. data model) standards and processes of respective systems. • A draft set of interface documentation (DSA, ICD, etc.). • An interface between EOSDIS and CLASS - defined and exercised. • An actual demonstration of CLASS support for EOS data. • The foundation for the development of a sound NASA/NOAA LTA plan. Reference Model • A Reference Model is needed to provide a common framework for discussion & description • A major aim is to facilitate a much wider understanding of what is required to preserve information for the long term • Facilitates description and comparison of archives • Provides a basis for further standardization – help broaden the market for commercial providers ...Reference Model • We are particularly concerned with Long-Term Preservation of digital information – long term is long enough to be concerned about changing technologies – not just bit preservation – starting point for model addressing non-digital information ......Reference Model • But this work is also of use for “Short-Term archives” because – technological change is rapid (years, not decades) – the short-term archive may eventually hand information over to another, longer-term, archive Areas for Standards to follow • • • • Interfaces between OAIS type archives Submission to OAIS (SIP) Dissemination from OAIS (DIP) Search & retrieve metadata from OAIS – Sufficient information should be provided to ensure the rendered content may be interpreted and understood by its intended users. • Information migration – Procedures should indicate the file format and version to be created and software used to create it. • Provenance – A description of the content history, including its origins, changes to the object or its content over time, and its chain of custody (if known). OAIS Responsibilities • Negotiates & accepts Submission IPs • Determines communities which need to be able to understand Content Information • Ensures information to be preserved is understandable to designated communities • Assumes sufficient control of information to be able to ensure long-term preservation • Follows policies & procedures to ensure information is preserved • Makes the information available to the designated communities in appropriate forms Data Submission Agreement (DSA) • Include all the information that is necessary for the producer to provide data products to the archive and for the archive to receive the data products from the produce – It seems a daunting (or rather an impossible) task to collect all of the information listed above in a single document in a timely fashion. • Need a high-level agreement in place before we proceed to specify the details of the Producer-Archive interface and to design the respective systems. – There is no way of compiling operational information until near the start of the operational phase. DSA Groupings • High-level agreement – the content is rather static (i.e., temporally stable) and provides a framework for both the Producer and the Archive to move on to defining details. • Detailed level interface and some functional specification – the content is somewhat dynamic (i.e., changing with time) and requires for the Producer and the Archive to do some in-depth studies. • Operational information – the content is not available until near the time when the data flows commence (e.g., IP addresses, host directory names, operations contacts) • Quasi-static metadata details – the definite content is hard to come by, especially for a planned spacecraft missions. • More?? DSA Groupings • With these considerations in mind, we suggest that the Producer-Archive Agreement – Divided into several separate documents – Each being signed at a different management/technical level and at a different time: • • • • • Memorandum of Agreement (MOA) Interface Control Document (ICD) Operations Agreement (OA) Quasi-Static Metadata Specification (QSMS) Others? DSA Groupings • The MOA should be developed early on and signed by a high-level management of both parties. – It should provide a firm ground for detailed level technical work to proceed. – Any details that will become clearer later or simply are unknown will have to be deferred to the lower level components of the agreement (i.e., ICD, OA, and QSMS). • Once the MOA is signed, both parties may start developing ICD and QSMS. – Forms the basis for the design of the physical systems (for both the Producer and the Archive). • The OA can wait until the time of the system I&T DSA Groupings • Depending on the circumstances, the Producer and the Archive may include additional documents. – There are certain items that do not belong in the MOA and yet are not covered by the ICD. – We may call it Supplement to the MOA (yet still separate from the MOA). • This approach of creating the Producer-Archive Agreement in multiple volumes and releasing each sequentially in time appears to be far better than the current approach of creating a single volume agreement. Use of OAIS • Benefits – Good overall framework of terms, functions and processes to structure the LTA discussion – Identifies a set of documents to capture and record requirements and specifications • Challenges – Timing - starting to use OAIS in the middle of the data life cycle of EOS data has been difficult – Complexity of EOS LTA requirements - numbers of products, data volumes, processing S/W – Overload on Data Submission Agreement - Interface Requirements Document, Interface Control Document, Operations Agreement Conclusions • Transfer of NASA’s Earth science data to NOAA for longterm preservation and stewardship is a major undertaking • NOAA/NASA MODIS Pilot Project - way to get started • Project provides great case study for use of OAIS Reference Model – Services to project and source of feedback on RM • Still learning how to best use OAIS • Expect that as OAIS is more widely used, over entire data life cycle, it will be even more valuable