Scope and Purpose of the Document

advertisement
Document Purpose and Scope
This document states the objectives, structure, resource costs and risks for the “Data
Exchange” effort, a collaboration between NOAA’s IOOS Data Integration Framework
(DIF) and NSF’s OOI Cyberinfrastructure (CI) programs. This collaboration arose
directly from NOAA’s and NSF’s participation in and commitment to the advancement
of the IOOS DMAC Subsystem. The scope of the effort articulated in the document is
funded through the NOAA cooperative agreement, # NA17RJ1231, with the Scripps
Institution of Oceanography University of California, San Diego. The Data Exchange
effort is a part of and dependent on the larger OOI Cyberinfrastructure effort funded
through the JOI Subaward, JSA 7-11, which is in turn funded by the NSF contract OCE0418967 with the Consortium for Ocean Leadership, Inc. The successful outcome of this
effort is dependent on funding from both sources.
Objectives
The primary objective of this effort is to deploy a scalable “Data Exchange” prototype
with server-side data processing capability for use by an initial set of active ocean
modeling communities to efficiently exchange large model datasets in whole or in part
while preserving the original content and structure of the dataset. The modeling
communities participating in this effort are NERACOOS, MARCOOS and SCCOOS.
With the successful conclusion of this effort, a second effort will be proposed to prepare
and promote the Data Exchange for broader use by the IOOS community.
Arising from the primary objective the effort will provide the IOOS DIF with a platform
on which to test the Web Services and Data Encoding being developed to distribute the
seven “Core Variables” (Currents, Water Level, Sea Temperature, Salinity/Conductivity,
Surface Winds, Waves and Chlorophyll) to their four initial customer groups. The
process of developing and deploying the Data Exchange in the context of operational user
communities will drive the refinement of the OOI Cyberinfrastructure requirements,
design and technology choices prior to the start of construction of the OOI. Finally this
effort will also provide the IOOS DMAC with practical insight into viable strategies for
realizing an integrated national ocean observing cyberinfrastructure.
Structure
The effort is a collaboration comprising the ocean modeling community, OPeNDAP
technology providers, OOI CI development team and the funding agencies. Each
constituency is represented by a set of stakeholders that will meet every month as the
oversight team to assess progress and determine any change in direction deemed
necessary to address emergent concerns. Rich Signell (NERACOOS), Scott Glenn
(MARCOOS) and Yi Chao (SCCOOS) will represent the modeling communities. Roy
Mendlessohn (ERDDAP), Steve Hankin (F-TDS), John Caron (TDS) and Bill Howe
(Gridfields) will represent the core application technologies comprising the Data
Exchange. Michael Meisinger (OOI CI) and Matthew Arrott (OOI CI) will represent the
development team. Charles Alexander and Jeff de La Beaujardière (NOAA IOOS DIF)
and John Orcutt and Frank Vernon (OOI CI PI and Deputy PI) will represent the interest
of the funding agencies. John Orcutt is PI of both the OOI CI and the NOAA Cooperative
Agreement.
The effort proposed will span 12 months. It comprises two overlapping phases. The first
phase is focused on development and the second on operations and community
engagement. The development phase lasts six months. The operational phase starts after
the first release cycle and ends six months after the last development release.
The development phase will deliver the scope of functionality discussed below for the
Data Exchange over five incremental release cycles. Each release cycle has a duration of
six weeks. The first week of a cycle is focused on design, the next four on development
and the last week on preparing the cycle’s deliverables for release into operations.
The operational phase will deliver feedback into the iterative development process and a
final report summarizing the user communities’ usage, their assessment of the deployed
system and the stakeholders’ recommendation for future effort. Each month during the
operational phase the development team will hold a review meeting with the user
communities to elicit feedback. During the six months of operations following the final
development release, any further development will focus on refinements to the system
that increase its applicability to the targeted modeling communities.
The Data Exchange’s central premise is to provide modeling communities with an
effective community infrastructure to publish datasets and server-side functions that will:
1. Register and transmit Datasets of any structural type supported by OPeNDAP,
2. Register and transmit Virtual Datasets authored using NcML ,
3. Register and execute “Ferret” conformant server-side Functions.
4. Register and trigger Subscriptions that follow the evolution of a Dataset,
5. Register and execute “Data Exchange” conformant Tasks,
6. Link data subscription notifications to the execution of a “Data Exchange” Task,
7. Register and manage “Data Exchange” Communities to delineate and control
access to Datasets, Functions, Subscriptions and Tasks.
The community infrastructure will focus on the following core concerns:
1. Transparency - Existing OPeNDAP publishers and consumers will be able to use
the infrastructure without making changes to their current practices and processes,
2. Elasticity - The infrastructure will automatically adjust its computing and storage
capacity to meet demand
3. Fault-Tolerance – The infrastructure will continue to operate and self-heal in the
presence of any infrastructure component failure; i.e., network, storage, computer
and/or process.
The development effort comprises the following activities:
1. Acquisition and deployment of the OPeNDAP technologies within the OOI CI
Cloud Computing Platform. The technologies involved are: Unidata’s THREDDS
Data Server (TDS), NOAA PMEL’s Ferret extensions to TDS (F-TDS), and Bill
Howe’s GridFields.
2. Design and development of the Data Exchange application logic and web
interface for life cycle management of Datasets, Functions, Subscriptions, Tasks
and Communities.
3. Design and development of the OOI CI Cloud Computing Platform, which
comprises the Messaging Service, the Provisioning Service, the Monitoring
Service and the Component Deployment Pipeline.
4. Engage the target communities to promote the Data exchange and elicit feedback
5. Operate and maintain the Data Exchange.
6. Management of the effort
Activities 1, 2 and 4 are funded by the NOAA cooperative agreement. The JOI Subaward
under the NSF OOI contract funds activities 3 and 6. Activity 5 is supported by both
funding sources.
Deliverables and Schedule
As stated the Data Exchange will be developed and released incrementally through five
iterations. All of the identified work is initially scheduled for the first four iterations. The
fifth iteration remains open to manage any overflow and/or rework that is required. Each
iteration is assigned a name, duration and set of deliverables.
Project Start – 23 Mar 09
Project End – 19 Mar 10
Iteration 1 – Zinc Release, 23 Mar to 8 May
 F-TDS Cloud Deployment Prototype
 Ferret Function Prototype - Time average for Rectilinear Grids
 Dataset Publishing Model Design
 Data Exchange User Interface Design
Iteration 2 – Aluminum Release, 4 May to 19 Jun
 Community Lifecycle & Participation Mgmt
 Dataset Publishing
 Dataset Caching (full copy of content)
 F-TDS Cloud Deployment
 Ferret Server-Side Functions for Rectilinear Grids
 Subscription Model and Task Mgmt Design Review
 Dataset Publishing Operational Review
 F-TDS Server-Side Functions Operational Review
Iteration 3 – Nickel Release, 15 Jun to 31 Jul
 Dataset & Community Subscription
 Subscription driven Tasking
 GridField Function - Time average for Unstructured Grids
 Community-Deployed Function Lifecycle Mgmt Design Review
 Subscription & Tasking Operational Review
Iteration 4 – Monel Release, 27 Jul to 11 Sep
 NcML Virtual Dataset Publishing
 User supplied Functions Lifecycle Mgmt
 Dataset caching (subset copy of content)
 GridField Functions for Unstructured Grids
 MatLab Functions
 Virtual Dataset Publishing Operational Review
 Unstructured Grid Server-Side Functions Operational Review
 User supplied Function Publishing Operational Review
Iteration 5 – Titanium Release, 7 Sep to 23 Oct
 Overflow and rework tasks
 Development’s Findings and Recommendations
Operations and Community Assessment – 19 Oct 09 to 19 Mar 10
 Data Exchange Usage Report
 Users’ Assessment Report
 Stakeholders’ Findings and Recommendations
Resources & Cost
Resources and associated costs needed for Activities 1, 2, 3 (partial) and 4 funded by
NOAA cooperative agreement are:
 Application Designer – 3 mo ($XXX)
 Application Developer – 6 mo ($XXX)
 Infrastructure Developer – 6 mo ($XXX)
 Travel – 6 trips ($XXX)
Note: Costs are fully burdened, which include benefits and indirect costs where applied.
Risks
The three areas of risk are:
1. OOI priorities may change.
2. Data Exchange infrastructure subsystems may be delivered late.
3. The application technologies (TDF, Ferret, GridFields) may prove more difficult
to adapt to a scalable deployment model.
1. The two risks arising from a change in OOI priorities are:
a. OOI construction starts in 3rd quarter of 2009 – The likelihood is high and
the consequence is low. The deliverables for this effort are a subset of the
first release of the OOI CI. The Data Exchange effort would be
synchronized with the OOI CI construction effort, which would expand
the scope and lengthen the schedule but would not increase cost.
b. The OOI may reassess the priorities and funding allocations for the Pilot
Period, which could delay the infrastructure components. The likelihood is
moderate and the consequence is moderate. The functionality associated
with communities, tasking and user-supplied functions would be dropped.
2. The two risks arising from delays in the Data Exchange infrastructure are:
a. Messaging Service is late – Likelihood is moderate and consequence is
low. The scale of the Data Exchange deployment would be limited to a
single execution site, specifically Amazon EC2.
b. Provisioning Service is late – Likelihood is low and consequence is
moderate. The lack of on-demand capacity scheduling would require
manual allocation of computing resources, which would increase both
labor and computing costs during the operational phase.
3. The likelihood of an application technology being more difficult to adapt to the
scalable deployment environment is moderate and the consequence is high.
Difficulty with any of the three core technologies will delay the schedule and
increase labor costs. These can be offset by reducing the functional and
operational scope of the Data Exchange as suggested above in 1.b.
Download