Rapid Prototype System - Mississippi State University

advertisement
Preliminary Design for RPC
Preliminary Design
Of the
Rapid Prototyping Capability
Version 1.0
By RPC Project Team
Mississippi State University
Robert Moorhead, P.I.
David Shaw, Co-P.I.
May 3, 2006
Page 1 of 12
Preliminary Design for RPC
TABLE OF CONTENTS
1.
Introduction ................................................................................................................. 2
1.1.
Purpose................................................................................................................ 2
1.2.
System Scope ...................................................................................................... 3
1.3.
References ........................................................................................................... 4
1.4.
Overview ............................................................................................................. 4
2. General System Description ....................................................................................... 5
2.1.
System context .................................................................................................... 5
2.2.
System modes and states ..................................................................................... 5
2.3.
Major system capabilities ................................................................................... 6
2.4.
Major System Constraints ................................................................................... 7
2.5.
User Characteristics ............................................................................................ 7
2.6.
Assumptions and dependencies .......................................................................... 7
2.7.
Operational scenarios .......................................................................................... 8
3. System Capabilities, Conditions, and Constraints ...................................................... 8
3.1.
Physical ............................................................................................................... 8
3.2.
System performance characteristics .................................................................... 8
3.3.
System security ................................................................................................... 9
3.4.
Information management .................................................................................... 9
3.5.
System operations ............................................................................................... 9
3.6.
Policy and Regulation ....................................................................................... 10
3.7.
System life cycle sustainment ........................................................................... 10
4. System Interfaces ...................................................................................................... 11
4.1.
System Administrator Interface ........................................................................ 11
4.2.
Model Specialist Interface ................................................................................ 11
4.3.
Data Specialist Interface ................................................................................... 11
4.4.
Domain Specialist Interface .............................................................................. 11
4.5.
Analyst Interface ............................................................................................... 12
4.6.
Data Provider Interface ..................................................................................... 12
4.7.
Access to Remote Resources Interface ............................................................. 12
1.
Introduction
1.1.
Purpose
The purpose of the Preliminary Design (PD) of the Rapid Prototyping Capability (RPC)
is to provide a “black-box” description of what the RPC should do, in terms of the
system’s interactions or interfaces with its external environment. [1]. As such, the PD
presents a basis for developing an agreement between the developers (Mississippi State
University) and the customers (NASA and NASA research communities) on intended
functionality, capabilities and operational characteristics of the system to be developed. It
provides viewpoints about fundamental aspects of the proposed RPC so that the customer
may develop insight about the system and provide opinion and feedback to the design
process that may be incorporated into the RPC Implementation Plan (IP). Similarly, the
PD considers major functionalities needed by the end users of the RPC, who will
comprise multi-disciplinary teams that will likely include model experts, scientists,
Page 2 of 12
Preliminary Design for RPC
researchers, data providers, and potentially agency partners that operate computational
models and decision support tools (DST) as part of their mission.
In general, the purpose of the PD is to provide
- Assurance to the customer that the developers at MSU understand customer’s
needs and are responsive to them
- An early opportunity for bidirectional feedback between the customer and
developers
- A method for the customer and the developers to identify problems and
misunderstandings early in the design while relatively inexpensive to correct
- A basis for the system qualification to establish that the system meets the
customer’s needs
- Protection for the developers, providing a baseline for system capabilities and a
basis of determining when the construction of the system is complete
- Support for the developer’s program planning, design, and development efforts
- Aid in assessing the effects of the inevitable requirement changes
- Opportunity for the customer to provide feedback and opinions that may be easily
incorporated into the IP
- Increased protection against customer and developer misunderstandings as the
development progresses.
1.2.
System Scope
The fully developed Rapid Prototyping Capability (RPC) will allow research results
harvested from the Solutions Network to be identified for RPC evaluation. The RPC fills
a critical role to reduce the amount of time that has typically been required to consider the
utility of new or future data streams on model outcomes. In the RPC evaluation, model
developers and owners will systematically evaluate research capabilities, based on the use
of specific NASA Earth-Sun system science research results in a simulated operational
environment in order to evaluate components and/or configurations that could be
considered for verification, validation, and benchmarking for transition from research to
operations and/or into an integrated system solution (ISS). Figure 1 illustrates the
interface between the RPC and external systems that include the SN and ISS components
of NASA’s Earth Science Application Plan.
Page 3 of 12
Preliminary Design for RPC
Figure 1: The Solutions Network will provide NASA with the ability to identify NASA science
results as candidates for RPC evaluation. Important criteria for RPC evaluations are identify
successful science results, baseline the conditions of the current system, identify new or future
NASA data streams that might mitigate or enhance the current solution system, define the
existing application schema, develop an RPC evaluation team, identify stakeholders in the
solution, identify a pathway to ISS, and develop the data and model resources needed to
conduct the RPC evaluation.
1.3.
References
[1] IEEE Guide for Developing System Requirements Specifications, IEEE Std 1233,
1998 Edition
1.4.
Overview
This “Preliminary Design” of the RPC contains a series of system-level viewpoints that
are intended to convey a common basis of understanding for developing agreement and
invoking feedback and opinions from the PDR team sufficient to move forward in
developing and cooperatively finalizing an RPC Implementation Plan. The level of detail
provided in the PD will enable Preliminary Design Review (PDR) team members to
understand major functional aspect of the system at a black-box level, to relate the major
interfaces of the RPC to external systems, and to understand the user-level expectations
for requirements to gain entree to the RPC and what the user can expect to accomplish
through use of the RPC.
The organization of the PD follows the IEEE Std 1233 for System Requirements
Specifications and omits component sections that are deemed not applicable to the RPC
design and functional implementation. The PD contains major sections that include (2)
General System Description; (3) System Capabilities, Conditions, and Constraints; and
(4) System Interfaces. Some degree of parallelism is incorporated with the RPC
Capabilities Document to ensure that a common basis is presented that is consistent
between the PD and Capabilities Document.
Page 4 of 12
Preliminary Design for RPC
2.
General System Description
2.1.
System context
The RPC will provide the capability to rapidly evaluate innovative methods of linking
science observations from current and near-term sensors and output from NASA models.
The same capability will facilitate the demonstration and evaluation of improvements in
future sensor systems and models and will provide a systematic way to extend the
benefits of Earth system science research for society. Therefore, it is anticipated that the
RPC will provide the capability to rapidly assess the efficacy of applications in Earth-Sun
System Science and thus enabling the examination, evaluation and benchmarking of
competing methods.
To achieve these goals, the RPC will provide the capability to integrate and provide
access to the tools needed to evaluate the use of a wide variety of current and future
NASA sensors and research results, model outputs, and knowledge, collectively referred
to as “resources”. It is assumed that the resources are geographically distributed and thus
RPC will provide the support for the location transparency of the resources.
Figure 2: The RPC concept as an integration platform for composing, executing, and
analyzing numerical experiments for Earth-Sun System Science supporting the location
transparency of resources.
2.2.
System modes and states
During its life cycle, new resources and tools will be integrated with the RPC node,
increasing the repertoire of experiments and analyses that can be performed. Before an
experiment can be performed (a particular model using a particular data source) two
conditions must be satisfied. First, the model must be installed at some computing facility
Page 5 of 12
Preliminary Design for RPC
assessable to RPC users, and configured to run; second, the data must be configured so
that it can be used by the model. The data configuration may involve developing tools for
the data conversions (format translations, subsetting, deriving values of variables not
included in the original data products, geo-processing, etc). Consequently, from the point
of view of performing a particular experiment and analysis, the RPC can be in two
distinct states:
- ready for the experiment and analysis by end users
- requiring action of specialists for installing and configuring the model and its data
2.3.
Major system capabilities
The RPC must support at least two major categories of experiments (and subsequent
analysis): comparing results of a particular model as fed with data coming from different
sources, and comparing different models using the data coming form the same source.
Figure 3: Two major categories of experiments and subsequent analysis to be supported
by RPC.
The realization of such experiments requires the following capabilities:
1.
2.
3.
Discovery, semantic understanding, secure access and transport
mechanisms for data products available from the known data provides
(Science Data Manager)
Data assimilation and geo-processing tools for all data transformations
needed to match a given data product (or products) to the model input
requirements, and support for organizing the data processing into
workflows built from reusable and interoperable modules, including both
the workflow specification mechanisms and the workflow enacting
engine (Interoperable Geo-processing Environment)
Model management:
Page 6 of 12
Preliminary Design for RPC
4.
2.4.
i. Catalog of available models, model metadata catalog (including
input and output model requirements), and mechanisms for
integrating new models with RPC
ii. mechanisms for creation runtime environments; data staging (in
and out); job scheduling, remote execution, and monitoring
iii. mechanisms for storing model outputs together with metadata and
provenance information (all information needed to recreate the
output data set); the metadata enable search and discovery of
model outputs
Tools for model output analysis (including visualizations), tools for
quantitative comparing model outputs, and tools for model benchmarking
(Performance Metrics Workbench)
Major System Constraints
Only models and data made available to RPC users and integrated with the RPC node can
be used to perform experiments. Installation and/or integration of models as well as
integration and geo-processing of data need to be performed by a respective specialist,
and the time needed for accomplish that task will depend on the complexity of a
particular model and data set(s). Running a model may take a long time, depending on
the complexity of the model and a particular configuration of the model. It is in many
cases unreasonable to expect that the experiments can be performed in real time.
2.5.
User Characteristics
The RPC will have five categories of users:
1. System administrators – responsible for deployment, configuration, and
maintenance of the system, and its users (for access control purposes)
2. Application specialists – responsible for installation and configuration of the
model on computational systems accessible to the RPC users, and integrating
these models with the RPC (which includes definition of the input and output data
requirements).
3. Data processing specialists – responsible for the development and the deployment
of the tools for data transformations
4. Domain specialists – responsible for defining, configuring (creating workflows
for data processing, setting model parameters, etc), and executing experiments
5. Domain specialist performing the data analysis.
A single user (depending on his or her expertise) may assume one or more roles (perhaps
all except for the system administration)
2.6.
Assumptions and dependencies
1. The RPC will depend on data and models provided by third parties.
2. Access to remote computational and storage facilities will be controlled according
to policies established by the facility owners (stakeholders). It is assumed that
these policies will allow RPC users to submit and monitor jobs on these systems
which may require penetrating firewalls. It is possible that the access privileges
Page 7 of 12
Preliminary Design for RPC
will be different for different users, depending on organizational membership,
nationality, or other factors beyond the control of the RPC system developers.
2.7.
Operational scenarios
To perform an experiment the following operations will need to be done:
1. Design of experiment – identification of models and data sets to be used
2. Assessment whether the models and data are currently integrated with the RPC
node
3. Filling requests to model and data specialists, as needed; the specialists issue a
notification when the models and data are available
4. Configuration of the experiment (setting the model parameters, configuring the
data (e.g., ROI, timeframe, etc)
5. Asynchronous run and monitoring of the model
6. Analysis
3.
System Capabilities, Conditions, and Constraints
3.1.
Physical
The RPC node will be installed on a dedicated, stand-alone system [of components]
consisting of standard commercially available computing nodes, data storage and hosting
middleware servers. The RPC shall be modular in its logical design; therefore, best
practices in developing a system with desired stability and configuration management
characteristics would suggest that the physical implementation of the RPC should parallel
the logical design of the system. This suggested practice would result in developing core
RPC modular capabilities on separate computing nodes, thereby providing enhanced
isolation of functional blocks aggregated by modular grouping. Interfaces between
physically isolated logical modules could be limited and provide an enhanced flow-ofcontrol among modules such that there is not undue competition for computational
resources for a given RPC evaluation. The RPC system will be complemented with
remote resources – high performance computing and storage facilities as needed by the
models to be used in the experiments. The RPC will be relocatable (can be moved form
one geographical location to another), and the access to the remote resources will require
standard internet connections.
The initial configuration of the RPC node will match the requirements of the experiments
needed to perform evaluation of the RPC node. It will be possible to change the initial
configuration (e.g., adding additional computational nodes or expanding the storage
capacity) to match new requirements at a later time.
3.2.
System performance characteristics
The RPC node is an integration platform for enabling rapid linking of science results
coming from NASA science data missions, NASA models, and NASA partners. The
overall performance of the system – the time needed to perform an experiment – thus
depends on the performance of its components and will vary with the size of the data and
speeds of the available networks and will vary from model to model.
Page 8 of 12
Preliminary Design for RPC
The primary goal of the RPC node is to provide the capability to rapidly prototype the
assimilation of new or future NASA data products and/or model derived data streams into
model applications that have generated demonstrable scientific results of merit and
stakeholder interest. However, there is no established benchmark to quantitatively specify
what “rapid” means. The reference point is the current practice – manual configuration of
data and models, whereas the expectation is that the RPC approach will considerably
speed up the process, in particular for repeated experiments, after the baseline data and
models are set up. However, the initial phase – setting the baseline data and models –
may prove to be time consuming as it will involve model integration, data acquisition and
simulation, and the development of new components for geoprocessing the data.
Achieving “rapid prototyping” capability is the focus of this project. It is expected that
“Rapid Prototyping” performance benefits will best be realized through the reusability of
configured geoprocessing tasks to provide model-ready input data to a model that has
been fully integrated into the RPC. It is this “reuse” capability that will enable the rapid
evaluation of new data types. By associating existing geoprocessing workflows with new
data types, the rapid assimilation of next-generation data into configured models should
be readily achievable.
3.3.
System security
The system will be made available only to registered users and the access will be
controlled by the user id, password and the role. The system administrators will have
privileges to register users, assign roles to the users, install/uninstall software, and backup
and restore data. Model and data specialists will have privileges to install and configure
the models, and install geo-processing modules, respectively. Domain experts will have
privileges to configure, run, monitor and control experiments, and analyze results.
The RPC will provide access to remote resources. These resources will be protected by
following the policies and procedures established by their respective stakeholders. The
RPC will provide support for the delegation of credentials to enable “a single log-on”
feature. Accessing the remote resources will depend on the ability to penetrate firewalls,
if any.
3.4.
Information management
Relevant RPC data resources, operational use, metadata, and associated information will
be managed following the commercial practice using DBMS systems (including the
standard backup/restore capabilities).
3.5.
System operations
1. System human factors: no specialized interfaces will be implemented
2. System maintainability: the system will be implemented following the current
industry standards and open source software, and designed to support extensibility
3. System reliability: best effort approach will be adopted; Service level agreements
(SLA) will be defined in the later phases of the project
Page 9 of 12
Preliminary Design for RPC
3.6.
Policy and Regulation
The Implementation Plan to be developed will provide insight as to Policies and
Regulations that might include requirements for a specific Work Breakdown Structure to
be developed to manage and track the status of development of the RPC and components
thereof. Policy consideration may be incorporated to describe the degree of detail needed
to describe the work, schedule of activities, and allocation of resources appropriate to the
scope of the task, the level of effort, and the priority development of the RPC
components being considered.
As the RPC develops into a viable simulation system, it is expected that activities
requiring RPC resources will be requested and coordinated among those selecting an
RPC for evaluation, the RPC team conducting a specific evaluation, and RPC developers
who will be required to maintain and evolve the RPC to support requirements for
integrating new model applications, data products, and geoprocessing tasks. As the RPC
evolves to meet new or changing requirements, configuration management practices,
version control, and developmental practices will be followed to ensure that capabilities
in development will be isolated from operational RPC capabilities. Simply stated,
development activities, testing, and integration of new functionalities into the RPC
should be “contained” through the use of segregated physical or virtual systems that may
be isolated from the operational instance of the RPC. As new capabilities mature through
development processes, configuration “check-in” procedures will be followed to ensure
the orderly integration of the new “proven” capabilities. Although these statements are
forward looking, it is likely that such new capabilities will be added in an orderly fashion
and released for use following a defined policy. It is likely that such activities will
involve proactive participation of an RPC technical working group.
3.7.
System life cycle sustainment
Systems development life-cycle (SDLC) or Total Life Cycle Systems Management are
fundamental approaches to systems engineering processes which provide policy-based
and structural underpinning to the development of resources for information technology
and process automation. SDLC phases that include Project Definition, User Requirements
Definition, System/Data Requirements Definition, Analysis and Design, System
Build/Prototype/Pilot, Implementation and Training, and Sustainment are useful for
ensuring the viability of the initial or continuing capability. Typically, the SDLC process
involves the use of status tracking for phases of the project. Given a working and
operational RPC, some level of consideration in the IP shall be given to support needs,
technology reviews for enhancements, or updates to the hardware and software to provide
sustainment and continuity of operations. Typical practices in maintenance and service,
business process reviews, and system reviews shall likely be core components of the
sustainment solution for the RPC and will likely involve the proactive involvement of an
RPC technical working group and direction from an advisory or steering group.
Page 10 of 12
Preliminary Design for RPC
4.
System Interfaces
The RPC node has 5 categories of users, each requiring a dedicated interface. In addition,
the RPC interacts with two classes of external systems: data providers and remote
computing and storage facilities.
4.1.
System Administrator Interface
The administrator interface must support the administrator tasks:
- registering and de-registering users and assigning roles
- maintaining the user credentials needed to access remote resources
- monitoring the system status and usage
- backing up and restoring data and software; recovery from faults
- deployment of new software components and services
4.2.
Model Specialist Interface
The model specialist is responsible for deploying and integrating the models into the RPC
environment. The models can be installed either locally on RPC node hardware and/or at
a remote computing facility. To deploy the model a generic access to the operating
system is needed. To integrate the model with RPC the specialist must “register” the
model, that is, generate a metadata record that describes the model in terms of its
functionality, the runtime requirements (location of the executable, environmental
variables, the structure of the working directory, etc.), model parameters and definition of
the input and output datasets. The model specialist interface must thus support the
registration of new models and editing of the metadata of the existing models. In
addition, the model specialist interface must provide support for the testing of the
correctness of the model deployment.
4.3.
Data Specialist Interface
The data specialist identifies the data providers and designs the geo-processing procedure
for transforming the original data product to match the model input data requirements.
The design of the geo-processing may require the development and deployment of
software components to perform specified tasks. It follows that the data specialist
interface must provide support for
- searching data products from known data providers
- assessing the structure and syntax of available data products
- assessing the model input data requirements
- discovering and evaluating the geo-processing modules already integrated with
the RPC node
- integrating new geo-processing modules with the RPC node
- composing the geo-processing process from available components
- testing of the correctness of the geo-processing procedure
4.4.
Domain Specialist Interface
Page 11 of 12
Preliminary Design for RPC
The domain specialist designs and performs experiments. To support these activities, the
domain specialist interface must support:
- Discovery of available models and data through the RPC facilities
- Filling and receiving request for new models and data
- Configuring experiments by
o Connecting a particular model with particular data
o Setting the model parameters
o Configuring datasets (region of interest, timeframe, etc.)
- Submitting models for execution
- Monitoring the model progress
- Controlling the model execution (e.g., aborting it, if needed)
- Verifying that the model completed successfully (e.g., by examining a log file
generated by the model, running a test applications, etc.)
4.5.
Analyst Interface
The analyst analyses the experiment outcome. The analyst interface must:
- Allows queries of the output data databases to find the model outputs of interest
- Provide access to model outputs
- Provide access to model provenance (when and in what circumstances the model
has been run, e.g., what input data sets has been used, what where the values of
the model parameters, etc.)
- Provide access to tools (visualizations or otherwise) enabling access to the results
of the experiments
4.6.
Data Provider Interface
The RPC must define interfaces that allow acceptance of data streams coming from data
providers.
4.7.
Access to Remote Resources Interface
The RPC must define interfaces for invoking Grid services such as allocating and
monitoring remote resources, accepting notifications about status changes (i.e., a job has
completed), and data transfers between RPC node and remote resources as well as data
transfers between remote resources. In addition, defined interfaces must support
delegation of user credentials to satisfy the access control requirements and policies of
the remote resources.
--- END ---
Page 12 of 12
Download