Supporting Science Through Workflows: Infrastructure, Architecture and Modeling David Woollard

advertisement
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Supporting Science Through Workflows:
Infrastructure, Architecture and Modeling
David Woollard
NASA Jet Propulsion Laboratory
University of Southern California
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Agenda
» Motivation
» Classification of in silico Experimentation
» Research Problem
» Related Work
» Introduction to Workflow Systems
» Research Goals
» Methodology
» Refactoring existing software
» Domain Specific Software Architecture
» Evaluation
» Conclusions & Future Work
D.M. Woollard. Supporting Science Through Workflows.
2
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Motivation
• The nature of scientific investigations has changed.
• Two major trend lines:
– Simulation via computer has for many replaced in vivo and in
vitro science.
– Collaborations are growing (system of systems science).
• New discoveries in materials science, chemistry,
physics, planetary science, and even social sciences
are made via in silico experimentation.
D.M. Woollard. Supporting Science Through Workflows.
3
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
in silico Experimentation
•
Discovery is a phase
is which a scientist
rapidly prototypes,
tests hypotheses,
and develops a
methodology
Theory
Development
Execution
Lone Researcher
[Kepner 03]
Discovery
Practice
Production
D.M. Woollard. Supporting Science Through Workflows.
Distribution
4
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
in silico Experimentation
• Production is the
engineering of
replicating an
experiment on
large volumes of
data.
We will focus on
Production Systems in
this talk.
Discovery
Production
D.M. Woollard. Supporting Science Through Workflows.
Distribution
5
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
in silico Experimentation
•
Distribution is a phase
in which data is
dispersed to peers for
review and further
experimentation
including:
Papers
Federated Data
Digital Libraries
Discovery
Production
D.M. Woollard. Supporting Science Through Workflows.
Distribution
6
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
The Role of Technology
• In silico science, especially system of systems
science, is facilitated by the Grid.
“The sharing that we are concerned with is not primarily file
exchange but rather direct access to computers, software,
data, and other resources, as is required by a range of
collaborative problem-solving and resource- brokering
strategies emerging in industry, science, and engineering.”
The Anatomy of the Grid
(2001)
D.M. Woollard. Supporting Science Through Workflows.
7
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Research Problem
•
Scientists harness complex hardware and software systems in order to
conduct scientific research in silico.
•
Once algorithms and processes are established, production systems
are created to produce large volumes of data.
•
Designing a production system is a complex engineering task as well
as a complex scientific task.
Meeting these production requirements causes scientists to
engineer a production system or a software engineer to rewrite
scientific code. This is both inefficient and costly.
D.M. Woollard. Supporting Science Through Workflows.
8
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Introduction to Workflows
Production
Systems
Workflows
Grid Systems
T0
Grid Systems
T1 have traditionally
focused on creating Virtual
T2
Organizations.
In Grids, workflows orchestrate
T3
processing tasks in production
systems.
T4
Workflows are a processing model
Workflow
management
that incorporate
actors,systems
tasks, data,
execute
tasks on data once the task’s
and rules.
dependencies are satisfied based on
rules.
D.M. Woollard. Supporting Science Through Workflows.
9
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Workflow System Model
D.M. Woollard. Supporting Science Through Workflows.
10
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Workflows Workflows Everywhere
Karajan
Wings
Askalon
Unicore
Gridbus
Taverna
Grid Workflow
Triana
GridAnt
ICENI
VDS
Condor-G
GrADS
Pegasus
Kepler
Yawl
GridFlow
OODT
D.M. Woollard. Supporting Science Through Workflows.
DAG-Man
SciFlow
11
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Bottom-up Taxonomy
• Yu & Buyya presented a
taxonomy [Yu & Buyya 05]
– Based on workflow
properties like model
representation and
scheduling policy
– Illustration of divergence
in the field
No taxonomy by
interface to task code.
D.M. Woollard. Supporting Science Through Workflows.
12
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Insights from an Architect
• Each production workflow task is a complex software application
with two primary stakeholders: the scientist and the engineer.
• Software architectures are a system’s blueprint–its form,
elements, and rationale [Perry & Wolf, 92].
• An architecture provides appropriate views for each
stakeholder in addition to encapsulation of computation and
communication. These are the architecture’s components,
connectors and topology.
• Reification of architectural elements in code is a method of
bridging the gap between design and implementation. Firstclass connectors and explicit interfaces are such reifications.
D.M. Woollard. Supporting Science Through Workflows.
13
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Research Goals
• Develop a Domain Specific Software Architecture
(DSSA) for tasks in scientific workflows.
• Develop a methodology for refactoring existing
scientific code into this DSSA.
• Minimize overhead (computation time and memory
footprint).
• Maximize science code reuse.
D.M. Woollard. Supporting Science Through Workflows.
14
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Agenda
» Motivation
» Classification of in silico Experimentation
» Research Problem
» Related Work
» Introduction to Workflow Systems
» Research Goals
» Methodology
» Refactoring existing software
» Domain Specific Software Architecture
» Evaluation
» Conclusions & Future Work
D.M. Woollard. Supporting Science Through Workflows.
15
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Decomposing Software
• Decomposition, the first step in the approach, is a process in
which scientific modules are identified and control flow
determined.
• Scientific modules are like functions - they have internal scope
and a single entry and exit point. In graph theoretic terms, the
call dominancy tree for the basic blocks in the module only have
one source and one sink.
• The proper level of decomposition is dependant on both
scientific functionality and engineering requirements. Therefore,
it should be “tunable.”
Decomposition
Architecting
D.M. Woollard. Supporting Science Through Workflows.
Deployment
16
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
“Injecting” Architecture
• In the second part of the approach, these modules must be
“architected” into a workflow task with connectors to services
at appropriate levels (to satisfy production requirements).
• We use Prism-MW wrappers to encapsulate and
componentized these decomposed modules. This provides us
with a standard interface and utilities at the module level for
employing event-based communication.
• We use the Exogenous Connector style [Lau et. al.] to mimic the
original control and data flow in the workflow task and augment
these connectors with a specialized version of the invoking
connector.
Decomposition
Architecting
D.M. Woollard. Supporting Science Through Workflows.
Deployment
17
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Deploying to the Grid
• Deployment is the last step in our approach.
• We currently deploy the resulting workflow component into the
OODT Science Data System environment. This is a grid
workflow management system used at JPL.
• We should note that this choice is purely for the sake of
developer convenience, the approach such be deployable to
any target workflow management system.
Decomposition
Architecting
D.M. Woollard. Supporting Science Through Workflows.
Deployment
18
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
SWSA Architecture
Scientific Workflow Software Architecture
(SWSA), a domain specific software
architecture for workflow tasks.
D.M. Woollard. Supporting Science Through Workflows.
19
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Preliminary Evaluation
•
•
We chose a canonical scientific
application (matrix multiplication)
implemented in both Fortran and
C
Six different metrics were taken:
– Execution time for:
• Base application
• Wrapper (no data exchanged)
• Wrapper (data exchanged)
– Memory Footprint
• Base application
• Wrapper (no data exchanged)
• Wrapper (data exchanged)
D.M. Woollard. Supporting Science Through Workflows.
20
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Preliminary Evaluation
Refactoring Methodology Example:
Molecular Dynamics Simulation
Performance results are very
promising:
Time Overhead: 1.85%
Code Reuse: 96.77%
D.M. Woollard. Supporting Science Through Workflows.
21
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Conclusions & Future Work
• Scientific Workflow Software Architecture (SWSA) improves
upon existing workflow systems by providing:
– A methodology for accessing services.
– A separation of concerns between scientific algorithms and
production features of code.
– A clean separation of roles between the scientist and the engineer.
• Satisfies the “cult of performance.”
• Future Work
– Extended evaluation on more advanced simulation codes.
– Expansion of the the architecture to support parallel codes.
D.M. Woollard. Supporting Science Through Workflows.
22
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Thank You
For more information, please see:
• D. Woollard, N. Medvidovic, Y. Gil, and C. Mattmann. “Scientific Software as Workflows: From
Discovery to Distribution.” To appear in IEEE Software Special Issue on Developing Scientific
Software, 2008.
• D. Woollard, D. Freeborn, E. Kay-Im, S. LaVoie. “Case Studies in Science Data Systems:
Meeting Software Challenges in Competitive Environments.” To appear in Proceedings of the
10th International Conference on Space Operations (SpaceOps-2008), AIAA press, Heidelberg,
Germany, May 2008.
• D. Woollard. “Supporting Scientific Workflows Through First-Class Connectors.” Qualifying
Examination Report. University of Southern California. May, 2007.
• D. Woollard, C. Mattmann, and N. Medvidovic "Injecting Software Architectural Constraints
into Legacy Scientific Applications." USC Center for Software Engineering Technical Report,
USC-CSE-2007-701, January 2007.
Portions of this research were conducted at the Jet Propulsion
Laboratory managed by the California Institute of Technology under a
contract with the National Aeronautics and Space Administration.
D.M. Woollard. Supporting Science Through Workflows.
23
Download