Workflow Orchestration: Conducting Science Efficiently on the Grid David Woollard March 17, 2009

advertisement
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Workflow Orchestration: Conducting
Science Efficiently on the Grid
March 17, 2009
David Woollard
NASA Jet Propulsion Lab
4800 Oak Grove Drive
Pasadena, CA 91108
Dept. of Computer Science
University of Southern California
Los Angeles, CA 90089
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Validating Computational Science
• Computational science, like all science, requires
validation
• Validation comes in two forms:
– Scaling (in data and computation)
– Independent replication
• Both forms require significant computational
resources
– Grid is a promising resource
Workflow Orchestration - March 17, 2009.
2
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Vision of the Grid
University of Southern California
Center for Software and Systems Engineering
NASA Jet Propulsion Laboratory
Science Data Systems Section
Aerospace Corporation
Northrop Grumman
Boeing Corporation
NASA Ames Research Center
Columbia Supercomputing Center
Lawrence Livermore National Lab
University of California San Diego
Supercomputing Center
Workflow Orchestration - March 17, 2009.
3
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Vision of the Grid
Like the power grid, the
computational Grid should
scale to the demands of
individual users.
Workflow Orchestration - March 17, 2009.
4
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Workflow-Based Specification
• Workflows orchestrate processes on the Grid
• Workflows are a processing model that incorporate
tasks, data, and rules.
• Workflow management systems execute tasks on
the Grid using data once the task’s dependencies are
satisfied based on rules.
Tas
k1
Tas
k2
Tas
k3
Tas
k5
Tas
k4
Workflow Orchestration - March 17, 2009.
5
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Scaling the Experiment
T
a
s
k
1
T
a
s
k
2
T
T
a
a
s
s
k
k
3
4
Other
Institutions
T
a
s
k
5
@Home
Laboratory
Institution
Workflow Orchestration - March 17, 2009.
Co-laboratory
6
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Independent Replication
T
a
s
k
1
T
a
s
k
2
T
T
a
a
s
s
k
k
3
4
T
a
s
k
5
Collaborator
Workflow Orchestration - March 17, 2009.
3rd Party
7
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Heterogeneous Environments
Laboratory
T
a
s
k
1
T
a
s
k
2
T
T
a
a
T
s
s
a
k
k
s
3
5
k
Workflow4 Engine 1
Institution
T
a
s
k
1
Grid Infrastructure 1
T
a
s
k
2
T
T
a
a
T
s
s
a
k
k
s
3
5
k
Workflow4 Engine 1
Grid Infrastructure 2
Collaborator
Workflow Orchestration - March 17, 2009.
Co-laboratory
T
a
s
k
1
T
a
s
k
2
T
T
a
a
T
s
s
a
k
k
s
3
5
k
Workflow
4 Engine 2
Grid Infrastructure 2
3rd Party
8
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Research Challenge
• Scientific validation requires:
– Scaling
– Replication
• Existing technologies exhibit three challenges:
– Require scientists to become engineers or vice versa
– Existing workflow specifications entwine scientific and
engineering concerns
– Existing workflow specifications are not portable
Workflow Orchestration - March 17, 2009.
9
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
A Model-Driven Approach
Computation
Independent Model
Implementation
Independent Model
Implementation
Workflow Orchestration - March 17, 2009.
Workflow Model
Domain-Specific
Software Architecture
Deployment
10
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Agenda
In the rest of this talk, we will cover:
• Models verses languages
• The role of software architecture
• Transforming workflows to domain-specific software
architectures
• Performance
• Future Work and Conclusions
Workflow Orchestration - March 17, 2009.
11
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
A Plethora of Workflow Languages
• Yu & Buyya presented a
taxonomy [Yu & Buyya 05]
– Based on workflow properties
like model representation and
scheduling policy
– Illustration of divergence in
the field
• As of last year, researchers
such as Osterweil, et. al.
[08] still advocated more
advanced language
features
• Considered a Grand
Challenge [Gil, et al. 07]
Workflow Orchestration - March 17, 2009.
12
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Making Decisions in Design Space
• Existing workflow languages violate separation of
concerns
– Scientists should work in languages applicable to the design
space, not the solution space
– Engineers should not have to become scientists to be able to
scale workflow-based systems
• If workflow languages become the realm of the
scientist, how does the software engineer effect
change?
Manipulation of the system at the architectural level
Workflow Orchestration - March 17, 2009.
13
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Orchestration Through Connectors
• Lau, et al., have proposed
exogenous connectors [Lau,
et al. 06].
– encapsulate both control and
data flow in a software system
– can be hierarchically composed
to simulate control flow
• Control can be managed
through several constructs:
– Sequence
– Conditional
– Branch & Bound
Workflow Orchestration - March 17, 2009.
A
B
A
B
C
A
B
A
B
14
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Invoking Connectors
• Different Grid infrastructures
interact with tasks in multiple
ways [Woollard 08]:
– Synchronous communication
– Events
– Web services
Workflow Orchestration - March 17, 2009.
15
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Custom Handlers
Exogenous
Connector
Control Flow
Invoking
Connector
Data Flow
Services
Control Flow
Data Flow
Internal
Custom
Logic
Handler
Component
Control Flow
Data Flow
Workflow Orchestration - March 17, 2009.
16
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
SWSA: A Domain Architecture
Workflow Orchestration - March 17, 2009.
17
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Implementation
• Prism-MW, an
architecturally-aware
middleware
– Components, Connectors,
Topologies and Architecture
are reified as first class
elements
• Exogenous connectors,
invoking connectors, and
component wrappers
around tasks are build with
Prism
Workflow Orchestration - March 17, 2009.
18
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Performance Studies
• Overhead induced in computation time and memory connectors
[Woollard, et al. 09].
• Impact of architectural deployment on computation [Woollard, et
al. 09].
- Modified an existing time series workflow used at JPL
- Deployed the system using OpenDAP and grid technology
to co-locate data and computation
Reduced typical analysis from 9+ hours to under 2 minutes
Workflow Orchestration - March 17, 2009.
19
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Deployment & Optimization
• In the future, we plan to
utilize advanced
architectural modeling and
deployment analysis to
guide software engineers
in deployment strategy
Workflow Orchestration - March 17, 2009.
20
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Conclusion
• Computational science requires validation
• Existing grid and workflow technologies are promising, but lack
support for scaling and replication across heterogeneous Grid
environments
• A model-driven approach allows scientists to manipulate
workflow specifications, while software engineers can effect the
transformed software architectures
Workflow Orchestration - March 17, 2009.
21
National Aeronautics and
Space Administration
Jet Propulsion Laboratory
Thank You
[Yu & Buyya 05] Yu, J. and Buyya, R. A Taxonomy of Workflow Management Systems for
Grid Computing. Journal of Grid Computing 3(3-4): pp. 171-200. 2005.
[Osterweil 08] Osterweil, L., et. al. Experience in using a process language to define
scientific workflow and generate dataset provenance. In Proceedings of the 16th ACM
SIGSOFT International Symposium on Foundations of software engineering, Atlanta,
Georgia. 2008.
[Gil, et. al. 07] Gil, Y., et. al. Examining the Challenges of Scientific Workflows. IEEE
Computer 40(12): pp. 24-32. 2007.
[Lau, et. al. 06] Lau, K., et. al. A Software Component Model and its Preliminary
Formalisation. In F.S. de Boer et al., editors, Proceedings of Fourth International
Symposium on Formal Methods for Components and Objects, Lecture Notes in
Computer Science 4111(1-21). 2006.
[Woollard 08] Woollard, D. Supporting the Engineering Aspects of e-Science Through
Workflow Services. Proceedings of the First Brazilian e-Science Workshop, Campinas,
Brazil, 2008.
[Woollard, et. al. 089 Woollard, D. et. al. Injecting Software Architectural Constraints into
Legacy Scientific Applications. To appear in Proceedings of the ICSE 2009 Workshop
on Software Engineering for Computational Science and Engineering. Vancouver,
Canada, 2009.
Workflow Orchestration - March 17, 2009.
22
Download