National Aeronautics and Space Administration Jet Propulsion Laboratory Supporting Science Through Workflows: Infrastructure, Architecture and Modeling David Woollard NASA Jet Propulsion Laboratory University of Southern California National Aeronautics and Space Administration Jet Propulsion Laboratory Agenda » Motivation » Classification of in silico Experimentation » Research Problem » Related Work » Introduction to Workflow Systems » Research Goals » Methodology » Refactoring existing software » Domain Specific Software Architecture » Evaluation » Conclusions & Future Work D.M. Woollard. Supporting Science Through Workflows. 2 National Aeronautics and Space Administration Jet Propulsion Laboratory Motivation • The nature of scientific investigations has changed. • Two major trend lines: – Simulation via computer has for many replaced in vivo and in vitro science. – Collaborations are growing (system of systems science). • New discoveries in materials science, chemistry, physics, planetary science, and even social sciences are made via in silico experimentation. D.M. Woollard. Supporting Science Through Workflows. 3 National Aeronautics and Space Administration Jet Propulsion Laboratory in silico Experimentation • Discovery is a phase is which a scientist rapidly prototypes, tests hypotheses, and develops a methodology Theory Development Execution Lone Researcher [Kepner 03] Discovery Practice Production D.M. Woollard. Supporting Science Through Workflows. Distribution 4 National Aeronautics and Space Administration Jet Propulsion Laboratory in silico Experimentation • Production is the engineering of replicating an experiment on large volumes of data. We will focus on Production Systems in this talk. Discovery Production D.M. Woollard. Supporting Science Through Workflows. Distribution 5 National Aeronautics and Space Administration Jet Propulsion Laboratory in silico Experimentation • Distribution is a phase in which data is dispersed to peers for review and further experimentation including: Papers Federated Data Digital Libraries Discovery Production D.M. Woollard. Supporting Science Through Workflows. Distribution 6 National Aeronautics and Space Administration Jet Propulsion Laboratory The Role of Technology • In silico science, especially system of systems science, is facilitated by the Grid. “The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource- brokering strategies emerging in industry, science, and engineering.” The Anatomy of the Grid (2001) D.M. Woollard. Supporting Science Through Workflows. 7 National Aeronautics and Space Administration Jet Propulsion Laboratory Research Problem • Scientists harness complex hardware and software systems in order to conduct scientific research in silico. • Once algorithms and processes are established, production systems are created to produce large volumes of data. • Designing a production system is a complex engineering task as well as a complex scientific task. Meeting these production requirements causes scientists to engineer a production system or a software engineer to rewrite scientific code. This is both inefficient and costly. D.M. Woollard. Supporting Science Through Workflows. 8 National Aeronautics and Space Administration Jet Propulsion Laboratory Introduction to Workflows Production Systems Workflows Grid Systems T0 Grid Systems T1 have traditionally focused on creating Virtual T2 Organizations. In Grids, workflows orchestrate T3 processing tasks in production systems. T4 Workflows are a processing model Workflow management that incorporate actors,systems tasks, data, execute tasks on data once the task’s and rules. dependencies are satisfied based on rules. D.M. Woollard. Supporting Science Through Workflows. 9 National Aeronautics and Space Administration Jet Propulsion Laboratory Workflow System Model D.M. Woollard. Supporting Science Through Workflows. 10 National Aeronautics and Space Administration Jet Propulsion Laboratory Workflows Workflows Everywhere Karajan Wings Askalon Unicore Gridbus Taverna Grid Workflow Triana GridAnt ICENI VDS Condor-G GrADS Pegasus Kepler Yawl GridFlow OODT D.M. Woollard. Supporting Science Through Workflows. DAG-Man SciFlow 11 National Aeronautics and Space Administration Jet Propulsion Laboratory Bottom-up Taxonomy • Yu & Buyya presented a taxonomy [Yu & Buyya 05] – Based on workflow properties like model representation and scheduling policy – Illustration of divergence in the field No taxonomy by interface to task code. D.M. Woollard. Supporting Science Through Workflows. 12 National Aeronautics and Space Administration Jet Propulsion Laboratory Insights from an Architect • Each production workflow task is a complex software application with two primary stakeholders: the scientist and the engineer. • Software architectures are a system’s blueprint–its form, elements, and rationale [Perry & Wolf, 92]. • An architecture provides appropriate views for each stakeholder in addition to encapsulation of computation and communication. These are the architecture’s components, connectors and topology. • Reification of architectural elements in code is a method of bridging the gap between design and implementation. Firstclass connectors and explicit interfaces are such reifications. D.M. Woollard. Supporting Science Through Workflows. 13 National Aeronautics and Space Administration Jet Propulsion Laboratory Research Goals • Develop a Domain Specific Software Architecture (DSSA) for tasks in scientific workflows. • Develop a methodology for refactoring existing scientific code into this DSSA. • Minimize overhead (computation time and memory footprint). • Maximize science code reuse. D.M. Woollard. Supporting Science Through Workflows. 14 National Aeronautics and Space Administration Jet Propulsion Laboratory Agenda » Motivation » Classification of in silico Experimentation » Research Problem » Related Work » Introduction to Workflow Systems » Research Goals » Methodology » Refactoring existing software » Domain Specific Software Architecture » Evaluation » Conclusions & Future Work D.M. Woollard. Supporting Science Through Workflows. 15 National Aeronautics and Space Administration Jet Propulsion Laboratory Decomposing Software • Decomposition, the first step in the approach, is a process in which scientific modules are identified and control flow determined. • Scientific modules are like functions - they have internal scope and a single entry and exit point. In graph theoretic terms, the call dominancy tree for the basic blocks in the module only have one source and one sink. • The proper level of decomposition is dependant on both scientific functionality and engineering requirements. Therefore, it should be “tunable.” Decomposition Architecting D.M. Woollard. Supporting Science Through Workflows. Deployment 16 National Aeronautics and Space Administration Jet Propulsion Laboratory “Injecting” Architecture • In the second part of the approach, these modules must be “architected” into a workflow task with connectors to services at appropriate levels (to satisfy production requirements). • We use Prism-MW wrappers to encapsulate and componentized these decomposed modules. This provides us with a standard interface and utilities at the module level for employing event-based communication. • We use the Exogenous Connector style [Lau et. al.] to mimic the original control and data flow in the workflow task and augment these connectors with a specialized version of the invoking connector. Decomposition Architecting D.M. Woollard. Supporting Science Through Workflows. Deployment 17 National Aeronautics and Space Administration Jet Propulsion Laboratory Deploying to the Grid • Deployment is the last step in our approach. • We currently deploy the resulting workflow component into the OODT Science Data System environment. This is a grid workflow management system used at JPL. • We should note that this choice is purely for the sake of developer convenience, the approach such be deployable to any target workflow management system. Decomposition Architecting D.M. Woollard. Supporting Science Through Workflows. Deployment 18 National Aeronautics and Space Administration Jet Propulsion Laboratory SWSA Architecture Scientific Workflow Software Architecture (SWSA), a domain specific software architecture for workflow tasks. D.M. Woollard. Supporting Science Through Workflows. 19 National Aeronautics and Space Administration Jet Propulsion Laboratory Preliminary Evaluation • • We chose a canonical scientific application (matrix multiplication) implemented in both Fortran and C Six different metrics were taken: – Execution time for: • Base application • Wrapper (no data exchanged) • Wrapper (data exchanged) – Memory Footprint • Base application • Wrapper (no data exchanged) • Wrapper (data exchanged) D.M. Woollard. Supporting Science Through Workflows. 20 National Aeronautics and Space Administration Jet Propulsion Laboratory Preliminary Evaluation Refactoring Methodology Example: Molecular Dynamics Simulation Performance results are very promising: Time Overhead: 1.85% Code Reuse: 96.77% D.M. Woollard. Supporting Science Through Workflows. 21 National Aeronautics and Space Administration Jet Propulsion Laboratory Conclusions & Future Work • Scientific Workflow Software Architecture (SWSA) improves upon existing workflow systems by providing: – A methodology for accessing services. – A separation of concerns between scientific algorithms and production features of code. – A clean separation of roles between the scientist and the engineer. • Satisfies the “cult of performance.” • Future Work – Extended evaluation on more advanced simulation codes. – Expansion of the the architecture to support parallel codes. D.M. Woollard. Supporting Science Through Workflows. 22 National Aeronautics and Space Administration Jet Propulsion Laboratory Thank You For more information, please see: • D. Woollard, N. Medvidovic, Y. Gil, and C. Mattmann. “Scientific Software as Workflows: From Discovery to Distribution.” To appear in IEEE Software Special Issue on Developing Scientific Software, 2008. • D. Woollard, D. Freeborn, E. Kay-Im, S. LaVoie. “Case Studies in Science Data Systems: Meeting Software Challenges in Competitive Environments.” To appear in Proceedings of the 10th International Conference on Space Operations (SpaceOps-2008), AIAA press, Heidelberg, Germany, May 2008. • D. Woollard. “Supporting Scientific Workflows Through First-Class Connectors.” Qualifying Examination Report. University of Southern California. May, 2007. • D. Woollard, C. Mattmann, and N. Medvidovic "Injecting Software Architectural Constraints into Legacy Scientific Applications." USC Center for Software Engineering Technical Report, USC-CSE-2007-701, January 2007. Portions of this research were conducted at the Jet Propulsion Laboratory managed by the California Institute of Technology under a contract with the National Aeronautics and Space Administration. D.M. Woollard. Supporting Science Through Workflows. 23