National Aeronautics and Space Administration Jet Propulsion Laboratory Workflow Orchestration: Conducting Science Efficiently on the Grid March 17, 2009 David Woollard NASA Jet Propulsion Lab 4800 Oak Grove Drive Pasadena, CA 91108 Dept. of Computer Science University of Southern California Los Angeles, CA 90089 National Aeronautics and Space Administration Jet Propulsion Laboratory Validating Computational Science • Computational science, like all science, requires validation • Validation comes in two forms: – Scaling (in data and computation) – Independent replication • Both forms require significant computational resources – Grid is a promising resource Workflow Orchestration - March 17, 2009. 2 National Aeronautics and Space Administration Jet Propulsion Laboratory Vision of the Grid University of Southern California Center for Software and Systems Engineering NASA Jet Propulsion Laboratory Science Data Systems Section Aerospace Corporation Northrop Grumman Boeing Corporation NASA Ames Research Center Columbia Supercomputing Center Lawrence Livermore National Lab University of California San Diego Supercomputing Center Workflow Orchestration - March 17, 2009. 3 National Aeronautics and Space Administration Jet Propulsion Laboratory Vision of the Grid Like the power grid, the computational Grid should scale to the demands of individual users. Workflow Orchestration - March 17, 2009. 4 National Aeronautics and Space Administration Jet Propulsion Laboratory Workflow-Based Specification • Workflows orchestrate processes on the Grid • Workflows are a processing model that incorporate tasks, data, and rules. • Workflow management systems execute tasks on the Grid using data once the task’s dependencies are satisfied based on rules. Tas k1 Tas k2 Tas k3 Tas k5 Tas k4 Workflow Orchestration - March 17, 2009. 5 National Aeronautics and Space Administration Jet Propulsion Laboratory Scaling the Experiment T a s k 1 T a s k 2 T T a a s s k k 3 4 Other Institutions T a s k 5 @Home Laboratory Institution Workflow Orchestration - March 17, 2009. Co-laboratory 6 National Aeronautics and Space Administration Jet Propulsion Laboratory Independent Replication T a s k 1 T a s k 2 T T a a s s k k 3 4 T a s k 5 Collaborator Workflow Orchestration - March 17, 2009. 3rd Party 7 National Aeronautics and Space Administration Jet Propulsion Laboratory Heterogeneous Environments Laboratory T a s k 1 T a s k 2 T T a a T s s a k k s 3 5 k Workflow4 Engine 1 Institution T a s k 1 Grid Infrastructure 1 T a s k 2 T T a a T s s a k k s 3 5 k Workflow4 Engine 1 Grid Infrastructure 2 Collaborator Workflow Orchestration - March 17, 2009. Co-laboratory T a s k 1 T a s k 2 T T a a T s s a k k s 3 5 k Workflow 4 Engine 2 Grid Infrastructure 2 3rd Party 8 National Aeronautics and Space Administration Jet Propulsion Laboratory Research Challenge • Scientific validation requires: – Scaling – Replication • Existing technologies exhibit three challenges: – Require scientists to become engineers or vice versa – Existing workflow specifications entwine scientific and engineering concerns – Existing workflow specifications are not portable Workflow Orchestration - March 17, 2009. 9 National Aeronautics and Space Administration Jet Propulsion Laboratory A Model-Driven Approach Computation Independent Model Implementation Independent Model Implementation Workflow Orchestration - March 17, 2009. Workflow Model Domain-Specific Software Architecture Deployment 10 National Aeronautics and Space Administration Jet Propulsion Laboratory Agenda In the rest of this talk, we will cover: • Models verses languages • The role of software architecture • Transforming workflows to domain-specific software architectures • Performance • Future Work and Conclusions Workflow Orchestration - March 17, 2009. 11 National Aeronautics and Space Administration Jet Propulsion Laboratory A Plethora of Workflow Languages • Yu & Buyya presented a taxonomy [Yu & Buyya 05] – Based on workflow properties like model representation and scheduling policy – Illustration of divergence in the field • As of last year, researchers such as Osterweil, et. al. [08] still advocated more advanced language features • Considered a Grand Challenge [Gil, et al. 07] Workflow Orchestration - March 17, 2009. 12 National Aeronautics and Space Administration Jet Propulsion Laboratory Making Decisions in Design Space • Existing workflow languages violate separation of concerns – Scientists should work in languages applicable to the design space, not the solution space – Engineers should not have to become scientists to be able to scale workflow-based systems • If workflow languages become the realm of the scientist, how does the software engineer effect change? Manipulation of the system at the architectural level Workflow Orchestration - March 17, 2009. 13 National Aeronautics and Space Administration Jet Propulsion Laboratory Orchestration Through Connectors • Lau, et al., have proposed exogenous connectors [Lau, et al. 06]. – encapsulate both control and data flow in a software system – can be hierarchically composed to simulate control flow • Control can be managed through several constructs: – Sequence – Conditional – Branch & Bound Workflow Orchestration - March 17, 2009. A B A B C A B A B 14 National Aeronautics and Space Administration Jet Propulsion Laboratory Invoking Connectors • Different Grid infrastructures interact with tasks in multiple ways [Woollard 08]: – Synchronous communication – Events – Web services Workflow Orchestration - March 17, 2009. 15 National Aeronautics and Space Administration Jet Propulsion Laboratory Custom Handlers Exogenous Connector Control Flow Invoking Connector Data Flow Services Control Flow Data Flow Internal Custom Logic Handler Component Control Flow Data Flow Workflow Orchestration - March 17, 2009. 16 National Aeronautics and Space Administration Jet Propulsion Laboratory SWSA: A Domain Architecture Workflow Orchestration - March 17, 2009. 17 National Aeronautics and Space Administration Jet Propulsion Laboratory Implementation • Prism-MW, an architecturally-aware middleware – Components, Connectors, Topologies and Architecture are reified as first class elements • Exogenous connectors, invoking connectors, and component wrappers around tasks are build with Prism Workflow Orchestration - March 17, 2009. 18 National Aeronautics and Space Administration Jet Propulsion Laboratory Performance Studies • Overhead induced in computation time and memory connectors [Woollard, et al. 09]. • Impact of architectural deployment on computation [Woollard, et al. 09]. - Modified an existing time series workflow used at JPL - Deployed the system using OpenDAP and grid technology to co-locate data and computation Reduced typical analysis from 9+ hours to under 2 minutes Workflow Orchestration - March 17, 2009. 19 National Aeronautics and Space Administration Jet Propulsion Laboratory Deployment & Optimization • In the future, we plan to utilize advanced architectural modeling and deployment analysis to guide software engineers in deployment strategy Workflow Orchestration - March 17, 2009. 20 National Aeronautics and Space Administration Jet Propulsion Laboratory Conclusion • Computational science requires validation • Existing grid and workflow technologies are promising, but lack support for scaling and replication across heterogeneous Grid environments • A model-driven approach allows scientists to manipulate workflow specifications, while software engineers can effect the transformed software architectures Workflow Orchestration - March 17, 2009. 21 National Aeronautics and Space Administration Jet Propulsion Laboratory Thank You [Yu & Buyya 05] Yu, J. and Buyya, R. A Taxonomy of Workflow Management Systems for Grid Computing. Journal of Grid Computing 3(3-4): pp. 171-200. 2005. [Osterweil 08] Osterweil, L., et. al. Experience in using a process language to define scientific workflow and generate dataset provenance. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, Atlanta, Georgia. 2008. [Gil, et. al. 07] Gil, Y., et. al. Examining the Challenges of Scientific Workflows. IEEE Computer 40(12): pp. 24-32. 2007. [Lau, et. al. 06] Lau, K., et. al. A Software Component Model and its Preliminary Formalisation. In F.S. de Boer et al., editors, Proceedings of Fourth International Symposium on Formal Methods for Components and Objects, Lecture Notes in Computer Science 4111(1-21). 2006. [Woollard 08] Woollard, D. Supporting the Engineering Aspects of e-Science Through Workflow Services. Proceedings of the First Brazilian e-Science Workshop, Campinas, Brazil, 2008. [Woollard, et. al. 089 Woollard, D. et. al. Injecting Software Architectural Constraints into Legacy Scientific Applications. To appear in Proceedings of the ICSE 2009 Workshop on Software Engineering for Computational Science and Engineering. Vancouver, Canada, 2009. Workflow Orchestration - March 17, 2009. 22