Workshop on the Future of Scientific Workflows Break Out #2: Workflow System Design Moderators Chris Carothers (RPI), Doug Thain (ND) Overall Questions • What is needed at the extreme scales? • How will extreme scales affect how we: 1. Design workflow systems (our focus!) 2. Use workflow systems 3. Validate the results of workflow systems • What are the key challenges? • Are there challenges unique to IS & DA • Are there challenges common to both Our Charge/Questions • What are the different high-level factors and design decisions that need to be considered and made? • Different coupling models, the role of storage and buffering, supported data models and their definition, and adaptivity of the workflow are addressed. Challenges Today – Wide-area WMS submitting to machines designed for interactive logins run into some barriers. (data movement too) – Security is part of the issue – solution designed today breaks when policy changes. Is this a technical issue or a policy issue? – Technology: WMS needs some facility for managing the time-limited nature of credentials, need human-in-the-loop, exception handling, etc. – Reasoning about robustness – cost of retry, diagnosing security failures, etc. • Need for transparency across the levels of system – Storage management: to handle intermediate storage, must be able to allocate storage with a time limit, and deal with allocation failures. Ilkay: must double expected storage use in order to succeed on XSEDE. • Can’t reserve / effectively allocate memory/storage hierarchy - oversubscription – Example use cases: Analysis code pulling remote data over the network at runtime. Need to “park” data temporarily between stages of application, perhaps on an external system. Applications are a mixture of supercomputer + database + low end applications. Future Architectures • Assumptions about the ecosystem: – Machines: Fat nodes, heterogeneous, storage on nodes, deep memory hierarchies, … – Outside the machine: SDNs, clouds, reserved experimental facilities, … • Problems on todays machines are magnified on future systems – – – – – • Planning, provisioning, and scheduling. – – – – • Experience: One big meta-scheduler not effective Alternative: separate provisioning from scheduling. Each resource scheduler needs autonomy, but also expose sufficient transparency and control. Problem: Can one component effectively mix provisioning, planning, and scheduling, or do we separate? Performance, predictability, and such – – – • Metrics of success are moving away from FLOPS which effects how the WMS does its job Coordination of policies across facilities: allocation, security, API, etc. Need for uniform representation of workflows. Supporting workflow composition. Multiplicity of WMS implementations makes it difficult to share solutions to these problems. Want predictable performance of workflow operations Reliable resource allocation or adaptability so that delays do not cascade. Need benchmarks, models, mini-apps, simulations, to evaluate systems and implementations. Reproducibility portability, and integrity. – – – – WMS is in a good place to track provenance and reproducibility. But, need transparency from the other components to pull out relevant data. And, the # of components and pace makes storing the provenance data itself a challenge to be managed. Frank: Is the end user willing to pay the price for that benefit? Data Management • What we expect to see. – Global filesystem will be slow and unpredictable. – Total I/O capacity is limited. – Competition for intermediate storage between apps. • What does this mean? – Need to provision storage and network bandwidth as a first-class concern, coupled with the allocation for FLOPS, IO, etc… – Need advanced mechanisms (e.g., data staging) within the machine for data sharing between running apps, not using the global filesystem. Involves naming, rendezvous, garbage collection… – To deal with unexpected events at runtime, we either need to overprovision or have the ability to re-provision at runtime. Example: If consumer of data is slow to start-up, then need to allocate more storage OR pause the producer. – Dantong: memory hierarchy will result in many different sharing mechanism, need some visibility to start the right ones.