Workshop on the Future of Scientific Workflows Break Out #2

Workshop on the Future of Scientific
Workflows Break Out #2:
Workflow System Design
Chris Carothers (RPI), Doug Thain (ND)
Overall Questions
• What is needed at the extreme scales?
• How will extreme scales affect how we:
1. Design workflow systems (our focus!)
2. Use workflow systems
3. Validate the results of workflow systems
• What are the key challenges?
• Are there challenges unique to IS & DA
• Are there challenges common to both
Our Charge/Questions
• What are the different high-level factors and
design decisions that need to be considered
and made?
• Different coupling models, the role of storage
and buffering, supported data models and
their definition, and adaptivity of the
workflow are addressed.
Challenges Today
– Wide-area WMS submitting to machines designed for interactive logins run
into some barriers. (data movement too)
– Security is part of the issue – solution designed today breaks when policy
changes. Is this a technical issue or a policy issue?
– Technology: WMS needs some facility for managing the time-limited nature
of credentials, need human-in-the-loop, exception handling, etc.
– Reasoning about robustness – cost of retry, diagnosing security failures, etc.
• Need for transparency across the levels of system
– Storage management: to handle intermediate storage, must be able to
allocate storage with a time limit, and deal with allocation failures. Ilkay:
must double expected storage use in order to succeed on XSEDE.
• Can’t reserve / effectively allocate memory/storage hierarchy - oversubscription
– Example use cases: Analysis code pulling remote data over the network at
runtime. Need to “park” data temporarily between stages of application,
perhaps on an external system. Applications are a mixture of
supercomputer + database + low end applications.
Future Architectures
Assumptions about the ecosystem:
– Machines: Fat nodes, heterogeneous, storage on nodes, deep memory hierarchies, …
– Outside the machine: SDNs, clouds, reserved experimental facilities, …
Problems on todays machines are magnified on future systems
Planning, provisioning, and scheduling.
Experience: One big meta-scheduler not effective
Alternative: separate provisioning from scheduling.
Each resource scheduler needs autonomy, but also expose sufficient transparency and control.
Problem: Can one component effectively mix provisioning, planning, and scheduling, or do we separate?
Performance, predictability, and such
Metrics of success are moving away from FLOPS which effects how the WMS does its job
Coordination of policies across facilities: allocation, security, API, etc.
Need for uniform representation of workflows.
Supporting workflow composition.
Multiplicity of WMS implementations makes it difficult to share solutions to these problems.
Want predictable performance of workflow operations
Reliable resource allocation or adaptability so that delays do not cascade.
Need benchmarks, models, mini-apps, simulations, to evaluate systems and implementations.
Reproducibility portability, and integrity.
WMS is in a good place to track provenance and reproducibility.
But, need transparency from the other components to pull out relevant data.
And, the # of components and pace makes storing the provenance data itself a challenge to be managed.
Frank: Is the end user willing to pay the price for that benefit?
Data Management
• What we expect to see.
– Global filesystem will be slow and unpredictable.
– Total I/O capacity is limited.
– Competition for intermediate storage between apps.
• What does this mean?
– Need to provision storage and network bandwidth as a first-class concern, coupled
with the allocation for FLOPS, IO, etc…
– Need advanced mechanisms (e.g., data staging) within the machine for data
sharing between running apps, not using the global filesystem. Involves naming,
rendezvous, garbage collection…
– To deal with unexpected events at runtime, we either need to overprovision or
have the ability to re-provision at runtime. Example: If consumer of data is slow to
start-up, then need to allocate more storage OR pause the producer.
– Dantong: memory hierarchy will result in many different sharing mechanism, need
some visibility to start the right ones.