Scientific Workflow Requirements Carole Goble, University of Manchester, UK

advertisement
Scientific Workflow
Requirements
Carole Goble, University of Manchester, UK
Bertram Ludaescher, SDSC, USA
Attendees included
Bob Mann
Anthony Mayer
Austin Tate
Bertram Ludäscher
Geoffrey Fox
Jeffrey Grethe
Matthew Shields
Mike Wilde
Simon Cox
Carole Goble
Antoon Goderis
Earl Ecklund
Alan Bundy
Albert Burger
Jessica Chen-Burger
And a bunch more whose names we didn’t get
Scientific Workflow Requirements
characterise scientific workflows,
identify their requirements
compare/contrast with business
workflow requirements.
Some science stakeholders
neuroscience, astronomy, engineering
Few business stakeholders
Goals
Identify
Those requirements which are
fundamental and crucial
Those requirements which are desirable
but optional
Those characteristics that are found in
business workflows but are
inappropriate or unnecessary for
scientific workflows
Result
Inform the selection of appropriate
workflow languages
Suggest the commonalities and
dissimilarities between different
workflows for various problems or
communities
Inform workflow models, lifecycles and
architectures such as workflow creation,
registration, enactment and
termination.
Methodology
Matrixes of requirements against
application workflows
System requirements
Functional requirements
Language requirements
Post-id harvesting
Retrospective rationalisation.
A Scientist Writes
“Work in my problem solving
environment so that I don’t need to
change the way I work.”
User facing
Reflect the modelling paradigm of the scientist.
Varies between experiments, disciplines
Which user would that be then?
Creators, users, auditors, validators (I know if its right if I see it
but I can’t right it)
Biologists compared to bioinformaticians, and transitioning
between
Different users different environments
Appropriate levels of abstraction.
User models -> workflow models
Simple to use & intuitive creation, deployment,
execution and debugging environments
Supporting Scientific Practice
Incrementally exploratory prototypical TYPE A
Got the data, now get the nature paper before the next guy
Large scale production TYPE B
Got the idea, Get the data for every many experiments, and
even many teams, communities blah blah
Migration from TYPE A to TYPE B.
Capture of TYPE A for later non-interactive replay in
a parameterised fashion.
Workflow creation paradigms
by example, plagiarism, drag and drop
Provenance tracking
Cool tools, right tools
I love my VI editor
Diagramming tools, text tools
Works on all workflows, use which you like
when you like.
Good tools! Easy tools! Friendly tools!
For the domain user (which user?) not
the computer scientist ☺
Cat skinning
Multiple scripting language support
Multiple ways to write a workflow
One size does not fit all
Transparency and control
Looking under the hood and inside the box
observe, trace, compare, muse, fettle & fiddle.
What should be transparent?
Do users need to know what format data is in or
just that it is an image?
Unveil at different levels of detail, through the
wedding cakes, stacks
Opaque to some users some of the time,
drillable by others some of the time
Role, authorisation, policy
Scientist knows best
User interaction
Creation, Discovery, Enactment
Single User interaction with workflow execution
Choice between paths of execution in specific states
Parameter modification mid-run
Collaborative multi-user interaction in creation
Reusing workflows -> Modularisation
Reusing wfs with different parameters and datasets
Joining up wfs from different areas, different
disciplines and across scales
E-science crosses disciplines!!
No support for “extreme team wf creation”
Collaborative multi-user interaction in execution?
Legacy and Extensibility
Ingesting legacy and external applications & services
May not run on every platform, may need an
emulator.
Heterogeneity – of types, platforms etc
Include arbitrary services available within the users
domain or hacked up by the users.
Simon’s piece of Matlab hackery – dark matter
services.
On the fly development and assimilation
Suspending the workflow, or prompting the user
For the prototypical exploratory workflows largely.
Massaging, lubrication, facilitating, gluing without programming !
Easy to extend to meet specific or unique requirements
Legacy and Extensibility
Ingesting legacy applications
May not run on every platform, may need an emulator.
Include arbitrary services available within the users domain or
hacked up by the users.
Simon’s piece of Matlab hackery – dark matter services.
On the fly development and assimilation
Suspending the workflow, or prompting the user
For the prototypical exploratory workflows largely.
Discovering, reusing, wrapping external services
Massaging, lubrication, facilitating, gluing
Without programming !
Easy to extend to meet specific or unique requirements
More on workflow sorts
Batch vs interactive
Dataflow vs control flow vs state driven
Incrementally exploratory prototypical
vs large scale production (and migration
from former to latter).
Workflow lifecycles
Prototypical workflow development to
production run
Different parts of the lifecycle might
need different environments and
policies
Different sorts of users will interact at
different points in the lifecycle.
Security, trust and validation
Guarantees
That a provisioned service is what it says it is and
follows all notification mandates.
Models of soundness at different level, well
behavedness
500 lambs follow 10-15 shepherds (or wolves?)
Validate at the right time not every time.
Confidence in someone else’s stuff
I can look at it to check it but I can’t write it.
Business vs Scientific
Its all the same and its all different
Use cases and scenarios needed.
Classify business and scientific workflow against
Matthew’s Stack
Drivers
Science workflow driven by scientific questions, outcomes
and vanity.
Business workflow driven by business processes & goals
and $£€
Granularity
Business languages for coarse grain of swf
Scientists hack at fine grain level
Business vs Scientific
Individualism vs Corporations
Ratios -- more creators than users in
science?
What is the Scientific Business Process?
A techy writes
Formal underpinning in CS theory
What is the underlying formal theoretic
model? What is the natural scripting
language?
Dataflow is function & parallel
Control flow is imperative & sequential?
SWF creation as programming.
What are the languages?
Next Steps
Write this up!
Harvest some business use cases from
Forrester report style sources (and get
Tony Hey to pay)
Collect scientific workflow examples
Develop matrixes of system, functional
and language requirements against
these examples.
Er … that’s it!
Fin
Download