Taverna Workbench Stuart Owen University of Mancester, UK

advertisement
Taverna Workbench
Stuart Owen
University of Mancester, UK
stuart.owen@manchester.ac.uk
What is a workflow
•
Data workflows
–
–
–
–
•
A task is invoked once its expected
data has been received, and when
complete passes any resulting data
downstream.
B starts when it receives data from A.
C and D run in parallel when they
receive data from B
E starts once its received data from
both C and D.
A
B
C
D
Control workflows
–
–
–
–
A task is invoked once its dependant
tasks have completed.
B starts when A has completed.
C and D run in parallel once B has
completed
E starts once both C and D have
completed.
E
F
Advantages of workflows
12181
12241
12301
12361
12421
12481
12541
12601
12661
12721
12781
acatttctac
cagtctttta
gaccatccta
gactaattat
taggtgactt
aggagctatt
ttcttataag
tggttaagta
tggcattaag
atccaatacc
taacccattt
caacagtgga
aattttaacc
atagatacac
gttgagcttg
gcctgttttt
tatatattct
tctgtggttt
tacatgacat
tacatccaca
cattaagctg
tctgtctcta
tgaggttgtt
tttagagaag
agtggtgtct
ttaccattta
ttttaattgg
ggatacaagt
ttatattaat
aaaacggatt
atattgtgca
tcactcccca
tggatttgcc
ggtctatgtt
agtcatacag
cactgtgatt
gacaacttca
gatcttaatt
tctttatcag
gtttttattg
atcttaacca
actatcacca
atctcccatt
tgttctggat
ctcaccaaat
tcaatagcct
ttaatttgca
ttagagaagt
tttttaaatt
atacacagtt
atgactgttt
ttttaaaatg
ctatcatact
ttcccacccc
attcatatta
ttggtgttgt
tttttagctt
ttttcctgct
gtctaatatt
attgatttgt
tgtgactatt
tttacaattg
taaaattcga
ccaaaagggc
tgacaatcaa
atagaatcaa
Advantages to workflows
•
High-level abstraction
– Easier to understand and modify.
– Easier to describe and discuss
with others.
– Describes what you want to do,
not how to do it.
•
Automation
•
Sytematic
•
Sharing and re-use
– Either on its own, or within other
workflows!
Workflows within Taverna
• Predominantly based around the flow of data, but does allow control
constraints as well.
• Service oriented workflows. Services may or not be grid enabled.
• High-level GUI approach seperated from lower level coding, you
don’t have to be a coder to build a workflow.
• Enactment can take place separate to the GUI, allowing workflows
to be executed from the command line or within other systems.
Taverna 1.4 Workbench
• Integral part of the myGrid project
• Java based, runs on Windows, Mac OS, Linux, Solaris
• Open source and user driven development
• Taverna in OMII-UK
– Dedicated team of developers focused on design,
implementation, testing and support – leading to production
quality software.
– Development of Taverna 2.0
Taverna 1.4 workbench
SCUFL
Taverna
Workbench
(Simple Conceptual Unified Flow Language)
Application data flow layer
Scufl + Workflow
Object Model
Scufl graph + service introspection
Execution flow layer
Workflow Execution
List management; implicit iteration mechanism;
MIME & semantic type decoration; fault
management; service alternates
Freefluo Workflow enactor
Processor
Processor
Processor
Processor
Processor
Processor
?
Web
Service
Soap
lab
Bio
MOBY
Local
App
Enactor
Processor
invocation layer
Nested workflows
• A processor can be a workflow
itself.
• Encourages the reuse of
workflows within a more
complex scenario.
• Greater abstraction of an
overall process making it more
manageable.
Iterations
• Scufl handles iterations implicitly
• i.e. Taverna handles it automagically, theres no need for the user to
indicate that there is an iteration required.
• Taverna recognises the data mismatch and repeatedly runs the task
over each data element in the list.
• Iteration stategy with multiple inputs can be configured.
•“Cross product” - all against all
•“Dot product” – first against first,
second against second ….. etc
What about when a service fails?
•
•
•
•
•
•
•
•
•
•
Most services are owned by other people
No control over service failure
Some are research level
Workflows are only as good as the services they
connect!
To help - Taverna can:
Notify failures
Instigate retries
Set criticality
Substitute alternative
services
Provenance Data?
•
Supports scientific method and best practice
•
Metadata about the origin of a resource (workflow , service, data , experiment
hypothesis etc) and the process of how a resource was generated.
•
The Who? , What? , When? ,Where? and Why? about resources.
•
Stored as RDF triples
•
Also available as OWL, opening it up to complex reasoning
Provenance
Record
Input
Result
Result
Result
Result
Result
Typed Workflow Run
executed
WorkflowRun
runs
Workflow
Provenance Ontology
launchedBy
ProcessRun
Experimenter
urn:lsid:…:workflow:6
Organization
belongsTo
urn:lsid:…:org:HY7
runs
urn:lsid:..:wfInstance:8
launchedBy
urn:lsid:…:person:4
executed
executed
belongsTo
urn:lsid:…:processRun:84
urn:lsid:…:processRun:51
Provenance Browser
New plans for Taverna 2.0
Evolving challenges
• Long running data intensive workflows
• Manipulation of confidential or otherwise protected
information
• Use with classical grid systems
• Publishing and sharing of workflows
• Better use of provenance
Runtime Service Binding
• Service definition consists of an abstract description
• Resolved at workflow runtime to one or more concrete
resources by a broker
• Allows load balancing or economic model based service
selection over grid environments
Processor Dispatch Stack
3rd party data transfers
• Allows ‘in place’ referencing of data
– Large data sets no longer round-trip between workflow engine and
data provider
– Allows restricted access to sensitive data
• Automatic de-reference when a reference type is linked to
a value type within a workflow.
Streaming Data
• Allow execution of downstream workflow stages on
partially complete results from upstream.
Service 1
Service 2
Service 3
Non streaming (Taverna 1), entire iteration must complete at each stage
Streamed data, Service 2 starts
operating on partial results from
Service 1
Conclusions
• Taverna and its source code is free to download.
– http://taverna.sourceforge.net
• Taverna is being adopted by a number of different disciplines
outside its bio-science origins, including chemoinformatics, social
science, astronomy.
• Open architecture and support for plugins to cope with open world –
allows expansion into other areas
• User driven development
– Taverna users mailing list
– Taverna hackers mailing list
• Production quality software within OMII-UK
Acknowledgements
• The myGrid group, past and present.
• OMII-UK
• All our users
•
•
•
•
•
•
Carole Goble
Katy Wolstencroft
Daniele Turi
Matthew Gamble
Tom Oinn
Paul Fisher
Download