Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk What is a workflow • Data workflows – – – – • A task is invoked once its expected data has been received, and when complete passes any resulting data downstream. B starts when it receives data from A. C and D run in parallel when they receive data from B E starts once its received data from both C and D. A B C D Control workflows – – – – A task is invoked once its dependant tasks have completed. B starts when A has completed. C and D run in parallel once B has completed E starts once both C and D have completed. E F Advantages of workflows 12181 12241 12301 12361 12421 12481 12541 12601 12661 12721 12781 acatttctac cagtctttta gaccatccta gactaattat taggtgactt aggagctatt ttcttataag tggttaagta tggcattaag atccaatacc taacccattt caacagtgga aattttaacc atagatacac gttgagcttg gcctgttttt tatatattct tctgtggttt tacatgacat tacatccaca cattaagctg tctgtctcta tgaggttgtt tttagagaag agtggtgtct ttaccattta ttttaattgg ggatacaagt ttatattaat aaaacggatt atattgtgca tcactcccca tggatttgcc ggtctatgtt agtcatacag cactgtgatt gacaacttca gatcttaatt tctttatcag gtttttattg atcttaacca actatcacca atctcccatt tgttctggat ctcaccaaat tcaatagcct ttaatttgca ttagagaagt tttttaaatt atacacagtt atgactgttt ttttaaaatg ctatcatact ttcccacccc attcatatta ttggtgttgt tttttagctt ttttcctgct gtctaatatt attgatttgt tgtgactatt tttacaattg taaaattcga ccaaaagggc tgacaatcaa atagaatcaa Advantages to workflows • High-level abstraction – Easier to understand and modify. – Easier to describe and discuss with others. – Describes what you want to do, not how to do it. • Automation • Sytematic • Sharing and re-use – Either on its own, or within other workflows! Workflows within Taverna • Predominantly based around the flow of data, but does allow control constraints as well. • Service oriented workflows. Services may or not be grid enabled. • High-level GUI approach seperated from lower level coding, you don’t have to be a coder to build a workflow. • Enactment can take place separate to the GUI, allowing workflows to be executed from the command line or within other systems. Taverna 1.4 Workbench • Integral part of the myGrid project • Java based, runs on Windows, Mac OS, Linux, Solaris • Open source and user driven development • Taverna in OMII-UK – Dedicated team of developers focused on design, implementation, testing and support – leading to production quality software. – Development of Taverna 2.0 Taverna 1.4 workbench SCUFL Taverna Workbench (Simple Conceptual Unified Flow Language) Application data flow layer Scufl + Workflow Object Model Scufl graph + service introspection Execution flow layer Workflow Execution List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates Freefluo Workflow enactor Processor Processor Processor Processor Processor Processor ? Web Service Soap lab Bio MOBY Local App Enactor Processor invocation layer Nested workflows • A processor can be a workflow itself. • Encourages the reuse of workflows within a more complex scenario. • Greater abstraction of an overall process making it more manageable. Iterations • Scufl handles iterations implicitly • i.e. Taverna handles it automagically, theres no need for the user to indicate that there is an iteration required. • Taverna recognises the data mismatch and repeatedly runs the task over each data element in the list. • Iteration stategy with multiple inputs can be configured. •“Cross product” - all against all •“Dot product” – first against first, second against second ….. etc What about when a service fails? • • • • • • • • • • Most services are owned by other people No control over service failure Some are research level Workflows are only as good as the services they connect! To help - Taverna can: Notify failures Instigate retries Set criticality Substitute alternative services Provenance Data? • Supports scientific method and best practice • Metadata about the origin of a resource (workflow , service, data , experiment hypothesis etc) and the process of how a resource was generated. • The Who? , What? , When? ,Where? and Why? about resources. • Stored as RDF triples • Also available as OWL, opening it up to complex reasoning Provenance Record Input Result Result Result Result Result Typed Workflow Run executed WorkflowRun runs Workflow Provenance Ontology launchedBy ProcessRun Experimenter urn:lsid:…:workflow:6 Organization belongsTo urn:lsid:…:org:HY7 runs urn:lsid:..:wfInstance:8 launchedBy urn:lsid:…:person:4 executed executed belongsTo urn:lsid:…:processRun:84 urn:lsid:…:processRun:51 Provenance Browser New plans for Taverna 2.0 Evolving challenges • Long running data intensive workflows • Manipulation of confidential or otherwise protected information • Use with classical grid systems • Publishing and sharing of workflows • Better use of provenance Runtime Service Binding • Service definition consists of an abstract description • Resolved at workflow runtime to one or more concrete resources by a broker • Allows load balancing or economic model based service selection over grid environments Processor Dispatch Stack 3rd party data transfers • Allows ‘in place’ referencing of data – Large data sets no longer round-trip between workflow engine and data provider – Allows restricted access to sensitive data • Automatic de-reference when a reference type is linked to a value type within a workflow. Streaming Data • Allow execution of downstream workflow stages on partially complete results from upstream. Service 1 Service 2 Service 3 Non streaming (Taverna 1), entire iteration must complete at each stage Streamed data, Service 2 starts operating on partial results from Service 1 Conclusions • Taverna and its source code is free to download. – http://taverna.sourceforge.net • Taverna is being adopted by a number of different disciplines outside its bio-science origins, including chemoinformatics, social science, astronomy. • Open architecture and support for plugins to cope with open world – allows expansion into other areas • User driven development – Taverna users mailing list – Taverna hackers mailing list • Production quality software within OMII-UK Acknowledgements • The myGrid group, past and present. • OMII-UK • All our users • • • • • • Carole Goble Katy Wolstencroft Daniele Turi Matthew Gamble Tom Oinn Paul Fisher