myGrid, Taverna and e-Science A Users Perspective By Paul Fisher fisherp@cs.man.ac.uk http://www.cs.man.ac.uk/~fisherp/ The Research Challenge • Aim: Use ‘Middle out’ approach with workflows Genotype (Gene) • Use microarray data from case studies Phenotype • Gather genotype and phenotype (Hair colour) data separately • Map genotype to phenotype Microarray Genotype Phenotype The Research Challenge Simple workflow to search over a database for homologous sequences to our query sequence. The output file is then compared to one previously collected to search for new hits in the database. Data input Web service or processor Data output All workflows are implemented in the Taverna workbench Expectations of e-Research 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgt 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa Williams-Beuren syndrome case study Expectations of e-Research How did I become aware of e-Science – MSc. Bioinformatics – created new web service to aid with BLAST comparisons - An e-Science project funded by EPSRC, PhD to solve real biology problems using Grid and workflow technology Level of involvement – Two tiers: Bioinformaticians building the workflows through Taverna, and Biologists using the workflows through a (yet to be built) portal. Functional & Non-Functional Requirements - Concrete use case scenarios from biologists - Need a means of capturing the process of generic work and automating it - Reliable/stable web services - Appropriate visualisation tools (user interface capable of displaying complex information) - Freely available datasets and appropriate access to databases Usability Requirements – Taverna already platform independent (written in Java) - Time scales and data handling when running workflows need to be significantly less than when done manually - Limiting errors is accounted for by using automated methods – End users should not see underlying workflows just data inputs and outputs – Interaction to look at workflow provenance and include interaction steps from Taverna into the workflow itself Lessons Learnt • No portal implemented yet (Just user interface to Taverna) - Automates generic, time consuming, error prone processes - Reduces errors in the manual “workflow” system - Large team of developers currently needed to provide a sustainable, usable means of workflow implementation - Limited visualisation at present, not suitable for bench biologist • Alternatives to a portal system depends on the builder/user architecture - User requires a environment for inputting and exporting results, no need to see services or underlying mechanisms - Workflow builder requires a more complex environment to view services, related workflows, metadata, and fully test the system Future Plans • Phased development from Developer to Super User to User • Integrate reliable web services into default component list • Provide means of large scale data upload • Improve its scalability and memory management • Implement a data cache for repeated experiments and accompany it with an interface (portal) • Increase the number of workflow components to perform even more biologically related analyses Thank you