myGrid, Taverna and e-Science A Users Perspective By Paul Fisher

advertisement
myGrid, Taverna and
e-Science
A Users Perspective
By Paul Fisher
fisherp@cs.man.ac.uk
http://www.cs.man.ac.uk/~fisherp/
The Research Challenge
• Aim: Use ‘Middle out’ approach with workflows
Genotype
(Gene)
• Use microarray data from case
studies
Phenotype
• Gather genotype and phenotype
(Hair colour)
data separately
• Map genotype to phenotype
Microarray
Genotype
Phenotype
The Research Challenge
Simple workflow to search
over a database for
homologous sequences to
our query sequence.
The output file is then
compared to one previously
collected to search for new
hits in the database.
Data input
Web service
or processor
Data output
All workflows are implemented in the Taverna
workbench
Expectations of e-Research
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt
ctcaccaaat ttggtgttgt
12241 cagtctttta aattttaacc
tttagagaag agtcatacag tcaatagcct tttttagctt
12301
gaccatccta atagatacac agtggtgtct cactgtgatt
ttaatttgca ttttcctgt
12361 gactaattat gttgagcttg
ttaccattta gacaacttca ttagagaagt gtctaatatt
12421
taggtgactt gcctgttttt ttttaattgg gatcttaatt
tttttaaatt attgatttgt
12481 aggagctatt tatatattct
ggatacaagt tctttatcag atacacagtt tgtgactatt
12541
ttcttataag tctgtggttt ttatattaat gtttttattg
atgactgttt tttacaattg
12601 tggttaagta tacatgacat
aaaacggatt atcttaacca ttttaaaatg taaaattcga
12661
tggcattaag tacatccaca atattgtgca actatcacca
ctatcatact ccaaaagggc
12721 atccaatacc cattaagctg
tcactcccca atctcccatt ttcccacccc tgacaatcaa
12781
taacccattt tctgtctcta tggatttgcc tgttctggat
attcatatta atagaatcaa
Williams-Beuren syndrome case study
Expectations of e-Research
How did I become aware of e-Science
– MSc. Bioinformatics – created new web service to aid with BLAST
comparisons
- An e-Science project funded by EPSRC, PhD to solve real biology problems
using Grid and workflow technology
Level of involvement
– Two tiers: Bioinformaticians building the workflows through Taverna, and
Biologists using the workflows through a (yet to be built) portal.
Functional & Non-Functional
Requirements
- Concrete use case scenarios from biologists
- Need a means of capturing the process of generic work and
automating it
- Reliable/stable web services
- Appropriate visualisation tools (user interface capable of
displaying complex information)
- Freely available datasets and appropriate access to
databases
Usability Requirements
– Taverna already platform independent (written in Java)
- Time scales and data handling when running workflows
need to be significantly less than when done manually
- Limiting errors is accounted for by using automated
methods
– End users should not see underlying workflows just
data inputs and outputs
– Interaction to look at workflow provenance and include
interaction steps from Taverna into the workflow itself
Lessons Learnt
• No portal implemented yet (Just user interface to Taverna)
-
Automates generic, time consuming, error prone processes
-
Reduces errors in the manual “workflow” system
-
Large team of developers currently needed to provide a sustainable,
usable means of workflow implementation
-
Limited visualisation at present, not suitable for bench biologist
• Alternatives to a portal system depends on the builder/user
architecture
-
User requires a environment for inputting and exporting
results, no need to see services or underlying mechanisms
-
Workflow builder requires a more complex environment to view services,
related workflows, metadata, and fully test the system
Future Plans
• Phased development from Developer to Super User to User
• Integrate reliable web services into default component list
• Provide means of large scale data upload
• Improve its scalability and memory management
• Implement a data cache for repeated experiments and accompany it
with an interface (portal)
• Increase the number of workflow components to perform even more
biologically related analyses
Thank you
Download