Portals and Grid Stefan Rennick Egglestone Mixed Reality Laboratory

advertisement
Portals and myGrid
Stefan Rennick Egglestone
Mixed Reality Laboratory
University of Nottingham
Introduction to myGrid
• a computer science pilot project working in
the field of bioinformatics
• a consortium of the European
Bioinformatics Institute, IT Innovations, 5
universities and some industrial partners
• ends June 2005 and other projects will
develope infrastructure further
Presentation aims
•
•
•
•
Introduce myGrid
Introduce bioinformatics
Introduce portal work in myGrid
Show some screenshots of portlets
Introduction to bioinformatics
• how to store, process and publish large
volumes of biological data
• large databases, access and analysis
services
• composite processes involve multiple
databases and services
• Automation through workflows
Data in bioinformatics
• Commonly genetic sequences
– DNA: GCGCATAGCGATGA
– Protein: MAHPLGPHGVANA
• Meta information
– Species, chromosome
– Interesting features
– Equipment used
– First published paper referring to sequence
Data storage
• 3 international databases aim to store all
DNA sequences (EMBL, GenBank, DDBJ)
• Protein sequences in SwissProt
• Journals require submission before
publication
• Smaller databases hold specialist
information
Using bioinformatics data
• Database access services
– Fetch sequence for given ID
– Fetch similar sequences
• Sequence analysis
– Look for interesting regions of sequence
• Sequence prediction
– Predict proteins generated by DNA sequence
Service interface types
•
•
•
•
Web-page
Command-line tool set
Programming language library client
SOAP web-service with WSDL interface
Using services
• Often need to combine services with
different interface types
• Cut-and-paste from web-page to file and
run command-line tool
• Repetitive and time-consuming
• Can be automated using scripts
Workflows
myGrid
workflow technology
• Freefluo workflow enactor
• Taverna – graphical workbench allowing
users to
– Author workflows
– Enact and browse results
•
myGrid
Information Repository
Authoring a workflow
Enacting a workflow
Browsing results
Including services in workflows
• Service invocation done by processor
• Generic processor for SOAP/WSDL webservices
• Custom processor can wrap custom client
• SOAPlab exposes command-line tools as
web-service
Portal in myGrid
• Taverna/Freefluo is production workflow
system, so interface can’t be hacked
around with
• Some interface limitiations
– Difficult to start new workflow running using
results of enactment
– Complex interface, so takes time to master
Text services work
• If enactment of a workflow produces a
SwissProt protein sequence record, can
extract from this PubMed ID of first paper
referring to this protein
• Add extra workflow stages which look up
related papers
• Might like to re-run these stages as a
separate workflow on any new papers
found
Input form
Monitoring progress
Results
MIR portal work
• Taverna/Freefluo/MIR interface caters for
expert user
• Large numbers of users who won’t write
workflows but might enact them
• Provide a simpler workflow enactment
interface
• Portal useful – all biologists have browser
on their desk
Collections of workflows
View workflow
View workflow results
View individual output param
Further details
• www.mygrid.org.uk
• Twiki.mygrid.org.uk
• Stefan Rennick Egglestone
(sre@cs.nott.ac.uk0
• Ian Roberts (i.roberts@dcs.sheff.ac.uk)
• Presentation and notes will be at
www.mrl.nott.ac.uk/~sre
Download