Professor Carole Goble http://www.mygrid.org.uk
Contact mygrid@cs.man.ac.uk
Personalised extensible environments for data-intensive
in silico experiments in biology
• Biology is a multi-faceted & increasingly multi-disciplinary science.
• Bioinformatics is an “e-Science”.
– Discovery is done in silico on results obtained from experiments using a number of analysis
& data resources.
• Molecular biology & genomics are our particular focus.
• Has anyone studied the effect of neurotransmitters on the circadian rhythms in
Drosophila?
• How do the functions of the clusters of proteins from my experiment interrelate?
What are the proteins with a particular function?
• Is a structure known for this protein and what other proteins have a similar structure?
• Can I build a homology 3D model?
• What is known about the homologous protein?
• Large amounts of data
& many applications.
• Highly heterogeneous.
– Different types, algorithms, forms, implementations, communities, service providers
• Highly complex and inter-related.
• Highly volatile.
• Obstacles Everywhere
1. Has anyone else studied the effect of neurotransmitters on the circadian rhythms in Drosophila?
2. How do the functions of the clusters of proteins from my experiment interrelate? And what are the proteins with a particular function?
3. Is a structure known for this protein and what other proteins have a similar structure?
4. Can I build a homology 3D model?
5. What is known about the homologous protein?
4
2
3
1
5
Who else has asked this question & can
I use/adapt their approach?
– Workflow.
What were the results at each stage?
– Dynamic Data Repositories.
When was P12345 last updated?
Which BLAST did I use?
– Provenance.
Has PDB changed since I last ran this?
– Notification.
Personalisation.
2
1
• Straightforward discovery, interoperation, fusion, sharing of data, knowledge and workflows.
• Explicit management of workflows.
– information & processes & best practice.
• Improving quality of experiments & data.
– provenance & propagating change.
• Scientific discovery is personal & global.
– personalisation & collaborative working.
• Security, ownership -> valuable assets.
– Users, developers, maintainers.
– Biologists.
– Bioinformaticians, resource providers.
– Tool builders, system administrators. myGrid users biologists
IS specialists infrequent problem tool builders specific bioinformaticians systems administrators service provider bioinformatics tool builders
1. e-Scientists
– Environment built on toolkits for service access, personalisation & community.
– Gene function expression analysis (fly & yeast).
– Annotation workbench for the PRINTS pattern database.
2. Developers
– Protocols and service descriptions.
– my Grid-in-a-Box developers kit of core services.
– Reference implementation services & applications.
– Bio services – already delivered.
Applications
Client
Framework
Semantic
Services
Admin
Info. Extraction
Portal User Agent Collaboration
Data Workflow Ontology
Metadata
Services
Personalisation
Coordination Services
Provenance Directory
Governance Workflow Data Directory
Networked Services
Portal
Personal
Repository
Workflow
Repository
Workflow
Enactment
Metadata:
Ontology
Metadata:
Service
Directory
Repository
Client
Personal
Repository
Workflow
Repository
Ontology
Client
Meta Data:
Ontology
Meta Data:
Service Type
Directory
How do the functions of the clusters of proteins from my experiment interrelate?
Repository
Client
Personal
Repository
Workflow
Repository
Ontology
Client
Meta Data:
Ontology
Meta Data:
Service Type
Directory
Repository
Client
Personal
Repository
Workflow
Repository
Ontology
Client
Meta Data:
Ontology
Meta Data:
Service Type
Directory
Repository
Client
Personal
Repository
Workflow
Repository
Ontology
Client
Meta Data:
Ontology
Meta Data:
Service Type
Directory
4
Personal
Repository
Provenance
Data
Repos.
Client
3
Workflow
Client
1
Workflow
Enactment
Service Selection
Client
2?
2?
2
Bioinformatic Services
Service
Directory
4
Personal
Repository
Repos.
Client
3
Provenance
Data
Workflow
Client
1
Workflow
Enactment
Service Selection
Client
2?
2?
2
Bioinformatic Services
Service
Directory
1. Ontologies, Protocols & APIs.
2. Database access from the Grid.
Reference implementation for UK DBTF.
3. Process enactment on the Grid.
4. Provenance services.
5. Metadata services.
– From Semantic Web: DAML+OIL, RDF(S).
6. Personalisation services.
7. Reference implementation of OGSA.
Globus, Sun Grid
Engine, Condor, DS
(Jini, Corba)
Grid Computing
An early adopter for OGSA
Agents
ACL, methodology
Web
Technologies
SOAP, WSDL, UDDI, WSFL
DAML+OIL, OWL, RDF(S)
• Carole Goble
• Norman Paton
• Brian Warboys
• Stephen Pettifer
• Luc Moreau
• Dave De Roure
• Chris Greenhalgh
• Tom Rodden
• John Brooke
• Paul Watson
• Alan Robinson
• Rob Gaizauskas
• Robert Stevens
• Ian Horrocks
• Neil Wipat
• Matthew Addis
• Nick Sharman
• Rich Cawley
• Simon Harper
• Karon Mee
• Simon Miles
• Vijay Dailani
• Xiaojian Liu
• Tom Oinn
• Martin Senger
• Milena Radenkovic
• Kevin Glover
• Angus Roberts
• Chris Wroe
• Mark Greenwood
• Phil Lord
• Neil Davis
• Darren Marvin
• Justin Ferris
• Peter Li
• Nedim Alpdemir
• Luca Toldo
• Robin McEntire
• Anne Westcott
• Tony Storey
• Bernard Horan
• Paul Smart
• Robert Haynes
m
• myGrid aims to develop infrastructure middleware for an e-Biologist’s workbench.
• The setting is bioinformatics but the results are intended to be generally applicable to e-Science.
• A mix of standard, vanguard and bleeding edge technologies, advanced development and (some) research.
• Academic & commercial partnership.
• myGrid project is timely & reflects a community desire to “ collaborate, or die ”.
Professor Carole Goble http://www.mygrid.org.uk
Contact mygrid@cs.man.ac.uk