NeSC Review 2003
NeSC
2003-09-30
ODD-Genes Background
Science enabled by ODD-Genes
Automating routine statistical conditioning of highly variable microarray results.
Discovering related data sources
Querying discovered data sources for relevant data
Identifying significant targets for focussed investigation
Caveats & further work
ODD-Genes is a demonstrator
Demonstrates how Grid technologies enable e-Science, accelerating scientific discovery
SunDCG’s TOG software allows for job submission on remote compute resources
OGSA-DAI provides access, control and discovery of data resources
ODD-Genes used to investigate Wilms Tumour
Routine statistical conditioning of microarray results
Data-driven discovery of novel targets for investigation and potential therapy
Collaborative project
NeSC/EPCC, Edinburgh, UK
Scottish Centre for Genomic Technology and Informatics, Edinburgh,
UK (GTI)
Human Genetics Unit at MRC, Western General Hospital, Edinburgh,
UK (HGU)
Choose analysis to perform
Automates analysis process
Provides predetermined workflow
Can run more than one analysis at a time
Multiple reproducible avenues for investigation
Reduces cost (human, machine), increases availability
TOG enables this by allowing access to HPC resources
Results of conditioning can be analysed and investigated
Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process)
Researcher can reproduce this initial condition for repeated analyses
Researcher need not perform each step manually and serially, or ask dedicated statistician to do so.
Multiple views of data
Raw
Heat Map
Cluster Map
Wilms Tumour study takes a new direction two genes appear significant in early development
Researchers would like more info on these genes…
OGSA-DAI uses keywords to locate relevant data resources
May return data resources previously unknown to researcher
Researcher selects most interesting data resource to query for information about gene
Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic development
Contrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions
OGSA-DAI returns data from query
Data and annotation displayed
Data contains references to related images
Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression
These show that the genes are stem cell markers
Targets for focussed investigation, potential therapy
ODD-Genes is a demonstrator
Need to develop production applications for both routine statistical processing and data resource discovery and query
Need to parameterise routine conditioning appropriately to complete automation
ODD-Genes requires GRID infrastructure
Participating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves)
However, alternatives often proprietary, expensive, less flexible
ODD-Genes requires registration by data-hosts
Critical mass of registered data sources.