ODD-Genes: Accelerating data-driven scientific discovery NeSC Review 2003

advertisement

ODD-Genes:

Accelerating data-driven scientific discovery

NeSC Review 2003

NeSC

2003-09-30

Introduction

ODD-Genes Background

Science enabled by ODD-Genes

Automating routine statistical conditioning of highly variable microarray results.

Discovering related data sources

Querying discovered data sources for relevant data

Identifying significant targets for focussed investigation

Caveats & further work

ODD-Genes Background

ODD-Genes is a demonstrator

Demonstrates how Grid technologies enable e-Science, accelerating scientific discovery

SunDCG’s TOG software allows for job submission on remote compute resources

OGSA-DAI provides access, control and discovery of data resources

ODD-Genes used to investigate Wilms Tumour

Routine statistical conditioning of microarray results

Data-driven discovery of novel targets for investigation and potential therapy

Collaborative project

NeSC/EPCC, Edinburgh, UK

Scottish Centre for Genomic Technology and Informatics, Edinburgh,

UK (GTI)

Human Genetics Unit at MRC, Western General Hospital, Edinburgh,

UK (HGU)

SunDCG – Enabling Routine

Statistical Conditioning

Choose analysis to perform

Automates analysis process

Provides predetermined workflow

Can run more than one analysis at a time

Multiple reproducible avenues for investigation

Reduces cost (human, machine), increases availability

TOG enables this by allowing access to HPC resources

SunDCG - Conditioning Results

Results of conditioning can be analysed and investigated

Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process)

Researcher can reproduce this initial condition for repeated analyses

Researcher need not perform each step manually and serially, or ask dedicated statistician to do so.

OGSA-DAI - Results Investigation

Multiple views of data

Raw

Heat Map

Cluster Map

Wilms Tumour study takes a new direction two genes appear significant in early development

Researchers would like more info on these genes…

OGSA-DAI - Data Resource

Discovery

OGSA-DAI uses keywords to locate relevant data resources

May return data resources previously unknown to researcher

Researcher selects most interesting data resource to query for information about gene

Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic development

Contrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions

OGSA-DAI - Data Resource Query

OGSA-DAI returns data from query

Data and annotation displayed

Data contains references to related images

Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression

These show that the genes are stem cell markers

Targets for focussed investigation, potential therapy

ODD-Genes Caveats & Further

Work

ODD-Genes is a demonstrator

Need to develop production applications for both routine statistical processing and data resource discovery and query

Need to parameterise routine conditioning appropriately to complete automation

ODD-Genes requires GRID infrastructure

Participating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves)

However, alternatives often proprietary, expensive, less flexible

ODD-Genes requires registration by data-hosts

Critical mass of registered data sources.

Download