ODD-Genes Demo All Hands Meeting 2003 Nottingham September 2

advertisement
ODD-Genes Demo
All Hands Meeting 2003
Nottingham
September 2nd-4th 2003
Introduction
ODD-Genes Background
Science demonstrated by ODD-Genes
Technology behind ODD-Genes
Background (1)
ODD-Genes is a demonstrator
Demonstrates how Grid technologies enable eScience
Demonstrates how these technologies help achieve
real scientific results
Grid technologies demonstrated are OGSA-DAI
and SunDCG
OGSA-DAI provides access, control and discovery of
data resources
SunDCG’s TOG software allows for job submission
on remote compute resources
Background (2)
ODD-Genes used to investigate Wilms Tumour
Analysis of microarray experiments
Investigation of analysis results
Collaborative project
EPCC, Edinburgh, UK
Scottish Centre for Genomic Technology and
Informatics, Edinburgh, UK (GTI)
Human Genetics Unit at MRC, Western General
Hospital, Edinburgh, UK (HGU)
ODD-Genes Application
ODD-Genes hosted at
GTI
First stage is to analyse
microarray data
Wilms Tumour data:
18 microarray chips
Each chip holds 22,000
genes
Analysis Selection
Have chosen data to analyse
Choose analysis to perform
Automates analysis process
Provides predetermined
workflow
Can run more than one
analysis at a time
Multiple reproducible avenues
for investigation
TOG enables this by allowing
access to HPC resources
HPC Resource Selection
Choose where to run
analyses
Time/cost trade off
Remote = fast, expensive
Local = slow, cheap
Ask scheduler (Grid
Engine) to decide
Job Confirmation
Confirm details before
submission
Job submission will
schedule analysis job on
requested resource
Job Progress
Monitor job’s progress
Gives researchers
access their results
Results are stored in a
database at GTI
Analysis Results
Results of the analyses
can be investigated
Results Investigation
OGSA-DAI pulls results from
database
Multiple views of data
Raw
Heat Map
Cluster Map
Wilms Tumour study
takes a new direction
two genes appear
significant in early
development
Researchers would like
more info on these
genes…
Data Resource Discovery
Require information on
genes
Search for related
external data resources
Researcher defines
search using keywords
External Data Resource Query (1)
OGSA-DAI uses
keywords to locate data
resources
May return data
resources unknown to
the researcher
Select data resource to
query for information
about gene
External Data Resource Query (2)
OGSA-DAI returns data
from query
Data contains references
to related images
These show that the
genes are stem cell
markers
OGSA-DAI makes such
discoveries possible
ODD-Genes Application Summary
ODD-Genes demonstrates Grid technologies
can aid scientific discovery
TOG provides access to remote HPC resource
allows GTI to automate their microarray analysis
workflow
will allow GTI to investigate generalising this workflow
OGSA-DAI opens up areas of collaboration
allows researchers to discover little known data
resources
provides researchers with the means to interact with
these resources
ODD-Genes Technical Details
We have seen:
the science demonstrated by ODD-Genes
how Grid technologies have enabled the science
Now, take detailed look at technologies involved
SunDCG
Collaboration between Sun and EPCC
“Transfer-queue Over Globus” (TOG) software
provides secure job submission and control on
remote compute resource
integrates Sun’s Grid Engine with Globus 2.2.x
Grid Engine is a resource manager
normally used to manage compute resources which
share a file system
TOG allows Grid Engine to access remote resources
managed by their own Grid Engine
SunDCG: analysis submission
Analysis job submitted to local
TOG enabled Grid Engine
TOG uses GridFTP to securely
transfer microarray data to
remote EPCC’s HPC resource
TOG uses GRAM to securely
submit analysis to Grid Engine
managing EPCC’s HPC
resource
SunDCG: analysis running
Analysis runs at remote
location
Progress information read from
job’s output stream
Output stream securely
transferred from remote site
using GASS
SunDCG: analysis complete
Analysis complete
TOG uses GridFTP to securely
transfer all analysis results
from remote HPC resource
OGSA-DAI
Grid middleware which build upon OGSI
Defines a set of services and interfaces for
access, control and discovery of heterogeneous
data resources
Higher level services can be built upon OGSADAI base services. For example:
Distributed query processing services
Data federation services
OGSA-DAI: retrieve analysis
results
Request GDS from
GDFS at GTI
GDS allows ODDGenes to interact with
GTI’s Oracle
database
Request data via
Perform Document.
contains SQL query
references an XSLT
to format results for
UI
Successive
interaction uses same
GDS
OGSA-DAI: data source discovery
Registry at EPCC
contains handles to
external data
sources
ODD-Genes
queries the registry
with researcher’s
keywords
Handles to
matching GDSFs
returned
HGU identified as a
possible source of
information
OGSA-DAI: retrieve external data
ODD-Genes
automatically
queries HGU for
required gene
Similar to data
retrieval at GTI
XML Database
Perform
Document
contains XPath
No transform is
required by ODDGenes
Perform
Document doesn’t
contain XSLT
Conclusions
ODD-Genes has demonstrated how OGSA-DAI
and TOG can enable e-Science
“This project has demonstrated how Grid technologies can be used to enable
true e-Science – discoveries that would not otherwise have been achieved
without this infrastructure in place.”
Professor Peter Ghazal, Director, GTI
Further Information
ODD-Genes
http://www.epcc.ed.ac.uk/oddgenes/
OGSA-DAI
http://www.ogsadai.org.uk/
http://www.epcc.ed.ac.uk/gridserve/
SunDCG and TOG
http://www.epcc.ed.ac.uk/sungrid/
http://gridengine.sunsource.net/project/gridengine/tog.html
EPCC
http://www.epcc.ed.ac.uk/
Scottish Centre for Genomic Technology and Informatics
http://www.gti.ed.ac.uk/
MRC Human Genetics Unit
http://www.hgu.mrc.ac.uk/
Download