ODD-Genes Demo All Hands Meeting 2003 Nottingham September 2nd-4th 2003 Introduction ODD-Genes Background Science demonstrated by ODD-Genes Technology behind ODD-Genes Background (1) ODD-Genes is a demonstrator Demonstrates how Grid technologies enable eScience Demonstrates how these technologies help achieve real scientific results Grid technologies demonstrated are OGSA-DAI and SunDCG OGSA-DAI provides access, control and discovery of data resources SunDCG’s TOG software allows for job submission on remote compute resources Background (2) ODD-Genes used to investigate Wilms Tumour Analysis of microarray experiments Investigation of analysis results Collaborative project EPCC, Edinburgh, UK Scottish Centre for Genomic Technology and Informatics, Edinburgh, UK (GTI) Human Genetics Unit at MRC, Western General Hospital, Edinburgh, UK (HGU) ODD-Genes Application ODD-Genes hosted at GTI First stage is to analyse microarray data Wilms Tumour data: 18 microarray chips Each chip holds 22,000 genes Analysis Selection Have chosen data to analyse Choose analysis to perform Automates analysis process Provides predetermined workflow Can run more than one analysis at a time Multiple reproducible avenues for investigation TOG enables this by allowing access to HPC resources HPC Resource Selection Choose where to run analyses Time/cost trade off Remote = fast, expensive Local = slow, cheap Ask scheduler (Grid Engine) to decide Job Confirmation Confirm details before submission Job submission will schedule analysis job on requested resource Job Progress Monitor job’s progress Gives researchers access their results Results are stored in a database at GTI Analysis Results Results of the analyses can be investigated Results Investigation OGSA-DAI pulls results from database Multiple views of data Raw Heat Map Cluster Map Wilms Tumour study takes a new direction two genes appear significant in early development Researchers would like more info on these genes… Data Resource Discovery Require information on genes Search for related external data resources Researcher defines search using keywords External Data Resource Query (1) OGSA-DAI uses keywords to locate data resources May return data resources unknown to the researcher Select data resource to query for information about gene External Data Resource Query (2) OGSA-DAI returns data from query Data contains references to related images These show that the genes are stem cell markers OGSA-DAI makes such discoveries possible ODD-Genes Application Summary ODD-Genes demonstrates Grid technologies can aid scientific discovery TOG provides access to remote HPC resource allows GTI to automate their microarray analysis workflow will allow GTI to investigate generalising this workflow OGSA-DAI opens up areas of collaboration allows researchers to discover little known data resources provides researchers with the means to interact with these resources ODD-Genes Technical Details We have seen: the science demonstrated by ODD-Genes how Grid technologies have enabled the science Now, take detailed look at technologies involved SunDCG Collaboration between Sun and EPCC “Transfer-queue Over Globus” (TOG) software provides secure job submission and control on remote compute resource integrates Sun’s Grid Engine with Globus 2.2.x Grid Engine is a resource manager normally used to manage compute resources which share a file system TOG allows Grid Engine to access remote resources managed by their own Grid Engine SunDCG: analysis submission Analysis job submitted to local TOG enabled Grid Engine TOG uses GridFTP to securely transfer microarray data to remote EPCC’s HPC resource TOG uses GRAM to securely submit analysis to Grid Engine managing EPCC’s HPC resource SunDCG: analysis running Analysis runs at remote location Progress information read from job’s output stream Output stream securely transferred from remote site using GASS SunDCG: analysis complete Analysis complete TOG uses GridFTP to securely transfer all analysis results from remote HPC resource OGSA-DAI Grid middleware which build upon OGSI Defines a set of services and interfaces for access, control and discovery of heterogeneous data resources Higher level services can be built upon OGSADAI base services. For example: Distributed query processing services Data federation services OGSA-DAI: retrieve analysis results Request GDS from GDFS at GTI GDS allows ODDGenes to interact with GTI’s Oracle database Request data via Perform Document. contains SQL query references an XSLT to format results for UI Successive interaction uses same GDS OGSA-DAI: data source discovery Registry at EPCC contains handles to external data sources ODD-Genes queries the registry with researcher’s keywords Handles to matching GDSFs returned HGU identified as a possible source of information OGSA-DAI: retrieve external data ODD-Genes automatically queries HGU for required gene Similar to data retrieval at GTI XML Database Perform Document contains XPath No transform is required by ODDGenes Perform Document doesn’t contain XSLT Conclusions ODD-Genes has demonstrated how OGSA-DAI and TOG can enable e-Science “This project has demonstrated how Grid technologies can be used to enable true e-Science – discoveries that would not otherwise have been achieved without this infrastructure in place.” Professor Peter Ghazal, Director, GTI Further Information ODD-Genes http://www.epcc.ed.ac.uk/oddgenes/ OGSA-DAI http://www.ogsadai.org.uk/ http://www.epcc.ed.ac.uk/gridserve/ SunDCG and TOG http://www.epcc.ed.ac.uk/sungrid/ http://gridengine.sunsource.net/project/gridengine/tog.html EPCC http://www.epcc.ed.ac.uk/ Scottish Centre for Genomic Technology and Informatics http://www.gti.ed.ac.uk/ MRC Human Genetics Unit http://www.hgu.mrc.ac.uk/