GridQTL: High performance QTL analysis via the Grid University of Edinburgh Roslin Institute Overview • • • • • • People Science / biology Objectives e-Science: Why the Grid? Management Summary People University of Edinburgh Institute of Evolutionary Biology Roslin Institute NeSC Sara Knott Dave Berry Chris Haley Peter Visscher Denise Ecklund Dirk-Jan De Koning Rationale for QTL analysis • QTL = quantitative trait locus • Biology: Understanding genetic variation by dissecting complex traits – basic biology – applications in agriculture – applications in medicine Women 700 600 500 400 300 200 Std. Dev = 6.40 100 Mean = 169.1 N = 1785.00 0 145.0 155.0 150.0 Stature 165.0 160.0 175.0 170.0 185.0 180.0 190.0 Dissection of Complex Traits GridQTL Genetics Sib pairs Chromosome Region Association Study Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism Detection Mutation Characterization/ Functional Annotation QTL Express: User-friendly web-based software to map QTL in outbred populations George Seaton, Sara Knott, Chris Haley, Peter Visscher http://QTL.cap.ed.ac.uk/ 1 Roslin Institute 2 University of Edinburgh 3 Example: Obesity in pigs 120 20 18 16 14 80 12 60 10 8 40 Test statistic (F) No. of observations 100 6 4 20 2 0 0 0 10 20 30 40 50 60 70 Position (cM) 80 90 100 110 120 [Knott et al., 1998] Present and future paradigms # individuals # phenotypes # markers # analyses models data sources Now Future 100s 10s 100s 100s simple homogeneous 1000s 10000s 100000s O(106 – 108) complex heterogeneous Present statistical algorithms and computer platforms will be inadequate for future analysis ~400 individuals, ~9000 phenotypes, ~3000 markers Nature. 2004 430:743-7. Objectives To develop & apply a grid-based platform for robust and fast multiple trait mapping of multiple quantitative trait loci in simple and complex pedigrees and disseminate the results Objectives: Grid implementation • Transform existing QTL Express to be grid compatible • Deliver an essential analysis component for integrative biology workflow • Integrate new analytical approaches and grid components to deploy GridQTL Objectives: QTL analysis • Robust multiple trait algorithms – Expression QTL • Methods and algorithms for gene-gene interactions • Combining gene-phenotype associations within and between families Objectives: dissemination • Dissemination of the developed grid applications and QTL mapping algorithms – The existing QTL Express user-base – (Inter)national postgraduate courses • Scientific publications • e-Science meetings • Websites and internet postings Why the Grid? • Grid helps achieve our key goals to – Scale-up analysis complexity • Analyse more individuals, phenotypes and markers – Provide a ‘growing’ public service to the research community – Provide a component for integrative biology • Make QTL analysis services available in a larger workflow • Using the grid we can leverage – Essential computation and data storage resources – Existing middleware to manage these resources But we need to build on top of the middleware to get what we need to effectively support multi-trait analysis GridQTL Portal – The Challenges • Execute QTL analyses on grid computing resources – Describe parallel computation requirements – Automatic task-level decomposition of analysis requests – Schedule, monitor and re-start decomposed tasks • Provide a secure and private data space for each researcher – Synchronise application input and output – Enable analysis re-start from intermediate results • Be a robust public service GridQTL Portal Analysis 1 Data Mgr Analysis 2 Analysis Portlet Analysis 3 Analysis 4 Analysis 5 Meta Sched UK e-Science Grid or NGS Resources Management • Research partners – science partner coordination • Knott, De Koning, Haley, Visscher – science & e-science coordination • All PI • Seaton and NeSC software engineers • User group – current QTL Express users • Scientific Advisory Board – science and e-science academics Summary GridQTL provides an essential core component of a future integrated system incorporating genetic, phenotypic, transcription and comparative information to allow prediction from sequence to consequence