GridQTL A Grid Portal for QTL Mapping of Compute Intensive Datasets People University of Edinburgh Institute of Evolutionary Biology National eScience Centre Sara Knott Ian White George Seaton Jules Hernandez Jean-Alain Grunchec Dave Berry John Allen Roslin Institute & R(D)SVS Chris Haley Dirk-Jan de Koning Burak Karacaören Wenhua Wei Suzanne Rowe What is GridQTL? • 5 year Research Council funded programme • AIM: To develop a Grid based platform for the analysis of multiple traits with high density markers with complex genetic models Develop the Grid platform Develop the genetic analysis Implement analysis on platform And keep it free! History: QTL Express • User-friendly web-based software to map QTL in outbred populations • Fast robust analysis of structured outbred populations • Variance component estimation for general pedigrees • http://qtl.cap.ed.ac.uk • Java servlet platform QTL Express New developments • 100,000s marker genotypes per individual (SNPs, AFLPs) • 10,000s phenotypes, e.g. through microarray technology • 1000s individuals New Models • Epistasis • Combined linkage and linkage disequilibrium mapping (LDLA) • Expression QTL (eQTL) • Genome-wide SNP association scans New requirements • New analysis take many CPU • Need to store data and intermediate results • One QTL study, many analyses • Batch submission and data retrieval • Solution: distributed computing=> Grid Grid Distributed DistributedComputing Computing •• Loosely Looselycoupled coupled •• Heterogeneous Heterogeneous •• Single SingleAdministration Administration Grid GridComputing Computing •• Large Largescale scale •• Cross-organizational Cross-organizational •• Geographical Geographicaldistribution distribution •• Distributed Management Distributed Management Source: Hiro Kishimoto GGF17 Keynote May 2006 Cluster Cluster •• Tightly Tightlycoupled coupled •• Homogeneous Homogeneous •• Cooperative Cooperativeworking working Development of QTL analyses •QTL mapping methods for multiple traits •QTL mapping methods for high-density multiple trait data •Methods to detect two-QTL epistatic interactions •Variance components methods to map QTL by linkage disequilibrium in general pedigrees GridQTL Analysis portfolio • Experimental crosses • Epistasis, Multi-trait, eQTL • General pedigrees • VC, LDLA, LD, • Sib-pairs, Half-sibs, single large full-sib family • Opportunistic additions Development of Grid Platform •Transform existing QTL Express Java Servlet platform to be Grid compatible •Implemented using GridSphere (www.gridsphere.org) •Ultimately, GridQTL as analysis component for integrative biology GridQTL Portal GridQTL Portal Analysis 1 Data Mgr Analysis 2 Analysis Portlet Analysis 3 Analysis 4 Analysis 5 Meta Sched UK e-Science Grid or NGS Resources Implementation Challenges • QTL software developed in many programming languages • Transform stand-alone scientific applications to Grid applications • Licensing of commercial components (software and libraries) • Saturation and robustness of UK Grid Where are we now? • QTL Express migrated to GridSphere • Want to test-drive GridQTL? • E-mail Sara Knott for login details • http://www.GridQTL.org.uk • Contacts: • Sara Knott s.knott@ed.ac.uk • Mail us your suggestions Acknowledgements • Peter Visscher & Denise Ecklund • Scientific Advisory Board: • • • • • • • Dr. Peter Visscher, QIMR, Brisbane Dr. Stephen McGough, Imperial college, London Dr. Robert Stevens, University of Manchester Dr. Lon Cardon, University of Oxford Dr. Mike Goddard, University of Melbourne Dr. Orjan Carlborg, Uppsala Univeristy Dr. Robin Thompson, Rothamsted Research