ComparaGRID Andy Law eScience Grant-holders’ workshop November 2004 Collaborators Roslin Institute EBI Institute for Food Research John Innes Centre Manchester Newcastle University (Maths & Stats) Newcastle University (Computing Science) SCRI Genomics and Bioinformatics Aim To develop ‘enabling technologies’ for comparative genomics Genomics and Bioinformatics Why? … an example … Genomics and Bioinformatics Identification of QTL What is the actual gene controlling the trait? Comparative genomics A tool to help? Genomics and Bioinformatics Identification of QTL gene Species A QTL is in here somewhere { Species B A A B B C C D D E E Genomics and Bioinformatics Identification of QTL gene Species A QTL is in here somewhere { A Species B A Gene 1 Gene 2 B B C C D D E E Gene 3 } Gene 4 These are potential candidate genes Genomics and Bioinformatics Pig Fat QTL Genomics and Bioinformatics Linkage and RH maps Fat Trait location Linkage Map Radiation Hybrid Map Genomics and Bioinformatics Human homology Pig Fat Trait location Linkage Map Radiation Hybrid Map Cytogenetic Map Genomics and Bioinformatics Physical clones Pig Human BAC1 BAC2 Fat BAC3 Trait location Linkage Map Radiation Hybrid Map Cytogenetic Map Physical Mapping Genomics and Bioinformatics Chicken EST homologues Pig Chicken Human BAC1 EST1 BAC2 Fat EST2 BAC3 Trait location Linkage Map Radiation Hybrid Map Cytogenetic Map Physical Mapping Genomics and Bioinformatics Expression data Pig Chicken Human BAC1 EST1 BAC2 Fat EST2 BAC3 Trait location Linkage Map Radiation Hybrid Map Cytogenetic Map Physical Mapping Expression Analysis Genomics and Bioinformatics Supporting literature Pig Chicken Human BAC1 EST1 BAC2 Fat EST2 BAC3 Trait location Linkage Map Radiation Hybrid Map Linked References Cytogenetic Map Physical Mapping Expression Analysis Genomics and Bioinformatics Problem We can compute this, computers can’t Genomics and Bioinformatics Making the links Define a vocabulary that describes links A ‘is an alias of’ B C ‘is contained by’ D • Ergo D ‘contains’ C E ‘is homologous/orthologous to’ F G ‘differs from’ G1 … etc. … Genomics and Bioinformatics Storing/presenting the links Links are data too… Assertions of identity, similarity etc. should be treated in exactly the same way as a gene sequence or a spot on a micro-array Genomics and Bioinformatics Ideal outcomes An ontology of comparative genomics relationships An infrastructure for rapidly generating genome-to-genome comparisons Applications that utilise these to generate problem-focussed hypotheses Genomics and Bioinformatics Making the links Different name, same thing… TGF-B1, TGFB1, Tgfb1, Transforming Growth Factor Beta 1, TGF 1 TGF-B1, TGF-B4, TGF-B5 Genomics and Bioinformatics Making the links Same name, different thing… There are at least 6 different markers recorded as ‘GH’ within ARKdb-pig Some primer pairs amplify multiple loci and the same anonymous symbol has thus been assigned to multiple chromosomal locations Genomics and Bioinformatics Making the links Gene families TGF-B1, TGF-B2, TGF-B3, TGF-B4, TGFB5 Chicken, human have 3, Xenopus has 2 Genomics and Bioinformatics Making the links Fat QTLs Abdominal fat pad, shoulder, back, interstitial (marbling) Other phenotypes Are chicken wings equivalent to arms or limbs in general? What about drosophila wings? Genomics and Bioinformatics