Bioinformatics for Farm Animal Genomics at Roslin Andy Law & Alan Archibald Overview z Bioinformatics z Farm Animal Genomics z Databases z ARKdb resSpecies Integration Bioinformatics Activities z The interface between computer science and biology Computer Scientists New algorithms Roslin Bioinformatics Group New Integrating Providing Providing access Automation access to/to/ Automation New Integrating tools distinct distincttools tools maintaining tools scripts maintaining tools tools scripts Biologists Using tools and scripts Genomics and Bioinformatics Roslin Bioinformatics Worldwide reputation z Construction and population of Resource databases – resSpecies, radiation hybrid database Genome databases - ARKdb TCAGdb Genetic diversity databases Genomics and Bioinformatics Roslin Bioinformatics Worldwide reputation z Anubis z The first web-delivered graphical user interface Webintool A web-page scripting program Several years ahead of its time Written by one person Genomics and Bioinformatics Roslin Bioinformatics Internal role z Support the genomics programmes z Provide other, more generalised assistance (e.g. sequence analysis etc.) z Provide tools, and advice on their use z Don’t provide analysis service Make routine data handling easier Genomics and Bioinformatics Overview z Bioinformatics z Farm Animal Genomics z Databases z ARKdb resSpecies Integration Farm Animal Genomics Genome Mapping Quantitative Trait Locus (QTL) Identification Causative Gene Identification (Physiology, Biochemistry, Pathways…) Genomics and Bioinformatics Farm Animal Genomics Genome Mapping Quantitative Trait Locus (QTL) Identification Sequencing? Causative Gene Identification (Physiology, Biochemistry, Pathways…) Genomics and Bioinformatics Farm Animal Genomics Genome Mapping Quantitative Trait Locus (QTL) Identification Causative Gene Identification (Physiology, Biochemistry, Pathways…) Genomics and Bioinformatics Genome Mapping z What is a Genome Map? A means of identifying points within the genome E.g. Chromosome banding patterns Cytogenetic locations Genetic Linkage maps (DNA sequence) Genomics and Bioinformatics Cytogenetic Map Genomics and Bioinformatics Genetic Linkage Map ARKdb Maps Genomics and Bioinformatics ARKdb presence z Main Roslin node (www.thearkdb.org, roslin.thearkdb.org) z Mirrors at Iowa, Texas (iowa.thearkdb.org, texas.thearkdb.org) z New mirror in Australia (oz.thearkdb.org, angis.thearkdb.org) Genomics and Bioinformatics Farm Animal Genomics Genome Mapping Quantitative Trait Locus (QTL) Identification Causative Gene Identification (Physiology, Biochemistry, Pathways…) Genomics and Bioinformatics QTL Identification z Take two lines that differ for the trait of interest z Cross them z Cross the F1 animals z Analyse the F2 animals Genomics and Bioinformatics Roslin Pig QTL Population Large White Meishan Genomics and Bioinformatics Roslin Chicken QTL Population Genomics and Bioinformatics QTL Identification z Analyse the F2 animals Measure the trait Determine genotypings Analyse to associate trait with inheritance patterns Genomics and Bioinformatics QTL-mapping Pedigree Records Trait Records Analysis Programs Genotypes Genomics and Bioinformatics Further points... z Input data file formats Complex Unforgiving Difficult to increment Genomics and Bioinformatics Crimap input file 1 3 A197 GGQW Z113 AF1 10 1 0 0 1 1 2 1 1 2 0 0 0 1 2 1 2 3 0 0 1 1 1 1 1 4 0 0 0 3 3 1 1 5 2 1 1 1 2 1 2 8 4 3 0 1 3 1 1 11 8 5 1 1 2 1 2 12 8 5 1 1 3 1 1 13 8 5 0 2 2 1 1 14 8 5 0 1 3 1 2 1 5 2 3 4 6 3 3 1 3 3 4 0 0 3 3 1 4 1 4 Genomics and Bioinformatics QTL-mapping z Other problems Sharing Data • Genotyping lab may be different from the lab that recorded the traits • Analysis may performed by a different lab • Populations may overlap z Need… An easily accessible database Genomics and Bioinformatics QTL-mapping Pedigree Records Trait Records Analysis Programs Genotypes Genomics and Bioinformatics QTL-mapping Pedigree Records Trait Records resSpecies Analysis Programs Genotypes Genomics and Bioinformatics Genomics and Bioinformatics resSpecies z Designed to be generic and speciesneutral (for all the species I knew would be required at the outset) z Handles Mapping and QTL experiments z Entirely web-operable Genomics and Bioinformatics QTL-mapping Pedigree Records Trait Records resSpecies Analysis Programs Genotypes Genomics and Bioinformatics Analysis programs z Developed / used by Department of Genetics & Biometry Genomics and Bioinformatics Analysis programs z Regression-based methods z Monte Carlo methods z Knott & Haley (QTL Express) Gibbs sampling Simulation studies Genomics and Bioinformatics Identification of QTL 45 Shoulder Back Loin Threshold 40 35 30 25 20 15 10 5 0 Marker 1 Marker 2 Marker 3 Marker 4 Marker 5 Marker 6 Marker 7 Identification of QTL 45 Shoulder Back Loin Threshold 40 35 30 25 20 15 10 5 0 Marker 1 z Marker 2 Marker 3 Marker 4 Marker 5 Marker 6 Marker 7 What is the actual gene controlling the trait? Farm Animal Genomics Genome Mapping Quantitative Trait Locus (QTL) Identification Causative Gene Identification (Physiology, Biochemistry, Pathways…) Genomics and Bioinformatics Identification of QTL gene z Positional Candidate Note which markers flank the QTL Use those markers to identify corresponding region of genetic map Look at the genes known to map to that region to identify potential candidate genes Genomics and Bioinformatics Identification of QTL 45 Shoulder Back Loin Threshold 40 35 30 25 20 15 10 5 0 Marker 1 Marker 2 Marker 3 Marker 4 Marker 5 Marker 6 Marker 7 Identification of QTL gene z The QTL region will probably cover at least 30cM z Chicken genetic map is approximately 3,500cM z Vertebrates have 20-35,000 genes 30cM contains between 175 and 300 genes Genomics and Bioinformatics Identification of QTL gene z Farm animals have relatively few genes mapped z Mouse and human have thousands of ESTs and genes mapped … plus evolving sequence assemblies Genomics and Bioinformatics Comparative Gene Mapping Species A Species B A A B B C C D D E E Genomics and Bioinformatics Identification of QTL gene Species A QTL is in here somewhere { Species B A A B B C C D D E E Genomics and Bioinformatics Identification of QTL gene Species A QTL is in here somewhere { A Species B A Gene 1 Gene 2 B B C C D D E E Gene 3 } Gene 4 These are potential candidate genes Genomics and Bioinformatics Comparative Gene Mapping z Requirements Some degree of conservation of genomic order Mapping of a large number of coding regions in a variety of species Good evidence to confirm homology between any pair of loci in two species Genomics and Bioinformatics Integration z Can also add in other data types Genomics and Bioinformatics Pig Fat QTL Genomics and Bioinformatics Linkage and RH maps Fat Trait location Linkage Map Radiation Hybrid Map Genomics and Bioinformatics Human homology Pig Fat Trait location Linkage Map Radiation Hybrid Map Cytogenetic Map Genomics and Bioinformatics Physical clones Pig Human BAC1 BAC2 Fat BAC3 Trait location Linkage Map Radiation Hybrid Map Cytogenetic Map Physical Mapping Genomics and Bioinformatics Chicken EST homologues Pig Chicken Human BAC1 EST1 BAC2 Fat EST2 BAC3 Trait location Linkage Map Radiation Hybrid Map Cytogenetic Map Physical Mapping Genomics and Bioinformatics Expression data Pig Chicken Human BAC1 EST1 BAC2 Fat EST2 BAC3 Trait location Linkage Map Radiation Hybrid Map Cytogenetic Map Physical Mapping Expression Analysis Genomics and Bioinformatics Supporting literature Pig Chicken Human BAC1 EST1 BAC2 Fat EST2 BAC3 Trait location Linkage Map Radiation Hybrid Map Linked References Cytogenetic Map Physical Mapping Expression Analysis Genomics and Bioinformatics Making the links z Different name, same thing… TGF-B1, TGFB1, Tgfb1, Transforming Growth Factor Beta 1, TGF β1 TGF-B1, TGF-B4, TGF-B5 Genomics and Bioinformatics Making the links z Same name, different thing… There are at least 6 different markers recorded as ‘GH’ within ARKdb-pig Some primer pairs amplify multiple loci and the same anonymous symbol has thus been assigned to multiple chromosomal locations Genomics and Bioinformatics Making the links z Gene families TGF-B1, TGF-B2, TGF-B3, TGF-B4, TGFB5 Chicken, human have 3, Xenopus has 2 Genomics and Bioinformatics Making the links z Fat QTLs Abdominal fat pad, shoulder, back, interstitial (marbling) Genomics and Bioinformatics Identification of QTL 45 Shoulder Back Loin Threshold 40 35 30 25 20 15 10 5 0 Marker 1 Marker 2 Marker 3 Marker 4 Marker 5 Marker 6 Marker 7 Making the links z Other phenotypes Are chicken wings equivalent to arms or limbs in general? What about drosophila wings? Genomics and Bioinformatics Making the links z Ontologies Graphs of controlled vocabularies Not perfect Current debate in MGED moving towards references to ontologies and collections of ontology-ontology mappings Genomics and Bioinformatics Making the links z z Ontologies provide a means to define hierarchies of attributes and functions We need a way to define relationships between instances of physical ‘things’ rather than their functions or attributes Genomics and Bioinformatics Making the links z Define a vocabulary that describes links A ‘is an alias of’ B C ‘is contained by’ D • Ergo D ‘contains’ C E ‘is homologous/orthologous to’ F G ‘differs from’ G1 Genomics and Bioinformatics Making the links z More importantly defines external data references A ‘has a sequence accession of’ AC012345 B ‘is defined at’ http://whatever.com Genomics and Bioinformatics Integration z Technical issues… Systems developed stand-alone • Fine for ‘point-and-click’ • Less good for automated/bulk analysis Genomics and Bioinformatics Integration z Re-engineer systems z Define Application Programming Interfaces (APIs) Define Structured Data Interchange Formats Use APIs to integrate data from different systems Genomics and Bioinformatics User resSpecies ARKdb Radiation Hybrid Database Diversity Databases Genomics and Bioinformatics Novel Analyses User resSpecies ARKdb Radiation Hybrid Database Diversity Databases Genomics and Bioinformatics User resSpecies Interface ARKdb Interface Radiation Hybrid Database Interface Diversity Databases Interface Application Programmable Interface resSpecies ARKdb Radiation Hybrid Database Diversity Databases Genomics and Bioinformatics User Novel Analyses Application Programming Interface resSpecies ARKdb Radiation Hybrid Database Diversity Databases Genomics and Bioinformatics User Novel Analyses Application Programming Interface resSpecies ARKdb Radiation Hybrid Database Array Diversity Expression Databases Data Sequence & Homology Genomics and Bioinformatics ? The GRID! Application Programming Interface resSpecies ARKdb Radiation Hybrid Database Array Diversity Expression Databases Data Sequence & Homology Genomics and Bioinformatics ? Farm Animal Genomics z Ultimate goal is to identify causative genes z Comparative genomics/Data integration will play a large part z Complexity, not volume Need to focus on infrastructure Genomics and Bioinformatics