THE HUMAN GENOME SERIES MAMMALIAN GENES I. Conservation and Slow Evolution (today) II. Functional Innovation and Rapid Change (Feb 10) Your genome! Feb 10 FAST SLOW Feb 3 Questions • • • • • • Are we ‘just’ E. coli, except more so? Where do new genes come from? Do all genes evolve at the same rate? Do all tissues & organs evolve at the same rate? Where do we fit in the tree of life? What specifies the differences between us and rodents, or us and chimps? • What specifies the elevated complexity of us versus other animals? • Can we understand sequence variation among humans? • How can gene function contribute to behaviour? Theodosius Dobzhansky (1900-1975) “Nothing in Biology makes sense except in the light of Evolution” • Are we ‘just’ E. coli, except more so? "Tout ce qui est vrai pour le Colibacille est vrai pour l'éléphant" Jacque Monod (1972) 1965 Nobel laureate "Tout ce qui est vrai pour le Colibacille est vrai pour l'éléphant ?" Genes 5.4k ~ 30k Mode of Protein Evolution • De novo creation • Gene fusion / fission • Gene duplication • Rapid sequence change • Pseudogenisation Genomes and Timelines wrt Archaea 1000 Mya Invertebrates 100 Mya 3000 Mya 1000 Mya Rodents 75 Mya Chimpanzee 5 Mya 10 Mya 1 Mya THE ORIGIN AND EVOLUTION OF MODEL ORGANISMS Hedges, SB Nature Reviews Genetics 3, 838 -849 (2002) Sequencing Assembly DNA Repeats Gene Prediction Genome Comparison Gene Comparison Gene Number • • • • • • • • • Walter Gilbert [1980s] 100k Antequera & Bird [1993] 70-80k John Quackenbush et al. (TIGR) [2000] 120k Ewing & Green [2000] 30k Tetraodon analysis [2001] 35k Human Genome Project (public) [2001] ~ 31k Human Genome Project (Celera) [2001] 24-40k Mouse Genome Project (public) [2002] 25k -30k Lee Rowen [2003] 25,947 Complexity & Gene Number? 35000 60000 30000 50000 Gene GeneCount Count 25000 40000 20000 30000 Series1 Series1 15000 20000 10000 10000 5000 0 0 Human Cress Fly Worm S. Maize Human Cress Fly Worm pombeS. pombe “Revealed: the secret of human behaviour. Environment, not genes, key to our acts” “We simply do not have enough genes for this idea of biological determinism to be right. The wonderful diversity of the human species is not hard-wired in our genetic code. Our environments are critical.” J Craig Venter February 10, 2001 Complexity? • Is ‘culture’ proportional to population size? • Is the complexity of the WWW proportional to its size? • Combinatorial argument • Genetic interactions; alternative splicing; non-genic regulation; post-transcriptional & post-translational modifications Complexity of Protein Sequences 2000 1800 Architecture numbers in 4 eukaryotic proteomes 1600 1400 1200 TM 1000 extra intra 800 600 Data generated using SMART 400 200 0 Human Fly Worm Yeast Function Orthologues and Paralogues Cenancestor SP1 SP2 DP2 A1 B1 C1 C1 and C2 are paralogues A1 and B1 and (C1 and C2) are orthologues C2 Only 1,195 human genes were found that had single orthologues in worm and fly. Approx 95% of human genes do not have obvious orthologues in fly and worm Data from Rich Copley and Peer Bork Extracellular signalling proteins are among the most different between animals Drosophila Human 220 119 C. elegans 12 Antifreeze protein type III from Antarctic eel pout (Lycodichthys dearborni) Few sequencebased findings. For example … [359 residues] Are we polyploid? Human(x):Fly(1):Worm(1) 1400 1200 Frequency 1000 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 No. of human paralogues Richard Copley 11+ Segmental Duplication in the Human Genome Bailey et al. Science. 2002 297: 1003-7. Am J Hum Genet. 2003 73: 823-34 Horizontal Gene Transfer? • The claim: “113 of these genes are widespread among bacteria, but, among eukaryotes, appear to be present only in vertebrates. These genes [may have] entered the vertebrate (or prevertebrate) lineage by horizontal transfer from bacteria.” Stanhope et al. Nature 2001 Jun 21; 411(6840): 940-4. “Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates.” The coral Acropora millepora shares a surprisingly large number of genes with vertebrates. Curr Biol. 2003 Dec 16; 13(24): 2190-5. Gene loss is a powerful force in shaping gene repertoire. "Tout ce qui est vrai pour le Colibacille est vrai pour l'éléphant“ ? ‘New Domains’ 23 of 94 InterPro families: Defense and Immunity e.g. IL, interferons, defensins 17 of 94 InterPro families: Peripheral nervous system e.g. Leptin, prion, ependymin 4 of 94 InterPro families: Bone and cartilage GLA, LINK, Calcitonin, osteopontin 3 of 94 InterPro families: Lactation Caseins (a, b, k), somatotropin 2 of 94 InterPro families: Vascular homeostasis Natriuretic peptide, endothelin 5 of 94 InterPro families: Dietary homeostasis Glucagon, bombesin, colipase, gastrin, IlGF-BP 18 of 94 InterPro families: Other plasma factors Uteroglobin, FN2, RNase A, GM-CSF etc. Pseudogenes • Two types: processed and non-processed • 70% processed vs 30% non-processed • ~ 20,000 Torrents et al. Genome Res. 2003 13: 2559-67. SNPs • Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. • They occur with an average density of 1/1000 nucleotides of a genotype • Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that are believed to have the highest impact on phenotype. • Ditto for SNPs in regulatory regions. Synonymous change: Non-synonymous change: TTA (Leu) → TTG (Leu) TTA (Leu) → TTT (Phe) What’s the difference between a mutation and a polymorphism? Frequency! A frequency value of 1% of the polymorphic allele is usually taken as a threshold between mutation and polymorphism. An example of a polymorphic variant which disrupts a critical disulphide bond. Although this variant (260 Cys→Tyr) in HLA-H protein is strongly associated with hereditary haemochromatosis, its frequency is as high as 6% in Northern Europeans with up to 14% in Ireland. from Sunyaev et al. HMG 2001, Vol. 10, No. 6 591-597 Questions • • • • • • Are we ‘just’ E. coli, except more so? NO. Where do new genes come from? Do all genes evolve at the same rate? Do all tissues & organs evolve at the same rate? Where do we fit in the tree of life? What specifies the differences between us and rodents, or us and chimps? • What specifies the elevated complexity of us versus other animals? • Can we understand sequence variation among humans? • How can gene function contribute to behaviour? After the break … Comparative Genomics: Humans vs Rodents Human and mouse c-kit mutations show similar phenotypes. The utility of mouse as a biomedical model for human disease is enhanced when mutations in orthologous genes give similar phenotypes in both organisms. In a visually striking example of this, the same pattern of hypopigmentation is seen in (a) a patient with the piebald trait and (b) a mouse with dominant spotting, both resulting from heterozygous mutations of the c-kit proto-oncogene. Rodents as models for human disease • All but a handful of human genes have orthologous counterparts in the mouse and rat genomes. • In general, disease genes are not under different selective constraints relative to all other genes. • Rodents are good model organisms for human disease Mouse equivalents of human disease variants Hs normal: MAETLFWTPLLVVLLAGLGDTEAQQTTLHPLVGRVFVHTLDHETFLSLPEHVAVPPAVHI Hs variant: MAETLFWTPLLVVLLAGLGDTEAQQTTLHLLVGRVFVHTLDHETFLSLPEHVAVPPAVHI Mm normal: MAAAVTWIPLLAGLLAGLRDTKAQQTTLHLLVGRVFVHPLEHATFLRLPEHVAVPPTVRL Equivalent disease variants? – 23 human disease-associated sequence variants whose variant amino acids are normal in the mouse. Including: • • • • Breast Cancer (BRCA1 and BRCA2) Cystic Fibrosis (CFTR) Type 2D LGMD (SGCA) Becker Muscular Dystrophy (DMD) – These variants are unlikely to be of value in understanding human disease. Mouse vs Human • Do all genes evolve at the same rate? • Do all tissues & organs evolve at the same rate? • Where do we fit in the tree of life? • What specifies the differences between us and rodents? More organisms … more comparisons … ~ 1000 more genes identified… Guigó, R. et al. PNAS (2003) 100, 1140-1145 Sequence conservation Figure 25. Sequence conservation between mouse and human genes Mouse genome paper Nature 420, 520-562 Slow Evolution The human spermidine synthase gene (SRM) and its mouse orthologue (Srm). The fifth exon in the mouse gene (green) is interrupted by an intron in the human orthologue. Orthologues and Paralogues Cenancestor SP1 SP2 DP2 A1 B1 C1 C1 and C2 are paralogues A1 and B1 and (C1 and C2) are orthologues C2 Human and mouse “local synteny” “Syntenic” regions contain orthologues! Human and mouse chromosomes: global orthology How do we link genomes & genes to evolution? • Do all genes evolve at the same rate? • Do all tissues & organs evolve at the same rate? • Where do we fit in the tree of life? • What specifies the differences between us and rodents? Percentage of sequences per interval Domain-regions are more conserved 30% 25% Full Length proteins Domain-free regions Domain-containing regions 20% 15% 10% 5% 0% 0% 20% 40% 60% Percentage Identity 80% 100% Mouse-Human Orthologues % Identity • • • • • sites not in domains: cSNP sites: all sites: sites in domains: disease sites: 64.4% 67.1% 70.1% 88.9% 90.3% Little selection at cSNP sites Significant selection at functional sites A model of neutral evolution • KS – the number of synonymous substitutions per synonymous site • takes advantage of the redundant genetic code • 4D sites GCx (ALA), CCx (PRO), TCx (SER), ACx (THR), CGx (ARG), GGx (GLY), CTx (LEU), GTx (VAL) • “how much would a gene have changed if selection had not acted upon it?” Thomas et al., Nature 424, 788 - 793 Neutral rates vary see also Hardison et al. Genome Res. 2003 13: 13-26. Variation in rates of mutation or rates of repair? • Transcription-associated mutational strand asymmetry (Phil Green et al. Nature Genetics 33: 514-7) • Associated with transcription-coupled repair processes (Majewski, Am J Human Genet 73, 688-692) • Genes transcribed in the germline at high levels, when mutated, are repaired more readily, than those not transcribed in the germline. • Majewski estimates that 71%-91% of genes are transcribed in the germline! Fe ta lb Terai n Pi A H stis tu m ep ita yg 3 ry da b gl l a D and O H HH U 2 V THE C Y Sp Th DR in yr G al oi c d U or te d r A dr P O v us en r o ar al sta y Thgla te ymnd Fe Kid us t n W Pa al l ey ho nc ive l r r Sa e b eas lo l iv ar L od y ive gl r an Pl He d a a Tr cen rt ac ta h Luea Sp n le g en Median Ks-value Tissue-specific genes’ Ks 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Winter et al. Genome Research 14:54-61, 2004 A model for non-neutral evolution • KA – the number of non-synonymous (amino acid changing) substitutions per non-synonymous site • What proportion of possible amino acid-changing substitutions has occurred? KA/KS (dN/dS, ω) ― A model of selective pressure conserving 0.0 diversifying 1.0 << 1 >1 purifying selection positive diversifying selection Percentage of sequences per interval 25% Domain-regions under higher purifying selection 20% Full Length proteins Domain-free regions Domain-containing regions 15% 10% 5% 0% 0.00 0.10 0.20 0.30 0.40 KA/KS 0.50 0.60 0.70 Percentage of sequences per interval 100% Domain-regions are under higher purifying selection 80% 60% 40% Full Length proteins Domain-free regions Domain-containing regions 20% 0% 0.00 0.10 0.20 0.30 KA/KS 0.40 0.50 Higher purifying pressures in enzymes Catalytic domains in are • more conserved • under higher purifying selection than non-catalytic domains Selective Pressures vary with cellular compartment For 521 domain families of known locale: KA/KS values • Secreted >> Nuclear > Cytoplasmic Questions • • • • • • Are we ‘just’ E. coli, except more so? NO. Where do new genes come from? Next week. Do all genes evolve at the same rate? NO. Do all tissues & organs evolve at the same rate? NO. Where do we fit in the tree of life? Mammals! What specifies the differences between us and rodents, or us and chimps? Next week. • What specifies the elevated complexity of us versus other animals? Unknown. • Can we understand sequence variation among humans? Hopefully, we will. • How can gene function contribute to behaviour? Next week. MRC Functional Genetics Unit, Oxford Leo Goodstadt Richard Emes Eitan Winter Steve Rice Scott Beatson Nick Dickens Caleb Webber Michael Elkaim Jose Duarte Ensembl (Ewan Briney, Michele Clamp, Abel Ureta-Vidal); Richard Copley (WTCHG, Oxford); Ziheng Yang (UCL); The Human, Mouse and Rat Genome Sequencing Consortia; UCSC Bibliography Human Genome Papers: Lander et al. Nature (2001) 409, 860-921 Venter et al. Science (2001) 291, 1304-1351. Mouse Genome Paper: Waterston et al. Nature (2002) 420, 520-62. Rat Genome Paper: submitted. Comparative genomics & evolutionary rates: Hardison et al. Genome Res. (2003) 13, 13-26. Adaptive evolution of genomes: Emes et al. Hum Mol Genet. (2003) 12, 701-9 Wolfe & Li Nat Genet. (2003) 33 Suppl: 255-65