Questions • • • • • • Are we ‘just’ E. coli, except more so? Where do new genes come from? Do all genes evolve at the same rate? Do all tissues & organs evolve at the same rate? Where do we fit in the tree of life? What specifies the differences between us and rodents, or us and chimps? • What specifies the elevated complexity of us versus other animals? • Can we understand sequence variation among humans? • How can gene function contribute to behaviour? Where do new genes come from? ‘New Domains’ 23 of 94 InterPro families: Defense and Immunity e.g. IL-1, interferons, defensins 17 of 94 InterPro families: Peripheral nervous system e.g. Leptin, prion, ependymin 4 of 94 InterPro families: Bone and cartilage GLA, LINK, Calcitonin, osteopontin 3 of 94 InterPro families: Lactation Caseins (a, b, k), somatotropin 2 of 94 InterPro families: Vascular homeostasis Natriuretic peptide, endothelin 5 of 94 InterPro families: Dietary homeostasis Glucagon, bombesin, colipase, gastrin, IlGF-BP 18 of 94 InterPro families: Other plasma factors Uteroglobin, FN2, RNase A, GM-CSF etc. Stepping through structure and sequence space: the FGF / IL-1 beta-trefoil story Structure & Sequence Sequence J Mol Biol. 2000 Oct 6;302(5):1041-7. FGFs, interleukin-1s beta-trefoils EXTRACELLULAR (CELL-CELL SIGNALLING): FGF IL-1a VERT., INVERT. VERT. INTRACELLULAR (ACTIN-BINDING PROTEINS): Fascin Hisactophilin VERT., INVERT., FUNGI Dictyostelium. J.Mol.Biol. 302, 1041-1047 Gene Genesis • Positive selection often leads to the erosion of sequence similarity • If this erosion is extensive, homology cannot be inferred from database search strategies. • If, concomitantly, there is positive selection for duplication of these genes, this gives the appearance of a new gene/domain family that lacks antecedents. Copley, Goodstadt, Ponting Current Opinion in Genetics & Development Volume 13, December 2003, Pages 623-628 Conservation and Selection over Time Conservation (% identity) 100.00% % of orthologs found in fugu 90.00% 50% a 80.00% b 70.00% c d 60.00% 50.00% e f g h i Mouse-rat Human-mouse Human-fugu 0 j 150 300 Time of Divergence (Myr) 450 100% Percentage of sequences Do all tissues & organs evolve at the same rate? 100% 80% Cytoplasmic domains Nuclear domains Secreted domains 60% 40% 20% 0% 0.00 0.10 0.20 KA /KS 0.30 0.40 Need to investigate expression of tissue-specific genes. PNAS | April 2, 2002 | vol. 99 | no. 7 | 4465-4470 Genetics Large-scale analysis of the human and mouse transcriptomes Andrew I. Su et al. http://expression.gnf.org • Tissue Specificity of a Gene: TS • A gene's fractional expression in a tissue relative to the sum of its expression in all tissues • max TS : an indicator of Tissue Specificity. • Divide data into 5 sets: • • • • • (1) maxTS ≤ 0.1; (2) 0.1 < maxTS ≤0.2; (3) 0.2 < maxTS ≤ 0.3; (4) 0.3 < maxTS ≤ 0.4; (5) maxTS > 0.4 All Protein secretion accounts for much of the elevation in KA /KS for Tissue-Specific genes. Non-secreted Secreted Non-disease Eitan Winter Disease Thymus Blood Brain Liver Kidney Slow (KA/KS=0.04) Evolutionary Rates Fast (KA/KS=0.13) Trachaea Blood Brain Liver Testis Kidney Low (12.2%) Protein Secretion (%) High 50% All Housekeeping genes are under-represented among disease genes Non-secreted Secreted Non-disease Eitan Winter Disease Trachaea Blood Brain Liver Testis Kidney Low (5.0%) Human Disease (%) High 39% B Te rai Pi A H sti n tu m e s ita yg p3 ry da b g la D lan O d H HH U 2 V THE C Sp Th DRYin yr G al oi c d U or te d r A dr P O v us en r o ar al sta y Thgla te ymnd K Fe id us t n W Pa al l ey ho n iv l cr er Sa e b ea lo s l iv ar L od y ive gl r a Pl Hend a a Tr cen rt ac ta h Luea Sp n le g en Median Ks-value Tissue-specific genes’ Ks 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Winter et al. Genome Research 14:54-61, 2004 Tissue/Organ Evolution • Mammalian tissues & organs are evolving at different rates, according to the genes that are specifically expressed in them. • Perhaps this is not too surprising since there are mammalian-specific tissues & organs! • Tissue-specific genes are ‘mutating’ at different rates, possibly due to transcription-coupled repair in the germline. • Mendelian disease acts non-uniformly among genes and tissues. Human-Mouse Orthologues’ Expression Profile Correlations 18 16 14 12 10 % Orthologue Pairs Random Pairs 8 6 4 2 Eitan Winter 0 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 Pearson Correlation 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Pan troglodytes genome • 4X coverage • average nucleotide divergence of just 1.2% How do the 2 gene complements differ? • Gene duplications observed in the human genome. • Lack of N-glycolylneuraminic acid (Neu5Gc) in humans due to mutation in CMP-sialic acid hydroxylase (Chou et al. PNAS 95(20):11751-6.) • Mutation in a Siglec (sialic acid receptor) (Angata et al. JBC 276:40282-7) How do the Great Apes differ from us? • • • • • • Rare HIV progression to AIDS Resistant to malarial infection Menopause rare Coronary atherosclerosis rare Epithelial cancers rare Alzheimer’s disease pathology incomplete FOXP2 • A point mutation in FOXP2 co-segregates with a disorder in a family in which half of the members have impaired linguistic and grammatical abilities • Human FOXP2 contains missense mutations and a pattern of nucleotide polymorphism, which strongly suggest that this gene has been the target of selection during recent human evolution. Enard et al. Nature 418, 869 - 872 Figure 2 Silent and replacement nucleotide substitutions mapped on a phylogeny of primates. Bars represent nucleotide changes. P < 0.001 Grey bars indicate amino acid changes. Loss of Olfactory Receptor Genes Coincides with the Acquisition of Full Trichromatic Vision in Primates. PLoS Biol. 2004 Jan;2(1):E5. Epub 2004 Jan 20 Gilad et al. Figure 2. The Proportion of OR Pseudogenes in 20 Species Table 1. Biological processes showing the strongest evidence for positive selection. The top panel includes the categories showing the greatest acceleration in human lineage, and the bottom panel includes categories with the greatest acceleration in the chimp lineage. Clark et al. Inferring Nonneutral Evolution from Human-Chimp-Mouse Orthologous Gene Trios Science (2003) 302: 1960-1963 Biological process Number of PMW (human/Model PMW (chimp/Model genes* 2)* 2)* Categories showing the greatest acceleration in human lineage Olfaction Sensory perception Cell surface receptor—mediated signal transduction Chemosensory perception Nuclear transport G protein—mediated signaling Signal transduction Cell adhesion Ion transport Intracellular protein traffic Transport Metabolism of cyclic nucleotides Amino acid metabolism Cation transport Developmental processes Hearing 48 146 (98) 505 (464) 0 0 (0.026) 0 (0.0386) 0.9184 0.9691 (0.9079) 0.199 (0.0864) 54 (6) 26 252 (211) 1030 (989) 132 237 278 391 20 78 179 542 21 0 (0.1157) 0.0003 0.0003 (0.1205) 0.0004 (0.0255) 0.0136 0.0247 0.0257 0.0326 0.0408 0.0454 0.0458 0.0493 0.0494 0.9365 (0.7289) 0.2001 0.2526 (0.0773) 0.0276 (0.0092) 0.3718 0.8025 0.8099 0.7199 0.1324 0.0075 0.8486 0.2322 0.9634 Categories with the greatest acceleration in the chimp lineage Signal transduction Amino acid metabolism Amino acid transport Cell proliferation and differentiation Cell structure Oncogenesis Cell structure and motility Purine metabolism Skeletal development Mesoderm development Other oncogenesis DNA repair * 1030 (989) 78 23 82 174 201 239 35 44 168 39 49 0.0004 (0.0255) 0.0454 0.1015 0.3116 0.2633 0.3132 0.2208 0.9127 0.2876 0.5813 0.2777 0.9363 0.0276 (0.0092) 0.0075 0.0102 0.0182 0.0233 0.0267 0.0299 0.0423 0.0438 0.0439 0.0469 0.0477 The number of genes and the PMW values excluding olfactory receptor genes are shown in Table 2. Molecular functions showing the strongest evidence for positive selection. The table includes only human-accelerated categories, because the only categories accelerated in the chimp lineage are chaperones (P = 0.0124), cell adhesion molecules (P = 0.0220), and extracellular matrix (P = 0.0333). Molecular function Number of genes* G protein coupled receptor G protein modulator Receptor Ion channel Extracellular matrix Other G protein modulator Extracellular matrix glycoprotein 199 (153) 62 448 134 97 (95) 32 44 (42) 0 (0.2533) 0.0008 0.0030 0.0043 0.0120 (0.0178) 0.0149 0.0178 (0.0269) 0.8689 (0.6776) 0.3776 0.9798 0.8993 0.1482 (0.1593) 0.4441 0.1579 (0.1765) Voltage-gated ion channel Other hydrolase Oxygenase Protein kinase receptor Transporter Ligand-gated ion channel Microtubule binding motor protein Microtubule family cytoskeletal protein 62 95 46 37 214 45 22 0.0219 0.0260 0.0303 0.0314 0.0338 0.0405 0.0421 0.6692 0.4823 0.4792 0.6911 0.1836 0.9503 0.6385 54 0.0467 0.2815 * PMW (human/Model PMW (chimp/Model 2)* 2)* The number of genes and the PMW values excluding olfactory receptor genes are shown in parentheses. • “Smell, Hearing Genes Differ between Chimps and Humans” Genome News Network January 9 2004 • “The 2.5Gb mouse genome sequence reveals about 30,000 genes, with 99% having direct counterparts in humans.” Nature editorial 5 December 2002. Questions • • • • • • Are we ‘just’ E. coli, except more so? Not at all. Where do new genes come from? Old genes! Do all genes evolve at the same rate? No. Do all tissues & organs evolve at the same rate? No. Where do we fit in the tree of life? Primates! What specifies the differences between us and rodents, or us and chimps? Jury is out. Duplicates? • What specifies the elevated complexity of us versus other animals? Jury is out. • Can we understand sequence variation among humans? Not yet – Lon’s lecture? • How can gene function contribute to behaviour? Seen in rodents, but not yet in primates. Near Future Genome Sequencing Capacity (NHGRI) YEAR 7X genome (3 Gb) 1X genome (3 Gb) 2003 2.5 genomes 4.9 genomes 6.2 genomes 8.4 genomes 18 genomes 34 genomes 43 genomes 59 genomes 2004 2005 2006 Sampling the placental mammal phylogeny * * (Murphy et al. Science 2001 294: 2348-51 ) MRC Functional Genetics Unit, Oxford Leo Goodstadt Richard Emes Eitan Winter Steve Rice Scott Beatson Nick Dickens Caleb Webber Michael Elkaim Jose Duarte Zoe Birtle Tania Oh Ensembl (Ewan Briney, Michele Clamp, Abel Ureta-Vidal); Richard Copley (WTCHG, Oxford); Ziheng Yang (UCL); The Human, Mouse and Rat Genome Sequencing Consortia; UCSC Bibliography Human Genome Papers: Lander et al. Nature (2001) 409, 860-921 Venter et al. Science (2001) 291, 1304-1351. Mouse Genome Paper: Waterston et al. Nature (2002) 420, 520-62. Rat Genome Paper: submitted. Comparative genomics & evolutionary rates: Hardison et al. Genome Res. (2003) 13, 13-26. Adaptive evolution of genomes: Emes et al. Hum Mol Genet. (2003) 12, 701-9 Wolfe & Li Nat Genet. (2003) 33 Suppl: 255-65