From Reductionism Comes New Science: Protein Structure Data Reveals How Environmental Pressures Shape Evolution PHAR 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD PHAR 201 Lecture 08, 2012 1 Introduction • Previously we reviewed one system of reductionism – SCOP • SCOP is used to assign superfamilies and families to complete proteomes in another resource called SUPERFAMILY • Today we will see how this is used to do new science (Dupont et al PNAS 2007 103(47) 17822-17827; PNAS 2010 doi: 10.1073/pnas.0912491107 ) • We cast this new science in the context of the Gaia hypothesis PHAR 201 Lecture 08, 2012 2 The SCOP Hierarchy v1.75 Based on 38221 Structures 7 1195 1962 3902 110800 PHAR 201 Lecture 08, 2012 3 The Gaia Hypothesis Gaia (pronounced /'geɪ.ə/ or /'gaɪ.ə/) "land" or "earth", from the Greek Γαῖα; is a Greek goddess personifying the Earth Gaia - a complex entity involving the Earth's biosphere, atmosphere, oceans, and soil; the totality constituting a feedback system which seeks an optimal physical and chemical environment for life on this planet. James Lovelock PHAR 201 Lecture 08, 2012 4 We Show Some Support for the Gaia Hypothesis Emergent properties of an organism have been influenced by the environment These organisms in turn have influenced the environment PHAR 201 Lecture 08, 2012 5 Nature’s Reductionism There are ~ 20300 possible proteins >>>> all the atoms in the Universe 11.2M protein sequences from 10,854 species (source RefSeq) 38,221 protein structures yield 1195 domain folds (SCOP 1.75) PHAR 201 Lecture 08, 2012 6 What Does Nature’s Reductionism Tell Us? • The advent of a new fold is a big deal • From new folds come new function(s) • Are these new folds enough to distinguish “species”? PHAR 201 Lecture 08, 2012 7 To Answer this Question We Only Need to Make Use of Existing Resources • SCOP – Further catalogs Nature’s reductionism into structural domains, folds, families and superfamilies • SUPERFAMILY assigns the above to fully sequenced proteomes PHAR 201 Lecture 08, 2012 8 Method – Distance Determination Presence/Absence Data Matrix organisms (FSF) SCOP Distance Matrix SUPERFAMILY C. intestinalis C. briggsae F. rubripes a.1.1 1 1 1 a.1.2 1 1 1 a.10.1 0 0 1 a.100.1 1 1 1 a.101.1 0 0 0 a.102.1 0 1 1 a.102.2 1 1 1 C. intestinalis C. briggsae C. intestinalis C. briggsae F. rubripes 0 101 109 0 144 F. rubripes 0 PHAR 201 Lecture 08, 2012 9 The Answer Would Appear to be Yes • It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies within a given proteome Yang, Doolittle and Bourne 2005 PNAS 102(2): 373-378 PHAR 201 Lecture 08, 2012 10 Moreover… Distribution of among the three kingdoms as taken from SUPERFAMILY Eukaryota (650) • Superfamily distributions would seem to be related to the complexity of life 135 153/14 10 21/2 • Update of the work of Caetano-Anolles2 (2003) Genome Biology 13:1563 118 310/0 645/49 387 9/1 12 29/0 17 Archaea (416) 42 68/0 Bacteria (564) SCOP fold (765 total) Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8 Any genome / All genomes PHAR 201 Lecture 08, 2012 11 The Unique Superfamily in Archaea – d.17.6 • Archaeosine tRNAguanine transglycosylase (tgt), C2 domain • First step in the biosynthesis of an archaea-specific modified base, archaeosine (7formamidino-7deazaguanosine) • Found in tRNAs • At present found exclusively in Archaea. Reference: Interpro IPR004804 PHAR 201 Lecture 08, 2012 12 Let us Take This a Step Further Consider the Distribution of Disulfide Bonds among Folds • Disulphides are only stable under oxidizing conditions • Oxygen content gradually accumulated during the earth’s evolution • The divergence of the three kingdoms occurred 1.8-2.2 billion years ago • Oxygen began to accumulate ~ 2.0 billion years ago • Logical deduction – disulfides more prevalent in folds (organisms) that evolved later • This would seem to hold true Eukaryota 31.9% (43/135) 0% (0/10) 0% (0/2) 1 4.7% (18/387) 14.4% (17/118) 5.9% (1/17) Archaea 16.7% (7/42) Bacteria SCOP fold (708 total) • Can we take this further? PHAR 201 Lecture 08, 2012 13 Recap So Far • Structure is a useful tool to study evolution since it is conserved over longer periods of geological time • A course-grained characterization of structure, namely superfamily, distinguishes between species • There is a tantalizing suggestion that proteomes may contain imprints of their ancient environment PHAR 201 Lecture 08, 2012 14 Recap So Far • Structure is a useful tool to study evolution since it is conserved over longer periods of geological time • A course-grained characterization of structure, namely superfamily, distinguishes between species • There is a tantalizing suggestion that proteomes may contain imprints of their ancient environment PHAR 201 Lecture 08, 2012 15 Consider Changes in Metal Ion Concentrations Chris Dupont, Scripps Institute of Oceanography (now JCVI) Bioinformatics Final Exam 2004 Dupont, Yang, Palenik, Bourne. PNAS 2007 103(47) 17822-17827; PNAS 2010 doi: 10.1073/pnas.0912491107 PHAR 201 Lecture 08, 2012 16 Evolution of the Earth • • • • • 4.5 billion years of change 300+50K 1-5 atmospheres Constant photoenergy Chemical and geological changes • Life has evolved in this time • The ocean was the “cradle” for 90% of evolution PHAR 201 Lecture 08, 2012 17 Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History Bacteria Archaea Eukarya 1 Oxygen 0 1.00E-08 Zinc 1.00E-12 1.00E-16 1.00E-20 1.00E-06 Iron 1.00E-09 1.00E-12 1.00E-15 1.00E-07 Cobalt Manganese 1.00E-09 1.00E-11 4.5 4 3.5 3 2.5 2 1.5 1 0.5 Billions of years before present 0 Concentration (O2 in arbitrary units, Zn and Fe in moles L-1 0.5 • Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines). • The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom. Replotted from Saito et al, 2003 Inorganica Chimica Acta 356: 308-318 PHAR 201 Lecture 08, 2012 18 Making the Metallome of Each Species – Can Only be Done from Structure 1. 2. 3. 4. 5. 6. 7. Start with SCOP Each {super}family level assignment was checked manually for metal binding All the structures representing the family had to bind the metal for it to be considered unambiguous The literature was consulted to resolve ambiguities Superfamily database used to map to proteomes 23 Archaea, 233 Bacteria, 57 Eukaryota Cu, Ni, Mo ignored (<0.3%) of proteome PHAR 201 Lecture 08, 2012 19 Levels of Ambiguity • Ambiguous superfamily binds different metals or have members that are not known to bind metals • Ditto families • Approx 50% of superfamilies and 10% of families are ambiguous • Only unambiguous families used in this study PHAR 201 Lecture 08, 2012 20 Superfamily Distribution As Well As Overall Content Has Changed Bacteria Fe superfamilies a.1.1 a.1.2 a.1.1 a.1.2 a.104.1 a.110.1 a.104.1 a.110.1 a.119.1 a.138.1 a.119.1 a.138.1 a.2.11 a.24.3 a.2.11 a.24.3 a.24.4 a.25.1 a.24.4 a.25.1 a.3.1 a.39.3 a.3.1 a.39.3 a.56.1 a.93.1 a.56.1 a.93.1 b.1.13 b.2.6 b.1.13 b.2.6 b.3.6 b.33.1 b.3.6 b.33.1 b.70.2 b.82.2 b.70.2 b.82.2 c.56.6 c.83.1 c.56.6 c.83.1 c.96.1 d.134.1 c.96.1 d.134.1 d.15.4 d.174.1 d.15.4 d.174.1 d.178.1 d.35.1 d.178.1 d.35.1 d.44.1 d.58.1 d.44.1 d.58.1 e.18.1 e.19.1 e.18.1 e.19.1 e.26.1 e.5.1 e.26.1 e.5.1 f.21.1 f.21.2 f.21.1 f.21.2 f.24.1 f.26.1 f.24.1 f.26.1 g.35.1 g.36.1 g.35.1 g.36.1 Eukaryotic Fe superfamilies g.41.5 PHAR 201 Lecture 08, 2012 g.41.5 21 14 100 90 80 70 60 50 40 30 20 10 0 12 10 8 6 4 2 0 Unique Fe-binding fold families (108 total) (♦)Average copy number (x) Percent of Bacterial proteomes which a fold family occurs in Metallomes are Very Diverse (Discriminatory) • A quantile plot showing the percent of Bacterial proteomes each Fe-binding fold family occurs in (x). • This plot also shows the average copy number of that fold family in the proteomes where it occurs (♦). • Few Fe-binding folds are in most proteomes. • Widespread Fe-binding folds are not necessarily abundant. • Similar trends are observed for Zn, Mn, and Co in all three Superkingdoms. PHAR 201 Lecture 08, 2012 22 2 A 102.5 Slope of fitted power law Total Zn-binding domains in a proteome 10 10 4 Metal Binding Proteins are Not Consistent Across Superkingdoms Total domains in a proteome 105 B Archaea Bacteria Eukarya Zn Fe Mn 1 0 Co Since these data are derived from current species they are independent of evolutionary events such as duplication, gene loss, horizontal transfer and endosymbiosis PHAR 201 Lecture 08, 2012 23 Power Laws: Fundamental Constants in the Evolution of Proteomes A slope of 1 indicates that a group of structural domains is in equilibrium with genome growth, while a slope > 1 indicates that the group of domains is being preferentially duplicated (or retained in the case of genome reductions). van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.). 201 Lecture 08, 2012 Power laws, scale-free networks, PHAR and genome biology 24 2 A 102.5 Slope of fitted power law Total Zn-binding domains in a proteome 10 10 4 Metal Binding Proteins are Not Consistent Across Superkingdoms Total domains in a proteome 105 B Archaea Bacteria Eukarya Zn Fe Mn 1 0 PHAR 201 Lecture 08, 2012 Co 25 Why are the Power Laws Different for Each Superkingdom? • Power laws are likely influenced by selective pressure. Qualitatively, the differences in the power law slopes describing Eukarya and Prokarya are correlated to the shifts in trace metal geochemistry that occur with the rise in oceanic oxygen • We hypothesize that proteomes contain an imprint of the environment at the time of the last common ancestor in each Superkingdom • This suggests that Eukarya evolved in an oxic environment, whereas the Prokarya evolved in anoxic environments PHAR 201 Lecture 08, 2012 26 Do the Metallomes Contain Further Support for this Hypothesis? Superkingdom Eukarya Archaea Bacteria Fold Family Cytochrome P450 Cytochrome c3-like Cytochrome b5 Purple acid phosphatase Penicillin synthase-like Hypoxia-inducible factor Di-heme elbow motif 4Fe-4S ferredoxins MoCo biosynthesis proteins Heme-binding PAS domain HemN a helical ferrodoxin biotin synthase ROO N-terminal domain-like High potential iron protein Heme-binding PAS domain MoCo biosynthesis proteins HemN 4Fe-4S ferredoxins cytochrome c a helical ferrodoxin % 0.44 + 0.48 0.13 + 0.3 0.12 + 0.09 0.11 + 0.08 0.07 + 0.1 0.07 + 0.04 0.06 + 0.01 1.80 + 0.7 1.60 + 0.3 1.10 + 1.0 0.80 + 0.20 0.60 + 0.16 0.55 + 0.1 0.5 + 0.1 0.38 + 0.25 0.3 + 0.4 0.21 + 0.15 0.2 + 0.15 0.2 + 0.2 0.14 + 0.2 0.12 + 0.09 Fe-binding heme heme heme amino amino amino heme Fe-S Fe-S heme Fe-S Fe-S Fe-S amino Fe-S heme Fe-S Fe-S Fe-S heme Fe-S O2 yes no no no yes yes no no no no 1 no no 2 no 1 no no no no no Overall percent of Fe bound by Fe-S heme amino 21 + 9 47 + 19 32 + 12 68 + 12 13 + 14 19 + 6 47 + 11 22 + 12 31 + 16 1. Some, but not all, PAS domains actually sense oxygen 2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway PHAR 201 Lecture 08, 2012 27 e- Transfer Proteins Same Broad Function, Same Metal, Different Chemistry Induced by the Environment? Fe-S clusters Cytochromes Fe bound by S Fe bound by heme (and amino-acids) Cluster held in place by Cys Generally negative reduction potentials Generally positive reduction potentials Less susceptible to oxidation Very susceptible to oxidation PHAR 201 Lecture 08, 2012 28 The importance of “small class” Zn folds to Eukarya Total “small class” Zn binding domains 10000 B A Eukarya 30/53 18/28 1000 5/53 0/28 100 Bacteria 0/53 0/28 10 7/53 0/28 0/53 0/28 11/53 9/28 Archaea 0/53 1/28 1 100 1000 10000 100000 Distribution of 53 unique small class Zn families Total number of domains in a proteomes Bacteria Archaea Eukarya 1 Oxygen 0 1.00E-08 Zinc 1.00E-12 PHAR 201 Lecture 08, 2012 1.00E-16 1.00E-20 1.00E-06 Iron Concentration (O2 in arbitrary units, Zn and 0.5 29 Hypothesis • Emergence of cyanobacteria changed oxygen concentrations • Impacted metal concentrations in the ocean • Organisms used new metals in new ways to evolve new biological processes eg complex signaling • This in turn further impacted the environment PHAR 201 Lecture 08, 2012 30 A Final Thought Perhaps We Should Study Both the Life Sciences and Earth Sciences Together? PHAR 201 Lecture 08, 2012 31