Evolutionary Insights from Protein Structure Philip E. Bourne University of California San Diego pbourne@ucsd.edu Support Open Access – All the work here does Dalhousie December 2007 1 Agenda • • • • • Why is protein structure useful? Tree construction using protein structure One protein superfamily in more detail Environmental influence On-going work – The role of calcium over time – Applying structural domain combinations – Co-evolution of kinases and phosphatases Dalhousie December 2007 2 Phosphoinositide-3 Kinase (D) and Actin-Fragmin Kinase (E) PKA ChaK (“Channel Kinase”) Why is protein structure useful? Dalhousie December 2007 3 The Key is Nature’s Reductionism There are ~ 20300 possible proteins >>>> all the atoms in the Universe ~6.7M protein sequences from 4734 species (source RefSeq) 34,494 protein structures yield 1086 folds (SCOP 1.73) Why is protein structure useful? Dalhousie December 2007 4 It follows that structure is more conserved than sequence Hence, structure comparison reveals relationships not detectable from sequence alone Stated another way, structure offers the opportunity to look at more distant evolutionary relationships Why is protein structure useful? Dalhousie December 2007 5 Potential Problems in Using Structure on a Proteomic Scale • Is structural space well enough populated? • Is proteome coverage by structure with current detection methods enough? • Currently 50-70% Why is protein structure useful? Dalhousie December 2007 6 Initial Bold Question: With this level of coverage and assuming we know a high percentage of all folds, is structure useful in discriminating species? Tree Construction Using Protein Structure Dalhousie December 2007 7 Russ Doolittle, Professor Center for Molecular Genetics UCSD Song Yang Former Graduate Student Department of Chemistry and Biochemistry UCSD Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8 Tree Construction Using Protein Structure Dalhousie December 2007 8 To Answer this Question We Only Need to Make Use of Existing Resources • SCOP – Further catalogs Nature’s reductionism into structural domains, folds, families and superfamilies • SUPERFAMILY assigns the above to fully sequenced proteomes Tree Construction Using Protein Structure Dalhousie December 2007 9 Use of SCOP Superfamilies Using structure, how do you distinguish convergent versus divergent evolution? The SCOP notion of SUPERFAMILY with evidence of weak sequence relationships can be used to discount convergence. Tree Construction Using Protein Structure Dalhousie December 2007 10 Structural Organization SCOP v1.73 7 1086 1777 3464 97178 Tree Construction Using Protein Structure Dalhousie December 2007 11 Is Structure a Useful Discriminator Maybe… Distribution among the three kingdoms as taken from SUPERFAMILY Eukaryota (650) 135 153/14 • Superfamily distributions would seem to be related to the complexity of life 10 21/2 645/49 387 9/1 12 • Update of the work of Caetano-Anolles2 (2003) Genome Biology 13:1563 118 310/0 29/0 17 42 68/0 Archaea (416) Bacteria (564) SCOP fold (765 total) Any genome / All genomes Tree Construction Using Protein Structure Dalhousie December 2007 12 The Unique Superfamily in Archaea – d.17.6 • Archaeosine tRNAguanine transglycosylase (tgt), C2 domain • First step in the biosynthesis of an archaea-specific modified base, archaeosine (7formamidino-7deazaguanosine) • Found in tRNAs • Was found exclusively in Archaea. Tree Construction Using Protein Structure Dalhousie December 2007 Reference: Interpro IPR004804 13 Method – Distance Determination Presence/Absence Data Matrix organisms (FSF) SCOP Distance Matrix SUPERFAMILY C. intestinalis C. briggsae F. rubripes a.1.1 1 1 1 a.1.2 1 1 1 a.10.1 0 0 1 a.100.1 1 1 1 a.101.1 0 0 0 a.102.1 0 1 1 a.102.2 1 1 1 C. intestinalis C. briggsae C. intestinalis C. briggsae F. rubripes 0 101 109 0 144 F. rubripes Tree Construction Using Protein Structure 0 Dalhousie December 2007 14 Is Structure a Useful Discriminator - Yes Archaea Bacteria Eukaryota The method cleanly placed all species in their correct superkingdoms Tree Construction Using Protein Structure Dalhousie December 2007 15 Presence/absence vs. Abundance • Abundance fails to distinctly separate the three superkingdoms • Presence/absence succeeds in distinctly separating the three superkingdoms • Why? – – – – – – Emergence or loss of a FSF is a major evolutionary event Emergence of a new FSF may lead to 1-n new functions Gene loss likely; FSF less likely Horizontal gene transfer only relevant if it introduces a FSF Not affected by gene duplication Coverage and sensitivity while not perfect is enough Tree Construction Using Protein Structure Dalhousie December 2007 16 Trees of Archaea Our NCBI Crenarchaeota Pyrococcus furiosus Pyrococcus horikoshii Pyrococcus Pyrococcus abyssi Thermoplasma volcanium 15 Sulfolobus tokodaii 14 Sulfolobus solfataricus 11 Pyrobaculum aerophilum 2 Aeropyrum pernix 13 Pyrococcus furiosus Halobacterium sp. NRC-1 12 Pyrococcus horikoshii Sulfolobus tokodaii 10 Pyrococcus abyssi 17 Thermoplasma volcanium Thermoplasma acidophilum Sulfolobus solfataricus Pyrobaculum aerophilum Thermoplasma Crenarchaeota 16 Thermoplasma acidophilum Aeropyrum pernix 3 Halobacterium sp. NRC-1 Methanosarcina mazei 9 Methanosarcina mazei Methanosarcina acetivorans 4 Methanosarcina acetivorans Archaeoglobus fulgidus 6 Methanocaldococcus jannaschii Methanopyrus kandleri 1 Archaeoglobus fulgidus Methanocaldococcus jannaschii 7 Methanopyrus kandleri Methanobacterium thermoautotrophicum 8 Methanobacterium thermoautotrophicum Methanothermobacter thermautotrophicus 5 Methanothermobacter thermautotrophicus Methanogen Euryarchaeota Tree Construction Using Protein Structure Dalhousie December 2007 17 Clostridiales Our Tree of Bacteria Bacilli Deinococcus • 123 Bacteria • Parasitic bacteria are not grouped with their full gene complement counterparts • They are sorted into proper groupings that mirror the overall tree • A few anomalies Tree Construction Using Protein Structure Dalhousie December 2007 Actinobacteria Bacilli Planctomycetacia Spirochaetes βγ-proteobacteria α-proteobacteria Thermotogae Fusobacteria Bacteroidetes Cyanobacteria Chlorobia ε-proteobacteria Aquificales Mollicutes – Parasitic Firmicutes Parasitic Spirochaetes Parasitic α-proteobacteria Parasitic γ-proteobacteria Parasitic Actinobacteria Chlamydiae 18 Eukaryotes – Anomalies May Point to Genome Problems Frog genome appears contaminated with bacterial genes Tree Construction Using Protein Structure Dalhousie December 2007 19 A Closer Look at One Superfamily: The Protein Kinase-Like Superfamily Eric Scheeff Scheeff & Bourne 2005 PLoS Comp. Biol. 1(5): e49 A Closer Look at One Superfamily Dalhousie December 2007 20 The Protein Kinase-like Superfamily • A large family important to signal transduction in eukaryotes and many bacteria. • Phosphotransferases: transfer phosphate group from ATP to Ser/Thr or Tyr residue on target protein, producing a range of downstream signaling effects. • PKA: an example of a typical protein kinase (TPK) fold, shown in “open book” format A Closer Look at One Superfamily Dalhousie December 2007 21 The Protein Kinase-Like Superfamily • A range of different families, all phosphotransferases • A variety of different targets • All possess a core cassette of elements shared with the TPKs: Family Structural Representative Phosphorylates Biological result Typical Protein Kinases (TPKs) Protein Kinase A (PKA) Ser/Thr or Tyr residues of proteins Range of signaling effects Alpha kinases Channel Kinase (ChaK) Ser/Thr residues in alpha-helices Range of signaling effects Actin-Fragmin Kinase (AFK) Actin-Fragmin Kinase (AFK) Thr residue of actin Control of actin polymerization Phosphatidyl -inositol 3- and 4kinases Phosphatidylinositol 3-kinase (PI3K) Phosphatidylinositol (PI), PIphosphates, PIbisphosphates Range of secondmessenger signaling effects Phosphatidylinositol phosphate kinases Phosphatidylinositol phosphate kinase (PIPK) PI-phosphates Range of secondmessenger signaling effects Choline/ ethanolamine kinases Choline Kinase (CK) Choline Part of pathway that eventually produces phoshpatidylcholine, important constituent of membranes Aminoglycoside Kinases Aminoglycoside Kinases (AK) Aminoglycoside antibiotics Antibiotic resistance • ATP binding • Catalysis • Structures can be highly variable, particularly in the substrate binding regions A Closer Look at One Superfamily Dalhousie December 2007 22 Method • Begin with a multiple structure alignment using CE-MC (NAR 2004) of 30 “comparable” TPKs and APKs and manually correct in a pair-wise manner over a period of 1-2 person years • Review the literature on each structure • Review the associated sequence alignments derived from structure A Closer Look at One Superfamily Dalhousie December 2007 23 Phosphoinositide-3 Kinase (D) and Actin-Fragmin Kinase (E) PKA ChaK (“Channel Kinase”) A Closer Look at One Superfamily Dalhousie December 2007 24 Can We Propose an Evolutionary History for the Protein Kinase-Like Superfamily? • Bayesian inference of phylogeny (MrBayes) • Manual structure alignment produces very high-quality sequence alignment of diverse homologues • But, sequence information too degraded to produce branching with sufficient support (i.e. a high posterior probability) • Addition of a matrix of structural characteristics (similar to morphological characteristics) produces a well supported combined model 1 2 3 4 5 Example columns: 1BO1 Atypical 0 0 0 0 1 1IA9 Atypical 1 1 1 1 0 1) Ion pair analogous to K72-E91 in PKA 1E8X Atypical 1 0 1 1 1 2) α-Helix B present 3) State of α-Helix C (0: kinked, 1: straight) 4) State of Strand 4 (0: kinked, 1: straight) 5) α-Helix D present • Neither sequence structural characteristics sufficient to alone produce resolved tree, must be used in combination. A Closer Look at One Superfamily Dalhousie December 2007 1CJA Atypical 1 0 1 1 1 1NW1 Atypical 1 0 1 0 0 1J7U Atypical 1 0 1 0 1 1CDK AGC 1 1 1 0 1 1O6L AGC 1 1 1 0 1 1OMW AGC 1 1 1 0 1 1H1W AGC 1 1 1 0 1 1MUO Other 1 1 1 0 1 1TKI CAMK 1 0 1 0 1 1JKL CAMK 1 0 1 0 1 1A06 CAMK 1 0 1 0 1 1PHK CAMK 1 0 1 0 1 1KWP CAMK 1 0 1 0 1 1IA8 CAMK 1 0 1 0 0 1GNG CMGC 1 0 1 0 1 1HCK CMGC 1 0 1 0 1 1JNK CMGC 1 0 1 0 1 1HOW CMGC 1 0 1 0 1 1LP4 Other 1 0 1 0 1 1F3M STE 1 0 1 0 1 1O6Y Other 1 0 1 0 1 1CSN CK1 1 0 1 0 1 1B6C TKL 1 0 1 0 1 2SRC TK 1 0 1 0 1 1LUF TK 1 0 1 0 1 1IR3 TK 1 0 1 0 1 1M14 TK 1 0 1 0 1 1GJO TK 1 0 1 0 1 25 Proposed Evolutionary History for the Protein Kinase-Like Superfamily • Suggests distinctive history for atypical kinases, as opposed to intermittent divergence from the typical protein kinases (TPKs) APH AGC CK • TPK portion of tree shows high degree of agreement with Manning tree • Branching is supported by species representation of kinase families CAMK 0.64 AFK 0.97 CMGC 1.0 0.85 0.78 TKL PI3K CK1 TK •Atypical kinase families: Blue A Closer Look at One Superfamily PIPKIIβ Dalhousie ChaKDecember 2007 •Typical protein kinase groups (subfamilies): Red •Branch labels: posterior probability of branch 26 Has the Environment had an Influence on Modern Day Proteomes? Chris Dupont Scripps Institute of Oceanography UCSD Dupont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827 Environmental Influence Dalhousie December 2007 27 Consider the Distribution of Disulfide Bonds among Folds • Disulphides are only stable under oxidizing conditions • Oxygen content gradually accumulated during the earth’s evolution • The divergence of the three kingdoms occurred 1.8-2.2 billion years ago • Oxygen began to accumulate ~ 2.0 billion years ago • Logical deduction – disulfides more prevalent in folds (organisms) that evolved later • This would seem to hold true Eukaryota 31.9% (43/135) 0% (0/10) 0% (0/2) 1 4.7% (18/387) 14.4% (17/118) 5.9% (1/17) Archaea 16.7% (7/42) Bacteria SCOP fold (708 total) • Can we take this further? Environmental Influence Dalhousie December 2007 28 Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History Bacteria Archaea Eukarya 1 Oxygen 0 1.00E-08 Zinc 1.00E-12 1.00E-16 1.00E-20 1.00E-06 Iron 1.00E-09 1.00E-12 1.00E-15 1.00E-07 Cobalt Manganese 1.00E-09 1.00E-11 4.5 4 3.5 3 2.5 2 1.5 1 0.5 Billions of years before present 0 Concentration (O2 in arbitrary units, Zn and Fe in moles L-1 0.5 • Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines). • The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom. Replotted from Saito et al, 2003 Inorganica Chimica Acta 356: 308-318 Environmental Influence Dalhousie December 2007 29 Making the Metallome of Each Species – Can Only be Done from Structure 1. 2. 3. 4. 5. 6. 7. Start with SCOP Each {super}family level assignment was checked manually for metal binding All the structures representing the family had to bind the metal for it to be considered unambiguous The literature was consulted to resolve ambiguities Superfamily database used to map to proteomes 23 Archaea, 233 Bacteria, 57 Eukaryota Cu, Ni, Mo ignored (<0.3%) of proteome Dalhousie December 2007 Environmental Influence 30 Levels of Ambiguity • Ambiguous superfamily binds different metals or have members that are not known to bind metals • Ditto families • Approx 50% of superfamilies and 10% of families are ambiguous • Only unambiguous families used in this study Environmental Influence Dalhousie December 2007 31 Superfamily Distribution As Well As Overall Content Has Changed Bacteria Fe superfamilies a.1.1 a.1.2 a.1.1 a.1.2 a.104.1 a.110.1 a.104.1 a.110.1 a.119.1 a.138.1 a.119.1 a.138.1 a.2.11 a.24.3 a.2.11 a.24.3 a.24.4 a.25.1 a.24.4 a.25.1 a.3.1 a.39.3 a.3.1 a.39.3 a.56.1 a.93.1 a.56.1 a.93.1 b.1.13 b.2.6 b.1.13 b.2.6 b.3.6 b.33.1 b.3.6 b.33.1 b.70.2 b.82.2 b.70.2 b.82.2 c.56.6 c.83.1 c.56.6 c.83.1 c.96.1 d.134.1 c.96.1 d.134.1 d.15.4 d.174.1 d.15.4 d.174.1 d.178.1 d.35.1 d.178.1 d.35.1 d.44.1 d.58.1 d.44.1 d.58.1 e.18.1 e.19.1 e.18.1 e.19.1 e.26.1 e.5.1 e.26.1 e.5.1 f.21.1 f.21.2 f.21.1 f.21.2 f.24.1 f.26.1 f.24.1 f.26.1 g.35.1 g.36.1 g.35.1 g.36.1 Eukaryotic Fe superfamilies g.41.5 Environmental Influence Dalhousie December 2007 g.41.5 32 14 100 90 80 70 60 50 40 30 20 10 0 12 10 8 6 4 2 0 Unique Fe-binding fold families (108 total) Environmental Influence (♦)Average copy number (x) Percent of Bacterial proteomes which a fold family occurs in Metallomes are Discriminatory • A quantile plot showing the percent of Bacterial proteomes each Fe-binding fold family occurs in (x). • This plot also shows the average copy number of that fold family in the proteomes where it occurs (♦). • Few Fe-binding folds are in most proteomes. • Widespread Fe-binding folds are not necessarily abundant. • Similar trends are observed for Zn, Mn, and Co in all three Superkingdoms. Dalhousie December 2007 33 2 A 102.5 Slope of fitted power law Total Zn-binding domains in a proteome 10 10 4 Metal Binding Proteins are Not Consistent Across Superkingdoms Total domains in a proteome 105 B Archaea Bacteria Eukarya Zn Fe Mn 1 0 Co Since these data are derived from current species they are independent of evolutionary events such as duplication, gene loss, horizontal transfer and endosymbiosis Environmental Influence Dalhousie December 2007 34 Power Laws: Fundamental Constants in the Evolution of Proteomes A slope of 1 indicates that a group of structural domains is in equilibrium with genome growth, while a slope > 1 indicates that the group of domains is being preferentially duplicated (or retained in the case of genome reductions). van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.). Power laws, scale-free networks, and genome biology Environmental Influence Dalhousie December 2007 35 2 A 102.5 Slope of fitted power law Total Zn-binding domains in a proteome 10 10 4 Metal Binding Proteins are Not Consistent Across Superkingdoms Total domains in a proteome Environmental Influence 105 B Archaea Bacteria Eukarya Zn Fe Mn 1 0 Dalhousie December 2007 Co 36 Why are the Power Laws Different for Each Superkingdom? • Power laws are likely influenced by selective pressure. Qualitatively, the differences in the power law slopes describing Eukarya and Prokarya are correlated to the shifts in trace metal geochemistry that occur with the rise in oceanic oxygen • We hypothesize that proteomes contain an imprint of the environment at the time of the last common ancestor in each Superkingdom Environmental Influence Dalhousie December 2007 37 Do the Metallomes Contain Further Support for this Hypothesis? Superkingdom Eukarya Archaea Bacteria Fold Family Cytochrome P450 Cytochrome c3-like Cytochrome b5 Purple acid phosphatase Penicillin synthase-like Hypoxia-inducible factor Di-heme elbow motif 4Fe-4S ferredoxins MoCo biosynthesis proteins Heme-binding PAS domain HemN a helical ferrodoxin biotin synthase ROO N-terminal domain-like High potential iron protein Heme-binding PAS domain MoCo biosynthesis proteins HemN 4Fe-4S ferredoxins cytochrome c a helical ferrodoxin % 0.44 + 0.48 0.13 + 0.3 0.12 + 0.09 0.11 + 0.08 0.07 + 0.1 0.07 + 0.04 0.06 + 0.01 1.80 + 0.7 1.60 + 0.3 1.10 + 1.0 0.80 + 0.20 0.60 + 0.16 0.55 + 0.1 0.5 + 0.1 0.38 + 0.25 0.3 + 0.4 0.21 + 0.15 0.2 + 0.15 0.2 + 0.2 0.14 + 0.2 0.12 + 0.09 Fe-binding heme heme heme amino amino amino heme Fe-S Fe-S heme Fe-S Fe-S Fe-S amino Fe-S heme Fe-S Fe-S Fe-S heme Fe-S O2 yes no no no yes yes no no no no 1 no no 2 no 1 no no no no no Overall percent of Fe bound by Fe-S heme amino 21 + 9 47 + 19 32 + 12 68 + 12 13 + 14 19 + 6 47 + 11 22 + 12 31 + 16 1. Some, but not all, PAS domains actually sense oxygen 2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway Dalhousie December 2007 Environmental Influence 38 e- Transfer Proteins Same Broad Function, Same Metal, Different Chemistry Induced by the Environment? Fe-S clusters Cytochromes Fe bound by S Fe bound by heme (and amino-acids) Cluster held in place by Cys Generally negative reduction potentials Generally positive reduction potentials Less susceptible to oxidation Very susceptible to oxidation Environmental Influence Dalhousie December 2007 39 Agenda • • • • • Why is protein structure useful? Tree construction using protein structure One protein superfamily in more detail Environmental Influence On-going work – The role of calcium over time – Applying structural domain combinations – Co-evolution of kinases and phosphatases Dalhousie December 2007 40 The Role of Calcium • Calcium concentrations have not fluctuated over evolutionary time scales to the same degree as iron and zinc • Low diffusion rate and rapid kinetics • Calcium important for maintaining cell structure • Calcium became a very important signaling molecule in multi-cellular organisms The Role of Calcium Dalhousie December 2007 41 Calcium – Positive Selection Across All Superkingdoms Large number of arylsulfatases Figure 1. Power law scaling for calcium binding domains. The abundances of Ca binding domains in Archaea, Bacteria and Eukaryotes are plotted against the total number of structural domains in a proteome. The powerlaw equations and R2 value, which describe the slope of the line and the quality of the power law fit respectively, are included next to the corresponding line label. The circled point represents Rhodopirellula baltica. The Role of Calcium Dalhousie December 2007 42 Calcium – Uni vs. Multi Cellular Figure 4. Diversity plot of calcium binding proteins across the three domains of life and between Unicellular (Uni) and Multcellular (Multi) Eukaryotes. The x-axis is unlabelled as the FF represented by each tick mark changes depending on the Superkingdom The Role of Calcium Dalhousie December 2007 43 Structural Domain Combinations • Definition – Compact, spatially distinct – Fold in isolation – Recurrence • Importance – Understand the structure and function of the whole protein Structural Domain Combinations Dalhousie December 2007 44 Domain Trees Might Provide Insights into Horizontal Gene Transfer Chlamydiales Alveolata Rhodophyta Cyanobacteria Metazoa Actinobacteria Exists only in Cyanobacteria Exists in only one red algae in Eukaryotes a.1.1.3: phycocyanin-like phycobilisome proteins A light harvesting antennae of photosystem II Structural Domain Combinations Dalhousie December 2007 45 Protein Kinases and Phosphatases • Protein kinases and phosphatases are components of numerous signal transduction pathways • They are responsible for regulating many cellular processes • Implicated in many cancers and diseases • Comprise a significant portion of genomes – At least 518 protein kinase genes – At least 107 protein tyrosine phosphatase genes • Alonso et al. Cell. 2004 Jun 11;117(6):699-711 Co-evolution – Kinases and Phosphatases Manning, et al. (2002) Science 298:1912-1934 Dalhousie December 2007 46 Example: ADF/Cofilin • The Cofilin/ADF (actin depolymerizing factor) family remodels the actin filaments of the cytoskeleton • They sever actin filaments and increase the rate that monomers leave the filament’s pointed end • Cofilin/ADF proteins are phosphorylated at a conserved N-terminal serine (Ser3) • When phosphorylated, cofilin/ADF is unable to bind actin, and is thus inactive • When dephosphorylated, cofilin/ADF can bind and depolymerize actin Co-evolution – Kinases and Phosphatases Dalhousie December 2007 47 Phosphorylation and Dephosphorylation of ADF/Cofilin • Two serine/threonine kinase families can phosphorylate (deactivate) ADF/cofilin – LIMK – TESK • Two phosphatase families have been identified that dephosphorylate ADF/Cofilin – Slingshot (SSH) phosphatases – Chronophin (CIN) Co-evolution – Kinases and Phosphatases Dalhousie December 2007 48 Coordinated Divergence • Slingshot phosphatase and TESK and LIMK protein kinase families appear to have emerged at same point in eukaryotic tree • They also underwent an apparent gene duplication at the same time (after Ciona divergence) • Can point of divergence be more accurately pinpointed as more organisms are sequenced? Emergence Gene Duplication Co-evolution – Kinases and Phosphatases Dalhousie December 2007 49 Parting Comments • Structure plays a useful role at various levels of detail in the study of evolution • Much of the data used here are sitting on the Web for anyone to apply • Perhaps we should do more to train students in both the life sciences and the earth sciences? Dalhousie December 2007 50 Parting Comments • The reductionism used here seems useful, but there is a growing sense that protein structure represents more of a continuum – perhaps composed of unique fragments at the sub-fold level – The Russian Doll effect • Evidence is growing that proteins from different superfamilies may share a functional site but nothing else – does this speak to a very distant evolutionary relationship? Dalhousie December 2007 51 Acknowledgements • Kristine Briedis • Andrew Butcher • Russ Doolittle • Chris Dupont • Eric Scheeff • Song Yang •The Whole Group • NSF & NIH Support Open Access – All the work here does Dalhousie December 2007 52 Backpocket Dalhousie December 2007 53 The importance of “small class” Zn folds to Eukarya Total “small class” Zn binding domains 10000 B A Eukarya 30/53 18/28 1000 5/53 0/28 100 Bacteria 0/53 0/28 10 7/53 0/28 0/53 0/28 11/53 9/28 Archaea 0/53 1/28 1 100 1000 10000 100000 Distribution of 53 unique small class Zn families Total number of domains in a proteomes Bacteria Archaea Eukarya 1 Oxygen 0 1.00E-08 Zinc 1.00E-12 Dalhousie December 2007 Chapter 4 Environmental Influence 1.00E-16 1.00E-20 1.00E-06 Iron Concentration (O2 in arbitrary units, Zn and 0.5 54 Conclusions • • • • Metallomes have diverse compositions, yet the total abundances conform to evolutionary constants These constants exhibit Superkingdom-specific differences consistent with ancient changes in geochemistry, a hypothesis further supported by the roles of Zn and Fe These results provide genomic-based evidence for the theory of Anbar and Knoll that Eukaryotic diversification and oxygen-related changes in trace metal chemistry are linked Prokaryotes likely diverged in anoxic environments, while Eukaryotes diverged in oxic environments (supported by the fossil records) Dalhousie December 2007 55 Possible Flaws in the Argument Proteome Coverage: Currently only 40% of Eukaryotes and 55% of Prokaryotes are covered by structural families – Estimate that 90% of the unannotated space is covered by existing families Dalhousie December 2007 56 Possible Flaws in the Argument Genome Bias – there is a disproportionate number of thermophiles among Archaea, whereas the Eukaryotes are almost entirely aerobic Bacteria have a better distribution The dataset does include the Eukaryotic anaerobic amitochondritic parasite Encephalitozoon cuniculi, which has metallomic features typical of aerobic Eukaryotes Principal component analysis shows oxygen tolerance and environment have little effect upon the trends observed. Phylogeny groupings are apparent however (suggests vertical inheritance) Dalhousie December 2007 57 Possible Flaws in the Argument • Zn concentrations are associated solely with increased complexity – not the environment – Eukaryotes of varying complexity follow the same power law – Zn finger abundance not consistent with complexity – 3 Zn superfamilies found in Prokaryotes and Eukaryotes are more abundant across all Eukaryotes Dalhousie December 2007 58 Manual Annotation of SCOP (1.68) Superfamilies and Families • 281 of the 1495 superfamilies have at least one metal associated structure at the domain level • ~50% of the 281 metal associated superfamilies are ambiguous; ~10% of the families • Zn associated superfamilies are the most prevalent, followed by Fe, Cu, Mn, Co= Mo = Ni Dalhousie December 2007 59 Dupont, Briedis, Yang, Palenik, Bourne 2005 In preparation. Thioredoxin FSF domains 1000 100 10 Bacteria Archea Eukaryotes 1 100 1000 10000 100000 Total domains • Follows an orderly progression through evolution - domain duplication events remain proportional to genome size • Occasionally follow power law distribution • Rough estimates of domain abundance e.g., thioredoxins = ~1% of “global” proteome Dalhousie December 2007 60 All Fe-S FSF domains 1000 Bacteria y = 4E-05x1.8193 2 R = 0.6911 100 Archaea Eukaryotes 10 y = 0.0082x - 2.4099 2 R = 0.6004 Archaea (1-2% of the proteome) Bacteria (.7-.8%) Eukaryotes (0.01-.05%) 1 Cytochrome c 100 Cytochrome c evolved after Bacteria/Archaea split 10 1 cytochrome p450 1000 Total domains Proliferation of cytP450 in Eukaryotes 100 10 1 1000 10000 total domains 100000 Dalhousie December 2007 61 Case study II: Fe vs. Zn • From 4Mya to the present: – Fe concentrations in the ocean have fallen 10,000 fold – Zn concentrations have risen 10,000,000 fold Dalhousie December 2007 62 Fe Binding y = 0.0002x1.6711 R2 = 0.7846 1000 Fe domains • 2-3% of Bacteria and Archaea proteomes are Fe-binding y = 0.0805x0.7764 R2 = 0.6667 y = 0.0001x1.6317 R2 = 0.6998 100 Bacteria Archaea Eukaryotes 10 • 0.5-1.5% of Eukaryota 1 1000 10000 100000 total domains Zn Binding • 1.5-2.5% of Bacteria and Archaea proteomes are Zn-binding • 4.5-5% of Eukaryota Zn domains (+phosphotases) 10000 1000 100 10 0.5155 y = 1.0657x R2 = 0.788 0.8044 y = 0.0935x R2 = 0.8511 1.0281 y = 0.0349x 1 1000 Dalhousie December 2007 2 R = 0.8464 10000 total domains Bacteria Archaea Eukaryotes 63 100000 Zn Binding by Kingdom Hard ligands: Asp, Glu, Ser, Tyr Soft ligands: Cys, His 100% Zn: Lewis acid reactions to informational systems (Zn fingers are >60% of Zn containing superfamilies in Eukaryotes!) 90% 80% 70% Soft ligands only hard and soft ligands 60% 50% 40% 30% 20% 10% 0% Dalhousie December 2007 Archaea Bacteria Eukaryotes 64 Future Work • Ca concentrations have also changed dramatically – is this evident in modern proteomes and if so what are the evolutionary implications? • Proteins associated with the nervous system – 9% before a rapid expansion .5 Mya – around the time of the TK transition • c.19 ubiquitous Mg binding • Evolution of photosynthesis Dalhousie December 2007 65