Rational Structural Proteomics and Genomics Perfect alignment of protein families is critical to: 1)Drug design and control of function 2)Understanding protein folding and evolution 3)Tracing the origin and evolution of the genetic code 4) Creating a rooted evolutionary tree 5) Correcting errors in the genebank Glycine is the key to Perfect Alignment. Outline The Gene Bank The evolutionary tree The ribosome Conserved ribosomal protein lengths GARP based alignment Separating Gram+ and Gram- bacteria Connecting Alpha proteobactria with mitochondria Separate Mitochondrial and cytosolic RibPros in Fungi Place Archaea on evolutionary tree Why GARP? The Last Universal Common Ancestor, LUCA Tree of Life Earliest use of genomic analysis to create an evolutionary tree used 16S ribosomal RNA and led to postulation of a third kingdom, archaea. Subsequent trees based on proteins present in all species produced contradictory trees. Horizontal gene transfer is considered the source of the disparity and many have concluded that it is impossible to determine a rooted tree of all species. Because it is unlikely the ribosome or its integral proteins arose from horizontal transfer we are exploring the potential to trace evolution with each of the 52 ribosomal proteins and to compare the 52 trees generated for consistency. In the course of doing so we have discovered the power of having all members of a protein family perfectly aligned. A Tree of Species Evolution Un-rooted Tree Jamerdon Dean, Junior Haeckel’s Rooted Tree Plants Animals Root Ribosome • Essential to species survival • Produce all proteins critical to existence • Unlikely to be affected by horizontal gene transfer Picture from http://www.biologyreference.com/Re-Se/Ribosome.html Fiona Hennig, City Honor Junior The Hypothesis An accurate phylogenic tree can be based on perfect alignment of ribosomal proteins. Method Create a search vector that finds and aligns all members of each ribosomal protein family in the gene bank. Tests of Accuracy 1.All members of the family are found with no false hits. 2.Separation of Gram + and Gram - bacteria; and bacteria, eukaryotes and archaea can be achieved on the basis of one or two positions in the search vector. Lengths of 1600 L1 Bacterial Ribosomal Proteins 95% between 228 aa and 241 aa. Shortest 216 aa Longest 241 aa GARP Based Minimum Universal Fingerprint (MUF) for Bacterial Ribosomal Protein L1 MX(0,22)X(26)[AST]X[DE]XxX(5,8)X(4) xXRXXXX[LM]PXGXGX(15,17)[AS]XX XX[GA]X(5)X(1,11)XxX(3)DX(5)PX(1,10 )[GA]XXXGXX(23,25)XXGX(28)NX(12) PX(7)xX(20,35) 10 100% conser ved identities (8 GARP ) 5 sites of occupancy of two or more ÒsimilaritiesÓ (2 GARP) 5 INDELS of precise location and length Captures 1600 L1s, no f alse hits and 17 Eukaryo tes GARP based alignment of 1600 bacterial L1 Ribosomal protein 29 Identities (98%) 16 GARP 5 INDELS The Power of Perfect Alignment: MX(0,22)X(26)[AST]X[DE]XxX(5,8)X(4)[z2]XRXXXX[LM]P XGXGX(15,17)[AS]XXXX[GA]X(5)X(1,11)X[z1]X(3)DX(5)PX (1,10)[GA]XXXGXX(23,25)XXGX(28)NX(12)PX(7)xX(20,35) Two site based separation of Gr+ and Gr- bacteria. z1(120),z2(67) = separate Gr+ from GrZ1 {WYFR} [WYFR] Z2 X {ML} [WYFR] [ML] 861 G584 G+ All Firmicutes 134 Gr- (80 Bacteroidetes, 50 delta Pro) 118 Gr+ All Actinobacteria Single site isolation of an entire class [ED] X 344 Gr- 309 Gamaproteobacteria(entire class) GARP conservation PHYLUM < CLASS < ORDER < GENUS RibPro S19 Vector • M(0,24)X(21)RX(9)[GARSDEN]X(6)XGX(7)[ LIVMP]X(8)[LFI][GARS][DEA][FYME]X(2)[ STP]X(5)[HKFYMT]X(0,25). • This vector aligns 3385 bacteria [1403 G+, 1982 G-], 2063 eukaryota, and 142 archaea. • The only 100% conserved residues in all 5410 species are a Glycine (G) and an Arginine (R). Rasheen Powell City Honors Junior Sample of Alignment of 6069 S19 Ribosomal Proteins 24 residues conserved at 90% identity 95% conserved residues in ribosomal protein S19 100 % Gly Separating G+ and G- Bacteria • G+ bacteria have a single membrane, while G- bacteria have two membranes. Rasheen Powell City Honors Junior S19 Separation of G+ and G• M(0,24)X(21)RX(9)[GSDRENA]X(6)ZGX(7 )[LIVMP]X(8)[LFI][GASR][DEA][FYME] X(2)[STP]X(5)[HKFYMT]X(0,25) • Single amino acid (Z) separation • Z = D(Aspartic acid) G+ • Z = N(Asparagine)G- 123 Archaea LUCA - Gr + Gr How can we determine why amino acid changes accompany the evolution of G+ to Gbacteria ? Locate the sites of changes on the three dimensional structures of the ribosomal proteins. Determine how these changes are related to protein/protein and protein/ribosomal RNA interactions. Determine how these interactions are related to the evolution of ribosomal structure and function. Ribosomal interactions (----) between S19 and rRNA that changed as G+ species evolved into G- revealed by X-ray Crystal Structure Analysis Two S19 Ribosomal Proteins in Fungi The S19 MUF finds 270 Fungal examples of two distinct types based on length and conserved residues. One is approximately 150 aa long and the other approximately 90 aa long. The shorter of the two resembles alpha-proteobacterial S19 and the longer resembles the metazoa and archaea. • The Cytosolic Protein Search Vector: X(0,33)X(13)[LIM]X(16)RRXXX[RHK]GX(19,30)X(3)XX[RGWCKT] X(5)PX(3)[GARDENS]X(4)[VIL][HYF]XGX(7)[LIVMP]X(7)X[LFI][GASR][DEA][FYME]X(0,30) • 143 Cytosolic S19 • The Mitochondrial Protein Search Vector: X(0,25)XSX[WY]KX(10,27)X(3)XX[RGWCKT]X(5) PX(3)[GARDENS]X(4)[VIL][HYF]XGX(7)[LIVMP]X(7)X[LFI][GASR][DEA][FYME]X(0,30) • 122 Mitochondrial S19 • 140 to 163 aa long, 60 to 70 aa N-terminal addition to 65 aa S19 core protein present in all S19 copies in all species. The addition contains a 95% conserved RXRRX(3)RG sequence also found in Metazoa and Archaea but not in bacteria. 85 to 105 aa long, 28 to 38 aa N-terminal addition to the 65 aa S19 core. The addition contains a 95% conserved RSXWKGP sequence found in all alpha-proteobacteria but not in any other bacteria, eukaryota or archaea. Residues in ribosomal protienS12b/23e that are 95% conserved throughout evolution Blue – Bacteria Orange – Fungi Green – 95% conserved in all Both proteins can be aligned due to the similarities across all s19s. However, the difference in length and the additionally conserved sequences segments can also be located and observed in 3-dimensional structures. Alpha-proteobacterial s19 homologous with mitochondrial s19 in fungi archaea and metazoa s19 homologous to Cytosolic s 19 in Fungi Yellow – additional conserved core residues Red – distinguishing amino acids at the Nterminus RibPro S19 Vector • X(0,50)X(3)[Z2]X[RGWCKT]X(5)PX(3)[GARDENS]X(4)[VIL][H YF][Z1]GX(7)[LIVMP]X(7)[Z3][LFI][GASR][DEA][FYME]X(0,2 5) The only 100% conserved residues are a Glycine (G) and an Proline (P). The vector aligns6069 species: 3556 bacteria [1487 G+, 2087 G-], 2355 eukaryota, and 147 archaea. Positions that parse the data by kingdoms Z1=D G+ bacteria; {D} all other species. Z2=W All G+, 95% of G-, and Plants; {W} All other Eukaryota and Archaea Z2={W}, Z3=K remaining 5% G- and plants and Fungal mitocondria Z3={K} Fungal cytosolic, metazoa, all Archaea S19 Taxonomic Distinguishers & Conserved Values P H G Full Conservation N G+/G- FEGL W G+&G-vsEuk&Ark K Mitochon/ Cytosolic Why GARP? • • • • • • • • • Gly swings both ways. The vast majority of all known folds have three or more Gly residues that turn in a way that only they can.These Glys are retained in all members of a fold family(ie. all Bacterial ribosomal proteins). Pro has a constrained conformation. While Pros can be replaced by other amino acids, specific energetic stability may be lost accounting for conservation. Arg provides positive charge to balance negatively charged rRNA and can form direct interactions with specific nucleotides of rRNA and tRNA as has been reported(L1 and L9). Ala is a major building block of alpha helices and beta strands. Ribosomal Protein L11-Gly 2 Interacting with rRNA 3 on Outside 29,- 23S rRNA 88,136,+ 25,+ 84,+ 32,+ 98,51,+ 16,- 130,- 10 Glycine 2 in Alpha Helixes 7 in Loops 1 in Beta Sheets 3-Dimensional Crystal Structure 1MMS of the Ribosomal Protein L11 Showing Positions and Locations of Glycine Residues as well as the sign (+ or-) of their Phi values B 98 Glycine 172.8, 179.0 A 136 Glycine 61.5,38.3 B 88 Glycine -81.5, 170.1 B 84 Glycine 75.2, 29.6 A 88 Glycine -89.8,161.0 A 84 Glycine 64.3, 16.5 A 16 Glycine -88.9,78.5 A 29 Glycine -62.0,-12.2 A 32 Glycine 132.0, -21.8 A 130 Glycine A 51 Glycine 101.1, -30.8 -63.1,-35.3 B 130 Glycine -59.8,-41.0 A 98 Glycine -174.6,-177.9 23S rRNA 23S rRNA 29,29,- 25,+ 25,+ 51,+ 51,+ 32,+ 16,16,- 88,88,- 136,+ 136,+ 84,+ 84,+ 130,130,- 98,98,- Looking for LUCA Hypothesis Because (a) average GARP content increases in going from eukaryotes to G- bacteria to G+ bacteria, (b) all 8 GC-only codons encode only GARP, ( c) GC-rich DNA is more stable and (d) and many G+ Actinobacteria have a GC content of over 72%, we looked for LUCA in Actinobacteria. Method Examine amino acid and codon bias in Actinobacteria Why GARP?II • GC-rich DNA melts at a higher temperature than AT-rich DNA because of additional hydrogen bonds. • • The most stable GC only codons may have been the first to acquire amino acid definitions. The eight GC-only codon are: GGG-Gly GCG-Ala CGG-Arg CCC-Pro GGC-Gly GCC-Ala CGC-Arg CCG-Pro G A R P Looking for LUCA Discovery The wobble base position in ALL codons used in ALL putative proteins in several species of Actinomycetales is 97% GC. These species lack most of the tRNAs for codons ending in A or T, and the 3% of the codons that end in A or T are disproportionately present in hypothetical proteins, between alternative starts, or constitute defacto stop codes. S-Ribosomal proteins and tRNA Synthetases of these species use codons ending in G or C almost exclusively. Codon Bias in Cellulomonas flavigena • Cellulomonas flavigena has a GC content of 75% • Use of codons ending with GC is 97% , of codons ending in with AT is 3%, and of codons beginning and ending in A or T is 0.4%. Codon use in Cflav Use of codons ending in A or T in 49 RibPros of Cfla [CG]X[AT] # • • • • • • • • • • • • • • • • 33 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. R-CGT G-GGT A-GCT P-CCT G-GGA A-GCA R-CGA V-GTT D-GAT LT-CTT P-CCA LT-CTA V-GTA H-CAT Q-CAA E-GAA 57 52 20 10 4 8 4 3 1 1 0 0 0 0 0 0 [AT]X[AT] # 49. T-ACT 3 50. T-ACA 2 51. S-TCT 1 52. S-AGT 0 53. C-TGT 0 54. S-TCA 0 55. RSG*-AGA 0 56. WCU-TGA 43*STOP 57. F-TTT 0 58. Y-TAT 0 59. N-AAT 0 60 I-ATT 0 61. L-TTA 0 62. IM-ATA 0 63. KN-AAA 0 64. *QY-TAA 3 18 A/T-ending codons, TTG, and ATG NOT used in 6634 codons of Cfla ribPROs Three Stop codes TAG Use of codons ending in A or T in 18 tRNA Synthatases of Cfla [CG]X[AT] # • • • • • • • • • • • • • • • • 33. R-CGT 59 34. G-GGT 76 35. A-GCT 23 36. P-CCT 7 37. G-GGA 19 38. A-GCA 41 39. R-CGA 7 40. V-GTT 3 41. D-GAT 9 42. LT-CTT 0 43. P-CCA 0 44. LT-CTA 0 45. V-GTA 0 46. H-CAT 3 47. Q-CAA 0 48. E-GAA 8 [AT]X[AT] # 49. T-ACT 50. T-ACA 51. S-TCT 52. S-AGT 53. C-TGT 54. S-TCA 55. RSG*-AGA 56. WCU-TGA 57. F-TTT 58. Y-TAT 59. N-AAT 60 I-ATT 61. L-TTA 62. IM-ATA 63. KN-AAA 64. *QY-TAA 0 7 0 0 1 1 0 18*STOP 3 1 0 1 0 0 0 0 14 A/T ending codons not used in 10,814 codons in Cfla tRNA Synthatases 22 tRNAs not found in Cellulomonas flavigena (Cfla) and the use of their cognate codons in ribPros and tRNA Synthetases • • • • • • • • • • • • • [GC]X[GC] 3. R-CGC 8. LT-CTC 13. H-CAC [AT]X[GC] [AT]X[AT] 49. T-ACT 1 34. G-GGT 40 35. A-GCT 11 36. P -CCT 1 125+ 70+ [CG]X[AT] 24. F-TTC 33+ 39. 40. 41. 42. R-CGA 0 V-GTT 2 D-GAT 0 LT-CTT 0 51. S-TCT 0 52. S-AGT 0 53 C-TGT 0 44. LT-CTA 0 57 F-TTT 0 58 Y-TAT 0 59 N-AAT 0 60 I-ATT 0 46. H-CAT 0 62 IM-ATA 0 41+ 13 of these codons are not used in ribPros or tRNA sythetases. 10 of them end in T. 9 are used. 4 of the 5 most used end in C. LUCA in Actinomycetales Evidence that LUCA was a GC rich species of the Gram positive bacterial order Actinomycetales includes: 1. Extreme GC content (97%) of the wobble base position of ALL putative proteins identified in these species 2. The reliably identified proteins in these species are encoded by genes that do not use 14 codons that end in A or T 3. Many codons with A or T in the wobble base position encode different amino acids in different species 4. No tRNAs for the majority of the unused codons are found in the genomes of these species 5. The absence of protein S21 from the ribosomes of all actinobacteria suggests that S21 arose later in the evolution of the ribosome of other phylum LUCA in Actinomycetales Evidence that LUCA was a GC rich species of the Gram positive bacterial order Actinomycetales includes: 6. Over 80% of the putative proteins in these species have full length multiple open reading frames including genes in which all six frames are ORFs. 7. The start code in these species is more often GTG(Val) than ATG(Met). 8. The absence of Cys and Trp residues in ribosomal proteins of these species is greater than in other bacteria 9. All the members of each of the other 20 ribosomal protein families can be aligned with sufficient accuracy to trace them all back to a common origin in Actinomycetales Potential Applications • Design antibiotics to destroy ribosomal function of specific classes, orders, and genus of bacteria. • Use details of co-evolution of species , substrate, cofactor, and function of families of enzymes to design selective inhibitors • Trace evolution of any enzyme family • Control Species distribution of candidates for drug design • Determine G+ ancestor of first G- bacteria. • Identify bacterial precursor to first mitochondrion • Determine order of evolution of eubacteria and archea • Determine order of evolution of tRNA Synthetases I and II. Conclusions On the basis of these data and our previously published work we conclude that five basic assumptions about the genetic code that appear in most textbooks are false. • The universal code is not universal. • All species now on earth do not use a code “frozen in time” as claimed by Watson and Crick. • Codon use is not determined by the tRNA population in a cell. • Wobble base variation is not a reliable gauge of the time course of mutational change. Evolution of bacterial genes is not orders of magnitude faster than eukaryote genes. Flawed Gene Bank Annotation • Every new bacterial genome that is annotated has over 500 genes without orthologs that are termed ORFans (Open Reading Frames without ancestors). While it is clear that a significant number of these ORFans are nonsense , a routine technique to distingish bonne fide genes from nonsense has not been found. • Through analysis of bias in amino acid distribution and codon use we are able to identify nonsense codes and nonsense sequences that reliably identify tens of thousands of putative gene product that are entirely or partially nonsense. Sources of Errors in Gene Notation • Incorrect assembly of DNA due to insufficient overlap redundancy, common in extremely high or low GC regions. • Wrong strand choice and wrong frame choice caused by programmed maximization of gene length. Especially challenging in high GC genomes having Multiply ORFs (MORFs) of equal length. • Wrong Start selection due to standard practice of maximizing gene length and the fact that in addition to Met(atg), Val(gcg) and Leu(ttg) are common start codes. This is further complicated by the absence of up stream start signal (TATA, CAT boxes etc) in Bacteria. • Unjustified assumption that the standard Genetic code is appropriate to all bacterial genomes until variants are established by biochemical analysis. Genetic Code Redundancy • Because 64 possible three “letter” codons can be created from four nucleotides and only 22 amino acids are encoded by them the code is redundant. • It was assumed that all species used the same “universal” code. This was a premature conclusion. • Now at least 14 codons have been proven to have different definitions in different species. Some codons have as many as four definitions. • Most of the codons with variable definitions are in the AT rich half of the genetic code. Once ribosomal proteins adjust to a significant change in environment they undergo little further change for 100s of millions of years. Significant changes in ribosomal protein sequence accompany appearance of new class, order, family or genus. The magnitude of change decreases in that order. Sample Alignment 5598 S19 Ribosomal Proteins First row: one of 673 with major Indel (Opisthokonta and Archaea). Last row: one of xxx with C-terminal addition [Better illustrated with Prosite Colors] 90% conserved residues in ribosomal protein S19 Sample of Alignment of 1495 S19 Ribosomal Proteins with DG conservation 1487 G+ bacteria, and 5 Archea and 3 Eukaryota with W=H 46 residues conserved at 90% identity Sample of Alignment of 1757 S19 Ribosomal Proteins with {D}G conservation 1653 G- and 54 G+ bacteria*, 60 Eukaryota and 1 Archaea with W=H 35 residues conserved at 90% identity (20:GPRK) *X(0,5)KK[GS]X(10,25)X(3)xX[RGWCKT]X(5)PX(3)[GARDENS]X(4)[VIL][HYF]{D}GX(7)[LIVMP]X(7)x[LFI][GASR][DEA][FYME]X(0,25 ) 17 Erysipelotrichaceae (Firmicutes) Separation of the S19 Ribpros in bacteria, chloroplasts and plants from those in spirokeates, opisthokonta , and archaea by a single site (Z = W versus {W}). X(0,74)X(19)ZX[RGWCKT]X(5)PX(3)[GARSDEN]X(6)XGX(7)[LI VMP]X(8)[LFI][GARS][DEA][FYME]X(2)[STP]X(5)[HKFYMT]X( 0,25) Sample of Alignment of 480 S19 Ribosomal Proteins with WK {D}G conservation 269 Alpha-Proteobacteria and 211 Eukaryota (mitochondrial) 20 residues conserved at 90% identity (10:GPRK) X(0,11)X(4)WK[GS]X(10,25)X(3)xX[RGWCKT]X(5)PX(3)[GARDENS]X(4)[VI L][HYF]{D}GX(7)[LIVMP]X(7)x[LFI][GASR][DEA][FYME]X(0,25) S19 Archaea Alignment • X(0,50)X(3)[YFW][RHK]GX(15,35)XPX(3)[KR][RS]X(3)[RK][GR QENV]X(16,35)X(5)RX(9)[GSDRENA]X(6)XGX(7)[LIVMP]X(8)[L FI][GASR][DEA][FYME]XX[STP]X(5)[HKFYMT]X(0,25) • Additional 55 residue prefix adding two indels • Aligns 192 eukaryotes as well as 142 archaea GARP conservation PHYLUM < CLASS < ORDER < GENUS 46 Escherichia and 8 Shigella L1 ribosomal proteins Amino acid sequence homology: Only 4 mutations in 3 sequence positions. DNA sequence homology: 4 sites of nonsynonymous mutation Only 16 sites of synonymous mutation 15 of 235 possible wobble bases 1 1st base synonymous mutation. Mutations not correlated with genus variation DNA Sequence Alignment of E.Coli and Shigalia, Local Clustalx GGA GGC GGC GGT GGC GGT One Base Change Multiple Base Changes Analysis of DNA • Of 22 Glycine Codes: • • • • 10 GGC 9 GGT 2 GGA 1 Variable Position (50% GGC / 50% GGT) • 95% Total Conservation – What does it mean? – Total AA conservation all the way down to the DNA level – Over Billions of years DNA has not changed in these wobble bases showing how greatly conserved through evolution these bacteria actually are – The codon bias is not due to bias in tRNA in the genus or species Separating Chloroplasts from other eukaryota • X(0,74)X(8)[TSIVM]XX[RK]X(5)[PFQLNH]X[FMYLS]X(6)[VI]XxGX(7)[LIVM PTCF]X(8)[LFIVMTP][GASRK][DEAQ][FYME]XX[STP]X(5)[HKFYMT]X(0,25) • X= W early chloroplasts and 95% of viridiplante • X= H alveolata, opisthokonata[fungi,metazoa(including humans)] X(0,17)X(25)[TSIVM]WX[RK]X(5)[PFQLNH]X[FMYLS]X(6)[VI]XxGX(7)[LIVMPTCF]X(8)[LFIVMTP][ GASRK][DEAQ][FYME]XX[STP]X(5)[HKFYMT]X(0,25) finds 1615 Eukaryotes(1588 Viridiplante, 14 early Chloroplasts, 12 stamenophyles) • X(0,31)X(14)GX(9)[PASYQV][KRINT][KQ][PS][NHS][SA][AG]X[RI][KP RH]X(5)[LIFM]X(1,2)X(7)[YFHSNRLAQ][LIVACT][VMGPSQAT]X(3)ZX (0,67) • • Z Bac Euk Vir Strm Fun Metzo Arch Total • H 3806 792 516 48 107 (76 bilat) 0 4602 • C 0 386 47 6 125 161 0 386 • A 0 0 0 0 0 0 109 109 • G 0 4 0 0 0 0 37 43 • S 26 5 31 Gly(G), Ala(A), Arg(R ), and Pro(P) are the key to perfect alignment. Why GARP? Gly Most protein folds have three or more Gly residues that turn in a way that only they can. Such Glys are found in all Bacterial ribosomal proteins. Pro Arg has a constrained conformation that provides consistent stability. provides positive charge to balance negatively charged rRNA and forms direct interactions with rRNA and tRNA. Ala is a major building block of alpha helices and beta strands. _______________________________ GC (guanine-cytosine) base pairs have three hydrogen bonds while AT(adininethymine) have only two. GC-rich DNA melts at a higher temperature than AT-rich DNA. GC-rich codons may have been the first to acquire amino acid definitions. The eight GC-only codon are: GGG-Gly GGC-Gly G GCG-Ala GCC-Ala A CGG-Arg CGC-Arg R CCC-Pro CCG-Pro P How do we determine the three dimensional structures of proteins? X- ray cystallography! Herbert Hauptman was awarded the Nobel Prize for developing methods of x-ray crystal structure determination in 1985 The determination of the structure of the ribosome was awarded the Nobel Prize in 2009. The 3D structures of all the ribosomal proteins are in the Protein Data Bank Sample alignment of S19 proteins of 142 Archeae , 19 Alveolata, 234 Opisthokonta and 66 viridiplante Kingdom distribution of SRibPro Homologs • • • • • • The 60 cyanobacteria have a sequence position in their S19 protein that is fully occupied by glutamine. This position is occupied by glutamine in only three of the 1297 S19 proteins in G+ bacteria and 195 of the 3985 S19 proteins in G – bacteria and eukaryotes (60 of which are the cyanobacteria). The species of the other 135 S19 Ribpros having glutamine in this position may provide evidence of a link between the ribosomes of cyanobacteria and those of other specific phylum, classes, orders or genius. The distribution of glutamines in the 5410 S19 proteins is given in the following table. Four sequences in G+ bacteria have greater than 10% Glutamine occupancy, only one position in Gbacteria and eukaryotes and archeaes has greater than 10% occupancy(23%) and Cyanobacteria have five sites of greater than 10% glutanmine occupancy. The five cyanobacteria sites correspond to three of those in G+ and one in G-/Arc/Euk but the 100% Gln site is peculiar to cyanobacteria. Adding the 100% conserved Q captures 82 hits in SwP (36 cyano, 22 G- bacteria and 24 eukaryota(the usual chloroplasts(21) cyanelle,and plastids). The 5 glutamine sites retain appreciable glutamine occupancies (25%,18%,16%,50% and 100%). It may be that a subset of cyanobacteria and chloroplasts share full glutamine occupancy in these five sites. If so would it have any significance? Positions in S19 occupied by 5% or greater Glutamine • • • • • • Pos Cyan 32 75 89 102 109 111 G+ 61% 35 19 0 28 0.2 (3) G-/Eu/Arc 3% 2 10 23 0 4.9 (195) 40% 45 23 53 0 100 (60) Genetic Code Redundancy • Because 64 possible three “letter” codons can be created from four nucleotides and only 22 amino acids are encoded by them the code is redundant. • It was assumed that all species used the same “universal” code. This was a premature conclusion. • Now at least 14 codons have been proven to have different definitions in different species. Some codons have as many as four definitions. • Most of the codons with variable definitions are in the AT rich half of the genetic code. Multiple copies of some RibPros? S18 has 70 more actinobacterial homologs than other S RibPros. The majority are in three genus, 34 mycobacterium, 26 streptomyces and 7 neocardiaceae. These constitute additional copies of S18 in these 67 species. A major difference between the two copies of S18 in mycobacterium tuberculosis is that one copy has an additional conserved cysteine at its C-terminus while the other has only two conserved cysteines that are found in all mycobacterium. These positions of the cysteines in the folded protein need to be mapped. • MX(5,8)PFX(3,6)X(21)RX(9)[GSDRENA]X(6)NGX(7)[LIVMP]X(5){AIT }XX[LFI][GASR][DEA][FYME]XX[STP]X(5)[HKFYMT]X(0,25) • A modification of the (NG) S19 Muf set (shown here) captures 1946 eukaryotes, 1925 G- bacteria and 123 archaea. This separates the plant world from opisthokonata and archaea and at the same time subdivides the G- bacteria. The S19 ribpros of the plant world have a total lengths closer to those of the G- bacteria and have the highest sequence homology with the S19s of 63 cyanobacteria. • • • • MX(0,15)WKX(0,40)RX(9)[GSDRENA]X(6)NGX(7)[LIVMP]X(5){AIT}XX[LFI][GASR][ DEA][FYME]XX[STP]X(5)[HKFYMT]X(0,25) This WK modification captures 97 • • • • • Chromera velia (1) Cyanophora paradoxa (1) Dictyosteliida (3) Hemiselmis andersenii (1) Malawimonas jakobiformis (1) Opisthokonta (70) Fungi (69) Batrachochytrium dendrobatidis (1) Dikarya (68) Trichoplax adhaerens (1) Reclinomonas americana (1) Stramenopiles (4) Viridiplantae (15) Chlorophyta (7) Streptophytina (8) Homology in S19 of 1535 Bac and 1485 plants The introduction of the PF residues isolates 1486 S19’s of the plant world and 1489 G- bacteria from all Archaea and all but 46 G+ Bacteria. The 3022 S19’s have 23 of 92 residues 95% conserved (9 GARP) Peeling the ribosomal onion Hsiao, C. et al. Mol Biol Evol 2009 26:2415-2425; doi:10.1093/molbev/msp163 Copyright restrictions may apply. Fluorescent labeling of ribosomal proteins L1 and L9 within the 50S ribosomal subunit. Fei J et al. PNAS 2009;106:15702-15707 ©2009 by National Academy of Sciences The Power of Perfect Alignment Ribosomal Proteins will be used to demonstrate the power of Gly, Ala, Arg, and Pro (GARP) based perfect alignment. Distribution of 4 SRibPros by Bacterial Phylum Fully Conserved Gly and the Asn before it in G- Bacteria, Eukaryotes and Archeae Cyanobacteria Begat Chloroplasts MX(0,22)X(26)[AST]X[DE]XZ3X(5,8)X(4)xXRXXXX[LM] PXGXGX(15,17)[AS]XXXX[GA]X(5)X(1,11)X{w}X(3)D X(5)PX(1,10)[GA]XXXGXX(23,25)XXGX(28)NX(12) PX(7)Z4X(20,35) 120{w},Z3(h) and Z4[wf] = Three site co isolation of the entire class of cyanobacteria (44) and 14 chloroplasts . The Chloroplasts have higher seqence homology with cyanobacteria than with other bacterial L1s LUCA - Gr + Gr S19 Mitochondrial w/o n-terminus S19 Cytosolic w/o n-terminus MITOCHONDRIAL Cytosolic Two S19 Ribosomal proteins in Fungi • The S19 Muf finds 265 Fungal examples of two distinct types based on length and conserved residues. One is approximately 150 aa long and the other approximately 90 aa long. The shorter of the two resembles bacterial S19 and the longer resembles the metazoa and archaea. • • 143 Cytosolic S19 140 to 163 aa long, 60 to 70 aa N-terminal • addition to 65 aa S19 core protein present in • all S19 copies in all species. The addition • contains a 95% conserved RXRRX(3)RG • sequence also found in Metazoa and • Archaea but not in bacteria. • • 122 Mytochondrial S19 85 to 105 aa long, 28 to 38 aa N-terminal • addition to the 65 aa S19 core. The addition • contains a 95% conserved RSXWKGP • sequence found in all alpha-proteobacteria • but not in any other bacteria, eukaryota or • archaea. • • Mitocondrial and cytosolic S19 in Fungi • The following modification to the S19 MUF vector isolate 110 mitochondrial and 125 cytosolic S19 proteins: • X(0,74)X(21)XX(5)PX(3)[GSDRENA]X(5)[HYF][DNST]GX(7)[LIVMP]X(8)[LFI][GASR][DEA][FYME]X(0,35) • The HY and F separate fungal mitochondrial and cytosolic S19 proteins. They have two distinctly different subsets with a difference in overall length. They share highly conserved residues in their central core with all bacterial S19 Ribpros but differ significantly in sequences throughout the sequences and especially at the N and C terminus. • • • Mitochondrial X(0,55)X(15)[LI]P[QNPR][FM][VIC]G[LIVA]X[FL]XX[HY][NT]GX(0,50) Cytosolic X(0,40)V[KR]TH[LCM]R[DN][ML][LIP]X(0,60)