X(0,25)

advertisement
Rational Structural Proteomics and Genomics
Perfect alignment of protein families is critical
to:
1)Drug design and control of function
2)Understanding protein folding and evolution
3)Tracing the origin and evolution of the
genetic code
4) Creating a rooted evolutionary tree
5) Correcting errors in the genebank
Glycine is the key to Perfect Alignment.
Outline
The Gene Bank
The evolutionary tree
The ribosome
Conserved ribosomal protein lengths
GARP based alignment
Separating Gram+ and Gram- bacteria
Connecting Alpha proteobactria with mitochondria
Separate Mitochondrial and cytosolic RibPros in Fungi
Place Archaea on evolutionary tree
Why GARP?
The Last Universal Common Ancestor, LUCA
Tree of Life
Earliest use of genomic analysis to create an evolutionary tree used 16S
ribosomal RNA and led to postulation of a third kingdom, archaea.
Subsequent trees based on proteins present in all species produced
contradictory trees. Horizontal gene transfer is considered the source of the
disparity and many have concluded that it is impossible to
determine a rooted tree of all species.
Because it is unlikely the ribosome or its integral proteins arose from
horizontal transfer we are exploring the potential to trace evolution with each
of the 52 ribosomal proteins and to compare the 52 trees generated for
consistency.
In the course of doing so we have discovered the power of having all
members of a protein family perfectly aligned.
A Tree of Species Evolution
Un-rooted Tree
Jamerdon Dean, Junior
Haeckel’s Rooted Tree
Plants
Animals
Root
Ribosome
• Essential to species survival
• Produce all proteins critical to existence
• Unlikely to be affected by horizontal gene transfer
Picture from http://www.biologyreference.com/Re-Se/Ribosome.html
Fiona Hennig,
City Honor Junior
The Hypothesis
An accurate phylogenic tree can be based on perfect
alignment of ribosomal proteins.
Method
Create a search vector that finds and aligns all members of
each ribosomal protein family in the gene bank.
Tests of Accuracy
1.All members of the family are found with no false hits.
2.Separation of Gram + and Gram - bacteria; and bacteria,
eukaryotes and archaea can be achieved on the basis of one
or two positions in the search vector.
Lengths of 1600 L1 Bacterial Ribosomal Proteins 95% between 228 aa and 241 aa. Shortest
216 aa Longest 241 aa
GARP Based Minimum Universal Fingerprint (MUF) for
Bacterial Ribosomal Protein L1
MX(0,22)X(26)[AST]X[DE]XxX(5,8)X(4)
xXRXXXX[LM]PXGXGX(15,17)[AS]XX
XX[GA]X(5)X(1,11)XxX(3)DX(5)PX(1,10
)[GA]XXXGXX(23,25)XXGX(28)NX(12)
PX(7)xX(20,35)
10 100% conser ved identities (8 GARP )
5 sites of occupancy of two or more
ÒsimilaritiesÓ (2 GARP)
5 INDELS of precise location and length
Captures 1600 L1s, no f alse hits and 17
Eukaryo tes
GARP based alignment of 1600 bacterial L1 Ribosomal protein
29 Identities (98%) 16 GARP 5 INDELS
The Power of Perfect Alignment:
MX(0,22)X(26)[AST]X[DE]XxX(5,8)X(4)[z2]XRXXXX[LM]P
XGXGX(15,17)[AS]XXXX[GA]X(5)X(1,11)X[z1]X(3)DX(5)PX
(1,10)[GA]XXXGXX(23,25)XXGX(28)NX(12)PX(7)xX(20,35)
Two site based separation of Gr+ and Gr- bacteria.
z1(120),z2(67) = separate Gr+ from GrZ1
{WYFR}
[WYFR]
Z2
X
{ML}
[WYFR]
[ML]
861 G584 G+ All Firmicutes
134 Gr- (80 Bacteroidetes, 50 delta Pro)
118 Gr+ All Actinobacteria
Single site isolation of an entire class
[ED]
X
344 Gr- 309 Gamaproteobacteria(entire class)
GARP conservation PHYLUM < CLASS < ORDER < GENUS
RibPro S19 Vector
• M(0,24)X(21)RX(9)[GARSDEN]X(6)XGX(7)[
LIVMP]X(8)[LFI][GARS][DEA][FYME]X(2)[
STP]X(5)[HKFYMT]X(0,25).
• This vector aligns 3385 bacteria [1403 G+, 1982
G-], 2063 eukaryota, and 142 archaea.
• The only 100% conserved residues in
all 5410 species are a Glycine (G) and
an Arginine (R).
Rasheen Powell
City Honors Junior
Sample of Alignment of 6069 S19 Ribosomal Proteins
24 residues conserved at 90% identity
95% conserved residues in ribosomal protein S19
100 % Gly
Separating G+ and G- Bacteria • G+ bacteria have a single membrane, while
G- bacteria have two membranes.
Rasheen Powell
City Honors Junior
S19 Separation of G+ and G• M(0,24)X(21)RX(9)[GSDRENA]X(6)ZGX(7
)[LIVMP]X(8)[LFI][GASR][DEA][FYME]
X(2)[STP]X(5)[HKFYMT]X(0,25)
• Single amino acid (Z) separation
• Z = D(Aspartic acid) G+
• Z = N(Asparagine)G- 123 Archaea
LUCA
-
Gr
+
Gr
How can we determine why amino acid
changes accompany the evolution of G+ to Gbacteria ?
Locate the sites of changes on the three dimensional
structures of the ribosomal proteins.
Determine how these changes are related to
protein/protein and protein/ribosomal RNA interactions.
Determine how these interactions are related to the
evolution of ribosomal structure and function.
Ribosomal interactions (----) between S19 and rRNA that
changed as G+ species evolved into G- revealed by X-ray
Crystal Structure Analysis
Two S19 Ribosomal Proteins in Fungi
The S19 MUF finds 270 Fungal examples of two distinct types based on length and conserved
residues. One is approximately 150 aa long and the other approximately 90 aa long. The
shorter of the two resembles alpha-proteobacterial S19 and the longer resembles the
metazoa and archaea.
•
The Cytosolic Protein Search Vector:
X(0,33)X(13)[LIM]X(16)RRXXX[RHK]GX(19,30)X(3)XX[RGWCKT]
X(5)PX(3)[GARDENS]X(4)[VIL][HYF]XGX(7)[LIVMP]X(7)X[LFI][GASR][DEA][FYME]X(0,30)
•
143 Cytosolic S19
•
The Mitochondrial Protein Search Vector:
X(0,25)XSX[WY]KX(10,27)X(3)XX[RGWCKT]X(5)
PX(3)[GARDENS]X(4)[VIL][HYF]XGX(7)[LIVMP]X(7)X[LFI][GASR][DEA][FYME]X(0,30)
•
122 Mitochondrial S19
•
140 to 163 aa long, 60 to 70 aa N-terminal
addition to 65 aa S19 core protein present in
all S19 copies in all species. The addition
contains a 95% conserved RXRRX(3)RG
sequence also found in Metazoa and
Archaea but not in bacteria.
85 to 105 aa long, 28 to 38 aa N-terminal
addition to the 65 aa S19 core. The addition
contains a 95% conserved RSXWKGP
sequence found in all alpha-proteobacteria
but not in any other bacteria, eukaryota or
archaea.
Residues in ribosomal protienS12b/23e that are
95% conserved throughout evolution
Blue – Bacteria
Orange – Fungi
Green – 95%
conserved in all
Both proteins can be aligned due to
the similarities across all s19s.
However, the difference in length and
the additionally conserved sequences
segments can also be located and
observed in 3-dimensional structures.
Alpha-proteobacterial s19
homologous with
mitochondrial s19 in fungi
archaea and metazoa
s19 homologous to
Cytosolic s 19 in Fungi
Yellow – additional
conserved core
residues
Red – distinguishing
amino acids at the Nterminus
RibPro S19 Vector
• X(0,50)X(3)[Z2]X[RGWCKT]X(5)PX(3)[GARDENS]X(4)[VIL][H
YF][Z1]GX(7)[LIVMP]X(7)[Z3][LFI][GASR][DEA][FYME]X(0,2
5) The only 100% conserved residues are a Glycine (G) and an Proline (P).
The vector aligns6069 species: 3556 bacteria [1487 G+, 2087 G-], 2355
eukaryota, and 147 archaea.
Positions that parse the data by kingdoms
Z1=D G+ bacteria; {D} all other species.
Z2=W All G+, 95% of G-, and Plants; {W} All other Eukaryota and Archaea
Z2={W}, Z3=K remaining 5% G- and plants and Fungal mitocondria
Z3={K} Fungal cytosolic, metazoa, all Archaea
S19 Taxonomic Distinguishers & Conserved Values
P
H
G
Full Conservation
N
G+/G-
FEGL
W
G+&G-vsEuk&Ark
K
Mitochon/
Cytosolic
Why GARP?
•
•
•
•
•
•
•
•
•
Gly swings both ways. The vast majority of all known folds have three or more
Gly residues that turn in a way that only they can.These Glys are retained
in all members of a fold family(ie. all Bacterial ribosomal proteins).
Pro has a constrained conformation. While Pros can be replaced by other amino
acids, specific energetic stability may be lost accounting for conservation.
Arg provides positive charge to balance negatively charged rRNA and can
form direct interactions with specific nucleotides of rRNA and tRNA
as has been reported(L1 and L9).
Ala is a major building block of alpha helices and beta strands.
Ribosomal Protein L11-Gly
2 Interacting with rRNA
3 on Outside
29,-
23S rRNA
88,136,+
25,+
84,+
32,+
98,51,+
16,-
130,-
10 Glycine
2 in Alpha Helixes
7 in Loops
1 in Beta Sheets
3-Dimensional Crystal Structure 1MMS of the Ribosomal Protein L11 Showing Positions and Locations of
Glycine Residues as well as the sign (+ or-) of their Phi values
B 98
Glycine 172.8, 179.0
A 136
Glycine 61.5,38.3
B 88
Glycine -81.5, 170.1
B 84
Glycine 75.2, 29.6
A 88
Glycine
-89.8,161.0
A 84
Glycine 64.3, 16.5
A 16
Glycine
-88.9,78.5
A 29
Glycine
-62.0,-12.2
A 32
Glycine 132.0, -21.8
A 130
Glycine
A 51
Glycine 101.1, -30.8
-63.1,-35.3
B 130
Glycine
-59.8,-41.0
A 98
Glycine
-174.6,-177.9
23S rRNA
23S rRNA
29,29,-
25,+
25,+
51,+
51,+
32,+
16,16,-
88,88,-
136,+
136,+
84,+
84,+
130,130,-
98,98,-
Looking for LUCA
Hypothesis
Because (a) average GARP content increases in going
from eukaryotes to G- bacteria to G+ bacteria, (b) all 8
GC-only codons encode only GARP, ( c) GC-rich DNA is
more stable and (d) and many G+ Actinobacteria have a
GC content of over 72%, we looked for LUCA in
Actinobacteria.
Method
Examine amino acid and codon bias in Actinobacteria
Why GARP?II
•
GC-rich DNA melts at a higher temperature than AT-rich DNA
because of additional hydrogen bonds.
•
•
The most stable GC only codons may have been the first to acquire
amino acid definitions.
The eight GC-only codon are:
GGG-Gly
GCG-Ala
CGG-Arg CCC-Pro
GGC-Gly
GCC-Ala
CGC-Arg
CCG-Pro
G
A
R
P
Looking for LUCA
Discovery
The wobble base position in ALL codons used in ALL
putative proteins in several species of Actinomycetales
is 97% GC. These species lack most of the tRNAs for
codons ending in A or T, and the 3% of the codons that
end in A or T are disproportionately present in
hypothetical proteins, between alternative starts, or
constitute defacto stop codes.
S-Ribosomal proteins and tRNA Synthetases of these
species use codons ending in G or C almost exclusively.
Codon Bias in Cellulomonas flavigena
• Cellulomonas
flavigena has a GC
content of 75%
• Use of codons ending with
GC is 97% , of codons
ending in with AT is 3%, and
of codons beginning and
ending in A or T is 0.4%.
Codon use in Cflav
Use of codons ending in A or T in 49 RibPros of Cfla
[CG]X[AT] #
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
33
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
R-CGT
G-GGT
A-GCT
P-CCT
G-GGA
A-GCA
R-CGA
V-GTT
D-GAT
LT-CTT
P-CCA
LT-CTA
V-GTA
H-CAT
Q-CAA
E-GAA
57
52
20
10
4
8
4
3
1
1
0
0
0
0
0
0
[AT]X[AT] #
49. T-ACT
3
50. T-ACA
2
51.
S-TCT
1
52. S-AGT
0
53.
C-TGT
0
54.
S-TCA
0
55. RSG*-AGA 0
56. WCU-TGA 43*STOP
57.
F-TTT
0
58. Y-TAT
0
59.
N-AAT
0
60
I-ATT
0
61.
L-TTA
0
62. IM-ATA
0
63. KN-AAA
0
64. *QY-TAA
3
18 A/T-ending codons, TTG, and ATG NOT used in 6634 codons of Cfla ribPROs
Three Stop codes TAG
Use of codons ending in A or T in 18 tRNA Synthatases of Cfla
[CG]X[AT] #
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
33. R-CGT 59
34. G-GGT 76
35. A-GCT 23
36. P-CCT 7
37. G-GGA 19
38. A-GCA 41
39. R-CGA 7
40. V-GTT 3
41. D-GAT 9
42. LT-CTT 0
43. P-CCA 0
44. LT-CTA 0
45. V-GTA 0
46. H-CAT 3
47. Q-CAA 0
48. E-GAA 8
[AT]X[AT] #
49. T-ACT
50. T-ACA
51.
S-TCT
52.
S-AGT
53.
C-TGT
54.
S-TCA
55. RSG*-AGA
56. WCU-TGA
57.
F-TTT
58. Y-TAT
59.
N-AAT
60
I-ATT
61.
L-TTA
62. IM-ATA
63. KN-AAA
64. *QY-TAA
0
7
0
0
1
1
0
18*STOP
3
1
0
1
0
0
0
0
14 A/T ending codons not used in 10,814 codons in Cfla tRNA Synthatases
22 tRNAs not found in Cellulomonas flavigena (Cfla) and the use of their
cognate codons in ribPros and tRNA Synthetases
•
•
•
•
•
•
•
•
•
•
•
•
•
[GC]X[GC]
3. R-CGC
8. LT-CTC
13. H-CAC
[AT]X[GC]
[AT]X[AT]
49. T-ACT 1
34. G-GGT 40
35. A-GCT 11
36. P -CCT 1
125+
70+
[CG]X[AT]
24. F-TTC
33+
39.
40.
41.
42.
R-CGA 0
V-GTT 2
D-GAT 0
LT-CTT 0
51. S-TCT 0
52. S-AGT 0
53 C-TGT 0
44. LT-CTA
0
57 F-TTT 0
58 Y-TAT 0
59 N-AAT 0
60 I-ATT 0
46. H-CAT
0
62 IM-ATA 0
41+
13 of these codons are not used in ribPros or tRNA sythetases. 10 of
them end in T.
9 are used. 4 of the 5 most used end in C.
LUCA in Actinomycetales
Evidence that LUCA was a GC rich species of the Gram positive
bacterial order Actinomycetales includes:
1. Extreme GC content (97%) of the wobble base position of ALL
putative proteins identified in these species
2. The reliably identified proteins in these species are encoded
by genes that do not use 14 codons that end in A or T
3. Many codons with A or T in the wobble base position encode
different amino acids in different species
4. No tRNAs for the majority of the unused codons are found in
the genomes of these species
5. The absence of protein S21 from the ribosomes of all
actinobacteria suggests that S21 arose later in the evolution
of the ribosome of other phylum
LUCA in Actinomycetales
Evidence that LUCA was a GC rich species of the Gram positive bacterial order
Actinomycetales includes:
6. Over 80% of the putative proteins in these species have full
length multiple open reading frames including genes in
which all six frames are ORFs.
7. The start code in these species is more often GTG(Val) than
ATG(Met).
8. The absence of Cys and Trp residues in ribosomal proteins of
these species is greater than in other bacteria
9. All the members of each of the other 20 ribosomal protein
families can be aligned with sufficient accuracy to trace
them all back to a common origin in Actinomycetales
Potential Applications
• Design antibiotics to destroy ribosomal function of specific
classes, orders, and genus of bacteria.
• Use details of co-evolution of species , substrate, cofactor,
and function of families of enzymes to design selective
inhibitors
• Trace evolution of any enzyme family
• Control Species distribution of candidates for drug design
• Determine G+ ancestor of first G- bacteria.
• Identify bacterial precursor to first mitochondrion
• Determine order of evolution of eubacteria and archea
• Determine order of evolution of tRNA Synthetases I and II.
Conclusions
On the basis of these data and our previously published work we conclude that
five basic assumptions about the genetic code that appear in most textbooks
are false.
• The universal code is not universal.
• All species now on earth do not use a code “frozen
in time” as claimed by Watson and Crick.
• Codon use is not determined by the tRNA
population in a cell.
• Wobble base variation is not a reliable gauge of the
time course of mutational change.
Evolution of bacterial genes is not orders of
magnitude faster than eukaryote genes.
Flawed Gene Bank Annotation
• Every new bacterial genome that is annotated has over 500
genes without orthologs that are termed ORFans (Open
Reading Frames without ancestors). While it is clear that a
significant number of these ORFans are nonsense , a routine
technique to distingish bonne fide genes from nonsense has
not been found.
• Through analysis of bias in amino acid distribution and codon
use we are able to identify nonsense codes and nonsense
sequences that reliably identify tens of thousands of putative
gene product that are entirely or partially nonsense.
Sources of Errors in Gene Notation
• Incorrect assembly of DNA due to insufficient overlap redundancy,
common in extremely high or low GC regions.
• Wrong strand choice and wrong frame choice caused by programmed
maximization of gene length. Especially challenging in high GC genomes
having Multiply ORFs (MORFs) of equal length.
• Wrong Start selection due to standard practice of maximizing gene length
and the fact that in addition to Met(atg), Val(gcg) and Leu(ttg) are
common start codes. This is further complicated by the absence of up
stream start signal (TATA, CAT boxes etc) in Bacteria.
• Unjustified assumption that the standard Genetic code is appropriate to
all bacterial genomes until variants are established by biochemical
analysis.
Genetic Code Redundancy
• Because 64 possible three “letter” codons can be created from four
nucleotides and only 22 amino acids are encoded by them the code is
redundant.
• It was
assumed that all species used the same
“universal” code. This was a premature conclusion.
• Now at least 14 codons have been proven to have
different definitions in different species. Some
codons have as many as four definitions.
• Most of the codons with variable definitions are in
the AT rich half of the genetic code.
Once ribosomal proteins adjust to a significant change in
environment they undergo little further change for 100s of
millions of years.
Significant changes in ribosomal protein sequence accompany
appearance of new class, order, family or genus.
The magnitude of change decreases in that order.
Sample Alignment 5598 S19 Ribosomal Proteins
First row: one of 673 with major Indel (Opisthokonta and Archaea).
Last row: one of xxx with C-terminal addition [Better illustrated with Prosite Colors]
90% conserved residues in ribosomal protein S19
Sample of Alignment of 1495 S19 Ribosomal Proteins with DG conservation
1487 G+ bacteria, and 5 Archea and 3 Eukaryota with W=H
46 residues conserved at 90% identity
Sample of Alignment of 1757 S19 Ribosomal Proteins with {D}G conservation
1653 G- and 54 G+ bacteria*, 60 Eukaryota and 1 Archaea with W=H
35 residues conserved at 90% identity (20:GPRK)
*X(0,5)KK[GS]X(10,25)X(3)xX[RGWCKT]X(5)PX(3)[GARDENS]X(4)[VIL][HYF]{D}GX(7)[LIVMP]X(7)x[LFI][GASR][DEA][FYME]X(0,25
) 17 Erysipelotrichaceae (Firmicutes)
Separation of the S19 Ribpros in bacteria, chloroplasts and plants
from those in spirokeates, opisthokonta , and archaea by a single
site (Z = W versus {W}).
X(0,74)X(19)ZX[RGWCKT]X(5)PX(3)[GARSDEN]X(6)XGX(7)[LI
VMP]X(8)[LFI][GARS][DEA][FYME]X(2)[STP]X(5)[HKFYMT]X(
0,25)
Sample of Alignment of 480 S19 Ribosomal Proteins with WK {D}G conservation
269 Alpha-Proteobacteria and 211 Eukaryota (mitochondrial)
20 residues conserved at 90% identity (10:GPRK)
X(0,11)X(4)WK[GS]X(10,25)X(3)xX[RGWCKT]X(5)PX(3)[GARDENS]X(4)[VI
L][HYF]{D}GX(7)[LIVMP]X(7)x[LFI][GASR][DEA][FYME]X(0,25)
S19 Archaea Alignment
• X(0,50)X(3)[YFW][RHK]GX(15,35)XPX(3)[KR][RS]X(3)[RK][GR
QENV]X(16,35)X(5)RX(9)[GSDRENA]X(6)XGX(7)[LIVMP]X(8)[L
FI][GASR][DEA][FYME]XX[STP]X(5)[HKFYMT]X(0,25)
• Additional 55 residue prefix adding two indels
• Aligns 192 eukaryotes as well as 142 archaea
GARP conservation PHYLUM < CLASS < ORDER < GENUS
46 Escherichia and 8 Shigella L1 ribosomal proteins
Amino acid sequence homology:
Only 4 mutations in 3 sequence positions.
DNA sequence homology:
4 sites of nonsynonymous mutation
Only 16 sites of synonymous mutation
15 of 235 possible wobble bases
1 1st base synonymous mutation.
Mutations not correlated with genus variation
DNA Sequence Alignment of E.Coli and Shigalia, Local
Clustalx
GGA
GGC
GGC
GGT GGC
GGT
One Base Change
Multiple
Base
Changes
Analysis of DNA
• Of 22 Glycine Codes:
•
•
•
•
10 GGC
9 GGT
2 GGA
1 Variable Position (50% GGC / 50% GGT)
• 95% Total Conservation
– What does it mean?
– Total AA conservation all the way down to the DNA level
– Over Billions of years DNA has not changed in these wobble bases showing
how greatly conserved through evolution these bacteria actually are
– The codon bias is not due to bias in tRNA in the genus or species
Separating Chloroplasts from other eukaryota
• X(0,74)X(8)[TSIVM]XX[RK]X(5)[PFQLNH]X[FMYLS]X(6)[VI]XxGX(7)[LIVM
PTCF]X(8)[LFIVMTP][GASRK][DEAQ][FYME]XX[STP]X(5)[HKFYMT]X(0,25)
• X= W early chloroplasts and 95% of viridiplante
• X= H alveolata, opisthokonata[fungi,metazoa(including humans)]
X(0,17)X(25)[TSIVM]WX[RK]X(5)[PFQLNH]X[FMYLS]X(6)[VI]XxGX(7)[LIVMPTCF]X(8)[LFIVMTP][
GASRK][DEAQ][FYME]XX[STP]X(5)[HKFYMT]X(0,25)
finds 1615 Eukaryotes(1588 Viridiplante, 14 early Chloroplasts, 12 stamenophyles)
• X(0,31)X(14)GX(9)[PASYQV][KRINT][KQ][PS][NHS][SA][AG]X[RI][KP
RH]X(5)[LIFM]X(1,2)X(7)[YFHSNRLAQ][LIVACT][VMGPSQAT]X(3)ZX
(0,67)
•
• Z Bac Euk Vir Strm Fun
Metzo
Arch Total
• H 3806 792 516
48
107 (76 bilat)
0 4602
• C
0 386 47
6
125
161
0
386
• A
0
0
0
0
0
0
109
109
• G
0
4
0
0
0
0
37
43
• S
26
5
31
Gly(G), Ala(A), Arg(R ), and Pro(P) are the key to perfect alignment.
Why GARP?
Gly
Most protein folds have three or more Gly residues that turn in a way that
only they can. Such Glys are found in all Bacterial ribosomal proteins.
Pro
Arg
has a constrained conformation that provides consistent stability.
provides positive charge to balance negatively charged rRNA and forms
direct interactions with rRNA and tRNA.
Ala
is a major building block of alpha helices and beta strands.
_______________________________
GC (guanine-cytosine) base pairs have three hydrogen bonds while AT(adininethymine) have only two.
GC-rich DNA melts at a higher temperature than AT-rich DNA.
GC-rich codons may have been the first to acquire amino acid definitions.
The eight GC-only codon are:
GGG-Gly
GGC-Gly
G
GCG-Ala
GCC-Ala
A
CGG-Arg
CGC-Arg
R
CCC-Pro
CCG-Pro
P
How do we determine the three dimensional
structures of proteins?
X- ray cystallography!
Herbert Hauptman was awarded the Nobel Prize for
developing methods of x-ray crystal structure
determination in 1985
The determination of the structure of the ribosome
was awarded the Nobel Prize in 2009.
The 3D structures of all the ribosomal proteins are in
the Protein Data Bank
Sample alignment of S19 proteins of 142 Archeae , 19 Alveolata, 234 Opisthokonta and 66 viridiplante
Kingdom distribution of SRibPro
Homologs
•
•
•
•
•
•
The 60 cyanobacteria have a sequence position in their S19 protein that is fully occupied by
glutamine. This position is occupied by glutamine in only three of the 1297 S19 proteins in G+
bacteria and 195 of the 3985 S19 proteins in G – bacteria and eukaryotes (60 of which are the
cyanobacteria). The species of the other 135 S19 Ribpros having glutamine in this position may
provide evidence of a link between the ribosomes of cyanobacteria and those of other specific
phylum, classes, orders or genius.
The distribution of glutamines in the 5410 S19 proteins is given in the following table. Four
sequences in G+ bacteria have greater than 10% Glutamine occupancy, only one position in Gbacteria and eukaryotes and archeaes has greater than 10% occupancy(23%) and Cyanobacteria
have five sites of greater than 10% glutanmine occupancy. The five cyanobacteria sites correspond
to three of those in G+ and one in G-/Arc/Euk but the 100% Gln site is peculiar to cyanobacteria.
Adding the 100% conserved Q captures 82 hits in SwP (36 cyano, 22 G- bacteria and 24
eukaryota(the usual chloroplasts(21) cyanelle,and plastids). The 5 glutamine sites retain
appreciable glutamine occupancies (25%,18%,16%,50% and 100%). It may be that a subset of
cyanobacteria and chloroplasts share full glutamine occupancy in these five sites. If so would it
have any significance?
Positions in S19 occupied by 5% or
greater Glutamine
•
•
•
•
•
•
Pos
Cyan
32
75
89
102
109
111
G+
61%
35
19
0
28
0.2 (3)
G-/Eu/Arc
3%
2
10
23
0
4.9 (195)
40%
45
23
53
0
100 (60)
Genetic Code Redundancy
• Because 64 possible three “letter” codons can be created from four
nucleotides and only 22 amino acids are encoded by them the code is
redundant.
• It was
assumed that all species used the same
“universal” code. This was a premature conclusion.
• Now at least 14 codons have been proven to have
different definitions in different species. Some
codons have as many as four definitions.
• Most of the codons with variable definitions are in
the AT rich half of the genetic code.
Multiple copies of some RibPros?
S18 has 70 more actinobacterial homologs than other
S RibPros. The majority are in three genus, 34 mycobacterium, 26
streptomyces and 7 neocardiaceae. These constitute additional
copies of S18 in these 67 species.
A major difference between the two copies of S18 in
mycobacterium tuberculosis is that one copy has an additional
conserved cysteine at its C-terminus while the other has only two
conserved cysteines that are found in all mycobacterium.
These positions of the cysteines in the folded protein need to be
mapped.
•
MX(5,8)PFX(3,6)X(21)RX(9)[GSDRENA]X(6)NGX(7)[LIVMP]X(5){AIT
}XX[LFI][GASR][DEA][FYME]XX[STP]X(5)[HKFYMT]X(0,25)
• A modification of the (NG) S19 Muf set (shown here) captures
1946 eukaryotes, 1925 G- bacteria and 123 archaea. This
separates the plant world from opisthokonata and archaea and at
the same time subdivides the G- bacteria. The S19 ribpros of the
plant world have a total lengths closer to those of the G- bacteria
and have the highest sequence homology with the S19s of 63
cyanobacteria.
•
•
•
•
MX(0,15)WKX(0,40)RX(9)[GSDRENA]X(6)NGX(7)[LIVMP]X(5){AIT}XX[LFI][GASR][
DEA][FYME]XX[STP]X(5)[HKFYMT]X(0,25)
This WK modification captures 97
•
•
•
•
•
Chromera velia (1) Cyanophora paradoxa (1) Dictyosteliida (3) Hemiselmis andersenii (1) Malawimonas jakobiformis (1)
Opisthokonta (70) Fungi (69) Batrachochytrium dendrobatidis (1) Dikarya (68)
Trichoplax adhaerens (1)
Reclinomonas americana (1) Stramenopiles (4) Viridiplantae (15) Chlorophyta (7)
Streptophytina (8)
Homology in S19 of 1535 Bac and 1485 plants
The introduction of the PF residues isolates 1486 S19’s of the plant
world and 1489 G- bacteria from all Archaea and all but 46 G+ Bacteria.
The 3022 S19’s have 23 of 92 residues 95% conserved (9 GARP)
Peeling the ribosomal onion
Hsiao, C. et al. Mol Biol Evol 2009 26:2415-2425; doi:10.1093/molbev/msp163
Copyright restrictions may apply.
Fluorescent labeling of ribosomal proteins L1 and L9 within the 50S ribosomal subunit.
Fei J et al. PNAS 2009;106:15702-15707
©2009 by National Academy of Sciences
The Power of Perfect Alignment
Ribosomal Proteins will be used to demonstrate
the power of Gly, Ala, Arg, and Pro (GARP)
based perfect alignment.
Distribution of 4 SRibPros by Bacterial Phylum
Fully Conserved Gly and the Asn before it in G- Bacteria,
Eukaryotes and Archeae
Cyanobacteria Begat Chloroplasts
MX(0,22)X(26)[AST]X[DE]XZ3X(5,8)X(4)xXRXXXX[LM]
PXGXGX(15,17)[AS]XXXX[GA]X(5)X(1,11)X{w}X(3)D
X(5)PX(1,10)[GA]XXXGXX(23,25)XXGX(28)NX(12)
PX(7)Z4X(20,35)
120{w},Z3(h) and Z4[wf] = Three site co isolation of the entire class of cyanobacteria (44)
and 14 chloroplasts .
The Chloroplasts have higher seqence
homology with cyanobacteria than with other
bacterial L1s
LUCA
-
Gr
+
Gr
S19 Mitochondrial
w/o n-terminus
S19 Cytosolic
w/o n-terminus
MITOCHONDRIAL
Cytosolic
Two S19 Ribosomal proteins in Fungi
• The S19 Muf finds 265 Fungal examples of two distinct types based on length and
conserved residues. One is approximately 150 aa long and the other approximately 90 aa
long. The shorter of the two resembles bacterial S19 and the longer resembles the metazoa
and archaea.
•
• 143 Cytosolic S19
140 to 163 aa long, 60 to 70 aa N-terminal
•
addition to 65 aa S19 core protein present in
•
all S19 copies in all species. The addition
•
contains a 95% conserved RXRRX(3)RG
•
sequence also found in Metazoa and
•
Archaea but not in bacteria.
•
• 122 Mytochondrial S19 85 to 105 aa long, 28 to 38 aa N-terminal
•
addition to the 65 aa S19 core. The addition
•
contains a 95% conserved RSXWKGP
•
sequence found in all alpha-proteobacteria
•
but not in any other bacteria, eukaryota or
•
archaea.
•
•
Mitocondrial and cytosolic S19 in Fungi
•
The following modification to the S19 MUF vector isolate 110 mitochondrial and 125 cytosolic S19
proteins:
•
X(0,74)X(21)XX(5)PX(3)[GSDRENA]X(5)[HYF][DNST]GX(7)[LIVMP]X(8)[LFI][GASR][DEA][FYME]X(0,35)
•
The HY and F separate fungal mitochondrial and cytosolic S19 proteins. They have two distinctly different
subsets with a difference in overall length. They share highly conserved residues in their central core with
all bacterial S19 Ribpros but differ significantly in sequences throughout the sequences and especially at
the N and C terminus.
•
•
•
Mitochondrial X(0,55)X(15)[LI]P[QNPR][FM][VIC]G[LIVA]X[FL]XX[HY][NT]GX(0,50)
Cytosolic
X(0,40)V[KR]TH[LCM]R[DN][ML][LIP]X(0,60)
Download