Document

advertisement
Institute
of
Molecular Biotechnology
Jena
Swetlana Nikolajewa, Andreas Beyer, Maik Friedel, Jens Hollunder, Thomas Wilhelm
Institute of Molecular Biotechnology, Jena Germany
Overview: Purine-Pyrimidine Patterns

Part 1
New Classification Scheme of the Genetic code

Part 2
Type II Restriction Enzyme Binding Sites
Overview: Genetic Code

Part 1. The purine-pyrimidine scheme of the
genetic codes shows

amino-acids patterns and regularities of codons

symmetry characteristics

possible predecessors of our contemporary quaternary triplet
code

explanation for the number (22) of tRNA genes in mammalian
mitochondrial genome
PuRines vs. PYrimidines
A
G
C
T
Purine pairs with Pyrimidine
3 H Bonds
2 H Bonds
The Common Genetic Code Table


2nd base
3 nucleobases (triplets)
of A, G, C, U code for 20 AAs
U
C
A
G
64 possible codons (4x4x4=43)
U
UUU Phe
UUC Phe
UUA Leu
UUG Leu
UCU Ser
UCC Ser
UCA Ser
UCG Ser
UAU Tyr
UAC Tyr
UAA Stop
UAG Stop
UGU Cys
UGC Cys
UGA Stop
UGG Trp
U
C
A
G
C
CUU Leu
CUC Leu
CUA Leu
CUG Leu
CCU Pro
CCC Pro
CCA Pro
CCG Pro
CAU His
CAC His
CAA Gln
CAG Gln
CGU Arg
CGC Arg
CGA Arg
CGG Arg
U
C
A
G
U
C
A
G


3 termination codons:
UGA, UAG, UAA
Met (AUG) codon is also the
start codon
1st
base
A
AUU Ile
AUC Ile
AUA Ile
AUG Met
ACU Thr
ACC Thr
ACA Thr
ACG Thr
AAU Asn
AAC Asn
AAA Lys
AAG Lys
AGU Ser
AGC Ser
AGA Arg
AGG Arg
G
GUU Val
GUC Val
GUA Val
GUG Val
GCU Ala
GCC Ala
GCA Ala
GCG Ala
GAU Asp
GAC Asp
GAA Glu
GAG Glu
GGU Gly
GGC Gly
GGA Gly
GGG Gly
The Common Genetic Code Table contains 64 fields…
U
C
A
G
3rd
base
Purine-Pyrimidine Classification Scheme
of the Genetic Code


C
A
binary representation of nucleobases
purines : A, G → 1
pyrimidines: C, U → 0
23 = 8 different binary triplets
000 , 001, … ,111
each of these has again 8 possibilities, for instance:
 000
stands for three pyrimidines: CCC, CCU, UUC, …, UUU
 111
stands for three purines:
GGG, GGA, GAA, …, AAA
G binds via 3 hydrogen bonds in the complementary base pairing
U binds via 2 hydrogen bonds in the complementary base pairing
Purine-Pyrimidine Table of the Genetic Code
Codon
000
Strong codons
6 H bonds
Pro
(C/U)
Ser
Proline
001
Pro
100
Ala
101
Ala
010
Arg
CC
GC
(A/G)
Ser
(C/U)
Thr
(A/G)
Thr
(C/U)
110
Gly
CG
GG
GG
Glycine
(A/G)
Leu
(A/G)
AC
(C/U)
Val
AC
(A/G)
AG
Val
(C/U)
His
Arg AG
Arginine
(A/G)
Leu
GU
GU
CA
(A/G)
Gln
CA
UU
(A/G)
Leucine
(C/U)
Ile
AU
(C/U)
Isoleucine
(A/G)
Ile/Met AU
(A/G)
Isoleucine/Methionine
(C/U)
Tyr
Histidine
UA
(C/U)
Tyrosine
(A/G)
Stop
(C/U)
Asn
UA
(A/G)
Glutamine
(C/U)
Asp
Serine
(A/G)
CU
(C/U)
Phenylalanine
Valine
Stop/Trp UG
Ser
Phe UU
Valine
Tryptophan
(C/U)
(C/U)
Leucine
Cystein
Glycine
Gly
UC
Cys UG
Arginine
CU
Weak codons
4 H bonds
Leucine
Threonine
Arginine
Arg
Leu
Threonine
Alanine
CG
(C/U)
Serine
Alanine
GC
UC
Mixed codons
5 H bonds
Serine
Proline
011
111
CC
Mixed codons
5 H bonds
GA
(A/G)
Glu
GA
(A/G)
Glutamatic acid
AA
(C/U)
Asparagine
Asparatic acid
Lys
AA
(A/G)
Lysine
…the new scheme contains the same information in only 32 fields.
Amino Acid Patterns:
Polar Requirement of NCN and NUN Codons
Strong
Mixed
Mixed
Weak
6 hydrogen bonds
5 hydrogen bonds
5 hydrogen bonds
4 hydrogen bonds
Codon
000
Pro
CC
(C/U)
Ser
UC
(C/U)
Leu
CU
(C/U)
Phe UU
001
Pro
CC
(A/G)
Ser
UC
(A/G)
Leu
CU
(A/G)
Leu
UU
(A/G)
100
Ala
GC
(C/U)
Thr
AC
(C/U)
Val
GU
(C/U)
Ile
AU
(C/U)
101
Ala
GC
(A/G)
Thr
AC
(A/G)
Val
GU
(A/G)
010
Arg
CG
(C/U)
Cys UG
(C/U)
His
CA
(C/U)
Tyr
UA
(C/U)
011
Arg
CG
(A/G)
Stop/Trp UG
Gln
CA
(A/G)
Stop
UA
(A/G)
110
Gly
GG
(C/U)
GA
(C/U)
Asn
Ser
AG
(A/G)
(C/U)
Asp
Ile/Met AU
Gly
GG
(A/G)
Arg AG
(A/G)
Glu
GA
(A/G)
Glutamatic acid
AA
(A/G)
(C/U)
Asparagine
Asparatic acid
111
(C/U)
Lys
AA
(A/G)
Lysine
C. R. Woese, G. J. Olsen, M. Ibba, D. Söll Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process. MMBR 2000(64) 202-236
Amino Acid Patterns: Hydrophobicity
Codon
Strong
Mixed
Mixed
Weak
6 H-bonds
5 H-bonds
5 H-bonds
4 H- bonds
000
Pro
CC
(C/U)
Ser
UC
(C/U)
Leu
CU
(C/U)
Phe UU
001
Pro
CC
(A/G)
Ser
UC
(A/G)
Leu
CU
(A/G)
Leu
UU
(A/G)
100
Ala
GC
(C/U)
Thr
AC
(C/U)
Val
GU
(C/U)
Ile
AU
(C/U)
101
Ala
GC
(A/G)
Thr
AC
(A/G)
Val
GU
(A/G)
010
Arg
CG
(C/U)
Cys UG
(C/U)
His
CA
(C/U)
Tyr
UA
(C/U)
011
Arg
CG
(A/G)
Stop/Trp UG
Gln
CA
(A/G)
Stop
UA
(A/G)
110
Gly
GG
(C/U)
Ser
AG
(C/U)
Asp
GA
(C/U)
Asn
AA
(C/U)
111
Gly
GG
(A/G)
Arg AG
(A/G)
Glu
GA
(A/G)
Lys
AA
(A/G)
(A/G)
(C/U)
Ile/Met AU
(A/G)
Kyte&Doolittle, 1982, http://biology-pages.info
Codon-Anticodon Symmetry
Codon
Strong
Mixed
Mixed
Weak
6 H-bonds
5 H-bonds
5 H-bonds
4 H-bonds
000
Pro
CC
(C/U)
Ser
UC
(C/U)
Leu
CU
(C/U)
Phe UU
001
Pro
CC
(A/G)
Ser
UC
(A/G)
Leu
CU
(A/G)
Leu
UU
(A/G)
100
Ala
GC
(C/U)
Thr
AC
(C/U)
Val
GU
(C/U)
Ile
AU
(C/U)
101
Ala
GC
(A/G)
Thr
AC
(A/G)
Val
GU
(A/G)
010
Arg
CG
(C/U)
Cys UG
(C/U)
His
CA
(C/U)
Tyr
UA
(C/U)
011
Arg
CG
(A/G)
Stop/Trp UG
Gln
CA
(A/G)
Stop
UA
(A/G)
110
Gly
GG
(C/U)
Ser
AG
(C/U)
Asp
GA
(C/U)
Asn
AA
(C/U)
111
Gly
GG
(A/G)
Arg AG
(A/G)
Glu
GA
(A/G)
Lys
AA
(A/G)
(A/G)
(C/U)
Ile/Met AU
(A/G)
Point Symmetry
Codon
Strong
Mixed
Mixed
Weak
6 H-bonds
5 H- bonds
5 H-bonds
4 H-bonds
000
Pro
CC
(C/U)
Ser
UC
(C/U)
Leu
CU
(C/U)
Phe UU
001
Pro
CC
(A/G)
Ser
UC
(A/G)
Leu
CU
(A/G)
Leu
UU
(A/G)
100
Ala
GC
(C/U)
Thr
AC
(C/U)
Val
GU
(C/U)
Ile
AU
(C/U)
101
Ala
GC
(A/G)
Thr
AC
(A/G)
Val
GU
(A/G)
010
Arg
CG
(C/U)
Cys UG
(C/U)
His
CA
(C/U)
Tyr
UA
(C/U)
011
Arg
CG
(A/G)
Stop/Trp UG
Gln
CA
(A/G)
Stop
UA
(A/G)
110
Gly
GG
(C/U)
Ser
AG
(C/U)
Asp
GA
(C/U)
Asn
AA
(C/U)
111
Gly
GG
(A/G)
Arg AG
(A/G)
Glu
GA
(A/G)
Lys
AA
(A/G)
(A/G)
(C/U)
Ile/Met AU
(A/G)
D. Halitsky Extending the (Hexa-)Rhombic Dodecahedral Model of the Genetic Code: the Code's Four 6-fold
Degeneracies and the Ten Orthogonal Projections of the 5-cube as 3-cube. Computer Systems Technology 2004
Codon-Reverse Codon (XYZ↔ZYX) Symmetry
Codon
Strong
Mixed
Mixed
Weak
6 H-bonds
5 H- bonds
5 H-bonds
4 H-bonds
000
Pro
CC
(C/U)
Ser
UC
(C/U)
Leu
CU
(C/U)
Phe UU
001
Pro
CC
(A/G)
Ser
UC
(A/G)
Leu
CU
(A/G)
Leu
UU
(A/G)
100
Ala
GC
(C/U)
Thr
AC
(C/U)
Val
GU
(C/U)
Ile
AU
(C/U)
101
Ala
GC
(A/G)
Thr
AC
(A/G)
Val
GU
(A/G)
010
Arg
CG
(C/U)
Cys UG
(C/U)
His
CA
(C/U)
Tyr
UA
(C/U)
011
Arg
CG
(A/G)
Stop/Trp UG
Gln
CA
(A/G)
Stop
UA
(A/G)
110
Gly
GG
(C/U)
Ser
AG
(C/U)
Asp
GA
(C/U)
Asn
AA
(C/U)
111
Gly
GG
(A/G)
Arg AG
(A/G)
Glu
GA
(A/G)
Lys
AA
(A/G)
(A/G)
(C/U)
Ile/Met AU
(A/G)
Codon-Reverse Codon (XYZ↔ZYX) Symmetry
Stop
AUC
UAG
STOP
Asp
AUC
CUA
GAU
Asp
Evolution of the Genetic Code

our contemporary code is the
quaternary triplet code:
43=64 fields
00*
00*
00*
00*
01*
01*
01*
01*
10*
10*
10*
10*
11*
11*
11*
11*
CGU, UAC,…

quaternary doublet code:
42=16 fields

00
00
00
00
01
01
01
01
10
10
10
10
11
11
11
11
CGU, UAC,…
binary doublet:
41=4
fields
00
01
10
11
Evolution: Scenario 1
Codon
000
01
01
01
01
10
10
10
10
11
11
11
11
Mixed
Mixed
Weak
5 H bonds
4 H bonds
Pro
100
Ala
101
Ala
111
00
5 H bonds
Pro
110
00
6 H bonds
001
011
00
Strong
CC
(C/U)
Ser
Proline
010
00
CC
(A/G)
Ser
(C/U)
Thr
(A/G)
Thr
Alanine
Arg
CG
(C/U)
CG
Gly
GG
GG
Glycine
(A/G)
Leu
(A/G)
AC
(C/U)
Val
AC
(A/G)
Val
His
AG
Arg AG
(A/G)
Leu
GU
GU
CA
(A/G)
Gln
(C/U)
CA
(C/U)
Ile
(A/G)
(A/G)
AU
(C/U)
Isoleucine
(A/G)
Ile/Met AU
(A/G)
Isoleucine/Methionine
(C/U)
Tyr
UA
(C/U)
Tyrosine
(A/G)
Stop
UA
(A/G)
Glutamine
Asp
GA
(C/U)
Asn
Glu
GA
(A/G)
Glutamatic acid
AA
(C/U)
Asparagine
Asparatic acid
Arginine
UU
Leucine
Histidine
Serine
(A/G)
CU
(C/U)
Phenylalanine
Valine
(C/U)
Stop/Trp UG
Ser
Phe UU
Valine
Tryptophan
(C/U)
(C/U)
Leucine
Cystein
Glycine
Gly
UC
Cys UG
Arginine
CU
Leucine
Threonine
Arginine
Arg
Leu
Threonine
Alanine
GC
(C/U)
Serine
Proline
GC
UC
Serine
Lys
AA
Lysine
(A/G)
Evolution: Scenario 2
Codon
000
Pro
Pro
100
Ala
101
Ala
010
011
110
111
01
01
01
10
10
10
10
11
11
11
11
Mixed
Weak
5 H bonds
4 H bonds
CC
(C/U)
Ser
CC
GC
GC
CG
CG
(A/G)
Ser
(C/U)
GG
Leu
Thr
(A/G)
Thr
UC
(A/G)
Leu
AC
(C/U)
Val
AC
(A/G)
Val
Cys UG
(C/U)
His
(A/G)
Gln
AG
Arg AG
Leu
GU
GU
CA
(C/U)
CA
(C/U)
Ile
(A/G)
(A/G)
AU
(C/U)
Isoleucine
(A/G)
Ile/Met AU
(A/G)
Isoleucine/Methionine
(C/U)
Tyr
UA
(C/U)
Tyrosine
(A/G)
Stop
Asp
GA
(C/U)
Asn
UA
(A/G)
Glu
GA
(A/G)
Glutamatic acid
AA
(C/U)
Asparagine
Asparatic acid
Arginine
UU
Leucine
Glutamine
Serine
(A/G)
(A/G)
Histidine
Stop/Trp UG
Ser
CU
(C/U)
Phenylalanine
Valine
Tryptophan
(C/U)
Phe UU
Valine
Cystein
(A/G)
(C/U)
Leucine
Threonine
(C/U)
CU
Leucine
Threonine
Glycine
Glycine
(C/U)
Serine
Arginine
GG
UC
Serine
Arginine
Gly
01
Mixed
Alanine
Gly
00
5 H bonds
Alanine
Arg
00
Strong
Proline
Arg
00
6 H bonds
Proline
001
00
Lys
AA
Lysine
(A/G)
Mitochondrial genomes
have several surprising features

genetic code of mitochondria
?

only 22 tRNAs are required for mammalian mitochondrial protein synthesis
The Mammalian Mitochondrial Genetic Code
Codon
Strong
Mixed
Mixed
Weak
6 H bonds
5 H bonds
5 H bonds
4 H bonds
000
Pro
CC
(C/U)
Ser
UC
(C/U)
Leu CU
(C/U)
Phe UU
001
Pro
CC
(A/G)
Ser
UC
(A/G)
Leu CU
(A/G)
Leu
UU
(A/G)
100
Ala
GC
(C/U)
Thr
AC
(C/U)
Val
GU
(C/U)
Ile
AU
(C/U)
101
Ala
GC
(A/G)
Thr
AC
(A/G)
Val
GU
(A/G)
010
Arg
CG
(C/U)
Cys UG
(C/U)
His
CA
(C/U)
Tyr
UA
(C/U)
011
Arg
CG
(A/G)
Trp /Trp UG
Gln
CA
(A/G)
Stop
UA
(A/G)
110
Gly
GG
(C/U)
Asp
GA
(C/U)
Asn
AA
(C/U)
111
Gly
GG
(A/G)
Glu
GA
(A/G)
Lys
AA
(A/G)
Ser
STOP
AG
(A/G)
(C/U)
AG
(A/G)
(C/U)
Met/Met AU
(A/G)
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
The Mammalian Mitochondrial Code
8 tRNAs for family codons + 14 tRNAs for non-family codons = 22
Codon
Strong
Mixed
Mixed
Weak
6 H bonds
5 H bonds
5 H bonds
4 H bonds
tRNAPhe UU (C/U)
000
tRNAPro CC
tRNASer1
UC
tRNALeu1 CU
001
tRNALeu2 UU (A/G)
100
tRNAIle AU (C/U)
tRNAAla
GC
tRNAThr
AC
tRNAVal GU
tRNAMet AU (A/G)
101
tRNACys UG (C/U)
tRNAHis CA (C/U)
tRNATyr UA (C/U)
011
tRNATrp UG (A/G)
tRNAGln CA (A/G)
STOP
110
tRNASer2 AG (C/U)
tRNAAsp GA (C/U)
tRNAAsn AA (C/U)
tRNAGlu GA (A/G)
tRNALys AA (A/G)
010
tRNAArg
tRNAGly
111
CG
UA
(A/G)
GG
STOP
AG
(A/G)
http://mamit-trna.u-strasbg.fr/2DStructures.html
Part 2. Common Patterns
in Type II Restriction Enzyme Binding Sites
Restriction Enzyme (Endonuclease)
Restriction enzymes
 recognize short specific DNA sequences
 enable bacteria to destroy foreign DNA
 are useful tools in biotechnology

The most well studied class of REs is
type II, which cleave DNA within their
recognition sequences

Many recognition sequences are
palindromic
Are REase similar in the binding sites?
Restriction
Enzyme
Source
Recognition
Sequence
Pur (1)–pyr (0)
pattern
AluI
Arthrobacter luteus
AG↓CT
11↓00
HaeIII
Haemophilus aegyptius
GG↓CC
11↓00
BamHI
Bacillus
amyloliquefaciens
G↓GA TCC
1↓11 000
HindIII
Haemophilus
influenzae
A↓AG CTT
1↓11 000
EcoRI
Escherichia coli
G↓AA TTC
1↓11 000
Examples from Kimball‘s Biology Pages
How significant is the Pattern RR/YY (11/00)?
Asymmetrical (2%)
recognition sequences

Frequencies of



dinucleotides
trinucleotides
tetranucleotides
coded in three possible coding scheme:

Type II
3726


R vs Y (G, A vs C, T)
K vs M (G, T vs C, A)
S vs W (G, C vs A, T)
Symmetrical (98%)
recognition sequences
In the symmetrical set the most significant dinucleotides are
RR (or 11) (p-value <10-63) and YY (or 00) (p-value <10-29)
In the asymmetric set RRR, YYY and YYYY are even more significant, but
RR and YY also stand out.
Why is the Motif RR..YY preferred?
Dinucleotides RR..YY are characterized by:

stronger H-bond donor and
acceptor clusters
Figure 1 Example of an interaction between an H-bond donor
cluster (resulting from two adjacent purines AA) and an H-bond
acceptor.

specific geometrical properties

minimal slide values

strong tilt
in the negative direction

positive roll

low stacking energy
Outlook

Looking for binary patterns
in the genomes

Additional information
http://www.imb-jena.de/tsb
Thank you for your attention !
Purine-Pyrimidine Scheme of the Genetic Code
Strong
Mixed
Mixed
Weak
6 hydrogen bonds
5 hydrogen bonds
5 hydrogen bonds
4 hydrogen bonds
Codon
000
Pro
CC
(C/U)
Ser
Proline
001
Pro
100
Ala
101
Ala
010
011
110
111
CC
(A/G)
Ser
(C/U)
Thr
(A/G)
Thr
Alanine
Arg
CG
(C/U)
CG
Gly
GG
GG
Glycine
(A/G)
Leu
(A/G)
AC
(C/U)
Val
AC
(A/G)
Val
His
AG
Arg AG
(A/G)
Leu
GU
GU
CA
(A/G)
Gln
(C/U)
CA
(C/U)
Ile
(A/G)
(A/G)
AU
(C/U)
Isoleucine
(A/G)
Ile/Met AU
(A/G)
Isoleucine/Methionine
(C/U)
Tyr
UA
(C/U)
Tyrosine
(A/G)
Stop
UA
(A/G)
Glutamine
Asp
GA
(C/U)
Asn
Glu
GA
(A/G)
Glutamatic acid
AA
(C/U)
Asparagine
Asparatic acid
Arginine
UU
Leucine
Histidine
Serine
(A/G)
CU
(C/U)
Phenylalanine
Valine
(C/U)
Stop/Trp UG
Ser
Phe UU
Valine
Tryptophan
(C/U)
(C/U)
Leucine
Cystein
Glycine
Gly
UC
Cys UG
Arginine
CU
Leucine
Threonine
Arginine
Arg
Leu
Threonine
Alanine
GC
(C/U)
Serine
Proline
GC
UC
Serine
Lys
AA
Lysine
(A/G)
Download