4. Nucleic Acids

advertisement
4.1
4. Nucleic Acids
Overview
1. Introduction
2. Chemical structure of nuclei acids
3. 3D-structure of DNA
4. Copying of DNA
5. Genetic code
6. Translation
7. Tools in genetics
1. Introduction
Dogma in molecular biology
"Information
is stored in DNA, copied to RNA and used to build proteins"
replication
transcription
DNA
DNA:
RNA:
replication:
transcription:
translation:
translation
RNA
Protein
deoxyribonucleic acid;
ribonucleic acid;
copying of entire genome prior to cell divisions;
copying of one or a few genes (from the DNA) to RNA;
synthesis of a protein according to information from RNA.
Empty arrows "violate" the dogma: RNA-viruses and reverse transcription.
The sequence of nucleic acids represents the basis for the storage of genetic
information. The 3D-structure is important for processes such as reading of this
information.
Chemical composition of E.coli cells
Molecule
Number Types
nucleic acids: DNA
2-4
1
mRNA
1000
1000
tRNA
4*105
60
rRNA
30’000
3
6
proteins
10
3000
H2O
Molecular mass % of cell mass
2.6*109
1
5
8*10
\
25’000
}
6
105 – 106
/
40’000
15
70
Nucleic acids and proteins make up for a large fraction of the cellular mass.
4.2
Overview of nucleic acids
There are two main types of nucleic acids: deoxyribonucleic acid (DNA) and
ribonucleic acid (RNA). As explained later, the differences are rather subtle but with
significant consequences on stability and structure.
DNA
DNA is found in cells of prokaryotes, in cell nuclei and mitochondria of eukaryotes
and in viruses. It occurs in linear or circular form. The following table summarizes
information on the occurrence and size of DNA. ("Base pairs" will be explained
later.)
Organism
Simian virus 40
Bacteriophage T4
E. coli
Yeast (S. cerevisiae)
Drosophila melanogaster
Mammals
base pairs
5243
~166’000
4’720’000
13*106
165*106
3000*106
genes
6
>100
>3000
length [mm]
0.0017
0.061
1.3
4.3
56
1000
Mammalians have about 1 m of DNA distributed over chromosomes (23 in humans,
haploid genome). The muntjak, an Asian deer, has about 1000*106 base pairs, but
only 3 large chromosomes.
Only about 10% of the DNA in eukaryotes store information: ~30’000 genes.
RNA
There are 3 types of RNA: transfer RNA, ribosomal RNA and messenger RNA. The
sizes below are for E. coli.
Name
tRNA (transfer)
rRNA (ribosomal)
mRNA (messenger)
size (nucleotides)
75-95
16S
1542
23S
2904
5S
120
100 – 10’000
function
transfer of amino acid to ribosome
structure and function of ribosome
copy of gene read by ribosome
Ribosomes are very large molecular systems consisting of many proteins and nucleic
acids. They translate genomic information from mRNAs into proteins (see later). The
size indications "16S" etc. stem from ultra-centrifugation measurements and indicate
the sedimentation behavior (units are Svedberg).
Special attention has recently been given to certain small RNA strands that form
specific 3D structures with catalytic activity, the ribozymes. Interest in these
molecules is twofold: they may provide new aspects to the question of what was first,
proteins or DNA, and they may allow the construction of new types of “enzymes”.
4.3
History
1941
“one gene–one enzyme” hypothesis
1944
genetic information is on DNA
1946
bacterial genetics
1952
DNA: information for proteins
1953
DNA double helix
~1957 dogma of molecular biology
1961
gene regulation, lac-operon
1962
restriction enzymes
1965
first sequence of a tRNA (yeast,Ala)
1962-66 genetic code
~1970 gene technology
1976
first 3D structure of a RNA
1977
sequencing of DNA
1978
splicing of mRNA
1980… x-ray/NMR: DNA structure studies
1982
ribozymes
1982
photosynthetic reaction center
1984
homeobox
1986
PCR (polymer chain reaction)
1991
prions
2000
ribosome at atomic resolution
Beadle, Tatum (1958)
Avery
Lederberg, Tatum (1958)
Hershey, Chase (1969)
Watson, Crick (1962)
(Crick)
Jacob, Monod (1965)
Arber (1978)
Holley (1968)
Nirenberg, Khorana (1968)
Rich, Klug (1982)
Gilbert, Sanger (1980)
Sharp, Roberts (1993)
Rich, Dickerson
Cech, Altman (1989)
Deisenhofer, Huber, Michel (1988)
Lewis, Nüsslein-Volhard, Wieschaus (95)
Mullis (1993)
Prusiner (1997)
Steitz
Nobel Prize Winners are underlined.
2. Chemical structure of nuclei acids
Nucleic acids are linear chains (like proteins). The elements of these chains, forming a
linear sequence of “letters” with genomic information, are nucleotides that are each
composed of a base, a sugar and a phosphate. Sugars and phosphates form the
backbone; they are identical for all nucleotides of a nucleic acid chain. The bases are
attached as side chains to the sugars; they differ from one “letter” to the other. Four
different bases occur in DNA (and RNA).
Consult also the figure on the next page for the following discussion of the three
components.
Phosphate group
Each phosphate group carries (at normal pH) one negative charge. The oxygen atoms
provide reactive centers for hydrogen bonds or ion binding.
Sugars (see figure at bottom of next page)
Consist of 5 carbons:
C1' to C5'
Chiral centers at:
C2’, C3’, C4’
Full name:
(2’-deoxy-) -D-ribofuranose
furanose:
5-membered ring
ribose:
same chirality at C2’, C3’, C4’
D:
configuration around C4’
:
trans configuration of oxygens on C1’ and C2’ (base!)
2’-deoxy:
DNA rather than RNA
4.4
Chemical structure
RNA fragment AUG (end groups missing); atom radii: P > O > N > C > H
Left: gray shades for different nucleotides
Right: phosphate groups (black), sugars (light gray) and bases (dark gray)
Sugar fragments
H-O5’
C5’
base
O4’
C4’
C3’
H-O3’
C1’
C2’
(O2’)
The ring has internal flexibility: ring pucker (with strain).
The DNA sugars have two O-H groups, 3’ and 5’, where the phosphate groups are
attached. This means that the backbone is oriented.
4.5
Bases
Bases are heterocyclic rings with aromatic character; the rings are therefore (almost)
planar. The chemical structure of the common bases and the atom numbering is
provided in the figure below. Additional, rare, bases occur in tRNA (see later).
Characteristic features of the bases are:
Purines:
adenine
(A, Ade): NH2 at position 6
guanine
(G, Gua): O at position 6, NH2 at position 2
Pyrimidines: thymine
(T, Thy): CH3 at position 5, O at positions 2 and 4
cytosine (C, Cyt): O at position 2, NH2 at position 4
uracil
(U, Ura): O at positions 2 and 4
These chemical differences define different patterns for interaction with other
molecules (e.g. proteins): hydrogen bond donors and acceptors, hydrophobic patches
(see the discussion of the 3D-structure of DNA below).
Nucleotides
Nucleotides consist of one (or more) phosphates, a sugar and a base. The base is
attached to the ribose at the C1’ via a glycosidic linkage. Ester bonds connect the
riboses and the phosphates. Normally, the phosphates bind to the ribose at the 5’ end.
DNA nucleotides:
RNA nucleotides:
deoxyriboses plus
riboses plus
A, G, T, C
A, G, U, C
Other related nucleotides are:
ATP: adenine-tri-phosphate;
GTP: guanine-tri-phosphate
NAD: nicotine-adenine-dinucleotide
These often serve as energy carriers in cells.
Polynucleotides
The combination of several nucleotides results in a sugar-phosphate backbone with a
5’-end group and a 3´-end group, each with or without phosphate. Several notations
are used: 5’-pA-C-G-T-3’ or simply ACGT.
DNA-strands are always written from 5’ to 3’.
4.6
3. 3D-structure of DNA
Interaction between nucleotides
The following interactions (de-)stabilize the 3D-structure of DNA:
 DNA and RNA carry negatives charges. For stability reasons these must interact
with external cations, since no positive charges are found on the DNA.
 Electron clouds around atoms are polarized, which results in weak attraction
between non-polar atoms. In DNA, a sizeable force is due to the interaction
between -orbitals of stapled bases: “base stacking”. This interaction is an
important factor in stabilizing DNA double helices.
 A hydrogen atom may be “shared” between two polar atoms, a donor and an
acceptor. Polar groups are found in all parts of a nucleotide; most interesting are
those in the bases: Base pairs are formed by hydrogen bonds between A – T and
G – C in DNA, and between A – U and G- C in RNA (see figure on page 5). The
bases in base pairs are complementary with respect to hydrogen bonds and space
requirements. Consequences are (a) [A] = [T] and [G] = [C] when base pairs are
formed ([X] indicates the concentration of X), and (b) GC-rich fragments are
more stable then AT-rich ones (see figures below).
For the specific recognition of DNA by other molecules, e.g. proteins, hydrogen bond
donors and acceptors as well as hydrophobic groups (CH3 in T) are essential.
Conformation
Structures may be described in various, equivalent ways:
- (x,y,z) coordinates: provides 3D-structure, but requires many numbers
- “internal coordinates: bond lengths, bond angles and torsion angles around bonds;
the advantage here is that bond lengths and bond angles can be considered
constant, leaving only the torsion angles as parameters. A structure description by
torsion angles is called conformation.
- Bonds that can be rotated are all bonds along the backbone and the sugar-base
connection; bases are rigid. Note that the torsion angles in the sugar ring are
correlated, and different "sugar puckers" are observed:
C3’
C4’
C2'
O4’
C2’
"C3'-endo" pucker
C1’
C4'
O4'
C3'
"C2'-endo" pucker
C1'
4.7
Duplex structures
Double stranded DNA molecules adopt the famous double helix proposed by Watson
and Crick. Their model was based on the observation of a periodicity of both 3.4 Å
and 34 Å from fiber diffraction experiments, and of the occurrence of equal
concentrations [A]=[T] and [G]=[C]. The model corresponds to an ideal B-DNA
form, which is adopted by most DNA duplexes. Other double helix forms are ADNA, mostly observed for RNA, and Z-DNA, mostly an artifact caused by high salt
and the exclusive presence of G-C base pairs (this was however the first experimental
3D-structure in 1980).
General aspects of the model are that through base pairs A-T and C-G complementary
strands are formed, which run antiparallel to each other. A major groove and a minor
groove provide access to the bases, and thus allow sequence-specific recognition!
Finally, it should be mentioned that DNA structures are flexible; one should therefore
talk about "B-DNA type DNA". The following figure and table illustrate the three
forms and summarizes characteristic features.
Parameter
Handedness
Major groove
Minor groove
Repeating unit
Bases per turn
Twist per base pair
Height per base
Pitch
Base pairing
Sugar pucker
Glycosyl angle
Base inclination (90o-tilt)
Base roll
Propeller twist
Axis displacement
A-DNA
right
very deep
shallow
1 base pair
10.9
33o
2.9Å
31.6Å
Watson-Crick
3’ endo
anti
13o
6o
15o
4Å
B-DNA
right
deep, wide
deep, narrow
1 base pair
10.0
36 o
3.4Å
34.0Å
Watson-Crick
2’ endo
anti
-2 o
-1o
12o
0Å
Z-DNA
left
shallow
very deep
2 base pairs
12.0
GC: -51 o; CG: -9 o
GC: 3.5Å; CG: 4.1Å
45.6Å
Watson-Crick
C: 2’endo; G: 3’ endo
C: anti; G: syn
9o
3o
4o
-3Å
4.8
The following figure explains the above entities twist, tilt, inclination, roll and
propeller twist.
Other structural features of DNA
Topology
A relaxed DNA is like normal B-DNA ("Tw=14"). Certain enzymes, topoisomerases
can cut the DNA open and undo double helical turns ("Tw=12"). In circular DNA, the
unwinding can be compensated by supercoiling ("Tw=14, Wr=-2, Lk=12").
Supercoiling is descried by the number of twists Tw, the writhe (supercoil turns) Wr,
and the linking number, Lk: Lk=Tk+Wr. Lk can only be changed by cutting circular
DNA open.
Palindromic DNA
During various events of DNA reading,
palindromic DNA plays an important
role, an example are early stages of
replication. Palindromic DNA is double
stranded DNA that reads the same from
both 5’ends. An example is:
GCATTAATGC
CGTAATTACG
Longer stretches can adopt cross-shaped
forms with B-DNA like arms.
4.9
Triplexes and quadruplexes
New interactions among DNA-strands can be formed based on new types of base
pairing: Hoogsten and Reverse Hoogsten base pairs.
Triplexes occur for long sequences with only purines or pyrimidines (example
GAGAGA…).
Quadruplexes are essential for the protein-DNA structures found at the end of
chromosomes, the telomeres.
4.10
Packing of DNA in chromosomes
While the DNA occurs as a single long molecule in prokaryotes, it is more organized
in the cell nuclei of eukaryotes. The DNA double helix is first wound around histone
proteins forming nucleosomes; the x-ray structure of a complete nucleosome has
recently been determined at atomic resolution. These nucleosomes are then further
organized into chromatin and eventually form the chromosomes.
4. Copying of DNA
The replication of DNA is achieved by DNA polymerases. These require the
presence of a template DNA strand, a short starting strand called primer, and the
deoxynucleoside 5'-triphosphates dATP, dGTP, dTTP and dCTP. Polymerases will
elongate the primer with nucleotides that are complementary to the template strand.
Chain elongation always occurs in the 5' to 3' direction. Many polymerases possess
in addition a nuclease activity that allows them to remove mismatched nucleotides.
They achieve error rates that are less than 10-8 per base pair!
Replication occurs in a "semiconservative" manner: Each strand of a DNA duplex is
copied, i.e. complemented by a newly synthesized strand. The results are two
daughter molecules that contain each a parent and a new strand. Note that for both
daughter molecules synthesis occurs in the 5' to 3' direction, which requires noncontinuous synthesis in one case.
Many viruses have RNA as genetic material. Some of them rely on RNA polymerases
to replicate their genome. Others, called retroviruses (e.g. HIV-1), use reverse
transcription to make a DNA copy of their RNA genome. The figure shows a hybrid
DNA-RNA duplex with a RNA-strand (dark and a strand consisting of a RNA primer
(dark) and DNA continuation (light) as it occurs in HIV reverse transcription. This
hybrid involves a variation in the width of the minor groove.
4.11
Somewhat similar to replication, (other) RNA polymerases are used to copy selected
genes from DNA to RNA. transcription. Again a template made of DNA as well as
ribonucleoside triphosphates are needed. However, no primer for the new strand is
required (in fact primer synthesis for DNA replication is performed by RNA
polymerases). Transcription therefore relies on a complex system of promoter sites on
the DNA and proteins (e.g. transcription factors) to start transcription at the desired
site, i.e. to create a mRNA copy with the requested gene(s).
5. Genetic code
Genetic information is the sequence of nucleotides in DNA (or RNA). It codes for the
sequences of amino acid residues in proteins. Because the "DNA-alphabet" contains
only 4 "letters" while the "protein-alphabet" contains 20 "letters", fragments of 3
nucleotides are required to identify a specific amino acid. These triplets of nucleotides
are called codons (and anticodon on t-RNA). The table below provides the translation
from nucleotide triplets to amino acid residues, i.e. the genetic code.
Genetic code
First position (5')
Second position
Third position (3')
U
C
A
G
--------------------------------------------------------------------------------------------Phe
Ser
Tyr
Cys
U
U
Phe
Ser
Tyr
Cys
C
Leu
Ser
STOP STOP
A
Leu
Ser
STOP Trp
G
--------------------------------------------------------------------------------------------Leu
Pro
His
Arg
U
C
Leu
Pro
His
Arg
C
Leu
Pro
Gln
Arg
A
Leu
Pro
Gln
Arg
G
--------------------------------------------------------------------------------------------Ile
Thr
Asn
Ser
U
A
Ile
Thr
Asn
Ser
C
Ile
Thr
Lys
Arg
A
Met
Thr
Lys
Arg
G
--------------------------------------------------------------------------------------------Val
Ala
Asp
Gly
U
G
Val
Ala
Asp
Gly
C
Val
Ala
Glu
Gly
A
Val
Ala
Glu
Gly
G
--------------------------------------------------------------------------------------------The genetic code has several interesting properties. It is degenerate, meaning that
most amino acids are coded by several "codons"; exceptions are Trp and Met. There
is a correlation between the number of codons for an amino acid and its frequency of
occurrence in proteins. Often, degenerate codes have the first two "letters" in
common. Three codons serve as stop signals, ending a gene. The codon AUG codes
for Met but is also part of the initiation signal (start of a gene). Chemically similar
amino acids often share the middle base; an example is the presence of U or C for
hydrophobic residues.
4.12
The genetic code is almost universal. Thus human genes are correctly read by
bacterial systems. However, a few exceptions exist. Human mitochondria read
slightly differently; e.g. UGA codes for Trp rather than being s stop signal. Ciliated
protozoa have only one stop-signal.
Messenger RNAs contain the information for one or a few genes, with start and stop
signals (the start signal is more complex than just AUG). Eukaryotic genes and their
mRNA copies are often discontinuous, i.e. large parts of these sequences are not
translated. These intervening parts are called introns, while the coding parts are called
exons. The mRNA is subjected to a further process called splicing to exclude the
introns. The advantage of this complication is higher flexibility to form new proteins.
The 7700 bp long ovalbumin gene for example has 1872 bp coding for 624 residues.
exon intron
exon
intron
exon intron exon
6. Translation
tRNA
Protein synthesis, translation, occurs on a molecular complex called ribosome. tRNAs
bring amino acids to the ribosome. To this end they have two binding sites: one for
the amino acid and one for the codon of the mRNA called the anticodon. There are
about 60 different tRNAs corresponding to the number of possible codons. These
consist of about 80 nucleotides and have a molecular weight of about 30’000. Many
bases in tRNAs are rare (examples: inosine, thymine) and often obtained by
modification of normal bases. The function of these rare bases remains unknown.
tRNAs form stable and water-soluble structures with short parts of double helices
formed by complementary segments. By maximizing the number of base pairs (using
the sequence) one arrives at a cross-like shape with four arms:
- acceptor-arm with CCA at 3’-end: amino acid binds here
- anticodon-arm
- D-arm including dihydrouridine
- T-arm including sequence TC (pseudouridine)
The 3D-structure resembles the letter “L” with the acceptor and antidocon arms at
opposite ends yielding a maximal distance between the two binding sites.
4.13
The ribosome
Ribosomes are molecular “machine” for the synthesis of proteins. A cell contains
about 20’000 ribosomes corresponding to 1/3 of the cell mass. The ribosome consists
of two units. In E. coli the 50S subunit contains 32 proteins, the 5S and the 23S
rRNA; the 30S subunit contains 21 proteins and the 16S rRNA.
Schematic view of peptide growth in ribosomes:
protein
protein
5’
protein
tRNA
(loaded)
mRNA
5’
mRNA
5’
mRNA
Recently a crystal structure at 2.4 Å resolution of the large unit of the ribosome
from the prokaryote Haloarcula marismortui has been presented. A major finding is
that the ribosome acts as a ribozyme, i.e. the active site is formed exclusively of
rRNA with an adenine playing a similar role as the histidine in chymotrypsin.
4.14
7. Tools in genetics
Restriction enzymes
Restriction enzymes are endonucleases that cut specific, palindromic sequences of
DNA duplexes (nucleases are enzymes that cut DNA, exonucleases cut terminal
nucleotides, endonucleases cut within the DNA). The tool chest of molecular
biologists contains more than 100 restriction enzymes. An example is EcoR1 that cuts
as follows:
-N-N-N-G-A-A-T-T-C-N-N-N-N-N-N-G
A-A-T-T-C-N-N-N-N-N-N-C-T-T-A-A-G-N-N-N-N-N-N-C-T-T-A-A
G-N-N-NThe result of using restriction enzymes on a DNA fragment is called a restriction
map. Consider the following example: A 10kb DNA is cut by a restriction enzyme R1
into fragments of 2 and 8 kb, and by another enzyme R2 into fragments of 3 and 7 kb.
If both enzymes are applied, fragments of 2, 3 and 5 kb are obtained. Conclusion: R1
cuts near one end after 2 kb and R2 cuts 3 kb before the other end.
The following figure shows the application of a restriction enzyme to insert a DNA
fragment into a plasmid (small circular DNA duplexes (1-200 kb) that can duplicate
autonomously).
AATT
TTAA
plasmid
restriction
enzyme
AATT
TTAA
anneal
AATT
AATT
TTAA
TTAA
AATT
TTAA
DNA fragment for insertion
Separation of DNA fragments
Several techniques allow the separation of DNA fragments obtained for example from
restriction enzyme analysis. An often-used characteristic is the electrophoretic
mobility. Polyacrylamide gels cam be used for fragments up to 1000 base pairs, and
porous agarose gels can resolve larger fragments with as many as 20 kb. Resolution
can be as good as one nucleotide difference in length of fragments with a few hundred
nucleotides.
After separation, DNA fragments can be transferred to nitrocellulose and hybridized
with a 32P-labeled probe. An autoradiogram then shows if a fragment and which one
is complementary to the probe.
DNA sequencing
DNA can be sequenced by controlled termination of enzymatic replication. The DNA
to be sequenced is added to a polymerization mixture with DNA polymerase and
labeled triphosphates as building blocks. In each of four such mixtures one of the
nucleotides is also added as an analog (2',3'-dideoxy). Insertion of this analog will
terminate the replication process. The results are fragments ending at the various
positions of A (or T or G or C, respectively), which can be separated according to
their length, and thus provide the sequence. This method can, in an automated
version, be used to sequence entire genomes (human genome project).
4.15
PCR (polymerase chain reaction)
5'
3'
3'
5'
DNA to be amplified
Step 1: denature DNA
5'
3'
3'
5'
Step 2: anneal primers
5'
3'
x 5'
3'
3'
5'
3'
5'
Step 3: primer extension
5'
3'
3'
5'
5'
3'
3'
5'
Product of first cycle: two double
stranded DNA molecules
Repeat cycles to yield a greater
than 106-fold increase in DNA
PCR can be used to amplify very small amounts of DNA, including the DNA of a
single cell. Besides for genetic testing, PCR is used in forensics or molecular
paleontology.
Download