Biomolecular chemistry 2. RNA and transcription

advertisement
28
Biomolecular chemistry
2. RNA and transcription
• Primary Source Material
• Biochemistry Berg, Jeremy M.; Tymoczko, John L.; and Stryer, Lubert (courtesy
of the NCBI bookshelf)
• Molecular Cell Biology Lodish, Harvey; Berk, Arnold; Zipursky, S. Lawrence;
Matsudaira, Paul; Baltimore, David; Darnell, James E. (courtesy of the NCBI
bookshelf)
• Many figures and the descriptions for the figures are from the educational
resources provided at the Protein Data Bank (http://www.pdb.org/)
• Most of these figures and accompanying legends have been written by David S.
Goodsell of the Scripps Research Institute and are being used with permission. I
highly recommend browsing the Molecule of the Month series at the PDB (http://
www.pdb.org/pdb/101/motm_archive.do)
The Central Dogma
29
information
U.S. Department of Energy Human Genome Program (http://www.ornl.gov/hgmis)
• There are many ways of stating the central dogma of molecular biology. Apparently Francis Crick
originally defined it like this:
• The central dogma of molecular biology deals with the detailed residue-by-residue transfer of
sequential information. It states that information cannot be transferred back from protein to either
protein or nucleic acid. (http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology)
• That way that I think of the Central Dogma is: genetic information tends to flow from DNA to RNA to
proteins.
• The information stored as DNA becomes useful through gene expression.
• Gene expression means the production of a protein or a functional RNA from its gene. • Gene expression involves several steps:
• Transcription: A DNA strand is used as the template to synthesize a RNA strand, which is called
the primary transcript. • RNA processing: This step involves modifications of the primary transcript to generate a mature
mRNA (for protein genes) or a functional tRNA or rRNA. For RNA genes (tRNA and rRNA), the
expression is complete after a functional tRNA or rRNA is generated. However, protein genes
require additional steps.
• Nuclear transport: mRNA has to be transported from the nucleus to the cytoplasm for protein
synthesis
• Protein synthesis: In the cytoplasm, mRNA binds to ribosomes, which can synthesize a
polypeptide based on the sequence of mRNA.
• Epigenetic information can be thought of as flowing the other way: changes in the cell (typically in
proteins or caused by proteins) that result in changes in gene expression but not in changes in the
genetic sequence itself.
The roles of RNA: more than just messengers30
rRNA
And one more… catalytic RNA
• Messenger RNA is the template for protein synthesis (translation). An mRNA molecule may be produced for each gene or
group of genes that is to be expressed in E. coli, whereas a distinct mRNA is produced for each gene in eukaryotes. In E.
coli, the average length of an mRNA molecule is about 1.2 kilobases (kb).
• Transfer RNA carries amino acids in an activated form to the ribosome for peptide-bond formation, in a sequence dictated by
the mRNA template. There is at least one kind of tRNA for each of the 20 amino acids. Transfer RNA consists of about 75
nucleotides (having a mass of about 25 kDa), which makes it one of the smallest of the RNA molecules discussed here.
• Ribosomal RNA (rRNA), the major component of ribosomes, plays both a catalytic and a structural role in protein synthesis.
In E. coli, there are three kinds of rRNA, called 23S, 16S, and 5S RNA because of their sedimentation behaviour. One
molecule of each of these species of rRNA is present in each ribosome.
• The first catalytic RNA was discovered by Cech and coworkers in the early 1980’s. Most naturally occurring ribozymes have
a role in mRNA splicing. However, in vitro evolution has resulted in ribozymes with a variety of different functions.
• Q: I was just wondering if you gave us a sequence of DNA let say 5'-AATGCCAGT-3' and you asked us to give the sequence
of the strand of mRNA read from the DNA strand. Would it be 5'-ACUGGCAUU-3' because when the RNA polymerase is
transcribing DNA it is also read from 3' to 5'? Unless you told us that the DNA strand shown was the coding strand. • A: If you were given a piece of DNA and no other information, the correct answer would be to assume that it is the coding
strand and just replace the Ts with Us. In this case AAUGCCAGU. The reason is that we read genes 5' to 3' and, in the
absence of other information, you would have to assume that I'd give you the gene in the proper orientation (i.e., the
sequence of the coding strand) and as a piece of double stranded (ds) DNA. Now, the RNA polymerase would actually be
'reading' the other strand (template), but it would be creating a sequence identical to the coding sequence provided (except
with U's instead of Ts). The answer you provided would be correct only if I specified that the sequence provided was that of
the template strand (but I probably wouldn't do that).
DNA to RNA (transcription)
31
The 2006 Nobel Prize in Chemistry was awarded to Roger Kornberg for his work in
determining the mechanism of RNA polymerase, including solving this crystal structure.
David S. Goodsell: The Molecule of the Month appearing at the PDB
• RNA synthesis (transcription), is the process of transcribing DNA nucleotide sequence information into RNA sequence information. RNA
synthesis is catalyzed by a large enzyme called RNA polymerase. The basic biochemistry of RNA synthesis is common to prokaryotes and
eukaryotes, although its regulation is more complex in eukaryotes. Despite substantial differences in size and number of polypeptide subunits,
the overall structures of these enzymes are quite similar between prokaryotes and eukaryotes, revealing a common evolutionary origin.
• RNA synthesis takes place in three stages: initiation, elongation, and termination. RNA polymerase performs multiple functions in this process:
• It searches DNA for initiation sites, also called promoter sites or simply promoters.
• It unwinds a short stretch of double-helical DNA to produce a single-stranded DNA template from which it will ‘read’ the sequence.
• It selects the correct ribonucleoside triphosphate and catalyzes the formation of a phosphodiester bond. RNA polymerase is completely
processive - a transcript is synthesized from start to end by a single RNA polymerase molecule.
• It detects termination signals that specify where a transcript ends.
• It interacts with activator and repressor proteins that modulate the rate of transcription initiation over a wide dynamic range. These
proteins, which play a more prominent role in eukaryotes than in prokaryotes, are called transcription factors. RNA polymerase is a huge
factory with many moving parts. The one shown here, from PDB entry 1i6h, is from yeast (Saccharomyces cerevisiae). It is composed of
a dozen different proteins. Together, they form a machine that surrounds DNA strands, unwinds them, and builds an RNA strand based on
the sequence of the DNA. Once the enzyme gets started, RNA polymerase continues along the DNA copying RNA strands thousands of
nucleotides long.
• In contrast with DNA synthesis, RNA synthesis can start de novo, without the requirement for a primer.
• Most newly synthesized RNA chains carry a highly distinctive tag on the 5’ end: the first base at that end is either pppG or pppA.
• RNA chains, like DNA chains, grow in the 5’-3’ direction.
• The Nobel Prize in Chemistry for 2006 went to Roger D. Kornberg of Stanford University, CA, USA "for his studies of the molecular basis of
eukaryotic transcription" (http://nobelprize.org/nobel_prizes/chemistry/laureates/2006/index.html)
The transcription bubble
32
David S. Goodsell: The Molecule of the Month appearing at the PDB
• The region containing RNA polymerase, DNA, and newly synthesized RNA is called a transcription bubble because it contains a locally melted “bubble” of DNA. The
newly synthesized RNA forms a hybrid helix with the template DNA strand. This RNA-DNA helix is about 8 bp long, which corresponds to nearly one turn of a double
helix (10.4 bp/turn in B-form).
• The 3’ hydroxyl group of the RNA in this hybrid helix is positioned so that it can attack the alpha-phosphate atom of an incoming ribonucleoside triphosphate. The
core enzyme also contains a binding site for the other DNA strand. About 17 bp of DNA are unwound throughout the elongation phase, as in the initiation phase. The
transcription bubble moves a distance of 170 Å (17 nm) in a second, which corresponds to a rate of elongation of about 50 nucleotides per second. Although rapid, it
is much slower than the rate of DNA synthesis, which is about 800 nucleotides per second.
• As you might expect, RNA polymerase needs to be accurate in its copying of genetic information. To improve its accuracy, it performs a simple proofreading step as
it builds an RNA strand. The active site is designed to be able to remove nucleotides as well as add them to the growing strand. The enzyme tends to hover around
mismatched nucleotides longer than properly added ones, giving the enzyme time to remove them. This process is somewhat wasteful, since proper nucleotides are
also occasionally removed, but this is a small price to pay for creating better RNA transcripts. Overall, RNA polymerase makes an error about once in 10,000
nucleotides added, or about once per RNA strand created. This rate is about 10,000 fold higher than DNA synthesis. The much lower fidelity of RNA synthesis can
be tolerated because mistakes are not transmitted to progeny. For most genes, many RNA transcripts are synthesized; a few defective transcripts are unlikely to be
harmful.
• PDB entry 1msw reveals the structure of a very small RNA polymerase that is made by the bacteriophage T7, shown here with blue tubes. A small transcription
bubble, composed of two DNA strands and an RNA strand, is bound in the active site. Notice how the two DNA strands form a double helix at the top of the picture.
The enzyme separates them in the middle and builds an RNA strand using the DNA on the right. Finally, at the bottom, the two DNA strands come back together.
• This structure was not determined by Roger Kornberg, but rather Tom Steitz, a famous x-ray crystallographer. Professor Steitz was awarded the The Nobel Prize in
Chemistry 2009 "for studies of the structure and function of the ribosome"(http://nobelprize.org/nobel_prizes/chemistry/laureates/2009/index.html)
• Q: What does this mean "many RNA transcripts are synthesized"?
• A: This statement refers to the fact that there is not one mRNA for each gene. When a gene is being expressed, it implies that there are many RNA polymerases are
copying it and making many mRNA molecules. If a couple of them have a mistake, it is probably not a big deal.
Transcription is a highly regulated process
33
Example: the lac operon
lactose absent
lactose present
For transcription to occur, lactose must bind to the lac repressor.
This binding changes the conformation of the protein such that it
can no longer bind to the operator site and interfere with the
function of RNA polymerase
• With only a few exceptions, every cell of the body contains a full set of chromosomes and identical genes.
Only a fraction of these genes are turned on at any one time, however, and it is the subset that is
"expressed" that confers unique properties to each cell type.
• "Gene expression" is the term used to describe the transcription of the information contained within the
DNA, the repository of genetic information, into messenger RNA (mRNA) molecules that are then translated
into the proteins that perform most of the critical functions of cells.
• Biologists study the kinds and amounts of mRNA produced by a cell to learn which genes are expressed,
which in turn provides insights into how the cell responds to its changing needs.
• Gene expression is a highly complex and tightly regulated process that allows a cell to respond dynamically
both to environmental stimuli and to its own changing needs. This mechanism acts as both an "on/off"
switch to control which genes are expressed in a cell as well as a "volume control" that increases or
decreases the level of expression of particular genes as necessary.
• The lac operon shown in this movie is one of the simplest gene regulation mechanisms, but it is actually a
bit more complicated than the extremely simplified version shown here.
• Although not shown here, the CAP protein that we met earlier is also involved in this process. Specifically,
CAP is a transcription factor. That is, it binds to a site next to the promoter site, and increases the affinity of
the DNA for RNA polymerase. This makes RNA polymerase more likely to bind and so transcription is
further increased.
• Q: What is the termination signal for RNA synthesis?
• A: Interestingly, RNA synthesis (transcription) is typically brought to a stop by the formation of an RNA stem
loop structure. The presence of this structure in the newly formed RNA is sufficient to disrupt the interaction
between RNA polymerase and the DNA, so the enzyme falls off and synthesis is stopped.
The mechanisms of DNA and RNA elongation34
are similar
Active site of DNA polymerase
Active site of RNA polymerase
• The catalytic site of RNA polymerase resembles that of DNA polymerase in that it includes
two metal ions in its active form. One metal ion remains bound to the enzyme, whereas the
other appears to come in with the nucleoside triphosphate and leave with the pyrophosphate.
Three conserved aspartate residues of the enzyme participate in binding these metal ions.
Note that the overall structures of DNA polymerase and RNA polymerase are quite different;
their similar active sites are the products of convergent evolution.
• Q: Do metal ions in the catalytic site of RNA polymerase link to the same thing as in DNA
polymerase?
• A: In both enzymes the metal ions in the catalytic site are held in place by aspartic acid
residues. However, there does not appear to be any similarities in the overall structure of
DNA and RNA polymerases, showing that this same use of the same type amino acid is an
example of convergent evolution. Transcription is much more complex in
eukaryotes than in prokaryotes
35
• In prokaryotes (bacterial and archaeal cells defined by the fact that they lack a nucleus), translation of mRNA begins while the transcript is still being synthesized.
• In eukaryotes (animal, plant, and fungi cells defined by the fact that they have a nucleus), transcription and translation take place in different cellular compartments: transcription takes
place in the membrane-bounded nucleus, whereas translation takes place outside the nucleus in the cytoplasm.
• A second major difference between prokaryotes and eukaryotes is the extent of RNA processing. Eukaryotes extensively process nascent pre-mRNA destined to become mature
mRNA. Primary transcripts (pre-mRNA molecules), the products of RNA polymerase action, acquire a cap at their 5’ ends and a poly(A) tail at their 3’ ends. Most importantly, nearly all
mRNA precursors in higher eukaryotes are spliced.
• primary transcript: Initial RNA product, containing introns and exons, produced by transcription of DNA. Many primary transcripts must undergo RNA processing to form the
physiologically active RNA species.
• Q: In many pictures, it only shows the mRNA. How about tRNA and rRNA, are they also go to the transcription and processing processes?
• A: tRNA and rRNA are encoded in the genome and are synthesized by RNA polymerases just like mRNA is. They will also undergo processing, but it is different than the processing
that occurs for mRNA.
• Q: Once processing is done and mature mRNA is formed, is there any other enzymes that will help it leave the nucleus or is it only the poly(A) tail that helps it do that?
• A: There are many proteins that are involved in the process of getting the mRNA out of the nucleus. There is a link to a very nice overview poster on the website. The ultimate goal of
these helper proteins is to get the mRNA through the Nuclear Pore Complex, which is the channel that allow stuff in or out of the nucleus. As we will see a bit later, movement of
mRNA inside the nucleus is primarily through random diffusion, whereas mRNA molecules outside of the nucleus are specifically transported to the location where they will be
translated.
• Q: Would you please explain about Prokaryote and Eukaryote? why they are talked about in each part and why they are always being compared?
• A: The term 'karyote' refers to the nucleus and 'Pro' means 'before' and 'Eu' means 'true'. So Prokaryotes are the organisms that do not have a nucleus and Eukaryotes are cells that
have a nucleus. The nucleus is a membrane-enclosed structure that holds all of the DNA of the cell. Think of the nucleus as the yolk of an egg. Following this analogy, the white of the
egg is the cytoplasm. Bacteria and Archaea are prokaryotes and tend to be single cell organisms. All of the complex multicellular life is made up of eukaryotic cells, but there are also
many examples of single cell eukaryotes such as bakers yeast and amoeba. It is important to talk about both and compare them because biochemical research often bounces back
and forth between these two type of cells, using experiments in one type to draw conclusions about the other. More specifically, researchers often study a process in prokaryotes
because it tends to be simpler, and then attempt to extend their results to eukaryotes which are more relevant to human health concerns (because we are eukaryotes).
Mature eukaryotic vs. prokaryotic mRNA
36
Prokaryotic mRNA
Eukaryotic mRNA
Eukaryotic mRNA
• The 5' cap (also called an RNA cap, an RNA 7-methylguanosine cap or an RNA m7G cap) is a modified guanine nucleotide that has been added to the 5' end of the messenger RNA shortly
after the start of transcription. The 5' cap consists of a terminal 7-methylguanosine residue which is linked through a 5'-5'-triphosphate bond to the first transcribed nucleotide. Its presence is
critical for recognition by the ribosome and protection from RNases.
• Coding regions are composed of codons, which are decoded and translated into protein by the ribosome. Coding regions begin with the start codon and end with the one of three possible
stop codons. In addition to their protein-coding role, portions of coding regions may also serve as regulatory sequences.
• Untranslated regions (UTRs) are sections of the RNA before the start codon and after the stop codon that are not translated, termed the five-prime untranslated region (5' UTR) and threeprime untranslated region (3' UTR), respectively. These regions are transcribed as part of the same transcript as the coding region. Several roles in gene expression have been attributed to
the untranslated regions, including mRNA stability, mRNA localization, and translational efficiency. The ability of a UTR to perform these functions depends on the sequence of the UTR and
can differ between mRNAs.
• The 3' poly(A) tail is a long sequence (often several hundred) of adenine nucleotides added to the "tail" (3' end) of the pre-mRNA. This addition is catalyzed by poly(A)polymerase and does
not require a template (still happens 5’ to 3’, of course).
• Q: I did not understand the “cap” part that you mentioned. Why is it necessary for the processing of mRNA?
• A: The 5' cap on mRNA is important because it is a modification that is recognized by other proteins in the cell and identifies the mRNA as one that should be transported from the nucleus and
read by the ribosome. mRNA molecules that do not have a 5' cap are quickly degraded.
• Q: After splicing there are some regions in mature mRNA which are called UTR regions. If during splicing all the nucleotides between two exons cut off as introns, where do UTR nucleotides
come from? I mean are all the nucleotides in exons translated or not?
• A: The region that is translated (that is, read by the ribosome) are just the nucleotides that start at the start codon and end with the stop codon. Before the the start codon and after the stop
codon are additional regions called the un-translated regions. As the name implies, these regions are not translated. There probably could be exons and introns in the UTR regions as well.
Although exons are said to be 'expressed', it is better just to think of them as the parts of the mRNA that end up in the mature RNA, and not to think about them as the coding region.
• Q. I have a question about 5'UTR and 3'UTR in the mRNA, are these part created during the process of transcription and left untranslated, or there will be added in the process of making
mature mRNA? If they are added later? Are there the same sequence for all mRNA or what is the base of their creation?
• A. The 5' and 3' UTR regions are created during transcription. They are encoded for within the genome and will be different for every gene. Essentially, these are the bits of the RNA between
the 5' end and the start codon, and after the stop codon but before the poly A sequence.
• Adapted from http://en.wikipedia.org/wiki/MRNA
Splicing of mammalian mRNA:
Introns and Exons
37
The primary transcript
is ‘spliced’ to form the
correct reading
sequence of the gene
One pre-mRNA molecule can be
spliced in a variety of different
ways, depending on the tissue
where it is expressed!
Chapter 1 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007
• Intron: Part of a primary transcript (or the DNA encoding it) that is removed by splicing during
RNA processing and is not included in the mature, functional mRNA, rRNA, or tRNA; also
called intervening sequence.
• Exon: Segments of a eukaryotic gene (or of its primary transcript) that reaches the cytoplasm
as part of a mature mRNA, rRNA, or tRNA molecule.
• Introns are precisely excised from primary transcripts, and exons are joined to form mature
mRNAs with continuous messages. Alternative splicing enlarges the repertoire of proteins in
eukaryotes and is a clear illustration of why the proteome is more complex than the genome.
• One pre-mRNA molecule can be spliced in a variety of different ways, depending on the tissue
where it is expressed! The specific example shown on this slide is the pre-mRNA molecule for
alpha-tropomyosin, which is involved in the ability of a cell to contract. The introns are the
black lines and the exons are the blue segments.
• Q: How could a cell accurately sense the decreasing of some particular proteins and find that
pieces of exons on DNA since in most cases the introns are much larger than the exons. Is
that the protein's specific amino acids sequence that reflect the complementary mRNA
sequence that it wants makes this happen without making any mistakes?
• A: Introns are recognized by a very large complex of RNA and protein known as the
spliceosome. There are parts of the spliceosome that interact with the 5' GU, the 3' AG, the A
at the branch point, and the polypyrimidine tract. Some of these interactions are mediated by
RNA-RNA interactions and some are mediated by protein-RNA interactions. Since all introns
would share these same 4 features, the interactions of these features with the spliceosome is
how the cell is able to accurately identify the presence of introns in mRNA.
Splicing of mammalian mRNA:
Introns and Exons
38
• Figure and following legend from Nature Reviews Genetics 5, 389-396 (May 2004) “There
are several conserved motifs in the nucleotide sequences near the intron–exon boundaries
that act as essential splicing signals: GU and AG dinucleotides at the exon–intron and intron–
exon junctions, respectively (5'- and 3'-splice sites), a polypyrimidine tract (Py)n and an A
nucleotide at the branch site. Splicing takes places in two transesterification steps. In the first
step, the 2'-hydroxyl group of the A residue at the branch site attacks the phosphate at the
GU 5'-splice site. This leads to cleavage of the 5' exon from the intron and the formation of a
lariat intermediate. In the following step, a second transesterification reaction, which involves
the phosphate (p) at the 3' end of the intron and the 3'-hydroxyl group of the detached exon,
ligates the two exons. This reaction releases the intron, still in the form of a lariat.”
• Q: I am just wondering that why has nature created introns?
• A: Good question. This simplest answer is that they are necessary for the process of
'alternative splicing'. Splicing is a highly regulated process and it turns out that, in response
to specific biochemical conditions, the spliceosome can purposely skip an exon during
splicing. At the protein level, this can mean that a particular portion (typically one whole
domain) of a protein could be missing, or potentially swapped with a different polypeptide
sequence. This process of alternative splicing means that the the ~20,000 protein-coding
genes of humans actually encode for much more than 20,000 proteins (the majority of genes
are thought to have alternative splicing options). It is generally thought that alternative
splicing is one of the keys for enabling animals as complex as humans to exist despite our
relatively small number of genes.
RNA processing in the mitochondria
39
RNA polymerase
mTERF1 stops
transcription from
HSP2
Human mtDNA only has 3
promoters, 2 for the heavy chain
(HSP1 and HSP2) and 1 for the
light chain.
This means that all of the genes
(including mRNA, tRNA, and
rRNA) get synthesized as one
long strand.
Pearce, S., et al., Mitochondrial diseases: Translation matters, Mol. Cell. Neurosci. (2012)
Rackham et al. The human mitochondrial transcriptome and the RNA-binding proteins that regulate its expression. WIREs RNA 2012, 3: 675-695
• Since the mtDNA is transcribed as one long RNA molecule, it needs to be cut into smaller
pieces that represent the gene products.
• The tRNA sections of the RNA will fold up, even in the context of the longer chain
• RNAse enzymes recognize the folded tRNA and cut the RNA at the 5’ and 3’ ends to release
the tRNAs. As you can see from the genome, the tRNAs occur between the other coding
sequences, so by releasing the tRNAs, the other sequences are also released
• Further processing gives the final tRNA, rRNA, and mRNA molecules
• There are no introns in human mitochondrial genes
• There is no 5‘ cap on mitochondrial mRNA, though there is a poly A tail.
• For translation, ribosomes are assembled in the mitochondria from the mitochondrial rRNA
plus additional proteins encoded in the nuclear genome.
HIV: reverse transcriptase is essential
40
see ‘HIV live cycle’ animation on webpage
• Retroviruses: these viruses can reverse the flow of genetic information (RNA to DNA rather than from DNA to RNA)! The most famous retrovirus is
human immunodeficiency virus 1 (HIV-1), the cause of AIDS. Retroviruses have two identical copies of a single-stranded RNA genome and an outer
envelope containing protruding viral glycoproteins.
• The retroviral envelope fuses directly with the plasma membrane (step 1).
• Following fusion, the nucleocapsid enters the cytoplasm of the cell; then deoxynucleoside triphosphates from the cytosol enter the nucleocapsid, where
viral reverse transcriptase and other proteins copy the ssRNA genome of the virus into a dsDNA copy (step 2).
• The viral DNA copy is transported into the nucleus (only one host-cell chromosome is depicted) and integrated into one of many possible sites in the
host-cell chromosomal DNA (step 3).
• The integrated viral DNA, referred to as a provirus, is transcribed by the host-cell RNA polymerase, generating mRNAs (dark red) and genomic RNA
molecules (light red). The host-cell machinery translates the viral mRNAs into glycoproteins and nucleocapsid proteins (step 4).
• The latter assemble with genomic RNA to form progeny nucleocapsids, which interact with the membrane-bound viral glycoproteins. Eventually the
host-cell membrane buds out and progeny virions are pinched off (step 5).
• Q: Is that possible that HIV get into a piece of intron and get chopped out when the normal transcription happens since as you said that the size of
intron is much larger than the exon. • A: One of the proteins in the HIV genome is known as Tat. This protein is a transcription factor that promotes enhanced expression of the HIV genes.
There is probably always a low level of transcription resulting in the production of a small amount of Tat. As the Tat protein accumulates, there is a
positive feedback cycle that results in increased expression of the HIV genes. It is unlikely that the provirus would end up in a intron, though I suppose
it is possible. It is important to keep in mind that an infected individual would have many infected cells, and in each one the provirus will integrate in a
different spot. It has been reported that HIV tends to integrate into genes that are highly transcribed.
• Q: Another question is that the DNA that the HIV virus RNA made, how do they get integrated into the host cell's chromosomal DNA? They always get
expressed? • A: The key to the integration of HIV dsDNA (following reverse transcription) is the enzyme integrase. This enzyme is responsible for delivering the
dsDNA to the chromosomal DNA, cutting the chromosomal DNA, and inserting the viral DNA into the chromosomal DNA by a strand-swapping
mechanism. This movie is a nice overview of the whole process: http://www.msd.com/pro/hiv/mk518/videos/English/moa_video.html
HIV: reverse transcriptase
+ primer
41
+ primer
9G
AG
David S. Goodsell: The Molecule of the Month appearing at the PDB
• Reverse transcriptase performs several different functions. As indicated by the name, it can
build DNA strands based on an RNA template. This reaction is performed in the polymerase
active site, which is formed by two sets of arms that surround the RNA and DNA. The
polymerase site is at the top in this illustration, taken from PDB entry 2hmi. After building the
DNA strand, the enzyme then removes the original RNA strand by cleaving it into pieces. This
is performed by a nuclease active site, which is located at the opposite end of the enzyme.
Finally, it builds a second DNA strand matched to the one that was just created to form the
final DNA double helix. This reaction is also performed by the polymerase site.
• Reverse transcriptase performs a remarkable feat, reversing the normal flow of genetic
information, but it is rather sloppy in its job. The polymerases used to make DNA and RNA in
cells are very accurate and make very few mistakes. This is essential because they are the
caretakers of our genetic information, and mistakes may be passed on to our offspring.
Reverse transcriptase, on the other hand, makes lots of mistakes, up to about one in every
2,000 bases that it copies. This high error rate turns out to be an advantage for the virus in
this era of drug treatment. The errors allow HIV to mutate rapidly, finding drug resistant
strains in a matter of weeks after treatment begins. Fortunately, the recent development of
treatments that combine several drugs are often effective in combating this problem. Since
the virus is simultaneously attacked by several different drugs, it cannot mutate to evade all
of them at the same time.
How do cancer cells escape crisis due to
telomere shortening?
They express the enzyme telomerase
which can extend their telomeres.
42
hTERT is the main
catalytic subunit of
the human
telomerase
enzyme
Telomerase activity
allows cancer cells to
maintain their telomeres
and thus avoid crisis
Telomerase is a reverse transcriptase!
Chapter 10 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007
• In some cases, the ‘crisis’ point may limit the growth of cancer cells
• In other cases, cancer cells have gained the ability to regenerate their telomeres by
increasing the expression of the enzyme telomerase.
• Telomerase is (of course) a normal human enzyme but it is only usually expressed during
early stages of development and in the testes.
• Telomerase is a complex protein with multiple subunits. One of these subunits is a reverse
transcriptase. , much like the enzyme made by HIV and other viruses.
• The telomerase also has an RNA portion (451 nucleotides) associated with it, which is
essentially the template for the reverse transcript.
• HEK, like HeLa are a widely used cell line. These cells are derived from normal Human
Embryonic Kidney cells that were transformed with viral DNA. They were created in the
1970s at McMaster University in Canada.
Speaking of cancer: another nasty
retrovirus trick
43
In the 1970s it was generally thought that many cancers were infectious disease spread
by viruses. At the time, there was a lot of evidence to suggest that this is the case.
In 1911 Peyton Rous (Nobel 1966) demonstrated that something small enough
to pass through filter paper could cause cancer (sarcoma) in a chicken.
The ‘thing’ that caused the cancer was found to be retrovirus which was
designated Rous sarcoma virus (RSV).
It is now understood that viruses do have a roll in
about 20% of human cancer. Specifically,
Hepatitis B (HBV), Hepatitis C (HCV), and human
papillomaviruses (HPV) trigger some commonly
occurring cancers.
http://jezebel.com
Chapter 3 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007
• ALV is Avian Leukosis virus which infects chickens but does not cause tumor formation. ALV
is another example of a retrovirus (that is, it has a similar life cycle to HIV), but is only infects
birds
• RSV is Rous sarcoma virus which does cause tumor (sarcomas) in chickens
• Gardasil (Merck) is the vaccine against HPV (a virus with a DNA genome, not RNA). It was
FDA approved in 2006 for the prevention of cervical cancer. The vaccine is the capsular
protein assembled into virus-like particles that lack the DNA genome.
src is to blame
Normally, cells cultured in a dish (in this case,
chicken cells) will stop growing when their
edges touch (contact inhibition). Once infected
with RSV, they loose contact inhibition and
grow into a larger mass that is many cells thick.
They also change their shape and gain the
ability to grow without attachment to a surface.
This looks a lot like how we think tumors
must grow inside of a human body.
• Like HIV and ALV, RSV has a simple genome with a gene
for core proteins (gag), a gene for reverse transcriptase
(pol) and a gene for the envelope protein (env).
• Unlike HIV and ALV RSV has one more gene which
seemed to be causing the sarcoma to develop. This gene
was designated src.
The src gene must be responsible for causing the sarcoma. Accordingly src is the classic
example of an oncogene, a gene that can transform a normal cell into a cancer cell.
Chapter 3 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007
44
types of viruses carry genes that have little if any relatedness to DNA sequences
native to the cells that they infect (see Sidebar 3.7).
RSV evolved from ALV
apture of src by avian
The precise mechanism
ian leukosis virus (ALV)
ular src gene (c-src) is not
sible scenario is indicated
proviral DNA (red)
ated (by chance) next to a
ogene (green) in an
n cell. The ALV provirus
src gene were coo a single hybrid RNA
and brown) After
rc introns (not shown),
RNA was packaged into
that became the ancestor
a virus (RSV). (Not shown
nt of ALV sequences at
-src, with the result that
ene became flanked on
LV sequences.)
45
But there was an even more important lesson to be learned here. this one concerning the c-src gene. This cellular gene. one among tens of thousands in the
chicken cell genome. could be converted into a potent viral oncogene following
some slight remodeling by a retrovirus such as RSV Because it was a precursor
In to1975,
researchers
bigcalled
surprise:
they found The
thatvery
the concept
viral srcof(v-src)
an active
oncogene. got
c-srcawas
a proto-oncogene.
a
gene is very
related
to a normal
chicken
gene
(c-src).
proto-oncogene
wasclosely
revolutionary:
it implied
that the
genomes
of normal
vertebrate cells carry a gene that has the potential, under certain circumstances. to
induce cell transformation
andathus
cancer.
Why does
virus
carry a chicken gene?
host cell chromosomal DNA
c-src
YRIAYI
ALV virion
INFECTION, •
REVERSE
dsDNA
TRANSCRIPTION provirus
INTEGRATION
TRANSCRIPTION
!
provirus
,_
_
I
I
c-,,, j _ _
v-src
*
PACKAGING OF
•
HYBRID RNA
ONTO CAPSID RSV virion
RSV evolved from ALV through the accidental incorporation of the
adjacent c-src gene in an ancient chicken.
Chapter 3 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007
• The mechanism by which the src gene causes cells to transform to cancer cells is complex
and still a very active area of investigation. We will address this a bit later in the course.
• Notably, the gene for v-src lacks the introns present in c-src.
46
RNA can adapt well-defined tertiary structures
G
9GA
Did the person who made this
realize they were referring to
RNA structure, or did they think
it was DNA...?
http://prion.bchs.uh.edu/bp_type/bp_structure.html
• Unlike DNA, which exists primarily in a single, very long three-dimensional structure, the double helix, the
various types of RNA exhibit different conformations. Differences in the sizes and conformations of the
various types of RNA permit them to carry out specific functions in a cell.
• The simplest secondary structures in single-stranded RNAs are formed by pairing of complementary bases.
“Hairpins” are formed by pairing of bases within ~ 5 to 10 nucleotides of each other, and “stem-loops” by
pairing of bases that are separated by ~50 to several hundred nucleotides. These simple folds can
cooperate to form more complicated tertiary structures, one of which is termed a “pseudoknot”. As
discussed on the next page, tRNA molecules adopt a well-defined three-dimensional architecture in solution
that is crucial in protein synthesis.
• Stem-loops, hairpins, and other secondary structures can form by base pairing between distant
complementary segments of an RNA molecule. In stem-loops, the single-stranded loop (dark red) between
the base-paired helical stem (light red) may be hundreds or even thousands of nucleotides long, whereas in
hairpins, the short turn may contain as few as 6 – 8 nucleotides.
• Interactions between the flexible loops may result in further folding to form tertiary structures such as the
pseudoknot. This tertiary structure resembles a figure-eight knot, but the free ends do not pass through the
loops, so no knot is actually formed.
• Q: Why the 2-OH in RNA will make B-form helix unstable?
• A: Take a look at the structure of DNA in the Jmol viewer (http://www.rcsb.org/pdb/explore/jmol.do?
structureId=1bna&bionumber=1). At the bottom of the page, change the Syle to Ball and Stick. Find the 2' C
on one deoxyribose and try to imagine a hydroxyl group on it. This hydroxyl group would have a steric clash
with the phosphate attached to the 3' OH, or possibly with the base portion of the adjacent 3' nucleotide.
RNA can adapt well-defined tertiary structures47
>Yeast phenyalanine tRNA
GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUG
AAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAU
UCGCACCA
secondary (2°)
primary (1°)
tertiary (3°)
http://ndbserver.rutgers.edu/atlas/xray/structures/T/tr0001/tr0001.html
• Transfer RNA (abbreviated tRNA), is a small RNA chain (73-93 nucleotides) that transfers a
specific amino acid to a growing polypeptide chain at the ribosomal site of protein synthesis
during translation. It has a 3' terminal site for amino acid attachment. This covalent linkage is
catalyzed by an aminoacyl tRNA synthetase. It also contains a three base region called the
anticodon that can base pair to the corresponding three base codon region on mRNA. Each
type of tRNA molecule can be attached to only one type of amino acid, but because the
genetic code contains multiple codons that specify the same amino acid, tRNA molecules
bearing different anticodons may also carry the same amino acid.
• http://en.wikipedia.org/wiki/Transfer_RNA
48
• mFold is a tool that enables the prediction of DNA or RNA secondary structure.
• It has been in operation since 1995, making it one of the oldest bioinformatics tools on the
web.
• http://mfold.rna.albany.edu/?q=mfold/RNA-Folding-Form
• mFold was developed primarily by Dr. Michael Zuker, now at the Rensselaer Polytechnic
Institute, while he was affiliated with the NRCC and later with Washington University, in St.
Louis.
mFold does a fairly good job of predicting
tRNA 2° structure
49
Rotate and flip
• These are the results obtained when I submitted the yeast phenylalanine sequence to the
mfold server
• Yeast phenyalanine tRNA
• GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGU
GUUCGAUCCACAGAAUUCGCACCA
• The predicted structures are practically identical to the known structure that has been
experimentally determined and verified using multiple techniques.
• But the true structure is not always the one predicted to have the lowest energy
• Human Phenylalanine tRNA
• GCCGAAAUAGCUCAGUUGGGAGAGCGUUAGACUGAAGAUCUAAAGGUCCCUGG
UUCGAUCCCGGGUUUCGGCA
• Try it yourself using tRNA sequences found online, or try to make up your own sequence that
folds into a specific shape!
Q. What factor is mFold not taking into account that could explain the difference between
theoretical and experimental 2° structures?
A. The tertiary structure. Their could be contacts in 3 dimensions between regions that are
distant in primary and secondary structure. The contacts could provide additional stabilization
to one particular arrangement of secondary structural elements.
50
RNA can have catalytic properties
intact mRNA
3‘exon
self-cleaving intron
(a ribozyme)
5‘exon
intermediate
+ GTP
GTP
intermediate
GTP
http://www.rcsb.org/pdb/101/motm.do?momID=65
• Just like enzymes, RNA can act as a catalyst. In nature, the vast majority of examples
involve RNA acting to chemically cleave or splice itself. Most notably, there are a number
of examples of self-cleaving introns
• The structure shown here is for a bacterial “group 1 intron”. This type of intron carries
out its splicing reaction using an extra GTP that acts as the nucleophile to break the
linkage between the 5‘ exon and the intron. In the second stop, the 3‘ exon is transferred
from the intron to the 5’ exon.
• The fact that RNA can act as both a catalyst and an information storage molecule has
led to the formation of the ‘RNA world hypothesis’. This is the idea that the first ‘living’
molecules on Earth may have been RNA-based.
• Researchers have made a variety of unnatural ribozymes in the laboratory.
• Adams et al. (2004) Nature 430: 45-50 (PDB entry 1u6b)
Download