28 Biomolecular chemistry 2. RNA and transcription • Primary Source Material • Biochemistry Berg, Jeremy M.; Tymoczko, John L.; and Stryer, Lubert (courtesy of the NCBI bookshelf) • Molecular Cell Biology Lodish, Harvey; Berk, Arnold; Zipursky, S. Lawrence; Matsudaira, Paul; Baltimore, David; Darnell, James E. (courtesy of the NCBI bookshelf) • Many figures and the descriptions for the figures are from the educational resources provided at the Protein Data Bank (http://www.pdb.org/) • Most of these figures and accompanying legends have been written by David S. Goodsell of the Scripps Research Institute and are being used with permission. I highly recommend browsing the Molecule of the Month series at the PDB (http:// www.pdb.org/pdb/101/motm_archive.do) The Central Dogma 29 information U.S. Department of Energy Human Genome Program (http://www.ornl.gov/hgmis) • There are many ways of stating the central dogma of molecular biology. Apparently Francis Crick originally defined it like this: • The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that information cannot be transferred back from protein to either protein or nucleic acid. (http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology) • That way that I think of the Central Dogma is: genetic information tends to flow from DNA to RNA to proteins. • The information stored as DNA becomes useful through gene expression. • Gene expression means the production of a protein or a functional RNA from its gene. • Gene expression involves several steps: • Transcription: A DNA strand is used as the template to synthesize a RNA strand, which is called the primary transcript. • RNA processing: This step involves modifications of the primary transcript to generate a mature mRNA (for protein genes) or a functional tRNA or rRNA. For RNA genes (tRNA and rRNA), the expression is complete after a functional tRNA or rRNA is generated. However, protein genes require additional steps. • Nuclear transport: mRNA has to be transported from the nucleus to the cytoplasm for protein synthesis • Protein synthesis: In the cytoplasm, mRNA binds to ribosomes, which can synthesize a polypeptide based on the sequence of mRNA. • Epigenetic information can be thought of as flowing the other way: changes in the cell (typically in proteins or caused by proteins) that result in changes in gene expression but not in changes in the genetic sequence itself. The roles of RNA: more than just messengers30 rRNA And one more… catalytic RNA • Messenger RNA is the template for protein synthesis (translation). An mRNA molecule may be produced for each gene or group of genes that is to be expressed in E. coli, whereas a distinct mRNA is produced for each gene in eukaryotes. In E. coli, the average length of an mRNA molecule is about 1.2 kilobases (kb). • Transfer RNA carries amino acids in an activated form to the ribosome for peptide-bond formation, in a sequence dictated by the mRNA template. There is at least one kind of tRNA for each of the 20 amino acids. Transfer RNA consists of about 75 nucleotides (having a mass of about 25 kDa), which makes it one of the smallest of the RNA molecules discussed here. • Ribosomal RNA (rRNA), the major component of ribosomes, plays both a catalytic and a structural role in protein synthesis. In E. coli, there are three kinds of rRNA, called 23S, 16S, and 5S RNA because of their sedimentation behaviour. One molecule of each of these species of rRNA is present in each ribosome. • The first catalytic RNA was discovered by Cech and coworkers in the early 1980’s. Most naturally occurring ribozymes have a role in mRNA splicing. However, in vitro evolution has resulted in ribozymes with a variety of different functions. • Q: I was just wondering if you gave us a sequence of DNA let say 5'-AATGCCAGT-3' and you asked us to give the sequence of the strand of mRNA read from the DNA strand. Would it be 5'-ACUGGCAUU-3' because when the RNA polymerase is transcribing DNA it is also read from 3' to 5'? Unless you told us that the DNA strand shown was the coding strand. • A: If you were given a piece of DNA and no other information, the correct answer would be to assume that it is the coding strand and just replace the Ts with Us. In this case AAUGCCAGU. The reason is that we read genes 5' to 3' and, in the absence of other information, you would have to assume that I'd give you the gene in the proper orientation (i.e., the sequence of the coding strand) and as a piece of double stranded (ds) DNA. Now, the RNA polymerase would actually be 'reading' the other strand (template), but it would be creating a sequence identical to the coding sequence provided (except with U's instead of Ts). The answer you provided would be correct only if I specified that the sequence provided was that of the template strand (but I probably wouldn't do that). DNA to RNA (transcription) 31 The 2006 Nobel Prize in Chemistry was awarded to Roger Kornberg for his work in determining the mechanism of RNA polymerase, including solving this crystal structure. David S. Goodsell: The Molecule of the Month appearing at the PDB • RNA synthesis (transcription), is the process of transcribing DNA nucleotide sequence information into RNA sequence information. RNA synthesis is catalyzed by a large enzyme called RNA polymerase. The basic biochemistry of RNA synthesis is common to prokaryotes and eukaryotes, although its regulation is more complex in eukaryotes. Despite substantial differences in size and number of polypeptide subunits, the overall structures of these enzymes are quite similar between prokaryotes and eukaryotes, revealing a common evolutionary origin. • RNA synthesis takes place in three stages: initiation, elongation, and termination. RNA polymerase performs multiple functions in this process: • It searches DNA for initiation sites, also called promoter sites or simply promoters. • It unwinds a short stretch of double-helical DNA to produce a single-stranded DNA template from which it will ‘read’ the sequence. • It selects the correct ribonucleoside triphosphate and catalyzes the formation of a phosphodiester bond. RNA polymerase is completely processive - a transcript is synthesized from start to end by a single RNA polymerase molecule. • It detects termination signals that specify where a transcript ends. • It interacts with activator and repressor proteins that modulate the rate of transcription initiation over a wide dynamic range. These proteins, which play a more prominent role in eukaryotes than in prokaryotes, are called transcription factors. RNA polymerase is a huge factory with many moving parts. The one shown here, from PDB entry 1i6h, is from yeast (Saccharomyces cerevisiae). It is composed of a dozen different proteins. Together, they form a machine that surrounds DNA strands, unwinds them, and builds an RNA strand based on the sequence of the DNA. Once the enzyme gets started, RNA polymerase continues along the DNA copying RNA strands thousands of nucleotides long. • In contrast with DNA synthesis, RNA synthesis can start de novo, without the requirement for a primer. • Most newly synthesized RNA chains carry a highly distinctive tag on the 5’ end: the first base at that end is either pppG or pppA. • RNA chains, like DNA chains, grow in the 5’-3’ direction. • The Nobel Prize in Chemistry for 2006 went to Roger D. Kornberg of Stanford University, CA, USA "for his studies of the molecular basis of eukaryotic transcription" (http://nobelprize.org/nobel_prizes/chemistry/laureates/2006/index.html) The transcription bubble 32 David S. Goodsell: The Molecule of the Month appearing at the PDB • The region containing RNA polymerase, DNA, and newly synthesized RNA is called a transcription bubble because it contains a locally melted “bubble” of DNA. The newly synthesized RNA forms a hybrid helix with the template DNA strand. This RNA-DNA helix is about 8 bp long, which corresponds to nearly one turn of a double helix (10.4 bp/turn in B-form). • The 3’ hydroxyl group of the RNA in this hybrid helix is positioned so that it can attack the alpha-phosphate atom of an incoming ribonucleoside triphosphate. The core enzyme also contains a binding site for the other DNA strand. About 17 bp of DNA are unwound throughout the elongation phase, as in the initiation phase. The transcription bubble moves a distance of 170 Å (17 nm) in a second, which corresponds to a rate of elongation of about 50 nucleotides per second. Although rapid, it is much slower than the rate of DNA synthesis, which is about 800 nucleotides per second. • As you might expect, RNA polymerase needs to be accurate in its copying of genetic information. To improve its accuracy, it performs a simple proofreading step as it builds an RNA strand. The active site is designed to be able to remove nucleotides as well as add them to the growing strand. The enzyme tends to hover around mismatched nucleotides longer than properly added ones, giving the enzyme time to remove them. This process is somewhat wasteful, since proper nucleotides are also occasionally removed, but this is a small price to pay for creating better RNA transcripts. Overall, RNA polymerase makes an error about once in 10,000 nucleotides added, or about once per RNA strand created. This rate is about 10,000 fold higher than DNA synthesis. The much lower fidelity of RNA synthesis can be tolerated because mistakes are not transmitted to progeny. For most genes, many RNA transcripts are synthesized; a few defective transcripts are unlikely to be harmful. • PDB entry 1msw reveals the structure of a very small RNA polymerase that is made by the bacteriophage T7, shown here with blue tubes. A small transcription bubble, composed of two DNA strands and an RNA strand, is bound in the active site. Notice how the two DNA strands form a double helix at the top of the picture. The enzyme separates them in the middle and builds an RNA strand using the DNA on the right. Finally, at the bottom, the two DNA strands come back together. • This structure was not determined by Roger Kornberg, but rather Tom Steitz, a famous x-ray crystallographer. Professor Steitz was awarded the The Nobel Prize in Chemistry 2009 "for studies of the structure and function of the ribosome"(http://nobelprize.org/nobel_prizes/chemistry/laureates/2009/index.html) • Q: What does this mean "many RNA transcripts are synthesized"? • A: This statement refers to the fact that there is not one mRNA for each gene. When a gene is being expressed, it implies that there are many RNA polymerases are copying it and making many mRNA molecules. If a couple of them have a mistake, it is probably not a big deal. Transcription is a highly regulated process 33 Example: the lac operon lactose absent lactose present For transcription to occur, lactose must bind to the lac repressor. This binding changes the conformation of the protein such that it can no longer bind to the operator site and interfere with the function of RNA polymerase • With only a few exceptions, every cell of the body contains a full set of chromosomes and identical genes. Only a fraction of these genes are turned on at any one time, however, and it is the subset that is "expressed" that confers unique properties to each cell type. • "Gene expression" is the term used to describe the transcription of the information contained within the DNA, the repository of genetic information, into messenger RNA (mRNA) molecules that are then translated into the proteins that perform most of the critical functions of cells. • Biologists study the kinds and amounts of mRNA produced by a cell to learn which genes are expressed, which in turn provides insights into how the cell responds to its changing needs. • Gene expression is a highly complex and tightly regulated process that allows a cell to respond dynamically both to environmental stimuli and to its own changing needs. This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary. • The lac operon shown in this movie is one of the simplest gene regulation mechanisms, but it is actually a bit more complicated than the extremely simplified version shown here. • Although not shown here, the CAP protein that we met earlier is also involved in this process. Specifically, CAP is a transcription factor. That is, it binds to a site next to the promoter site, and increases the affinity of the DNA for RNA polymerase. This makes RNA polymerase more likely to bind and so transcription is further increased. • Q: What is the termination signal for RNA synthesis? • A: Interestingly, RNA synthesis (transcription) is typically brought to a stop by the formation of an RNA stem loop structure. The presence of this structure in the newly formed RNA is sufficient to disrupt the interaction between RNA polymerase and the DNA, so the enzyme falls off and synthesis is stopped. The mechanisms of DNA and RNA elongation34 are similar Active site of DNA polymerase Active site of RNA polymerase • The catalytic site of RNA polymerase resembles that of DNA polymerase in that it includes two metal ions in its active form. One metal ion remains bound to the enzyme, whereas the other appears to come in with the nucleoside triphosphate and leave with the pyrophosphate. Three conserved aspartate residues of the enzyme participate in binding these metal ions. Note that the overall structures of DNA polymerase and RNA polymerase are quite different; their similar active sites are the products of convergent evolution. • Q: Do metal ions in the catalytic site of RNA polymerase link to the same thing as in DNA polymerase? • A: In both enzymes the metal ions in the catalytic site are held in place by aspartic acid residues. However, there does not appear to be any similarities in the overall structure of DNA and RNA polymerases, showing that this same use of the same type amino acid is an example of convergent evolution. Transcription is much more complex in eukaryotes than in prokaryotes 35 • In prokaryotes (bacterial and archaeal cells defined by the fact that they lack a nucleus), translation of mRNA begins while the transcript is still being synthesized. • In eukaryotes (animal, plant, and fungi cells defined by the fact that they have a nucleus), transcription and translation take place in different cellular compartments: transcription takes place in the membrane-bounded nucleus, whereas translation takes place outside the nucleus in the cytoplasm. • A second major difference between prokaryotes and eukaryotes is the extent of RNA processing. Eukaryotes extensively process nascent pre-mRNA destined to become mature mRNA. Primary transcripts (pre-mRNA molecules), the products of RNA polymerase action, acquire a cap at their 5’ ends and a poly(A) tail at their 3’ ends. Most importantly, nearly all mRNA precursors in higher eukaryotes are spliced. • primary transcript: Initial RNA product, containing introns and exons, produced by transcription of DNA. Many primary transcripts must undergo RNA processing to form the physiologically active RNA species. • Q: In many pictures, it only shows the mRNA. How about tRNA and rRNA, are they also go to the transcription and processing processes? • A: tRNA and rRNA are encoded in the genome and are synthesized by RNA polymerases just like mRNA is. They will also undergo processing, but it is different than the processing that occurs for mRNA. • Q: Once processing is done and mature mRNA is formed, is there any other enzymes that will help it leave the nucleus or is it only the poly(A) tail that helps it do that? • A: There are many proteins that are involved in the process of getting the mRNA out of the nucleus. There is a link to a very nice overview poster on the website. The ultimate goal of these helper proteins is to get the mRNA through the Nuclear Pore Complex, which is the channel that allow stuff in or out of the nucleus. As we will see a bit later, movement of mRNA inside the nucleus is primarily through random diffusion, whereas mRNA molecules outside of the nucleus are specifically transported to the location where they will be translated. • Q: Would you please explain about Prokaryote and Eukaryote? why they are talked about in each part and why they are always being compared? • A: The term 'karyote' refers to the nucleus and 'Pro' means 'before' and 'Eu' means 'true'. So Prokaryotes are the organisms that do not have a nucleus and Eukaryotes are cells that have a nucleus. The nucleus is a membrane-enclosed structure that holds all of the DNA of the cell. Think of the nucleus as the yolk of an egg. Following this analogy, the white of the egg is the cytoplasm. Bacteria and Archaea are prokaryotes and tend to be single cell organisms. All of the complex multicellular life is made up of eukaryotic cells, but there are also many examples of single cell eukaryotes such as bakers yeast and amoeba. It is important to talk about both and compare them because biochemical research often bounces back and forth between these two type of cells, using experiments in one type to draw conclusions about the other. More specifically, researchers often study a process in prokaryotes because it tends to be simpler, and then attempt to extend their results to eukaryotes which are more relevant to human health concerns (because we are eukaryotes). Mature eukaryotic vs. prokaryotic mRNA 36 Prokaryotic mRNA Eukaryotic mRNA Eukaryotic mRNA • The 5' cap (also called an RNA cap, an RNA 7-methylguanosine cap or an RNA m7G cap) is a modified guanine nucleotide that has been added to the 5' end of the messenger RNA shortly after the start of transcription. The 5' cap consists of a terminal 7-methylguanosine residue which is linked through a 5'-5'-triphosphate bond to the first transcribed nucleotide. Its presence is critical for recognition by the ribosome and protection from RNases. • Coding regions are composed of codons, which are decoded and translated into protein by the ribosome. Coding regions begin with the start codon and end with the one of three possible stop codons. In addition to their protein-coding role, portions of coding regions may also serve as regulatory sequences. • Untranslated regions (UTRs) are sections of the RNA before the start codon and after the stop codon that are not translated, termed the five-prime untranslated region (5' UTR) and threeprime untranslated region (3' UTR), respectively. These regions are transcribed as part of the same transcript as the coding region. Several roles in gene expression have been attributed to the untranslated regions, including mRNA stability, mRNA localization, and translational efficiency. The ability of a UTR to perform these functions depends on the sequence of the UTR and can differ between mRNAs. • The 3' poly(A) tail is a long sequence (often several hundred) of adenine nucleotides added to the "tail" (3' end) of the pre-mRNA. This addition is catalyzed by poly(A)polymerase and does not require a template (still happens 5’ to 3’, of course). • Q: I did not understand the “cap” part that you mentioned. Why is it necessary for the processing of mRNA? • A: The 5' cap on mRNA is important because it is a modification that is recognized by other proteins in the cell and identifies the mRNA as one that should be transported from the nucleus and read by the ribosome. mRNA molecules that do not have a 5' cap are quickly degraded. • Q: After splicing there are some regions in mature mRNA which are called UTR regions. If during splicing all the nucleotides between two exons cut off as introns, where do UTR nucleotides come from? I mean are all the nucleotides in exons translated or not? • A: The region that is translated (that is, read by the ribosome) are just the nucleotides that start at the start codon and end with the stop codon. Before the the start codon and after the stop codon are additional regions called the un-translated regions. As the name implies, these regions are not translated. There probably could be exons and introns in the UTR regions as well. Although exons are said to be 'expressed', it is better just to think of them as the parts of the mRNA that end up in the mature RNA, and not to think about them as the coding region. • Q. I have a question about 5'UTR and 3'UTR in the mRNA, are these part created during the process of transcription and left untranslated, or there will be added in the process of making mature mRNA? If they are added later? Are there the same sequence for all mRNA or what is the base of their creation? • A. The 5' and 3' UTR regions are created during transcription. They are encoded for within the genome and will be different for every gene. Essentially, these are the bits of the RNA between the 5' end and the start codon, and after the stop codon but before the poly A sequence. • Adapted from http://en.wikipedia.org/wiki/MRNA Splicing of mammalian mRNA: Introns and Exons 37 The primary transcript is ‘spliced’ to form the correct reading sequence of the gene One pre-mRNA molecule can be spliced in a variety of different ways, depending on the tissue where it is expressed! Chapter 1 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007 • Intron: Part of a primary transcript (or the DNA encoding it) that is removed by splicing during RNA processing and is not included in the mature, functional mRNA, rRNA, or tRNA; also called intervening sequence. • Exon: Segments of a eukaryotic gene (or of its primary transcript) that reaches the cytoplasm as part of a mature mRNA, rRNA, or tRNA molecule. • Introns are precisely excised from primary transcripts, and exons are joined to form mature mRNAs with continuous messages. Alternative splicing enlarges the repertoire of proteins in eukaryotes and is a clear illustration of why the proteome is more complex than the genome. • One pre-mRNA molecule can be spliced in a variety of different ways, depending on the tissue where it is expressed! The specific example shown on this slide is the pre-mRNA molecule for alpha-tropomyosin, which is involved in the ability of a cell to contract. The introns are the black lines and the exons are the blue segments. • Q: How could a cell accurately sense the decreasing of some particular proteins and find that pieces of exons on DNA since in most cases the introns are much larger than the exons. Is that the protein's specific amino acids sequence that reflect the complementary mRNA sequence that it wants makes this happen without making any mistakes? • A: Introns are recognized by a very large complex of RNA and protein known as the spliceosome. There are parts of the spliceosome that interact with the 5' GU, the 3' AG, the A at the branch point, and the polypyrimidine tract. Some of these interactions are mediated by RNA-RNA interactions and some are mediated by protein-RNA interactions. Since all introns would share these same 4 features, the interactions of these features with the spliceosome is how the cell is able to accurately identify the presence of introns in mRNA. Splicing of mammalian mRNA: Introns and Exons 38 • Figure and following legend from Nature Reviews Genetics 5, 389-396 (May 2004) “There are several conserved motifs in the nucleotide sequences near the intron–exon boundaries that act as essential splicing signals: GU and AG dinucleotides at the exon–intron and intron– exon junctions, respectively (5'- and 3'-splice sites), a polypyrimidine tract (Py)n and an A nucleotide at the branch site. Splicing takes places in two transesterification steps. In the first step, the 2'-hydroxyl group of the A residue at the branch site attacks the phosphate at the GU 5'-splice site. This leads to cleavage of the 5' exon from the intron and the formation of a lariat intermediate. In the following step, a second transesterification reaction, which involves the phosphate (p) at the 3' end of the intron and the 3'-hydroxyl group of the detached exon, ligates the two exons. This reaction releases the intron, still in the form of a lariat.” • Q: I am just wondering that why has nature created introns? • A: Good question. This simplest answer is that they are necessary for the process of 'alternative splicing'. Splicing is a highly regulated process and it turns out that, in response to specific biochemical conditions, the spliceosome can purposely skip an exon during splicing. At the protein level, this can mean that a particular portion (typically one whole domain) of a protein could be missing, or potentially swapped with a different polypeptide sequence. This process of alternative splicing means that the the ~20,000 protein-coding genes of humans actually encode for much more than 20,000 proteins (the majority of genes are thought to have alternative splicing options). It is generally thought that alternative splicing is one of the keys for enabling animals as complex as humans to exist despite our relatively small number of genes. RNA processing in the mitochondria 39 RNA polymerase mTERF1 stops transcription from HSP2 Human mtDNA only has 3 promoters, 2 for the heavy chain (HSP1 and HSP2) and 1 for the light chain. This means that all of the genes (including mRNA, tRNA, and rRNA) get synthesized as one long strand. Pearce, S., et al., Mitochondrial diseases: Translation matters, Mol. Cell. Neurosci. (2012) Rackham et al. The human mitochondrial transcriptome and the RNA-binding proteins that regulate its expression. WIREs RNA 2012, 3: 675-695 • Since the mtDNA is transcribed as one long RNA molecule, it needs to be cut into smaller pieces that represent the gene products. • The tRNA sections of the RNA will fold up, even in the context of the longer chain • RNAse enzymes recognize the folded tRNA and cut the RNA at the 5’ and 3’ ends to release the tRNAs. As you can see from the genome, the tRNAs occur between the other coding sequences, so by releasing the tRNAs, the other sequences are also released • Further processing gives the final tRNA, rRNA, and mRNA molecules • There are no introns in human mitochondrial genes • There is no 5‘ cap on mitochondrial mRNA, though there is a poly A tail. • For translation, ribosomes are assembled in the mitochondria from the mitochondrial rRNA plus additional proteins encoded in the nuclear genome. HIV: reverse transcriptase is essential 40 see ‘HIV live cycle’ animation on webpage • Retroviruses: these viruses can reverse the flow of genetic information (RNA to DNA rather than from DNA to RNA)! The most famous retrovirus is human immunodeficiency virus 1 (HIV-1), the cause of AIDS. Retroviruses have two identical copies of a single-stranded RNA genome and an outer envelope containing protruding viral glycoproteins. • The retroviral envelope fuses directly with the plasma membrane (step 1). • Following fusion, the nucleocapsid enters the cytoplasm of the cell; then deoxynucleoside triphosphates from the cytosol enter the nucleocapsid, where viral reverse transcriptase and other proteins copy the ssRNA genome of the virus into a dsDNA copy (step 2). • The viral DNA copy is transported into the nucleus (only one host-cell chromosome is depicted) and integrated into one of many possible sites in the host-cell chromosomal DNA (step 3). • The integrated viral DNA, referred to as a provirus, is transcribed by the host-cell RNA polymerase, generating mRNAs (dark red) and genomic RNA molecules (light red). The host-cell machinery translates the viral mRNAs into glycoproteins and nucleocapsid proteins (step 4). • The latter assemble with genomic RNA to form progeny nucleocapsids, which interact with the membrane-bound viral glycoproteins. Eventually the host-cell membrane buds out and progeny virions are pinched off (step 5). • Q: Is that possible that HIV get into a piece of intron and get chopped out when the normal transcription happens since as you said that the size of intron is much larger than the exon. • A: One of the proteins in the HIV genome is known as Tat. This protein is a transcription factor that promotes enhanced expression of the HIV genes. There is probably always a low level of transcription resulting in the production of a small amount of Tat. As the Tat protein accumulates, there is a positive feedback cycle that results in increased expression of the HIV genes. It is unlikely that the provirus would end up in a intron, though I suppose it is possible. It is important to keep in mind that an infected individual would have many infected cells, and in each one the provirus will integrate in a different spot. It has been reported that HIV tends to integrate into genes that are highly transcribed. • Q: Another question is that the DNA that the HIV virus RNA made, how do they get integrated into the host cell's chromosomal DNA? They always get expressed? • A: The key to the integration of HIV dsDNA (following reverse transcription) is the enzyme integrase. This enzyme is responsible for delivering the dsDNA to the chromosomal DNA, cutting the chromosomal DNA, and inserting the viral DNA into the chromosomal DNA by a strand-swapping mechanism. This movie is a nice overview of the whole process: http://www.msd.com/pro/hiv/mk518/videos/English/moa_video.html HIV: reverse transcriptase + primer 41 + primer 9G AG David S. Goodsell: The Molecule of the Month appearing at the PDB • Reverse transcriptase performs several different functions. As indicated by the name, it can build DNA strands based on an RNA template. This reaction is performed in the polymerase active site, which is formed by two sets of arms that surround the RNA and DNA. The polymerase site is at the top in this illustration, taken from PDB entry 2hmi. After building the DNA strand, the enzyme then removes the original RNA strand by cleaving it into pieces. This is performed by a nuclease active site, which is located at the opposite end of the enzyme. Finally, it builds a second DNA strand matched to the one that was just created to form the final DNA double helix. This reaction is also performed by the polymerase site. • Reverse transcriptase performs a remarkable feat, reversing the normal flow of genetic information, but it is rather sloppy in its job. The polymerases used to make DNA and RNA in cells are very accurate and make very few mistakes. This is essential because they are the caretakers of our genetic information, and mistakes may be passed on to our offspring. Reverse transcriptase, on the other hand, makes lots of mistakes, up to about one in every 2,000 bases that it copies. This high error rate turns out to be an advantage for the virus in this era of drug treatment. The errors allow HIV to mutate rapidly, finding drug resistant strains in a matter of weeks after treatment begins. Fortunately, the recent development of treatments that combine several drugs are often effective in combating this problem. Since the virus is simultaneously attacked by several different drugs, it cannot mutate to evade all of them at the same time. How do cancer cells escape crisis due to telomere shortening? They express the enzyme telomerase which can extend their telomeres. 42 hTERT is the main catalytic subunit of the human telomerase enzyme Telomerase activity allows cancer cells to maintain their telomeres and thus avoid crisis Telomerase is a reverse transcriptase! Chapter 10 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007 • In some cases, the ‘crisis’ point may limit the growth of cancer cells • In other cases, cancer cells have gained the ability to regenerate their telomeres by increasing the expression of the enzyme telomerase. • Telomerase is (of course) a normal human enzyme but it is only usually expressed during early stages of development and in the testes. • Telomerase is a complex protein with multiple subunits. One of these subunits is a reverse transcriptase. , much like the enzyme made by HIV and other viruses. • The telomerase also has an RNA portion (451 nucleotides) associated with it, which is essentially the template for the reverse transcript. • HEK, like HeLa are a widely used cell line. These cells are derived from normal Human Embryonic Kidney cells that were transformed with viral DNA. They were created in the 1970s at McMaster University in Canada. Speaking of cancer: another nasty retrovirus trick 43 In the 1970s it was generally thought that many cancers were infectious disease spread by viruses. At the time, there was a lot of evidence to suggest that this is the case. In 1911 Peyton Rous (Nobel 1966) demonstrated that something small enough to pass through filter paper could cause cancer (sarcoma) in a chicken. The ‘thing’ that caused the cancer was found to be retrovirus which was designated Rous sarcoma virus (RSV). It is now understood that viruses do have a roll in about 20% of human cancer. Specifically, Hepatitis B (HBV), Hepatitis C (HCV), and human papillomaviruses (HPV) trigger some commonly occurring cancers. http://jezebel.com Chapter 3 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007 • ALV is Avian Leukosis virus which infects chickens but does not cause tumor formation. ALV is another example of a retrovirus (that is, it has a similar life cycle to HIV), but is only infects birds • RSV is Rous sarcoma virus which does cause tumor (sarcomas) in chickens • Gardasil (Merck) is the vaccine against HPV (a virus with a DNA genome, not RNA). It was FDA approved in 2006 for the prevention of cervical cancer. The vaccine is the capsular protein assembled into virus-like particles that lack the DNA genome. src is to blame Normally, cells cultured in a dish (in this case, chicken cells) will stop growing when their edges touch (contact inhibition). Once infected with RSV, they loose contact inhibition and grow into a larger mass that is many cells thick. They also change their shape and gain the ability to grow without attachment to a surface. This looks a lot like how we think tumors must grow inside of a human body. • Like HIV and ALV, RSV has a simple genome with a gene for core proteins (gag), a gene for reverse transcriptase (pol) and a gene for the envelope protein (env). • Unlike HIV and ALV RSV has one more gene which seemed to be causing the sarcoma to develop. This gene was designated src. The src gene must be responsible for causing the sarcoma. Accordingly src is the classic example of an oncogene, a gene that can transform a normal cell into a cancer cell. Chapter 3 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007 44 types of viruses carry genes that have little if any relatedness to DNA sequences native to the cells that they infect (see Sidebar 3.7). RSV evolved from ALV apture of src by avian The precise mechanism ian leukosis virus (ALV) ular src gene (c-src) is not sible scenario is indicated proviral DNA (red) ated (by chance) next to a ogene (green) in an n cell. The ALV provirus src gene were coo a single hybrid RNA and brown) After rc introns (not shown), RNA was packaged into that became the ancestor a virus (RSV). (Not shown nt of ALV sequences at -src, with the result that ene became flanked on LV sequences.) 45 But there was an even more important lesson to be learned here. this one concerning the c-src gene. This cellular gene. one among tens of thousands in the chicken cell genome. could be converted into a potent viral oncogene following some slight remodeling by a retrovirus such as RSV Because it was a precursor In to1975, researchers bigcalled surprise: they found The thatvery the concept viral srcof(v-src) an active oncogene. got c-srcawas a proto-oncogene. a gene is very related to a normal chicken gene (c-src). proto-oncogene wasclosely revolutionary: it implied that the genomes of normal vertebrate cells carry a gene that has the potential, under certain circumstances. to induce cell transformation andathus cancer. Why does virus carry a chicken gene? host cell chromosomal DNA c-src YRIAYI ALV virion INFECTION, • REVERSE dsDNA TRANSCRIPTION provirus INTEGRATION TRANSCRIPTION ! provirus ,_ _ I I c-,,, j _ _ v-src * PACKAGING OF • HYBRID RNA ONTO CAPSID RSV virion RSV evolved from ALV through the accidental incorporation of the adjacent c-src gene in an ancient chicken. Chapter 3 of Robert A. Weinberg, The Biology Of Cancer, Volume 1; Garland Pub, 2007 • The mechanism by which the src gene causes cells to transform to cancer cells is complex and still a very active area of investigation. We will address this a bit later in the course. • Notably, the gene for v-src lacks the introns present in c-src. 46 RNA can adapt well-defined tertiary structures G 9GA Did the person who made this realize they were referring to RNA structure, or did they think it was DNA...? http://prion.bchs.uh.edu/bp_type/bp_structure.html • Unlike DNA, which exists primarily in a single, very long three-dimensional structure, the double helix, the various types of RNA exhibit different conformations. Differences in the sizes and conformations of the various types of RNA permit them to carry out specific functions in a cell. • The simplest secondary structures in single-stranded RNAs are formed by pairing of complementary bases. “Hairpins” are formed by pairing of bases within ~ 5 to 10 nucleotides of each other, and “stem-loops” by pairing of bases that are separated by ~50 to several hundred nucleotides. These simple folds can cooperate to form more complicated tertiary structures, one of which is termed a “pseudoknot”. As discussed on the next page, tRNA molecules adopt a well-defined three-dimensional architecture in solution that is crucial in protein synthesis. • Stem-loops, hairpins, and other secondary structures can form by base pairing between distant complementary segments of an RNA molecule. In stem-loops, the single-stranded loop (dark red) between the base-paired helical stem (light red) may be hundreds or even thousands of nucleotides long, whereas in hairpins, the short turn may contain as few as 6 – 8 nucleotides. • Interactions between the flexible loops may result in further folding to form tertiary structures such as the pseudoknot. This tertiary structure resembles a figure-eight knot, but the free ends do not pass through the loops, so no knot is actually formed. • Q: Why the 2-OH in RNA will make B-form helix unstable? • A: Take a look at the structure of DNA in the Jmol viewer (http://www.rcsb.org/pdb/explore/jmol.do? structureId=1bna&bionumber=1). At the bottom of the page, change the Syle to Ball and Stick. Find the 2' C on one deoxyribose and try to imagine a hydroxyl group on it. This hydroxyl group would have a steric clash with the phosphate attached to the 3' OH, or possibly with the base portion of the adjacent 3' nucleotide. RNA can adapt well-defined tertiary structures47 >Yeast phenyalanine tRNA GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUG AAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAU UCGCACCA secondary (2°) primary (1°) tertiary (3°) http://ndbserver.rutgers.edu/atlas/xray/structures/T/tr0001/tr0001.html • Transfer RNA (abbreviated tRNA), is a small RNA chain (73-93 nucleotides) that transfers a specific amino acid to a growing polypeptide chain at the ribosomal site of protein synthesis during translation. It has a 3' terminal site for amino acid attachment. This covalent linkage is catalyzed by an aminoacyl tRNA synthetase. It also contains a three base region called the anticodon that can base pair to the corresponding three base codon region on mRNA. Each type of tRNA molecule can be attached to only one type of amino acid, but because the genetic code contains multiple codons that specify the same amino acid, tRNA molecules bearing different anticodons may also carry the same amino acid. • http://en.wikipedia.org/wiki/Transfer_RNA 48 • mFold is a tool that enables the prediction of DNA or RNA secondary structure. • It has been in operation since 1995, making it one of the oldest bioinformatics tools on the web. • http://mfold.rna.albany.edu/?q=mfold/RNA-Folding-Form • mFold was developed primarily by Dr. Michael Zuker, now at the Rensselaer Polytechnic Institute, while he was affiliated with the NRCC and later with Washington University, in St. Louis. mFold does a fairly good job of predicting tRNA 2° structure 49 Rotate and flip • These are the results obtained when I submitted the yeast phenylalanine sequence to the mfold server • Yeast phenyalanine tRNA • GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGU GUUCGAUCCACAGAAUUCGCACCA • The predicted structures are practically identical to the known structure that has been experimentally determined and verified using multiple techniques. • But the true structure is not always the one predicted to have the lowest energy • Human Phenylalanine tRNA • GCCGAAAUAGCUCAGUUGGGAGAGCGUUAGACUGAAGAUCUAAAGGUCCCUGG UUCGAUCCCGGGUUUCGGCA • Try it yourself using tRNA sequences found online, or try to make up your own sequence that folds into a specific shape! Q. What factor is mFold not taking into account that could explain the difference between theoretical and experimental 2° structures? A. The tertiary structure. Their could be contacts in 3 dimensions between regions that are distant in primary and secondary structure. The contacts could provide additional stabilization to one particular arrangement of secondary structural elements. 50 RNA can have catalytic properties intact mRNA 3‘exon self-cleaving intron (a ribozyme) 5‘exon intermediate + GTP GTP intermediate GTP http://www.rcsb.org/pdb/101/motm.do?momID=65 • Just like enzymes, RNA can act as a catalyst. In nature, the vast majority of examples involve RNA acting to chemically cleave or splice itself. Most notably, there are a number of examples of self-cleaving introns • The structure shown here is for a bacterial “group 1 intron”. This type of intron carries out its splicing reaction using an extra GTP that acts as the nucleophile to break the linkage between the 5‘ exon and the intron. In the second stop, the 3‘ exon is transferred from the intron to the 5’ exon. • The fact that RNA can act as both a catalyst and an information storage molecule has led to the formation of the ‘RNA world hypothesis’. This is the idea that the first ‘living’ molecules on Earth may have been RNA-based. • Researchers have made a variety of unnatural ribozymes in the laboratory. • Adams et al. (2004) Nature 430: 45-50 (PDB entry 1u6b)