PHAR lecture 7 page 1 Genome Organisation Synopsis: If protein-coding portions of the human genome make up only 1.5% what is the rest doing? Transcription of rRNA and tRNA by RNA pol I and III. Other non-coding RNAs, snoRNAs, miRNAs, natural antisense RNAs and their processing and function. Non-coding RNA Although very little of the genome codes for proteins a large amount of it is transcribed; some estimates have it as high as 90%. This only adds to the paradoxes of the genome. This non-coding RNA (ncRNA) consists obviously of the introns of protein coding genes, non coding genes (what are these??) and sequences which are described as antisense to or overlapping protein coding genes. Let’s consider some of these ncRNAs. The obvious ones, the structural or, more accurately described, infrastructural RNAs such as rRNA and tRNA will be discussed first. Processing rRNA and tRNA. Ribosomal RNA in eukaryotes is actually 4 separate RNA species: 28S RNA, 18S RNA, 5.8S RNA and 5S RNA. The 28S, 18S and 5.8S rRNA are transcribed as a long precursor pre-rRNA of 45S. The bacterial rRNAs (23S, 16S and 5S) are also transcribed as one long molecule. In eukaryotes the prerRNA is cleaved in a sequence of reactions. Initially the 45S pre-rRNA is modified by 2’ O-ribose methylation at many sites (humans have 106 sites) and the uracils are converted to pseudouracils. This process is guided by snoRNAs (we will meet them later). The 5.8S + 28S fragment is cleaved from the 18S then the 5.8S species is released, although it remains hydrogen bonded to the 28S rRNA. The rRNA is then modified by methylation at some sites. There are many copies of the ribosomal RNA sequences in the genome (as well as the histone proteins). Some sequences are required by all cells in such large quantities that they have multiple copies in the genome. Transfer RNA is also transcribed as a long precursor containing several tRNAs joined together. RNase P releases the separate tRNAs by cleavage at the 5’ end of the tRNAs. RNase P is an interesting enzyme because it contains both RNA and protein and it is the RNA component that is capable of the RNase activity. It was this enzyme that led scientists to the discovery of ribozymes; the RNA species capable of catalytic activity. The 3’ end of the tRNAs all have a CCA, some of which are attached after cleavage (some have the sequence encoded in the DNA). The attachment is done by a special enzyme. The CCA is important as this is where the amino acid is attached. Several of the bases in tRNA molecules are modified at this stage. Other non-coding RNAs. Small nuclear RNAs (snRNAs) form part of the spliceosome which cleaves the introns out of mRNA precursors. There are 5 snRNAs; U1, U2, U4, U5 and you guessed it U6. I have no idea what happened to U3??? These RNA species are between 50 and 200 nucleotides long and complex with proteins to form snRNPs (small nuclear ribonucleoprotein particles). PHAR lecture 7 page 2 Apart from these traditional structural RNAs thee are some 800 other ncRNAs identified in mammals and some 20 000 putative ncRNAs from cDNA libraries. Possible DNA or RNA targets for these sequences have largely been identified by their potential to base pair. snoRNA are small nucleolar RNAs between 60 and 300 nucleotides in length. They recognise their target sequence by base pairing and then recruit specialised proteins to perform nucleotide modifications to these RNAs; usually 2’ O-ribose methylation, base deaminations such as adenine to inosine conversions and the addition of pseudouridines. These modifications are crucial to ribosome biogenesis and these were the first demonstrations of a snoRNA role. Since then sno RNAs in conjunction with snRNAs have been suggested as regulators for alternative splice sites. snoRNAs are derived from the introns of pre-mRNA transcripts, suggesting that introns are not “junk” DNA. microRNA (miRNA) and short interfering RNA (siRNA) are very small RNA molecules, ranging between 21 to 25 nucleotides long. These are very recently discovered RNA species and there is intense interest in the literature at present concerning the functions of these molecules. They are seen as the next anti-viral agents, cures for cancer etc even a replacement for fossil fuels!!! (well not really but you get the picture). The 2 species are quite similar, the variations come from their source or origin. MicroRNA comes from short endogenous hairpin loop structures, synthesised by RNA pol II, often from within introns. After the introns are cut out during splicing the hairpin structures are processed in the nucleus by a type of RNAse, known as RNase III endonuclease or Drosha. It cuts off the hairpin loop and the 65 75 nt pre-miRNAs are exported to the cytoplasm by exportin 5 and further processed by another RNase III endonuclease system, Dicer. The mature miRNA s are ~22 nt duplexes and act usually to repress translation of target mRNA sequences. siRNAs are similar but are produced from long double stranded RNA molecules or giant hairpin molecules, often of exogenous origin. This whole process is thought to be part of the cell’s antiviral defense. Researchers can also introduce their own double stranded RNA. The double stranded molecules are processed by Dicer. The processed interfering RNA (RNAi) can catalyse the destruction of endogenous mRNAs of the same sequence and this process has been used very successfully by scientists to silence genes or knock them down. These small RNA sequences are thought to have a role in post transcriptional modulation of gene expression during development and differentiation. The vast number of these small non-coding RNAs, mostly located in introns goes a long way to account for the transcription of much of the genome. Other genomes. Some organelles have their own genomes, in particular the mitochondrion and the chloroplast. Let’s cover the mitochondrion first. Mitochondria have themselves evolved from bacteria, after they moved into larger cells and developed a nice symbiotic relationship with these “host” cells. This process is known as endosymbiosis. The genetics of the present day mitochondria have evolved from the first invading bacteria, in particular the circular chromosome. There can be multiple copies of this circular chromosome in one organelle and the chromosome size varies from species to species. There is almost a C-value paradox concerning mitochondrial genome size as well. Humans and most animals have a ~16 kb mitochondrial genome, whereas yeast have an PHAR lecture 7 page 3 ~80 kb genome and some plants have 200 – 2000 kb mitochondrial chromosomes. Yet all mitochondrial genomes code for roughly the same number of genes; 13 essential components of oxidative phosphorylation and the electron transport pathway. Many of these proteins must be embedded into membrane as they are translated. As well as these proteins the mitochondrial genome encodes all the ribosomal RNA (16S and 12S) and tRNAs (22 in total) required to translate these specialised proteins. As only 22 different tRNAs are required for all the translation it poses some interesting questions. Even with the wobble, using the standard “universal” genetic code you need a minimum of 31 tRNAs to code for all 61 different codons. How does the mitochondria get away with 22 tRNAs? It has a slightly different code to the regular genetic code but these are very minor e.g. UGA codes for trp in the human code but is a stop codon in the mitochondrion. However, the discrepancy in the number of tRNAs required has more to do with some “sloppy” base pairing between the codons and anticodons in the mitochondrion. The U in the first position of the anticodon can pair with any base in the 3rd position of the codon; kind of like a wild card. This relaxation means one anticodon can base pair with 4 codons. The mitochondrial genome comes from the mother. Almost all mitochondria in a fertilised egg come from the mother, hence maternal genetics can be traced by the mitochondrial genome. Most of the mitochondrial proteins are encoded in the nuclear genome. These include all the enzymes neded to synthesise the DNA for the mitochondrial chromosome, the enzymes of the citric acid cycle and many of the soluble enzymes in oxidative phosphorylation. These proteins are synthesised in the cytoplasm and transferred to the mitochondria once they have been completely translated. The sequences tagged for the mitochondria have a pre-sequence at the N-terminal of the protein. This 15 – 35 amino acid sequence, containing a large number of positive side chains is recognised by certain receptors which bring them to the surface of the mitochondrion. The proteins, only partially folded, are then imported into the organelle, often having to transport them through 2 membranes. The pre sequences are cleaved once the protein is localised and folding is completed with the help of molecular chaperones. The chloroplast genome is similar to the mitochondrial chromosome, reflecting its similar bacterial origin. The circular chromosome is larger than its mitochondrial counterpart; encoding 30 membrane proteins required for photosynthesis, the four components of the bacterial ribosomal system (23S, 16S, 5S and 4.5S), 20 ribosomal proteins and 30 tRNAs. This is enough tRNAs to translate the universal genetic code. The interesting protein coded for in the chloroplast genome is one of the subunits of the enzyme ribulose bisphosphate carboxylase or Rubisco. This critical enzyme carboxylates, from gaseous CO2, ribulose 1,5 bisphosphate and is responsible for carbon fixation. It is also the most abundant protein on earth and one of its subunits is coded on the chloroplast genome!! The vestigial chloroplast is also of importance to the malaria parasite, the plasmodium. It codes for certain enzymes in fat synthesis and is of interest as a drug target.