The genome organisation - The University of Sydney

advertisement
PHAR lecture 7
page 1
Genome Organisation
Synopsis: If protein-coding portions of the human genome make up only 1.5% what is the
rest doing? Transcription of rRNA and tRNA by RNA pol I and III. Other non-coding RNAs,
snoRNAs, miRNAs, natural antisense RNAs and their processing and function.
Non-coding RNA
Although very little of the genome codes for proteins a large amount of it is transcribed; some
estimates have it as high as 90%. This only adds to the paradoxes of the genome. This non-coding
RNA (ncRNA) consists obviously of the introns of protein coding genes, non coding genes (what are
these??) and sequences which are described as antisense to or overlapping protein coding genes. Let’s
consider some of these ncRNAs. The obvious ones, the structural or, more accurately described,
infrastructural RNAs such as rRNA and tRNA will be discussed first.
Processing rRNA and tRNA.
Ribosomal RNA in eukaryotes is actually 4 separate RNA species: 28S RNA, 18S RNA, 5.8S RNA
and 5S RNA. The 28S, 18S and 5.8S rRNA are transcribed as a long precursor pre-rRNA of 45S. The
bacterial rRNAs (23S, 16S and 5S) are also transcribed as one long molecule. In eukaryotes the prerRNA is cleaved in a sequence of reactions. Initially the 45S pre-rRNA is modified by 2’ O-ribose
methylation at many sites (humans have 106 sites) and the uracils are converted to pseudouracils. This
process is guided by snoRNAs (we will meet them later). The 5.8S + 28S fragment is cleaved from
the 18S then the 5.8S species is released, although it remains hydrogen bonded to the 28S rRNA. The
rRNA is then modified by methylation at some sites. There are many copies of the ribosomal RNA
sequences in the genome (as well as the histone proteins). Some sequences are required by all cells in
such large quantities that they have multiple copies in the genome.
Transfer RNA is also transcribed as a long precursor containing several tRNAs joined together.
RNase P releases the separate tRNAs by cleavage at the 5’ end of the tRNAs. RNase P is an
interesting enzyme because it contains both RNA and protein and it is the RNA component that is
capable of the RNase activity. It was this enzyme that led scientists to the discovery of ribozymes; the
RNA species capable of catalytic activity. The 3’ end of the tRNAs all have a CCA, some of which
are attached after cleavage (some have the sequence encoded in the DNA). The attachment is done by
a special enzyme. The CCA is important as this is where the amino acid is attached. Several of the
bases in tRNA molecules are modified at this stage.
Other non-coding RNAs.
Small nuclear RNAs (snRNAs) form part of the spliceosome which cleaves the introns out of mRNA
precursors. There are 5 snRNAs; U1, U2, U4, U5 and you guessed it U6. I have no idea what
happened to U3??? These RNA species are between 50 and 200 nucleotides long and complex with
proteins to form snRNPs (small nuclear ribonucleoprotein particles).
PHAR lecture 7
page 2
Apart from these traditional structural RNAs thee are some 800 other ncRNAs identified in mammals
and some 20 000 putative ncRNAs from cDNA libraries. Possible DNA or RNA targets for these
sequences have largely been identified by their potential to base pair.
snoRNA are small nucleolar RNAs between 60 and 300 nucleotides in length. They recognise their
target sequence by base pairing and then recruit specialised proteins to perform nucleotide
modifications to these RNAs; usually 2’ O-ribose methylation, base deaminations such as adenine to
inosine conversions and the addition of pseudouridines. These modifications are crucial to ribosome
biogenesis and these were the first demonstrations of a snoRNA role. Since then sno RNAs in
conjunction with snRNAs have been suggested as regulators for alternative splice sites. snoRNAs are
derived from the introns of pre-mRNA transcripts, suggesting that introns are not “junk” DNA.
microRNA (miRNA) and short interfering RNA (siRNA) are very small RNA molecules, ranging
between 21 to 25 nucleotides long. These are very recently discovered RNA species and there is
intense interest in the literature at present concerning the functions of these molecules. They are seen
as the next anti-viral agents, cures for cancer etc even a replacement for fossil fuels!!! (well not really
but you get the picture). The 2 species are quite similar, the variations come from their source or
origin. MicroRNA comes from short endogenous hairpin loop structures, synthesised by RNA pol II,
often from within introns. After the introns are cut out during splicing the hairpin structures are
processed in the nucleus by a type of RNAse, known as RNase III endonuclease or Drosha. It cuts off
the hairpin loop and the 65 75 nt pre-miRNAs are exported to the cytoplasm by exportin 5 and further
processed by another RNase III endonuclease system, Dicer. The mature miRNA s are ~22 nt
duplexes and act usually to repress translation of target mRNA sequences.
siRNAs are similar but are produced from long double stranded RNA molecules or giant hairpin
molecules, often of exogenous origin. This whole process is thought to be part of the cell’s antiviral
defense. Researchers can also introduce their own double stranded RNA. The double stranded
molecules are processed by Dicer. The processed interfering RNA (RNAi) can catalyse the
destruction of endogenous mRNAs of the same sequence and this process has been used very
successfully by scientists to silence genes or knock them down.
These small RNA sequences are thought to have a role in post transcriptional modulation of gene
expression during development and differentiation. The vast number of these small non-coding RNAs,
mostly located in introns goes a long way to account for the transcription of much of the genome.
Other genomes.
Some organelles have their own genomes, in particular the mitochondrion and the chloroplast. Let’s
cover the mitochondrion first. Mitochondria have themselves evolved from bacteria, after they moved
into larger cells and developed a nice symbiotic relationship with these “host” cells. This process is
known as endosymbiosis. The genetics of the present day mitochondria have evolved from the first
invading bacteria, in particular the circular chromosome.
There can be multiple copies of this circular chromosome in one organelle and the chromosome size
varies from species to species. There is almost a C-value paradox concerning mitochondrial genome
size as well. Humans and most animals have a ~16 kb mitochondrial genome, whereas yeast have an
PHAR lecture 7
page 3
~80 kb genome and some plants have 200 – 2000 kb mitochondrial chromosomes. Yet all
mitochondrial genomes code for roughly the same number of genes; 13 essential components of
oxidative phosphorylation and the electron transport pathway. Many of these proteins must be
embedded into membrane as they are translated. As well as these proteins the mitochondrial genome
encodes all the ribosomal RNA (16S and 12S) and tRNAs (22 in total) required to translate these
specialised proteins. As only 22 different tRNAs are required for all the translation it poses some
interesting questions. Even with the wobble, using the standard “universal” genetic code you need a
minimum of 31 tRNAs to code for all 61 different codons.
How does the mitochondria get away with 22 tRNAs?
It has a slightly different code to the regular genetic code but these are very minor e.g. UGA codes for
trp in the human code but is a stop codon in the mitochondrion. However, the discrepancy in the
number of tRNAs required has more to do with some “sloppy” base pairing between the codons and
anticodons in the mitochondrion. The U in the first position of the anticodon can pair with any base in
the 3rd position of the codon; kind of like a wild card. This relaxation means one anticodon can base
pair with 4 codons.
The mitochondrial genome comes from the mother. Almost all mitochondria in a fertilised egg come
from the mother, hence maternal genetics can be traced by the mitochondrial genome.
Most of the mitochondrial proteins are encoded in the nuclear genome. These include all the enzymes
neded to synthesise the DNA for the mitochondrial chromosome, the enzymes of the citric acid cycle
and many of the soluble enzymes in oxidative phosphorylation. These proteins are synthesised in the
cytoplasm and transferred to the mitochondria once they have been completely translated. The
sequences tagged for the mitochondria have a pre-sequence at the N-terminal of the protein. This 15 –
35 amino acid sequence, containing a large number of positive side chains is recognised by certain
receptors which bring them to the surface of the mitochondrion. The proteins, only partially folded,
are then imported into the organelle, often having to transport them through 2 membranes. The pre
sequences are cleaved once the protein is localised and folding is completed with the help of
molecular chaperones.
The chloroplast genome is similar to the mitochondrial chromosome, reflecting its similar
bacterial origin. The circular chromosome is larger than its mitochondrial counterpart; encoding 30
membrane proteins required for photosynthesis, the four components of the bacterial ribosomal
system (23S, 16S, 5S and 4.5S), 20 ribosomal proteins and 30 tRNAs. This is enough tRNAs to
translate the universal genetic code. The interesting protein coded for in the chloroplast genome is one
of the subunits of the enzyme ribulose bisphosphate carboxylase or Rubisco. This critical enzyme
carboxylates, from gaseous CO2, ribulose 1,5 bisphosphate and is responsible for carbon fixation. It is
also the most abundant protein on earth and one of its subunits is coded on the chloroplast genome!!
The vestigial chloroplast is also of importance to the malaria parasite, the plasmodium. It codes for
certain enzymes in fat synthesis and is of interest as a drug target.
Download