Uploaded by Nur Alfiani Fauziah Siagian

Molecular Biology II notes

advertisement
MOLECULAR BIOLOGY II
THE PROGRAMS:
The genome of eukaryotes and the complexity of genomes. Gene density. Histone-DNA
interactions, histone types. Chromatin organisation: nucleosomes, solenoids, loops. Posttranslational modifications of histones: bromodomain and chromodomain
The Remodeling complexes.
Replication: replication mechanism: semi-conservative replication, okazaki fragments.
Eukaryotic DNA polymerases. The origins of replication: the ARS. The replication complex: ORC,
primases, helicases, SSB, topoisomerases, Sliding Clamps. Telomerase. Transposition: DNA
transposons and RNA transposons.
The transposition mechanisms. Recombination V (D) J.
Transcription: the mechanism of transcription. Eukaryotic RNA polymerases. Structure of the
eukaryotic promoter: promoters for POL II: the TATA box and the TFII factors
The transcription complex: mediator complex, regulatory proteins and enzymes that modify
chromatin. Sites for regulation, enhancer, silencer and insulator. DNA binding proteins: DNA
binding domain and leucin zipper activation domain, zinc finger, elix loop elix.
Promoters for POL I and for POL III. Maturation of messenger RNA: messenger capping and
polyadenylation.
Splicing: mechanisms and function of group I and group II intron and spliceosome Alternative
splicing Splicing regulation: SR and hn RNP proteins.
Editing of m RNA: specific site deamination.
Translation: Translation mechanism in eukaryotes: beginning phase and factors involved. The
types of RNA involved in the translation. The eukaryotic ribosome. Le Aminoacil t RNA synthetase.
The genetic code. Reading frame and ORF. Translation control.
Small regulatory RNAs: Micro RNA and non-RNA coding
Organization and packaging of eukaryotic DNA
The DNA molecules that make up the chromosomes are closely associated with proteins, forming
the nucleoprotein complex called chromatin. The proteins associated with DNA are mostly small
(140kd) and basic: histones. The remaining proteins are basic and non-histone: High Mobility
Group. The histones are organized in octamers around which the DNA is wrapped, forming the
lowest level of chromatin compaction: the 10nm fiber. Following experiments involving the
treatment of DNA and nucleosomes with nucleases, it was possible to observe that the tract of
DNA that binds histones is usually 200 pb long, of which 146/7 is closely adhered to proteins by
hydrogen bonds (about 140 with minor groove) while the remaining portion of DNA is called a
linker, and can have different lengths.
The histone octamer is formed by a heterodimer of H2A and H2B, and a tetramer of H3 and H4
(more highly conserved among species). H1 is another protein that is not part of the histone core
but still contributes to the DNA compacting on the nucleosome, interacting with the DNA linker
imposes a DNA constriction that allows it to adhere better to the core, and represents the second
level of compaction: the 30nm fiber. The 30nm fiber is subsequently organized in a helical solenoid
structure or in a zigzag pattern, which allow the formation of loops that adhere to the nuclear matrix
or to the chromosome scaffold. Topo II and SMC proteins, structural maintenance of chromosome,
are essential for this interaction.
Histone proteins are mainly composed of lysine and arginine, positively charged. Those of the core
have a highly conserved domain called histone-fold, consisting of 3 alpha-helices, which allows it
to be assembled in dimer or tetramer and then assembled on DNA. They also have an N-terminal
tail that does not serve as a structural function, but is a target region for chemical modifications
that, through interaction with specific enzymes, regulate chromatin accessibility. Queues undergo
post-translational modifications:
acetylation and methylation of arginine (R) and lysine (K)
phosphorylation of serine (S) and threonine (T).
Different modifications on different amino acid residues have different functional meanings: ex:
acetylation on K8 of H2 and on k14 of H3 -> transcriptionally active state
methylation on K9 of H3 and absence of acetylation -> inactive state
The modifications of histone tails are dependent on enzyme complexes such as
1. acetyl transferase and deacetylase
2. methyl transferase and methylase
3. kinase and phosphatase
These modifications are selectively recognized by specific protein domains such as bromine and
chromodomains. The modifications modify the charge of the histone proteins, increasing or
decreasing their interaction with DNA, but they are not sufficient to move the nucleosome so as to
make the gene accessible to the transcription factors, which can be activated. This role is played
by remodeling complexes that release energy thanks to the hydrolysis of ATP and allow the
expulsion or displacement of nucleosomes.
Chromatin (DNA + protein) is subject to noticeable organizational changes during the various
phases of the cell cycle. The interphase chromatin (G1-S-G2) is dispersed in the nucleus
(euchromatin), and the DNA associated with histone proteins is organized in loops, anchored to the
base on a filamentous structure called the nuclear matrix. In the mitotic chromosomes (M), which
present the greatest degree of compaction so as to be visible under an optical microscope
(heterochromatin), the chromatin loops are anchored to a scaffold of specific proteins. To allow the
DNA to be anchored to these protein structures, specific recognition sequences are present on the
DNA, called MAR (Matrix Attachment Region) if they bind the nuclear matrix, and SAR (Scafflod ...)
if they bind the protein scaffold. MAR sequences usually have numerous recognition sites for DNA
Topo II. The interphase chromatin fiber in some nuclear areas has areas with different levels of
compaction, in particular the more densely organized areas define heterochromatin:
- facultative heterochromatin: present in a region that has coding capacity, but which is expressed
only in certain conditions, or in certain tissues rather than in others
- constitutive heterochromatin: it always remains compact and has a mainly structural role rather
than coding (includes satellite DNA in the centromeric regions)
To create a functional artificial (linear) chromosome, the three essential components are: origin of
replication, centromere and telomeres.
The centromere is essential for the correct segregation of the sister chromatids, consisting of a
protein complex assembled on specific DNA sequences. The kinetocore, a fibrous structure to
which the microtubules adhere, is part of this protein complex.
The centromere sequences of S. cerevisiae were the first to be isolated and sequenced; about
120pb long they are divided into three elements: central CDE-II, CDE-I and lateral CDE-III. CDE-II
is about 90pb long and is formed at 90% of A-T, while CDE I and III are formed by very conserved
sequences 9 and 11pb. Each of these regions interacts with proteins specific to the centromere
protein complex. Centromere sequences present a variant of histone H3, CENP-A, with a more
extensive N-terminal tail to allow better interaction with kinetocore proteins.
Telomeres serve to stabilize the ends of linear DNA molecules, and prevent their degradation
(connected to cellular aging and apoptosis). Telomeric DNA consists of repeated tandem
sequences that interact with specific proteins (TTAGG Repeat binding Factor 1 and 2 in
mammals), and form T-loop loop structures that protect the extremities.
The human genome is about 250 times larger than that of yeast, but the number of genes coding
inside remains almost constant across the species (ex: 6k in cerevisiae, 23k in h.sapiens): this
implies that the differences in size of the genome are essentially related to the non-coding portions
of DNA.
Eukaryotic genes are generally discontinuous, the coding portions, exons, are interrupted on
average interrupted by 8-9 introns each. The percentage of exonic DNA constitutes just over 1% of
the human genome. Although the intronic portions are not translated, they contribute to the
expansion of the transcriptome and the proteome, thanks to the alternative splicing process that
allows to combine the exons of a certain gene in a different way
giving different protein products: ex gene for caspase, CASP9 induces apoptosis / CASP9S inhibits
apopotosis. Also the process of exon shuffling for which exonical portions of two different genes
are reassembled through illegitimate crossing over and give life to a chimera protein product.
In the non-transcribed genome there are also pseudogenes, in other words non-functional inactive
genes due to frameshift mutations or premature stop codons. An example in humans are the genes
for olfactory receptors, which in primate mammals has thousands of genes, in humans more than
half have turned into pseudogenes. Although they are inactive from a transcription point of view,
pseudogenes may still have functional roles, for example in m.musculus markorin inactivation
induces severe developmental anomalies.
Another portion of non-coding DNA is represented by satellite DNA, a highly repetitive fraction of
genome consisting of thousands of tandem repetitions of short sequences. These regions can be
identified by centrifugation on CsCl because they have a lower% of G + C (normally around 40% in
the DNA). In human DNA most of the satellite DNA is found in the centromeric regions.
A further reason that explains the reason for the disparity in size of the human genome compared
to that of less developed organisms is represented by the families of genes: they include clusters
of clusters of homeotic genes that come from modifications of the same ancestral gene.
The Replication
Eukaryotic organisms have multiple origins of replication, whose number is linked to the size of the
genome itself, since a single origin of replication would greatly slow down the replication of the
entire genome. However, although there are more origins of replication, not all of them are
activated at each replicative cycle, but the ignition of a certain origin is genetically controlled.
In yeast the origins of replication are called ARS, and about 350 are found. ARS contain a
consensus sequence (A) essential for binding to initiator proteins, flanked by sequences rich in A-T
that favour the separation of the double helix. Also in eukaryotes the origin of replication proceeds
bidirectionally and is characterised by a leading filament and a lagging. Replication is slower in
eukaryotes and the Okazaki fragments are smaller than in prokaryotes (100-200nt)
The number of replicative proteins involved in the process is much higher in eukaryotes.
DNA replication takes place only during the S phase of the cell cycle, and for it to occur correctly it
is necessary that the replisome proteins perform precise functions in precise moments of the cell
cycle to ensure that each region of the genome is replicated only once during each cell cycle,
thanks to the interaction of cyclin and cyclin dependent kinases. During the G1 phase the ORC
pre-replicative complex is formed on the origins: this is mediated by the action of two key proteins
of the CDC6 and CDT1 replicative control that are required for the loading of the MCM2-7 helicase.
When the CDK phosphorylates the CDC6 and CDT1, they break off and activate the helicase,
leading to the opening of the origin and the assembly of the entire replicative complex. The CDK
activity depends on the specific cycline concentrations during the cycle, and its activity is inhibited
during the G1 phase, preventing a second replication from starting outside the S phase.
When the cell enters phase S the CDKs increase their activity and allow the start of replication, and
then decrease it again in phase M. At the opening of the replication origin the RPA (replication
protein A) proteins are bound, with high affinity for SSDNA. The beginning of DNA synthesis
requires the intervention of the alpha-primase polymerase (prim 1-2) which leads to the synthesis
of DNA primers. At this stage the polymerase switch process takes place, in which the alpha
polymerase detaches to allow the loading of pol delta and epsilon, much more processive than the
continuous filaments and lagging. This exchange is mediated by PCNA proteins, which are loaded
onto the ssDNA thanks to the ATPase activity of RFC.
Meanwhile, primary polymath moves to the delayed filament to form the next primer for the next
Okazaki fragment. Moreover, the polysilon and delta have a 3 '-> 5' esonculeasica activity that the
alpha does not have.
The final regions of eukaryotic chromosomes known as telomeres would easily undergo
degradation, so it is important to keep them intact and replicate them without loss of genetic
information.
Telomere length is maintained by the telomerase enzyme.
The final portions of the telomere present a protruding end in 3 ', with a sequence repeated
numerous times in tandem, in man 5'-TTAGGG-3 ’.
Telomerase is a ribonucleoprotein formed by two main domains:
a TERT protein that acts as a reverse transcriptase that synthesizes DNA from RNA, associated
with other accessory proteins (est, ever shorter telomeres)
and a TERC portion, formed by an RNA mold.
Telomerase does not stretch the non-protruding filament in 5 ', but rather continues to extend the
protruding 3'. To do this the 3 'end appears with the TERC mold RNA sequence complementary to
the DNA, while the protein part elongates it synthesizing DNA. Now that the protruding filament is
long enough, the primal polymar intervenes which lengthens the filament 5 ', but which
nevertheless keeps the filament 3' protruding when the RNA primer is removed by the RNase.
When the 3 'filament is stretched, this can match the complementary sequences forming a T-loop
loop region, mediated by protein factors such as POT1 TRF1 and TRF2 TIN2 and RAP1 which
increase its stability. The activity of telomerase is studied both from the point of view of cellular
aging, and as a possible antitumor function since some tumors go to reactivate the enzyme.
Regarding the removal of nucleosomes during replication, both chromatin remodeling and CAF1
and NAP1 chaperonins are required. As the replicative fork advances the dimers disassemble,
while the tetramers are randomly distributed among the daughter molecules, and the spaces left
empty will be filled by isotonic neosynthetic proteins to Reform the correct histone cores. This
mechanism of tetramer partition is a method that the cell has to maintain the information derived
from the chemical modifications of the isotonic code, and therefore allow the epigenetic
inheritance.
The transposition
It is a form of genetic recombination that moves moving elements or transposons from one site to
another within the genome, with or without replication of the mobile element itself. The target site
into which the transposon is inserted does not always present a particular criterion for which it is
selected, sometimes it is an arbitrary choice, sometimes factors such as the curvature of the DNA
or the binding of the target sequence to proteins may or may not favor the insertion of the
transposable element.
In eukaryotes transposition represents the major source of genetic variability, and the genome of
maize and man is composed of up to 50% of transposons (against 1% of coding genes).
There are different families of transposable elements that can be grouped according to the
structure of the transposon and the transposition mechanism:
1. DNA transposons
2. virus-like retrotransposons
3. poly-A retrotransposons
The DNA transposons carry within them the sequences for the enzymes necessary for the
transposition, the transposases, and at both their ends they present sequences of inverted
repetitions that will serve as recombination sites during transposition. This kind of transposons is
said to be autonomous because it contains all the elements necessary for its transposition, while
other elements, called non-autonomous, present only the inverted repetitions but not the genes for
the transposase, therefore they can be transported only through the action of an autonomous
transposase.
Virus-like retrotransposons exhibit repeated inverted sequences within longer segments called
LTR, long terminal repeats. These elements contain the genes for two enzymes necessary for
transposition: an integrase (transposase) and a reverse transcriptase, necessary to convert the
intermediate to RNA into DNA.
The poly-A retrotransposons do not have sequences of repetitions reversed at the ends, but have
untranslated UTR sequences both at 5 'and at 3', and are followed by a repetition queue of A: T
called poly-A tail. They present two ORF1 and ORF2 genes: ORF1 binds RNA while ORF2 has
both endonuclease and reverse transcriptase activity.
All DNA transposons and virus-like retrotransposons use similar non-replicative transposition
mechanisms, called cut-and-paste mechanism.
Initially the transposase binds the inverted repetitions to the end of the transposon and forms a
stable DNA-protein complex, called the synaptic complex.
The transposase cuts one strand at the end of each transposon, releasing a 3'-OH. The
mechanism by which the second filament is cut varies according to the transposon (vd forward or
intact after if there is not).
After having freed the transposon, the 3OH end will make a nucleophilic attack on the
phosphodiester skeleton on the target DNA, then the transposon exciso is inserted into the new
site by transesterification. The 3'-OH attack generates a staggered cut on the target site, and the
two nicks at the ends are filled by the DNA polymerase and then resoldered by a ligase. Filling the
nicks on the sides of the transposon generates direct repetitions.
Also the nick generated at the origin site of the transposon must be filled and closed by cellular
enzymes.
Regarding the filament cutting mechanisms 5 ', different mechanisms can be used:
in the bacterial transposon Tn7 there is a gene for a TnsA protein that acts as a restriction
endonuclease and cuts the 5 'of the element.
in transposons tn5 and tn7 the 5 ’is cut thanks to a transesterification carried out directly from the
free 3’OH end which makes nucleophilic attack on the filament still intact and forms a hairpin that
will then be hydrolysed by the transposase which generates a dsbreak. The transposon Hermes
instead cuts the single filaments according to a different order, so as to form hairpins not on the
ends of the transposon but on the sides that flank the DNA that houses the transposon, in order to
free the element (similar to VDJ recombination) The reason why release by transesterification is
common may be because it is economical for the cell not to have to synthesize another enzyme
that cuts the second strand.
In the replicative transposition mechanism (typical of the Mu phage) the transposase always cuts
the ends 3 'first, generating the free OH, which will attack the target filament while the 5' ends
remain attached to the origin site, creating a structure similar to a fork replicative that recalls the
replicative apparatuses and forms a structure called cointegrated in which there are two
transposons, one newly formed and one deriving from the original one. Replicative transposition is
not widespread in eukaryotic genomes because it poses a high risk of inversions and deletions that
can be deleterious to the cell.
Virus-like retrotransposons use a mechanism similar to that of DNA transposons, but for this to
happen they must first synthesize the cDNA (copied) from the intermediate to RNA thanks to the
action of a reverse transcriptase. The cDNA is recognized by an integrase that cuts a few
nucleotides at the 3 'ends and allows the attack on the target DNA, with the same mechanism of
cut and paste.
The integrases have catalytic domains similar to those of the transposases which preserve a
conserved sequence of two Aspartate residues (D) and a glutamate (E), thus forming the DDE
motif.
Poly-A retrotransposons (LINE SINE ALU) use a different mechanism than copy-and-paste. In the
5'UTR sequence there is the promoter for the transposon genes, which is transcribed into mRNA
and then translated to produce the ORF1 and 2 proteins. ORF1 binds the mRNA and ORF 2, which
has endonuclease activity, cuts the site on the DNA target, favoring sequences rich in T so that
they can be coupled with the poly A tail of the mRNA, to form a DNA-RNA hybrid that serves as a
trigger for the complementary DNA synthesis always by ORF2. In subsequent steps the RNA is
degraded and the DNA complementary to the newly synthesized cDNA is synthesized.
The transposons have efficiently colonized the genome of numerous organisms because their
regulation makes it possible to regulate the number of copies and control the target sites, entering
areas that do not have deleterious effects on the host.
The Mu phage uses a particular transposition mediated by the DDE MuA transposases, and by the
muB protein that hydrolyses ATP and binds the DNA of the target site, and allows the interaction
between the target DNA and the synaptic complex.
The Tc1 / mariner elements found respectively in C.elegans and Drosophila are highly diffused,
and use an autonomous cut and paste mechanism. They are very numerous, but many Tc1 /
mariner elements are inactivated by mutations in the transposase gene that keep the enzyme
activity at sub-optimal levels. Thanks to the fusion of active and inactive elements, the researchers
succeeded in creating a hyperactive artificial Tc1 / mariner element called Sleeping Beauty which is
promising in mutagenesis operations and insertion of genes into eukaryotic DNA.
Virus-like retrotransposons such as Ty elements are very present in yeasts, particularly in
S.cerevisiae and Ty1,3,4 and 5 and are even encapsulated by protein shells inside the cells
themselves, just as if they were endogenous viral particles. They also have a specificity for the
target sites: ty1,3 and 4 transpose upstream of the tRNA genes, while ty5 transpose into the
telomeric sequences or other quiescent regions of the genome.
The most widespread poly-A retrotransposons in the human genome are the LINE elements, which
make up up to 20% of the human genome,
and the SINEs (which include the Alu elements, 1.1 million in the human genome) that have
evolutionary origins different from the LINE but not being autonomous they must rely on the protein
complexes produced by the LINE
Recombination V (D) J
The transcription
In eukaryotes, transcription requires further levels of complexity compared to prokaryotes, due to
the multiple functions and specialization that each eukaryotic cell can take. A first difference is due
to the structure of eukaryotic chromosomes, organized in a complex DNA-protein called chromatin,
which prevents RNA polymerase from advancing. The degree of chromatin compaction is essential
to make the DNA sequences with which RNAPol and transcription proteins interact accessible.
Another difference lies in the fact that eukaryotic RNAPols do not interact directly with DNA by
recognizing the promoter, but require protein factors that can recruit it. Furthermore, in eukaryotes
the genes are not transcribed in operons, but individually. In eukaryotes there are three different
polymerases:
RNAPol I which transcribes the 18S and 28S rRNA genes and works mainly in the nucleolus. Pol II
in the nucleoplasm encodes mRNA, miRNA and some snRNAs
The Pol III in the nucleoplasm that transcribes the tRNAs, the 5S rRNA and the U6 snrNA involved
in splicing.
The Eukaryotic Pol RNAs are experimentally distinguishable because they are all sensitive to
alpha-amanitine (toxin Amanita phalloides)
Each Pol consists of about 12 subunits (500kDa) of which some (1-2-3-6-11) are highly conserved
both in eukaryotic and prokaryotic organisms, and present in all three, while others vary depending
on whether it is Pol I, II or III.
A characteristic of Pol II is the presence of the CTD domain in the Rpb1 subunit, which consists of
a series of repetitions of seven TSPYSPS amino acids, whose residues can be phosphorylated
and play a role in the beginning of transcription. In the mammals of these 7aa repetitions there are
52 copies, and variations in the number of copies are often lethal.
RNA Pol I transcribes the 18 and 28S rRNA genes in clusters of repeated tandem operons, whose
genes are separated by spacers that contain promoters for downstream genes, and are
transcribed into a single precursor RNA. The promoter consists of two regions: a core promoter
element that overlaps the TSS transcription start site, and an upstream promoter called UPE. The
formation of the start complex requires two auxiliary transcription factors:
UBF (upstream binding factor) presents HMG domains (high mobility group) and interacts with both
the core of the promoter and UPE, in the form of dimer, and promotes the formation of a loop on
DNA. Subsequently the element SL1 (Tata Binding protein + TBP Associated Factors) binds the
DNA thanks to the mediation of UBF and calls the RNA Pol I on the TSS to start the transcription.
The RNA Pol III has a promoter that presents two particular boxes, downstream of the TSS of the
genes:
Box A and B in the case of the tRNA genes
Box A and C in the case of 5S rRNA genes
In the case of the tRNA genes the TFIIIC binds both boxes A and B, and recalls the TFIIIB, which is
formed by TPB and TAF subunits. The interaction between TFIIIB and DNA recalls the Pol III on
the TSS. For rRNA5S the factor TFIIIA (zinc finger domains) links box A, which refers to TFIIIC on
box C, which refers to TFIIIB on the TSS and consequently Pol III. In the case of Pol III and II a
curvature of 80 ° was observed due to the binding of TBP with the DNA, which then allows the
correct positioning of the DNA so that the transcription is triggered.
The promoter of Pol II is much longer than those of the other two, and the required transcription
factors are much greater. This is due to the fact that the activity of Pol II must be finely modulated
based on the location of the cell with respect to the various tissues, and based on physiological
and environmental stimuli, therefore it needs more regulation.
The transcription factors of Pol II can be divided into:
basal factors, which are used to recruit the RNA Pol and assemble the start complex
regulatory factors of transcription that function both as activators and repressors of transcription.
A general outline of the structure of Pol II promoters can be expressed as:
- minimum or core promoter, consisting of the DNA necessary to allow the basal factors to start
the transcription; contains the initiator element INR, the TATAbox, the BRE (B resposive element
that links TIIB), CpH islands, the DCE element (downstream core element); in some core
promoters there is no TATA box and they are called TATA-less.
- proximal elements of the promoter placed at about 200 nt upstream of the TSS and elements up
to thousands of nucleotides away from the TSS site called enhancers, silencers and insulators
which instead positively or negatively regulate transcription by binding activating or repressor
proteins.
The formation of the transcription initiation complex is promoted by a high number of basal factors:
these are about 20, whose total weight is greater than 1mil Dalton and are assembled together to
form the PIC Pre-start complex. The first factor to bind is the TFIID, which contains the TBP and at
least 13 TAF. TBP interacts with the minor groove of the DNA in the TATAbox using phenalanine
residues which made two highly conserved loops, which bend the DNA of 80 ° and facilitate its
interaction with other protein factors.
Then TFIIA binds, which has the dual function of stabilizing the TFIID-DNA interaction and
preventing the binding of repressors to the promoter.
Subsequently it binds TFIIB which interacts with both BRE on DNA and RNA Pol thanks to the
auxiliary factor TFIIF.
The TFIIE and TFIIH factors now: TFIIE recruits TFIIH which has the dual function of opening the
DNA thanks to a helicase activity, and of phosphorylating the CTD amino acid tail of the RNA Pol II.
The phosphorylation of the CTD tail causes a variation between the interactions between RNA Pol
II and the components of the pre-initiation complex, allowing the initiation of transcription and
elongation.
In order for RNA Pol II to bind to the pre-start complex, it is necessary to have the mediator, a
multiprotein complex of about 20 subunits, 7 of which are highly conserved between species,
which stabilizes the interactions between PIC and Pol and acts as a bridge between upstream
linked activators the start complex.
the RNA Pol II is formed by several subunits which give it a chela shape. The upper part of the
chela allows the entry of the DNA to be transcribed, while in the lower part there are the entry sites
of ribonucleotides and assembly. The assembly is promoted by the trigger loop protein domain
which allows the correct nucleotide to be discriminated
and promoting the nucleophilic attack of 3OH to the nascent chain, and the removal of Pi cordinata
from Mg2 +
Once triggered the transcription begins the stretching phase, assisted by protein factors such as
TFIIS which stimulate the ribonuclease activity of Pol II in case of insertion of a wrong nucleotide.
Transcription regulation
The regulation mechanisms of Pol II RNA transcription occur through the integration with proteins
generically called regulatory factors, which positively and negatively control the transcription of
specific genes, so as to modulate their expression also based on the response to stimuli external
and cellular requirements.
The promoters of pol II are distinguished in proximal areas (basal factor linkage) and distal, to
which the transcriptional regulators generally bind.
However, even in the proximal regions regulatory factors can be linked, such as in S.cerevisiae
and the regions at -200 nt from the TSS, called UAS upstream activating sequences, bind factors
that activate the transcription (ex Gal 4)
To identify which promoter DNA sequences bind regulatory factors, it is possible to conduct sitespecific mutagenesis experiments in the promoter. The promoter is fused upstream of an easily
identifiable reporter gene (GFP), whose presence can be measured by specific assays. The
chimeric construct is inserted in a cell culture, and by changing specific sequences in the promoter,
the production of GFP is tested to ascertain how certain mutations of certain sequences influence
the activity of the promoter (and therefore if they reduce the expression of the gene). These
mutations often prevent binding to the basal or regulatory factors of transcription.
Both proximal and distal promoters are organised in boxes containing specific sequences, among
the most important proximal elements recognised by activators are the CAAT boc, the GC boxes
and the Oct boxes.
The distal elements that bind regulatory factors that activate transcription are called enhancers,
and bind a complex of activators that creates a loop on the DNA and contacts the proximal region
and the Pol II RNA via the mediator. The proteins that bind the enhancers can either bind
cooperatively to each other (point mutations in one of the proteins of the complex affect the
efficiency of the whole activator), or each individual co-activator behaves like flexible-billboard and
binds into independent of enhancers, and conformational modification is the result of a combined
effect of each. In addition to enhancers, there are also silencers that bind transcription repressors
that for example compete with mediator binding, and insulators that instead influence the action of
enhancosoma on enhancers, preventing them from communicating with a downstream promoter
mediator.
One of the most studied eukaryotic activators is Gal4, required to activate the genes of galactose
metabolism in yeast, by binding to the promoter of the GAL1 gene. When Gal4 binds four sites in
the UAS region, GAL1 transcription increases 1000 times
Thanks to experiments on Gal4 it was possible to demonstrate that the DNA binding domain and
the activation domain of transcription factors are independent and interchangeable.
If the promoter to which gal4 binds is fused to the lacZ gene, the protein is expressed. If Gal4 is
deprived of its activation domain, it binds to DNA but does not allow lacZ transcription.
If a hybrid protein is produced in the laboratory in which the Gal4 DNA binding domain is replaced
by the binding domain of LexA (a bacterial repressor of lacZ), and fused to the Gal4 activation
domain, lacZ transcription is normally activated . This experiment shows unequivocally how the
DBD and AD domains are separate and independent.
This observation allowed the development of a very useful laboratory technique to determine
protein-protein interactions, known as the double hybrid assay. Simplifying: the DBD domain of a
transcription factor is fused to an X protein. DBD-X binds to the promoter of a gene whose product
is particularly recognizable, but in the absence of the activation domain the gene is not expressed.
Similarly, the AD domain is fused to a Y protein. If the Y prey protein interacts with the X protein,
the DBD-X + AD-Y complex is formed, and the reporter gene is transcribed and easily identified.
Activators act on transcription in different ways:
chemically or sterically modifying transcription factors and increasing their affinity to Pol II, or
recalling chromatin remodeling complexes, freeing the promoter region from nucleosomes.
The DNA binding domains interact with the major grooves of the double helix through hydrogen
bonds and hydrophobic interactions. Usually the recognition involves 3-7 nt, while at least 16 are
required for the DNA recognition sequence to be specific; this is possible thanks to the fact that
mostly the activators in their DBD dimerize, interacting with two recognition sequences and
increasing the possibility of combinatorial control of the transcription.
Activators can be classified according to the structure of their DBD.
1. Motif Helix-Turn-Helix (helix-turn-helix) is formed by three folded alpha-helices in which the
recognition helix R interacts with the major groove and interacts with the recognition
sequences. Of this reason there are variants present in certain activators such as those that
regulate homeotic genes in Drosophila: to the three helices a flexible arm is added which
intracts with the minor groove.
2. Zinc Finger domain: present in the TFIIIA factor for transcription of the rRNA 5s. It consists of a
zinc atom that coordinates 4 cysteine and histidine residues, creating a loop of 12-13 amino
acids, which is divided into two subunits: alpha-helix and beta fogietto, of which the alpha helix
interacts with the sulcus greater than DNA. This structure is present in many families of
activators and usually presents a series of zinc fingers joined by short chains of DNA linker of
7-8 nt. Steroid hormone receptors exhibit a zinc finger domain with 4 cysteine residues to
interact on DNA recognition sequences.
3. Leucine zipper domain: dimers formed by long alpha-helices, each formed either by a
dimerization domain or by a DNA binding domain. The DNA binding domain is basic, and is
represented by the terminal part of the two alpha helices which interacts with sequences of 4 nt
on the major sulcus but on the opposite faces of the DNA (drawing). The dimerization domain
has a leucine residue at the level of the primary structure every 7aa. When the secondary
alpha-helix structure is formed, each helix revolution is formed by 3.5 aa, and in this way the
leucine residues on each helix are placed facing each other every two helix turns . The
assembly of the dimer is allowed by the hydrophobic interaction between the two facing
leucines that protrude from the alpha-helix like the teeth of a zipper. Dimerization allows the
alpha helices to unite both in homodimers and in heterodimers, as for example in the case of cJun and c-Fos proto-oncogen heterodimers that regulate gene expression based on stimuli
such as stress, infections and cytokines.
4. Helix-Loop-Helix domain of which each monomer consists of two alpha-helices connected by a
loop, and the longer helix has a basic portion in the N-terminal region, which allows binding to
DNA. Generally activators with this domain recognize E-boxes, promoters of genes involved in
development, such as Myo-D that promotes differentiation of muscle cells. The advantage of
having heterodimers is that one monomer can be constitutively expressed, while the other can
be controlled by specific physiological demands.
5. Domain of the p53 family activators. They contain two DNA binding domains, both through a
DBD with three cysteines coordinated to a zinc atom, and a CTD domain that binds the DNA in
a non-specific manner. The DBD domain binds precise response elements, usually close to the
transcription start site (upstream 300nm). P53s are tumor suppressors, which control
numerous cellular processes. When the cells are healthy, p53 is negatively regulated, but due
to external stimuli the inhibition of p53 is deactivated, and the target genes are expressed
causing apoptosis, cellular aging, autophagy etc.
Many activators activate transcription by recruiting chromatin modification and remodeling
complexes, such as histone acetyltransferase (HAT) that acetyl histone tails and promotes
promoter opening: ex Gal4 recruit SAGA complex Similarly transcription repressors can recruit
enzymes with acetylase activity that further compact the chromatin and make it inaccessible.
Often the regulation of transcription in eukaryotes is linked to the ability of cells to respond to a
variety of external stimuli that induce a specific response. Transcription activators act as a mediator
of the message from outside to inside the cell.
For example, steroid hormone receptors have three domains:
- one of ligand binding.
- a DNA binding domain (zinc finger).
- an activation domain.
When the ligand enters the cell (vitamin D-retinoid-steroid hormones) they bind to the receptor,
allowing their dimerization: the receptor detaches from the Hsp90 protein with which it interacts in
the absence of the ligand, enters the nucleus and interacts with DNA in HRE sequences , hormone
responsive elements, and activates target genes.
Other ligands interact with 7 transmembrane domain receptors that have a conformational change
following the binding that contacts the G protein, which activates the enzyme adenylate cyclase to
produce cAMP. High concentrations of cAMP induce the activation of PKA a kinase that can
phosphorylate transcription factors such as CREB that binds to DNA thanks to
leucine-zipper domains. When CREB is phosphorylated, it is activated and allows the transcription
of genes for many neuropeptides, and is important in neuronal plasticity and in long-term memory
formation.
Processing and maturation of RNA
The maturation does not produce functional RNA molecules, but precursors that will undergo a
series of maturation processes such as:
- Nucleolytic cuts by exo and endonuclease for the resolution of polycistronic RNA as in the case
of rRNAs.
- Capping at 5 'and polyadenylation at 3.
- Splicing and editing for intron removal.
These modifications take place inside the nucleus to avoid that partially matured RNA precursor
can be used improperly by the translational apparatus. Mature RNAs are actively transported from
the nucleus to the cytoplasm.
In eukaryotes, the genes that code for rRNA precursors are repeated in tandem in hundreds of
copies, and each gene for pre-rRNA is separated from the other by an area that is not transcribed.
In eukaryotes the pre-RNA is formed by a 45S transcript molecule that contains all three ribosomal
RNAs. transcribed from RNA pol I: rRNA 18S, 28S and 5,8S, separated by internal spacers. For
the maturation of this pre-rRNA, in addition to the nucleases, and of the 5S pre-rRNA (RNA pol III)
the intervention of small nucleolar RNA, snoRNA, is required in the form of snoRNP
ribonucleoprotein particles (sno RNP E1, E2, E3 , C / D). The proteins associated with snoRNPs
have enzymatic activities that modify some of the bases of rRNA, for example they direct
methylation in 2 ', or the modification of uridine to pseudouridine.
Eukaryotic pre-tRNAs involve processing that involves cutting a 5-15nt sequence on the 3 'by a
RNase, followed by the union of the CCA trinucleotide thanks to a tRNA nucleotidyl transferase.
Moreover the tRNA bases have numerous post-transcriptional chemical modifications, which serve
to determine the specific identity of each tRNA and to give stability to the secondary and tertiary
structure.
Eukaryotic mRNAs are generated from precursors called hnRNA, heterogeneous nuclear RNA.
Maturation includes the addition of the 5 'cap. Capping is the addition of a particular modified
nucleotide with protection and recognition functions. The cap consists of a 7-methylguanosine
bound at 5 'through an unusual 5'-5 ’bond, which, not being recognized by the exonucleases as
3’-5’, cannot be cut.
Capping requires different enzymatic activities:
- An RNA triphosphatase removes phosphate in the range of 5 'of the nascent transcript.
- A guanyl transferase adds GMP to the diphosphate forming the 5'-5 ’alloy.
- A methyl transferase RNA adds a CH3 to the nitrogen in 7 of the newly added guanosine.
- Another RNA methyl transferase adds a CH3 at 2 'of the first transcribed nucleotide.
There are different types of caps:
1. Cap0 no nucleotide is methylated in addition to 7-methylguanosine.
2. Cap1 only the first methylated nucleotide.
3. Cap2 both the first and the second nucleotide are methylated in 2 '.
Cap formation occurs at the beginning of transcription, when the nascent transcript contains only
20-40 bases. The CTD queue of the RNA Pol II serves for the assembly of the protein complex that
catalyses the formation of the cap. The CTD tail is phosphorylated on serine 5 by TFIIH, and this
allows the recruitment of capping enzymes.
(Furthermore, the CTD tail is essential for interaction with other elongation factors such as pTEF
kinase which phosphorylates serine 2 and promotes elongation. FACT removes histones) The
functions of the cap are many:
1. protects the RNA from degradation
2. increase translation efficiency because it allows the link with translation initiation factors (eIF4E
cap binding protein)
3. improve splicing efficiency
When the RNA Pol II reaches the end of a gene, it transcribes a specific sequence motif called the
polyadenylation signal, consisting of a series of 6 nucleotides A and U, which is found at a few
dozen nucleotides upstream of the cutting site and polyadenylation. The sequence of A and U is
not sufficient, however, because so many spurious adenylation sites could be found in zones of
DNA rich in AT, so there is also a rich sequence of G and U.
The signal recalls the two protein components called:
Cleavage and Poly (A) denylation specificity Factor that links AU and Cleavage stimulation Factor
that links GU. These complexes include protein subunits necessary for cutting such cleavage
factors, and a poly (A) polymerase PAP (ATP required) which adds the A.
Initially the activity of PAP depends on the interaction with CPSF, to add the first tens of adenines,
while in the stretch phase PAP interacts with a Poly (A) polymerase binding protein PAPB which
catalyzes the rapid addition of other 200-250 adenines, independently of the interaction with the
polyadenylation signal sequence. Transcription termination
The cut at the level of the polyadenylation site from the RNA pol II the signal to detach from the
mold requires the recognition of DNA without cap from a RNASi such as Rat1 or Xrn2 that
degrades the transcript.
Splicing and editing (core)
The sequence of a functional RNA is not co-linear with that of the template DNA from which it
originates, since the sequence of a gene can contain sequence traits called introns that will then be
removed from the final transcript. The sequence traits that join together form the mature transcript
are called exons.
The discovery of introns was possible during the observation of the hexon gene for Adenovirus
capsid protein: by means of the R-looping technique a DNA-RNA hybrid is formed which is then
visualized under the electron microscope showing the formation of three corresponding loops to
regions of DNA not present in mature RNA.
In humans, genes can be monoesonic, without any intron, to have several hundred exons, as in
the case of the titin gene, but on average human genes have about ten introns.
The removal of introns occurs after the precursor RNA tract has been transcribed, by means of a
cutting and sewing mechanism in which the introns are first transcribed and then removed,
observed thanks to hybrid experiments with DNA-RNA (precursor and mature) of topo beta-globin.
There are four different splicing mechanisms:
Nuclear splicing:
Specific for RNA Pol II transcripts, therefore for mRNAs, which requires the intervention of a large
ribonucleoprotein complex called spliceosome.
The correct recognition of splicing sites in the precursor mRNA is determined by short signals near
the intronic sequences: almost all introns begin with GU and terminate with dinucleotide AG. These
are dinucleotides preserved within more extensive reasons than consensus sequences. The GU
dinucleotide at 5 'determines the donor site, while the AG at 3' determines the acceptor site. Two
other reasons are also present: a site rich in pyrimidines upstream of the donor site, and a
branching site upstream of the pyrimidine site.
Splicing involves two successive trans-esterification reactions: the first thanks to the nucleophilic
attack of the adenine 2 'of the branching site on the phosphate in 5' of the GU dinucleotide in 5 '. In
this way a 3OH free end is generated at the level of what was the exon-intron junction. The intron
remains bound to the downstream junction and to the Adenine of the branching point (which is now
implicated in 3 phosphodiester bonds), and forms a looped structure.
The second trans-esterification reaction takes place thanks to the 3'OH of the free end which
makes nucleophilic attack on the phosphate in 5 'of the downstream AG. This second reaction
connects the two exons and frees the intron noose, which will then be linearized and degraded.
The two reactions have a zero energy balance but the reaction is however favored and shifted to
the right because it entails a net gain of entropy (the precursor mRNA molecule is divided into
smaller molecules) and the removed introne mRNA is degraded immediately to prevent reverse
reaction.
The nuclear splicing process requires a complex of 200 proteins and 5RNA called spliceosome.
The 5 RNA molecules are small nuclear RNA U1 U2 U4 U5 U6, which act
in the form of ribonucleoprotein snRNP (snurp). All snRNPs have a common group of seven Sm
proteins that bind to a site preserved in snRNAs.
Spliceosome has two main functions: it allows the correct recognition of splicing sequences and
the correct positioning so that the trans-esterifications take place efficiently.
In animals and plants there is also a minor spliceosome that uses the U11, U12, U4atac and
U6atac snRNAs, which is responsible for the removal of small introns whose acceptor and donor
sequences are highly conserved and poorly degenerated, but which lack the polypyrimidine trait
downstream of the branching site.
Spliceosome assembly is exemplified by the spliceosome cycle:
1. binding of the U1 snRNP to the donor site at 5 'thanks to a complementary pairing of the snRNA.
2. the Branch point Binding Protein BBP binds the branching site, while the U2AF factor binds the
polypyrimidine site and the acceptor site to 3 ': the complex E is formed
3. the snRNPU2 undermines the BBP and goes to bind to the branching site, in a complementary
interaction that does not involve the adenine of the first transetherification, which protruding
towards the outside allows the reaction: complex A
4. on the complex A comes the tri-snRNP formed by snRNP U4 / 6 and U5, which interacts with U1
and U2 approaching the splicing site at 5 'with the adenine of the branching point: complex B1
where all snRNPs are present at the same time
5. The snRNP U1 is released and replaced by snRNPU6 in contact with site 5 ’, thanks to the
protein prp8 and the expenditure of ATP: complex B2
6. also the snRNP U4 is released, which allows the interaction of snRNP U6 (5 ’) and snRNP U2
(3’) which activates the spliceosome complex
7. the activated spliceosome catalyzes by expenditure of ATP the first and second transesterification reaction forming the complexes C1 and C2.
The mature mRNA released is not naked, but is covered by ribonucleoproteins such as the EJC
complex that is able to recognize and target non-functional mRNAs generated by non-corrected
splicing
The malfunctioning of the splicesosome assembly is at the base of numerous pathologies called
spliceopathies (spinal muscular atrophy, cystic fibrosis)
Auto splicing:
A phenomenon observed for the first time in Tetrahymena rNA, capable of splicing not mediated by
protein factors. This allowed to detect the catalytic properties of the RNA which can also function
as ribozyme. These introns are grouped into two categories:
Class I autocatalytic introns
they have a length from 200 to 500 nt. and they have a highly conserved secondary structure that
has numerous hairpins, which at a three-dimensional level appear in such a way as to bring the 5
'and 3' splicing site closer together. The first trans-esterification reaction is not mediated by the
adenine of the branching point, but by an exogenous guanosine to the RNA whose 3'OH binds the
5 'donor, freeing the 3'OH that binds the guanosine phosphate in 3' , releases the intron and binds
the exons.
Class II autocatalytic introns
length from 400 to 1k nt. and consist of a ribozyme component that catalyzes the splicing and of an
ORF that codes for proteins involved in the mobilization of the intron in the processes of reverse
splicing (splicing + retrorescription). The mechanism is the same as that of nuclear splicing.
Formation of loops (D6)
A fourth type of splicing involved in tRNA splicing involves the use of endonucleases and ligases in
an ATP dependent mechanism
The splicing process is co-transcriptional because the introns are removed from the RNA during
the stretch phase. The process is aided by the CTD queue of Pol II, thanks to the phosphorylation
of serine 2 from a pTEFb kinase which facilitates the recruitment of the factors involved in splicing
(serine 5 was already phosphorilated by TFIIH to recruit capping proteins)
The observation that there are no significant differences between the number of genes of very
simple organisms such as yeast (6k) and man (23k) (always in the order of a thousand) despite the
immense phenotypic differences between organisms can be partly explained by the phenomenon
of alternative splicing: mechanism by which the same pre-mRNA can undergo different splicing
events that lead to alternative mRNAs that can code for different proteins. Over 90% of human
genes are affected by alternative splicing.
The events that contribute to alternative splicing are:
1. optional exons or mutually exclusive exons
2. intron retention
3. alternative donor splicing sites and acceptors
The transcriptional variants determined by specific assortments of exons can produce numerous
different proteins, such as in the case of human troponin T, or the Drosophila Dscam gene whose
exons can be combined in 38k different variants, more than the number of Drosophila genes
themselves.
The exons present in all the variants are called constitutive, those that may or may not exist are
called optional.
The presence of introns is therefore advantageous on an individual level because it allows the
production of numerous isoforms of the same protein, guaranteeing flexibility in gene expression
that allows a wide range of functions depending on the different cell types, development phases
and stimuli environmental or pathological conditions.
On the evolutionary scale the presence of introns allows the birth of new proteins thanks to the
exon shuffling process. This happens when two genes exchange their respective exons, following
an unequal or illegitimate crossing over producing chimeric proteins that have some features in
common with other genes: family of blood coagulation factors.
Splicing regulation makes it possible to modulate the efficiency of the recognition of splicing sites
by protein factors. This happens thanks to the recognition of reasons
additional to splicing sites that promote or inhibit the recognition of splicing sites, and can be found
both in exons and in introns: Enhancers and exonic silencers and enhancers and intronic silencers.
Exon enhancer and silencer sequences are usually bound by Serine and arginine-rich SR proteins.
The interaction between SR protein and exon enchancer sequences facilitates spliceosome
assembly. Exon silencer sequences instead bind ribonucleoproteins that inhibit the interaction
between essential components of the spliceosome (ex snRNP U1 and U2) stopping their activity.
The specific splicing patterns depend on the simultaneous action of enhancer and silencer.
Editing is a process of post-transcriptional modification that involves the conversion of one base
into another, or the insertion / deletion of nucleotides.
Editing can lead either to a replacement of an amino acid, by modifying the structural properties of
a protein, or to the creation or elimination of start and termination codons. In animals and in
particular in mammals, editing is almost exclusively carried out by conversion of bases. Editing can
modify the cytosine and convert it to uracil; the latter is a fairly rare event, of which the best known
example is that of the apolipoproteineB gene which can code for two different isoforms ApoB100
and ApoB48, of different molecular weights, in the liver and intestine. These isoforms are due to
the action of an APOBEC1 deaminase which in the intestine leads to the transformation of a CAA
codon in UAA (C-> U) producing a premature stop codon leading to a smaller protein (48kDa).
Editing can also lead to the conversion of an A into an inosina. This event requires the enzyme
action of the ADAR family (adenosine deaminase acting on RNA) which recognizes as
complementary substrate complementary RNA regions. This type of modification is very often
present in the Alu regions because these contain many regions of repetitions in tandem. The
isinine is read as guanosine from the translational apparatus and mutations A-> I therefore
correspond to mutations A-> G. Dysregulation of this specific editing event is associated with
neurological disorders such as depression, epilepsy and schizophrenia.
TRANSLATION
The ribosomes have highly conserved structures during evolution, they are constituted by rRNA
and ribonucleoproteins rp (ribosomal). They consist of two subunits classified according to their
sedimentation rate expressed in Svedberg, the small 40S subunit composed of 18S rRNA, and
the large subunit composed of 28S rRNA, 5.8S and 5S. Furthermore, there are about
80 ribosomal proteins present.
The rRNAs make up about 80% of the cellular RNA, and their secondary structures are even more
conserved than the nucleotide sequence, because what is functionally relevant is the structure of
hairpins and folds To ensure the formation of three-dimensional domains. In eukaryotes, in addition
to the 4 canonical nucleotides, post-transcriptional modifications of rRNAs such as the
isomerization of uridine into pseudouridine and methylation of ribosis are very common.
The ribosome has three binding sites for tRNA: the site A aminoacyl that binds the incoming
aminoacylated tRNA, the site P peptidyl that binds the nascent peptide chain, and the E site exit
that binds the unloaded tRNA to be released. Sites A and P straddle the large and small subunits,
and have two emitters in each. The tRNAs are positioned inside the sites in order to have the end
of the long arm of the anticodon paired with the codon to be read on the mRNA, at the height of the
decoding center. The ends of the short arm of the L instead (3 ’) that carry the amino acids are
located in the center of the peptidyl transferase of the major subunit, responsible for the formation
of peptide bonds. The formation of the bond is at the expense of the rRNA itself, not of accessory
proteins, but it is due to the intrinsic enzymatic activity of some RNAs (ribozymes).
TRNAs function as a kind of molecular adapters, capable of recognizing triplets of the genetic
message and pairing with the corresponding amino acids to be inserted into the protein sequence.
These are small molecules, with numerous unusual bases resulting from post-transcriptional
modifications. The secondary trefoil structure is very conserved and consists of:
- acceptor arm on which binds the amino acid to the CCA3 ’end which protrudes from the doublestranded stem. In eukaryotes the CCA terminal sequence is added following post-transcriptional
modification of a tRNA nucleotidyl transferase.
- ti-psi-ci arm: presence of thymine (abnormal in rRNA) and pseudouridine
- arm D: dihydrouridine
- arm of the anticodon: presence in its loop of a triplet of nucleotides that recognizes the codons
of the mRNA
- variable arm
The secondary structure of the clover folds to form an L on which the acceptor arm is present
along the short arm, and the arm of the anticodone on the long arm of the L.
The tRNAs cannot play the role of molecular adapters alone, because as far as they are able to
recognize the codons of the mRNA, they have no specific affinity for the amino acid to be
loaded.The loading process is mediated by enzymes amminoacil -tRNA synthase. Each aminoacyltRNA-synthetase recognizes a single amino acid, but recognizes all possible t RNA isoacectors for
that amino acid (degenerate code). They catalyze the formation of an acyl bond between the
COOH carboxylite group of the amino acid and OH in 2 ’or 3’ of adenosine at the 3 ’end of tRNA,
and requires ATP.
In the first step the amino acid reacts with ATP to form an adenylated amino acid
(aa + ATP-> aaAMP + PP)
In the second step the adenylated amino acid is passed from the AMP to the adenine in 3 'of the
tRNA, with release of AMP.
Having two steps in the amino acid loading process on the tRNA ensures greater adjustment and
reduction of errors, because if it is easier for it to initially link the wrong AA due to the similarities
between some of these, it is possible to correct the error before passing the aa-AMP to the tRNA,
thanks to a proof-reading activity intrinsic to the enzyme itself.
The mRNA in eukaryotic cells is not translated into the nucleus, but into the cytoplasm after being
exported through nuclear pores. Eukaryotic mRNAs present cap at 5’
5’ UTR leader sequence
3’ UTR trailer sequence
tail poles (A) at 3’
and inside they present a coding region characterized by a single ORF (because monocistronic),
formed by a codon of beginning AUG, a series of sense codons for the protein and a termination
codon (UGA, UAA, UAG).
The fundamental event of protein synthesis is the formation of a new peptide bond between the
amino group of the incoming amino acid and the carboxylic group of the amino acid of the nascent
chain. tRNA in A with the interposition of a new aa. After the transfer of the A chain, the
translocation process restores the initial situation with the vacuum site ready to receive a new aatRNA. The minor subunit is the first to interact with the mRNA to form the beginning complex, but
the synthesis can take place only when the major subunit is also assembled. The assembled
ribosome flows over the mRNA, decodes the message, synthesizes the peptide chain until it
encounters a stop codon. The protein detaches and the two subunits are released into the cell pool
of free ribosomes.
Start:
In eukaryotes the translation begins with the assembly of the pre-start complex: mRNA + tRNA +
40S subunit of the ribosome.
The first tRNA of eukaryotic synthesis is an unformilated methionine. The 40S subunit interacts
with the 5 ’end of the mRNA, and migrates along the leading 5 UTR in a scanning process, until it
meets the ORF AUG.
The beginning of translation in eukaryotes is mediated by numerous factors called eIF, eukaryotic
initiation factors.
The factors eIF3 and eIF1 contribute to the dissociation of the 40 and 60S subunits by binding to
the minor subunit, while eIF6 binds the major subunit.
Recruitment of the initial tRNA (met) on 40S is mediated by eIF2-GTP and eIF5. eIF2-GTP
interacts with the initial tRNA forming a ternary eIF2-GTP-Met tRNA complex, which by eIF5 binds
to the 40S subunit forming the 43S pre-start complex.
In the meantime and independently on the mRNA the complex eIF4F is formed, formed eIF4 A, G
and E. The eIF4G is a large protein on which the whole complex is assembled. This eIF4F
complex interacts with mRNA due to the binding of the E subunit with the 5 'cap of the mRNA, so
that this subunit is also called cap binding protein. When eIF4F is mounted on the mRNA, eIF4B is
added.
At this point the 43S pre-beginning complex is associated with the mRNA-eIF4 complex and the
minor subunit can begin to flow along the 5TR. In order for this to flow, the action of helicase
factors is necessary which go to remove any pairing and hairpins between complementary regions
of the mRNA. The helicase activity is performed by eIF4A (of the eIF4F complex, protein of the
DEAD family (asp-glu-ala-asp)) and stimulated by eIF4B.
The initial 48S complex (43 + eIF4) flows on the 5UU but does not stop on any AUG it encounters
first, but manages to identify the ORF AUG thanks to the presence of Kozak's consensus
sequence.
During the scan the 40S subunit presents the Met-tRNA at the P site, whose anticodon appears
with AUG and can recruit the 60S subunit, releasing both the eIF1 and 3 factors from the small
subunit and eIF6 from the large subunit.
Eukaryotic mRNA has a poly (A) to 3 'tail bound by the polyA protein bInding protein, PABP which
interacts with the G subunit of the eIF4G complex, which in the meantime remains bound to the 5'
of the same mRNA. In this way the mRNA takes on a loop / circular shape that stabilizes the
molecule, makes its translation more efficient and facilitates the release of the ribosome when it
reaches the 3 ‘end.
The vast majority of eukaryotic mRNAs work for this cap dependent mechanism, but some cellular
and viral mRNAs do not have a 5 'cap, but have RNA sequences called IRES Internal Ribosome
Entry Site, which directly recruit 40S, and also have enzymes that turn off the eIF4G complex,
preventing the dependent cap translation and supporting the synthesis of its own proteins.
At the end of the translation initiation process we have a ribosome in which the site P has a loaded
Met-tRNA and the site A empty. In each elongation cycle a new aa-tRNA enters the site A, thanks
to the mediation of eEF1A which first binds GTP, and then the aa-tRNA forming a ternary complex
and taking it to site A: It is important that the tRNA transported to site A corresponds to the codon
on the mRNA. If the codon-anticodone pairing is correct, this stimulates the hydrolysis of GTP and
the eEF1A-GDP complex detaches from the a-tRNA. Before it can be reused eEF1A-GDP must be
reloaded with GTP by eEF1B.
The second phase of the elongation cycle involves the formation of the peptide bond between the
NH3 of the amino acid in A and the COOH of the last amino acid in P, thanks to the action of the
peptidyl transferase the nascent amino acid chain binds to the tRNA in the site TO.
The last phase of the elongation cycle involves the translocation: the ribosome flows towards the 3
'of the mRNA, the tRNA discharged in P moves to the site E, and the tRNA with the nascent chain
in A moves to P, leaving A empty for a new aa-tRNA. Translocation requires the GTPase activity
elongation factor eEF2.
The elongation cycle continues until a stop codon enters the A site of the ribosome, so there are no
corresponding tRNA anticodons. These codons are recognized by first and second class eRF1 and
eRF3 termination protein factors, which induce the hydrolysis of the polypeptide from the tRNA at
the P site, and separate the two ribosomal subunits.
Regulation of translation
Regulatory mechanisms can relate to the stability of an mRNA: a more stable mRNA can generate
more protein product than one with a shorter life.
In eukaryotes they can regulate the efficiency of translation in relation to the availability of
nutrients, or in response to signals (hormones and growth factors) that promote or inhibit growth.
These mechanisms of global / general regulation are based on modifications of the proteins
involved in the translation, in particular of phosphorylation of starting factors. For example the
phosphorylation of eIF2 by specific kinases when there are few amino acids prevents the formation
of eIF2 - GTP - aatRNA. When eIF4E (cap binding protein) of the eIF4F complex is
phosphorylated, the translation is stimulated following binding with growth factors. eIF4E is
inhibited by 4EBP proteins, (factor 4E binding protein) which prevents the interaction between
eIF4G and E. When growth factors bind, they lead to phosphorylation of 4EBP that detach from
eIF4E and promote transcription.
In addition to controls on the global regulation of translation, there are also controls for specific
genes, through autogenous and non-autogenous control mechanisms.
An example of autogenous control is that for the PABP gene: the PABP mRNA in addition to the
poly (A) tail at 3 ’, contains a region rich in poly (A) in the 5’ UTR region. If PAPB is being
synthesized in excess of the cell's needs, it will also link to sequences at 5 ', preventing assembly
of the eIF4F complex.
Non-autogenous controls involve the intervention of protein factors that function as translational
repressors. Two examples of eukaryotes are:
translational control of ferritin synthesis. Ferritin is a protein complex that sequesters excess iron in
the cell, but if there is no excess iron it does not need to be synthesized. Then the ferritin mRNAs
contain in the 5 ’UTR sequence, in the vicinity of the cap, an IRE (Iron Responsive Element)
sequence that forms a hairpin, which is recognized and bound by the IRP repressor protein (Iron
Regulated Protein). When excess iron is present in the cell, it will bind to the IRP and induce a
conformational change that prevents its binding to IRE, and therefore no longer inhibits the
translation of ferritin mRNA, and promotes protein synthesis.
Another example is what happens in the Xenopus oocyte, which contains numerous maternal
mRNAs that are not immediately translated, but are activated in precise stages of embryonic
development. The repression is due to a shortening of the poly (A) tail of these mRNAs, mediated
by the CPEB protein that binds an element in the CPE sequence in the trailer sequence 3 '. CPEB
linked to CPE interacts with a second Maskin protein, which binds the eIF4e and prevents its
interaction with eIF4G. When upon specific CPEB signals it is phosphorylated, it does not bind
Maskin and therefore eIF4E and G can interact.
There are also mechanisms for controlling the translation of specific genes that do not use
repressors.
An example is that of the GCN4 gene of yeast that encodes an activator of the transcription of
genes necessary for the synthesis of amino acids, and is normally expressed if the cell suffers from
a deficiency of amino acids.
When the levels of amino acids in the cell are high, the translation of GCN4 mRNA is inhibited.
Analyzing the mRNA sequence of this gene we note that upstream of the gene-coding sequence,
there are four small ORFs, each consisting of an AUG, one or two codons for amino acids, and a
termination codon. The 40S subunit flows over the mRNA, recruits the 60S, which is soon released
when it encounters the stop codon. At each ORF the small subunit has a 50% chance of remaining
attached to the mRNA, and then continuing the translation by recognizing the AUG downstream of
the subsequent uORF. At each transition from one uORF to another the minor subunit has less
chance of remaining attached, continuing scanning, and reaching the true ORF for GCN4. When
there is a deficiency of amino acids instead there is more probability that the 40S reads and
translates the true AUG of the true ORF. This is because after scanning the uORF1, in the absence
of aa, it is probable that it will not easily find a ternary eIF2-GTP-MettRNA complex to be able to
start again on the downstream uORF, and then jump, continuing to scan. While the 40S reaches
the true downstream ORF, it has time to find and recruit an eIF2-GTP-MettRNA, even if present in
a limiting quantity, and then start the translation of GCN4 for the biosynthesis of amino acids.
REGULATORY ROLES OF ncRNAs
Of the two thirds of genomic DNA transcribed in RNA, only 2% of these correspond to mRNAs
coding for proteins: all the rest is identified as ncRNA (including tRNA and rRNA, of which,
however, their functions have been known and understood for some time). most ncRNAs still have
unclear functions.
Some of these are involved in regulation, and are equipped with catalytic activities:
- the TERC RNA implicated in the maintenance and elongation of telomeres
- snoRNAs involved in chemical modifications (pseduouridylation and 2O-methylation) of rRNA
bases
- the snRNAs that make up the spliceosome
The miRNAs are small RNAs of 21-23 nt that derive from the processing of larger precursor
molecules, and that interacting with mRNA molecules influence their stability or translatability.
The first miRNA was identified in C.elegans where the expression of the lin-14 gene, involved in
larval development, is negatively regulated at the post-transcriptional level by the product of the
lin-4 gene, whose product is a small RNA of 22 nt. The Lin14 protein is present in the first stage of
larval development, but soon disappears in the later stages, in conjunction with the increase in
RNA lin-4. Lin4 is a microRNA that has a complementary region to a 10 nt sequence. repeated
seven times in the 3 UTR of the lin-14 target mRNA. The pairing of the miRNA lin4 and the
sequence on the mRNA lin14 results in a reduction of the translation of the protein, regulating the
gene expression at the level of translation.
MiRNAs are generated from precursors called primary micro RNAs, or pri-miRNAs. The primiRNAs are transcribed from the Pol II RNA, and matured in the nucleus or in the cytoplasm by the
DROSHA and DICER complexes, and then assembled in the RISC complex.
The pri-miRNAs undergo capping, splicing, polyadenylation and a further maturation called
cropping and dicing. Cropping removes sequence traits at 5 'and 3' of the pri-miRNA and
generates a hairpin structure, called pre-miRNA.
The pre-miRNA is transported into the cytoplasm by an exportin / Ran / GTP in the nuclear pore
and is subjected to dicing. The dicing removes the hairpin structure and forms the duplex miRNA.
One of these two strands is incorporated into the RISC complex in order to interact with the target
mRNA mostly in the 3UTR region.
MiRNAs cannot act on their own, but require complexes such as RISC. RISC is made up of
proteins from the Argonauta family. Needle proteins contain four domains: N, PAZ, MID and PIWI.
The N and PAZ domains at the N-terminal and MID and PIWI at the C-terminal. In Ago proteins the
miRNA has the 5 'end (which will bind the target) bound to MID and the 3' end linked to PAZ, while
the central region of the miRNA has the bases directed towards the inside unavailable for pairing.
Initially the miRNA appears with the messenger via the seed region, a sequence of seven essential
bases, on the 5 'of miRNA, to allow pairing with mRNA. When the seed region is perfectly paired,
this leads to a conformational change in the needle that exposes the miRNA regions that were
previously unavailable for binding, for further recognition. Depending on whether the pairing
between miRNA and mRNA is perfect or imperfect, mRNA is untranslated or directed to
degradation.
One of the miRNA-RISC targets is the eIF4A helicase of the eIF4F complex, which performs the
secondary structures that are formed in the 5 UTR to allow scanning of the 40S. Thus the
interaction between miRNA / RISC and eIF4A prevents scanning.
Other ncRNAs are lncRNAs: hundreds of molecules long up to thousands of nt, with 5'cap but
without tail poly (A) and without ORF. These have a regulatory role, and their quantities can vary in
various tissues, or during various moments of cell differentiation.
In the nucleus lncRNAs can act as baits or traps for transcription factors, preventing their
interaction with DNA, or alternatively they can act as scaffolds for the formation of
ribonucleoprotein complexes that can modify the state of chromatin condensation, or can recruit
modifying enzymes of chromatin, such as DNA methyltransferases 3, which induces DNA
methylation by inhibiting transcription. An example is the Xist lncRNA which inactivates the entire X
chromosome in cells where two X chromosomes are present. Xist is 17kb long and binds the
chromosome and involves the recruitment of chromatin remodeling enzymes to silence the
chromosome (gene-body imbalance of barr etc etc). The Xist antagonist that prevents him from
deactivating the second X chromosome is Tsix, a second lncRNA, which binds in a complementary
manner to Xist and inhibits its activity.
Download