Uploaded by Milou Veer

Samenvatting labbuddy BIC

Introduction ............................................................................................................................................ 2
Module 1 ................................................................................................................................................. 3
Part 1. Tomato genome ...................................................................................................................... 3
Part 2. Gene structure......................................................................................................................... 7
Part 3. Gene detection ........................................................................................................................ 9
PCR .................................................................................................................................................. 9
Southern blotting .......................................................................................................................... 12
BLAST............................................................................................................................................. 15
Module 2.Transcription and RNA processing in eukaryotes................................................................. 15
Part 1. Transcription and processing ................................................................................................ 15
RNA prcessing steps: ..................................................................................................................... 17
5’-cap............................................................................................................................................. 17
Splicing .......................................................................................................................................... 17
Polyadenylation ............................................................................................................................ 18
Differences between Eukaryotes and Bacteria ............................................................................. 19
Alternative splicing........................................................................................................................ 20
Part 2. RT-PCR ................................................................................................................................... 22
RT-PCR ........................................................................................................................................... 22
RNA isolation................................................................................................................................. 22
Quantitative RT-PCR; qPCR ........................................................................................................... 24
Part 3. Sequencing ............................................................................................................................ 25
Sanger dideoxy-sequencing .......................................................................................................... 25
Module 3 ............................................................................................................................................... 27
Part 1. Western blot .......................................................................................................................... 27
Part 2. Mutations .............................................................................................................................. 28
Module 4. Recombinant DNA technology ............................................................................................ 29
Part 1. Cloning ................................................................................................................................... 29
Part 2. Applications ........................................................................................................................... 33
You will get a better insight into the genome complexity of eukaryotes: what’s the structure of a
gene like and how are genes organized in the genome? We’ll be doing research on the RbcS1 gene
from tomato.
Module 1:
Genome organisation and gene structure in eukaryotes – How to identify, detect and isolate a
specific DNA fragment/gene in the genome? – How does the structure of an eukaryotic gene look
like? – How many homologous genes/DNA sequences are present in the genome?
Module 2:
Transcriptional analysis – Where, when and how strong is a gene expressed?
Module 3:
Protein analysis – How to detect a specific protein? – What effects can different mutations in the
DNA have on the encoded protein?
Module 4:
Cloning – How to clone a gene; for example to make a fusion with the green fluorescent protein
In total:
You will isolate a specific gene fragment, encoding the RbscS1 protein, from the mRNA that is
produced in tomato leaves, and this cDNA fragment will be cloned into E. coli creating a genetically
modified bacterium.
The experiment is around a gene that encodes for a small subunit of the proteincomplex RuBisCo,
RbcS1 from tomato. RuBisCo, ribulose 1-5 biphosphate carboxylate oxygenase, is the best known
and most crucial enzyme in all forms of photosynthesis and therefore considered the most abundant
protein on Earth. RuBisCo is also part of the Calvin cyclus that fixes CO2 in the chloroplast.
Module 1
This module has three parts:
Part 1: Visualize genome complexity by isolating and analysing the (genomic) DNA of tomato and
compare it to the DNA of E. coli bacteria.
Part 2: Examine how an eukaryotic gene, ie tomato RbcS1, is built up. Therefor, you will use a
software program to analyse DNA sequences.
Part 3: How to detect and isolate a specific DNA fragment/gene from the complex genomic DNA.
Therefore, you will specifically isolate the RbcS1 gene from tomato and the TRYP gene from E. coli.
Part 1. Tomato genome
Only a small part of the genome of (higher) eukaryotes actually codes for functional genes or
proteins. It was long thought that a higher complexity of an organism would be reflected in it’s
genome size (more protein coding genes more complexity). However, we now know that there is no
correlation between genome size and organismal complexity.
The colored bars in the
figure depict the range
of genome size that are
found within a particular
class of organisms.
Angiosperms: all
flowering plants
Protists: relatively
simple eukaryotes
When we are studying biological processes at the molecular level, most often we are interested in
studying the genes. We want to know what they encode and how they are regulated, to understand
how they influence a biological process or cause disease. So we need to identify which parts of the
genome represent (active) genes.
The nuclear genome of tomato consists of 900 million basepairs and is divided over 12
chromosomes. Like humans, tomato is a diploid, which means that there are two copies of each
chromosome; one of each parent. So for each gene there are also two copies. The copies can vary a
bit in sequence, and we call them alleles. An allele represents a variant of a gene at a particular
position (locus) on a chromosome. If a diploid organism has two alleles that are identical, than we
call it a homozygote. If the two alleles differ in sequence than we call it a heterozygote.
There is more DNA at the darkly stained areas than in the lighter
stained areas.
The lighter stained areas are more accessible for DNA binding
The darker stained areas we call
heterochromatin. These parts of the
chromosome are very compact. Eukaryotic
chromosomes are packaged into
chromatin, which is composed of DNA and
proteins (mostly histones). The basic unit of
chromatin is the nucleosome. The
nucleosome contains about 150 bp DNA
wrapped ~1.8 times around a core of eight
histones. The nucleosomes are surrounded
by another histone, H1, by which they can
be arranged into higher order structures. In
this way the DNA is basically “wound-up” to
fit in the relatively small nucleus. This
packaging makes the DNA not readily
accessible to regulatory proteins and for example RNA polymerases that need to transcribe DNA into
RNA. So the genes in these areas are mostly inactive.
The lighter stained areas we call euchromatin. Here the DNA is much less densely packaged, it is
much more accessible to regulatory proteins and the transcription machinery. So active genes are
mostly located in the euchromatin. The modification of the chromatin structure plays a distinctive
role in eukaryotic gene regulation. The exact chromatin structure, so the distribution of
heterochromatin and euchromatin areas, can differ between
different cell types in an organism and is controlled by proteins
called chromatin remodelling enzymes.
In addition to the nuclear genome. Mitochondria and
chloroplasts also contain their own DNA that encodes for
specific genes/proteins. The most accepted theory,
endosymbiotic theory, states that mitochondria and
chloroplasts arose when an eukaryotic cell (or its predecessor)
took up an aerobic proteobacterium (mitochondria) and later,
in case of plants, a photoautotrophic cyanobacterium
(chloroplasts) inside its cell as an endosymbiont that later
evolved into an organelle. The organization of the DNA in
mitochondria and chloroplasts is strikingly similar to the
organization of DNA in bacteria. During evolution, many of the
genes of the original bacterium were lost from the genome of
the organelle, and were transferred to the nucleus of the host.
Agarose gel electrophoresis
To study DNA, you need to visualtize
it. With agarose gel electrophosresis
you can separate and visualize DNA
molecules based on their size. When
a suspension of agarose (a
polysaccharide polymer form
seaweed) is boiled and subsequently
cooled down it will form a gel
containing pores. Because small DNA
fragments can migrate easier
through these pores, then large DNA
fragments. So with agarose gel
electrophresis you can separate DNA
fragments of different sizes.
The separating capacity of an
agarose gel is however limited. Very
small fragments (less than 50 bp) or
very large fragments (more than 20
kb) can in principle not be separated
on a standard agarose gel. Generally,
an agarose gel is used to separate
DNA fragments between 50 bp and 20 kb. There is a linear relation between the mobility and the
agarose concentration.
Mostly 1% agarose is used for agarose gel. A higher percentage of agarose will result in smaller pores
en andersom.
To make sure that the DNA stays in the slot of the agarose gel and to visualize where you loaded
your sample on the gel, a loading buffer, containing 50% glycerol (to raise the density of the DNA
solution) and a dye is added to your DNA solution.
To visualize the DNA in the gel, the DNA is stained
with a fluorescent dye, called ethidium bromide.
It’s a compound, meaning that is binds between
the two strands of the (double stranded) DNA
molecule. Under UV-light of 590 nm the DNA will
To be able to determine the size of the DNA
fragments, a so-called DNA size marker is loaded
next to your samples. So you can see what the
sizes in bp are of the different bands.
Measuring DNA/RNA concentration
The concentration of a DNA or RNA solution can be measured using a spectrophotometer. You
measure the optical density (O.D.) (aka the absorbance of light) at wavelengths of 260 nm and 280
nm. DNA and RNA adsorb light at 260 nm, while proteins absorb light mostly at 280 nm.
The concentration of DNA can be calculated according to the law of Lambert-Beer: O.D. (A) = ε x c x L
A= adsorbance
L = the length of the light-path (cm)
c = the concentration (mol/l)
ε = the molar extinction coëfficiënt (l/mol/cm) (for DNA: 0.020) (for RNA: 0.025)
The ratio (O.D. 260/280) should be above ~1.8 for DNA, and ~2.0 for RNA. When the ratio is lower, it
means that you are measuring a lot of (contaminating) proteins that cause the relatively high
absorbance at 280 nm.
Restriction enzymes
A very important and much used method to analyze and manipulate DNA, is to digest (cut/cleave)
the DNA with so-called restriction enzymes. Restriction enzymes naturally occur in bacteria, where
they function to protect the genome against invading DNA, such as for example bacteriophages that
try to inject their genome into the bacteria. Restriction enzymes in the bacteria take care that this
invading DNA is cleaved. Many restriction enzymes are named after the bacterial species where they
were first isolated from; for example EcoRI stands for Restriction enzyme I from E. coli.
Many restriction enzymes recognize specific palindromic sequences and different enzymes recognize
different (unique) sequences, which we call restriction sites. Depending on the number of bases that
is recognized we speak of for example: 4-, 6-, 8-cutters. ExoRI is an exampleof a 6-cutter, that
recognizes the 6-bp sequence (restriction site): GAATTC.
When EcoRI cleaves the DNA it
generates a so-called sticky end. This
is called a sticky end because the
AATT-overhang that is generated can
hybridize (anneal) with the TTAAoverhang generated in the opposite
(complementary) strand, due to
The restriction enzyme SmaI cleaves
the restriction site GGGCCC exactly
in the middle, thereby generating socalled blunt end fragments.
Restriction enzymes cut extremely reproducible. This means that when you add sufficient restriction
enzyme, it will digest the DNA at every place where there is a recognition (restriction) site for that
enzyme. So you get a very reproducible digestion pattern. Different restriction enzymes will
generate different patterns.
The concentration of a restriction enzyme is expressed as units/µl; where 1 unit of enzyme can
digest 1 µg of DNA in 1 hour at the appropriate temperature. Most restriction enzymes work best at
37 OC. To make sure that each site is digested we usually add ~5-10x more enzyme than strictly
required based on the units. If you: add too little enzyme, incubate the reaction at a too low
temperature, or during a too short period of time the enzyme will not digest at the restriction site.
This is called partial digest. Digesting the DNA with two different restriction enzymes at the same
time is called double digest.
Part 2. Gene structure
In this part we will focus on the structural organization of a typical eukaryotic (protein-coding) gene.
Only a very small part of the genome actually codes for a protein…. So from the complex genomic
DNA, we are often only interested in a small fraction of the DNA.
1. RbcS1 genomic DNA . This is the nucleotide sequence of a small part of chromosome 3 of tomato.
This piece contains the complete RbcS1 -gene.
2. RbcS1 cDNA . cDNA stands for copy DNA and is a DNA copy of the RbcS1 messenger RNA (mRNA).
Base T instead of base U.
The difference between these two sequences is: the genomic sequence contains introns and exons,
the cDNA (mRNA) sequence contains only exons.
You can determine the position of the exons and introns by comparing the genomic DNA sequence
with the cDNA sequence.
Transcription start/stop signals
The distribution of exon/intron is only relevant after transcription has occurred, because splicing
occurs on the primary transcript. For transcription to start, a proper startsignal is required. This
startsignal occurs in a region that we call the promoter.
A promoter is a DNA element that determines/regulates the expression (the transcription of the
gene into RNA). The promoter is located before the first exon and is therefore not transcribed into
RNA. The promoter region contains oa. a sequence (TATA box) where the RNA-polymerase enzyme
complex binds. Thereby, the promoter
determines the startpoint for making
RNA. Eukaryotic RNA polymerases are
not able to bind to promoter
sequences on naked DNA. They
require additional DNA binding
proteins to bind to the DNA first.
These DNA binding proteins are called
transcription factors, and they are
essential to initiate RNA synthesis by
the RNA polymerase. In eukaryotes,
transcriptional regulatory regions, socalled enhancers, can additionally be
located far away from the actual
transcription start site. The threedimensional organization of the DNA
makes sure that the regulatory proteins are oriented in the proper way to assemble the transcription
complex. The signal that markes the end of the mRNA, polyadenylationsite, is the place on the
mRNA where polyadenylationcomplex binds. This complex stabilizes the end of the mRNA by adding
a poly-A tail.
Polyadenylation is one of the processing
steps during the production of mRNA in
eukaryotes. Most eukaryotic mRNAs have a
poly-A tail at their 3’-end. This poly-A tail is
not encoded as stretch of T’s in the DNA.
The end of the mature mRNA is marked by
the polyadenylationsite on the primary
transcript. When the process of
transcription occurs, the RNA polymerase
synthesizes RNA far beyond the end of the
gene. Therefore, primary transcripts can be
hundreds of nucleotides longer at their 3’end than the processed/mature mRNA.
These ends are cleaved off by the so-called
polyadenylationcomplex, which binds at a
conserved ponlyadenylationsite, AAUAAA,
in the mRNA. After cleavage behind this
site, a stretch of 100-250 A’s is added, the
so-called poly-A tail.
Open reading frame, 5’-UTR and 3’-UTR
mRNA is translated into protein by the ribosomes. The ribosome “reads” the mRNA and couples the
appropriate aminoacids into a chain. Different combinations of three bases code for different
aminoacids. The open reading frame (ORF) is the part of the mRNA that is translated into
aminoacids. The first codon (AUG) of the ORF is called the start codon and the stopcodon, stops the
chain. The stopcodon itself does not make part of the ORF, it is not translated!
There are sequences in the cDNA before the startcodon (AUG). The ribosome starts making protein
at the startcodon, so the region in the mRNA located before the startcodon will not be translated
into aminoacid sequence. We call this region the 5’-UTR, which stands for 5’-UnTranslated Region.
Similarly there is a region after the stopcodon called the 3’-UTR.
Part 3. Gene detection
There are different methods to detect a specific DNA sequence. We’ll learn three methods: PCR,
Southern blotting, BLAST searchs.
Polymerase Chain Reaction, PCR, is a method to very strongly amlify a specific piece of DNA. Using
PCR you can create millions/billions of copies from one specific DNA molecule. There is one
prerequisite: you need to know the sequence of the ends of the DNA fragment that you want to
amplify, because you need to attach primers. Primers are single stranded DNA molecules of 20-30
bp, that are complementary to the ends of the fragment that you want to amplify. You need two
primers. One primer (the forward primer) complementary to one strand of the DNA (template) and
the other (reverse primer) complementary to the opposite strand of the template DNA.
!! DNA and RNA molecules can only be extended at their 3’-ends (; synthesis proceeds from the 5’
end (phosphate) to the 3’ end !!
To make a lot of copies of DNA fragments you need a heat-stable DNA polymerase, the Taq
polymerase. Its very crucial for the cycles of DNA strand separation (denaturation), primer annealing
and primer elongation (the actual synthesis of new strands). Summarized:
Thinks necessary for PCR:
genomic tomato DNA
forward primer
reverse primer
dNTP’s (A, C, G, T
Taq polymerase
a PCR buffer
30 cycles of PCR: 230 = ~1000000000
Annealingtemperature Tm is:
Tm (0C) = 2 x [number of A+T] + 4 x
[number of G+C] – 5
reverse: 5’-GTCATCACTGCACAG-3’
To design primers for this PCR reaction, the forward primer should have the same sequence as the 5'
end of the indicated strand so it will bind to the 3'end of the complementary strand. The reverse
primer is the reverse complement of the 3' end of the indicated strand.
The sequence of the primers will also end up in the final PCR product. You can use this to add extra
sequences, such as a restriction site, to the ends of your PCR product.
Q: Why is it important to use a positive control and what would you use?
Use a previously purified and verified gene fragment (corresponding to the gene you want to
amplify), to check that the PCR reaction was functional.
By using a known/verified DNA fragment as template you can check whether your primers and PCR
components all work fine, and can generate the correct PCR product. In this experiment you will get
a plasmid containing the RbcS1 gene or TRYP gene from your supervisor to use as positive control.
When possible, always use a positive control to verify that the PCR could work! Note, in practice it is
sometimes not possible to use a positive control as it may not be available in all experimental setups.
Q: Why is it important to use a negative control and what would you use?
Use all the ingredients, without the template (genomic) DNA, to check whether there is already
amplification of the fragments.
By leaving out the template DNA you can check whether there is any contamination (for example
genomic DNA or gene fragments in the water or contamination from the pipette). The negative
control should not give any products in the PCR!
Southern blotting
Southern blotting is a technique to detect specific DNA fragments in a complex mix based on the
principle of basepairing (hybridization) between complementary single stranded DNA molecules, just
like the annealing of primers to the template DNA in a PCR reaction. So the fact that an A hybridizes
to a T, and a G to a C, by means of hydrogen-bridges.
A Southern blot can be used for example to:
determine whether a DNA fragment is present in the genome
determine whether a gene isolated from organism X is also present in the genome of
organism Y.
determine how many homologous genes are present within the genome of an organism. For
example, to find out whether a certain gene is part of a gene family, such as RbcS1.
For a southern blot you first need to make a so called filter-replicate of your DNA fragments, for
instance after these fragments have been separated through agarose gel electrophoresis.
A simple way to transfer DNA from the agarose gel to the membrane filter, and for which no special
equipment is needed, is via capillary force. (setup above). It is important that the DNA that you
transfer is single stranded, because you want to use it for hybridisation!! The gel gets incubated in a
very basic solution (salt solution in the image above) and so the DNA gets denatured. Next, a
membrane that will bind the DNA is placed on top of the gel. By stacking a pile of (moistureabsorbing paper) towels on top of this membrane, the solution together with the DNA will be sucked
into the membrane by capillary force. To definitely bind the DNA to the membrane, the membrane is
often “baked” at 80 0C (or treated with UV-light) after transfer.
Next, you incubate the blot (the membrane containing the single stranded DNA) with a single
stranded labelled DNA probe, representing the DNA fragment that you want to detect. Often a
radioactive labelled single stranded probe is used, because this allows a very sensitive detection
method. Alternatively, fluorescent labeled or enzyme-labeled probes can be used. A radioactive
labeled probe can be detected using a photosensitive X-ray film, called an autoradiogram.
The single stranded labeled probe will now try to "bind/anneal/hybridize" to its complementary DNA
sequence on the blot. Probes that cannot bind are washed away. Just like primers in a PCR reaction,
the stringency of hybridization depends on the temperature (the higher the temperature, the more
specific the hybridization will be
because the binding-strength
between the probe and the DNA on
the blot needs to be stronger at
higher temperature) and the salt
concentration of the
hybridization/wash-buffer; the lower
the salt concentration, the more
specific the hybridization will be as
the strength of the hydrogen-bridges
between the complementary bases
will be lowered.
So, when your probe is 100%
complementary to the DNA on the
blot, you can use a high stringency
(high temperature (ie. 65 oC) and a
low salt concentration) when
incubating the blot with the probe. Is
the probe less specific, for example a
homologous gene from a different
organism (which will have a slightly
different sequence), you need to use
a lower stringency to allow sufficient
(strong enough) binding of the
probe. However, a too low
stringency will cause the probe to
hybridize to aspecific places on the
DNA. Therefore, optimal
hybridization conditions need to be
found in practice.
Autogram of a southern blot performed on genomic DNA. Left
DNA size marker. Right digested EcoRI and HindIII.
The fact that only one band hybridizes in both digests strongly
indicates that the probe was generated from a single copy
(unique) gene. In case the probe would be a gene that is part of
a gene family, or a repetitive sequence, you would expect
multiple hybridizing bands.
The term Southern blotting is used
when you transfer and analyse DNA
on a blot.
There is another practice named
Northern blot. With this you run RNA
on an agarose gel and transfer it to a
membrane. This can be used to
determine ewhether a gene is
transcribed. So to determine where
(in the tissue), when an d how strong
a gene is active.
Western blot is also a different
method where you separate proteins
based on their size and transfer them
to a membrane. (module 3)
A variation on Southern blotting is FISH: Fluorescent In Situ Hybridisation
In a FISH experiment, a fluorescently labelled DNA probe is used to hybridize to the chromosomes
spread on a microscope slide.
Basic Local Alignment Search Tool
Instead of comparing all sequences by hand BLAST uses a mathematical algorithm fo compare
sequences. This algorith works in two steps. First, small pieces of the input sequence are compared
to sequences in the database. Next, the sequences that "match" to this small piece are more
thoroughly compared in the second step.
There are two standard methods to compare sequences: BLASTn and BLASTp. BLASTn (n = nucleic
acid) compares a nucleotide sequence with a database of all nucleotidesequences. BLASTp (=
protein) compares an aminoacidsequence with a database of all aminoacidsequences.
Bit score
The Bit score gives an indication of how homologous two sequences are. The higher the bit score,
the more two sequences resemble each other. The maximal bit score equals twice the length of the
sequence that u put in (your query). So, a sequence of 800 bp has a maximal bit score of 1600.
The Expect value, or E-value, is a parameter that describes the number of hits one can "expect" to
see by chance when searching a database of a particular size. It represents the chance that you
retrieve the same bit score when u use an arbitrary sequence in a BLAST search. The lower the Evalue, or the closer it is to zero, the more "significant" the match is. However, keep in mind that
virtually identical short alignments have relatively high E values. This is because the calculation of
the E value takes into account the length of the query sequence. These high E values make sense
because shorter sequences have a higher probability of occurring in the database purely by chance.
Module 2.Transcription and RNA processing in eukaryotes
The production of mRNA is called transcription. Gene transcribing into RNA is called expression of a
Part 1. Transcription and processing
Transcription summarized in a image:
DNA and RNA contain different nucleotides.
deoxyribonucleotides (A, C, G, T; lacking an -OH group at the 2nd C-atom of the ribose).
ribonucleotides (A, C, G, U; containing an -OH group at the 2nd C-atom of the ribose).
Because DNA is double stranded, a gene can be encoded in either the upper strand or in the bottom
strand. The DNA strand that is used as a template to make the mRNA is called the template strand .
The complementary strand is called the non-template strand.
RNA prcessing steps:
Binding of RNA polymerase to the promoter region
Addition of a 5’-cap
Cleavage of the primary transcript at the 3’-end
Addition of a poly-A tail
Removing the introns, ie. splicing
Transport to the cytoplasm
Binding by the ribosomes
The first modification that occurs when the primary RNA transcript is being made is the addition of a
5’-cap at the 5’-end of the RNA molecule. This is a modified Guanine linked by three phosphate
groups at the start of the RNA. The function is to stabilize the 5’-end of the mRNA and aid transport
to the ribosomes.
Eukaryotic genes contain exons and
introns. Exons are the pieces translated
into protein, and introns need to be
removed before the mRNA transcript
can be translated into protein. The
removal of introns and the joining of
exons is called splicing.
Spliceosome is the enzyme that does
this. This enzyme complex consists of
(>100) proteins as well as so-called
small nuclear RNAs(snRNAs). These
core components of the spliceosome
recognize conserved nucleotides in the
sequence of the intron. These
conserved nucleotides occur in every
intron, and are called splice-sites. They
are GU at the 5'-end and AG at the 3'end; the so-called GU-AG rule. Another
conserved site is an A residue between
15 and 45 nucleotides upstream of the
3'-splice site; the branch point.
The spliceosome makes that the mRNA is properly folded to allow the removal of the intron. First
the 5’-donor end joins to the interal branch point. Second, the two exons are joined together.
Almost all eukaryotic mRNA contain a
poly-A tail, 100-250 A’s, at the 3’-end of
the mature mRNA. This tail protects the
3’-end of mRNA against nucleases that try
to break down the RNA and aids the
transport of the mRNA to the cytoplasm.
Polyadenylation is performed by an
enzyme complex, called the
polyadenylation complex. This complex
recognizes and binds to a specific
sequence in the RNA, the socalled polyadenylation-signal. Often this
polyadenylation-signal is AAUAAA
(although some variation in this sequence
is observed). The RNA is cleaved several
basepairs behind this signal at
the polyadenylation-site. Next, a poly-A
polymerase enzyme adds a stretch of A's
to the end of the mRNA molecule.
Differences between Eukaryotes and Bacteria
Bacteria have a circular genome that is located in the cytoplasm (no nucleus). In general this genome
is much smaller than eukaryotic genomes. This is partly because bacterial genes do not contain
introns. Furthermore, the genes are located much closer together and importantly, many genes are
organized in so-called operons. An operon is a functioning unit of bacterial DNA containing a cluster
of genes under the control of a single promoter. The genes in this cluster/operon are all transcribed
as one big mRNA, by which different proteins are encoded by one big mRNA.
The expression of ^^these 5 genes is controlled by
one transcription regulation region (the promotor
region) upstream of the genes. when transcription
occurs, all 5 genes are transcribed end become one
large mRNA molecule. Ribosomes synthesize the
mRNA immediately since it’s in in the cytoplasm.
The bacterial mRNA does not have a 5’-cap and no
poly-A tail, so they are less stable, and break down
This way of organizing genes, only one transcription
regulation/promoter region is required to control
the mRNA synthesis of all the genes for proteins
that work together. And the bacteria can turn on or
off whether proteins for one thing are made or not.
Alternative splicing
Higher eukaryotes, such as plants and mammals, are considered to be more complex organisms than
lower eukaryotes; compare for example humans to amoeba. However, as you know from the
introductory lecture, the complexity of an organism does not correlate with the genome size or the
number of genes in an organism!
One of the factors that contribute to a higher complexity is through variations in the splicing of the
introns; so-called alternative splicing. Through alternative splicing, multiple different mRNA
molecules can be made from one gene, that will consist of different (combinations of) exons. And as
a result different proteins can be encoded by that one gene. An extreme example is the DSCAM gene
in Drosophila (fruitfly), which controls the growth direction of nerve-cells. This gene has multiple
(cassette) exons, which can result in over 38.000 different proteins through alternative splicing.
Compared to this the number of ~18.000 genes in Drosophila is rather small. So the number of
different proteins is much dependent on the alternative splicing of the primary RNA transcript.
Below you see several examples of possible results of alternative splicing:
Alternative splicing is a very regulated process, which is often cell- or tissue-specific. In other words,
different cell-types can show different alternative splicing (different exons are joined) and therefore
have different mRNA's from the same gene. In the image below you see the exon-intron distribution
in the primary transcript of the rat alpha-tropomyosin gene (a component of the cell cytoskeleton).
This gene is alternatively spliced in different cell-types. As you can see in the image, different cells
contain different spliced mRNA's. Note that each mRNA needs to have a poly-A tail, and therefore
multiple poly-adenylation signals have to be present in the gene.
Part 2. RT-PCR
RT-PCR is a very sensitive method to
determine where, when and how strong a
gene is expressed. PCR is used to amplify
specific mRNA’s to study their abundance in
a certain organism or tissue.
Every cell has its own transcriptome: ie. its
own collection of mRNA’s that occur in that
cell type. One of the most used techniques
to study the occurrence of a specific mRNA
in a tissue is via RT-PCR.
Before you can PCR on mRNA, the mRNA
first needs to be converted into so-called
copy DNA, or cDNA. This can be achieved by
an enzyme called reverse-transcriptase.
Reverse transcriptase is a RNA-dependent
DNA polymerase, that can use RNA as a
template to make a DNA strand. Like all
DNA polumerases, reverse transcriptase
needs a small double-stranded piece to
start the synthesis of a new strand.
Therefore a primer is needed that can
attach to the mRNA so that a doublestranded region is created from which the reverse transcriptase can start making the new cDNA
A much used primer to make cDNA is an oligo-dT primer; a stretch of ~25 T’s that can anneal to the
poly-A tail of eukaryotic mRNA’s. After this primer anneals the reverse transcriptace can, in the
presence of dNTP’s (A, C, T, G) make a new cDNA strand. This cDNA can then serve as a template for
the PCR. And then analysed on agarose gel.
The amount of PCR product that you see on the gel is directly proportional to the amount of cDNA
that was in the sample, so the amount of mRNA that was in the sample. The primers determine
which gene is amplified, so the amount of PCR product in an RT-PCR reaction is proportional to the
amount of mRNA for that specific gene.
RNA isolation
You need isolated RNA from a certain organism or tissue before you can start an RT-PCR experiment.
There are different types of RNA:
mRNA: messenger RNA
tRNA: transfer RNA (brings correct amino acid to the mRNA during translation by the ribosomes)
snRNA: small nuclear RNA (have a catalytic function in for example spliceosome)
rRNA: ribosomal RNA (largest class, is present in any cell)
Ribosomes are build up out of proteins as well as functional RNA molecules, the rRNA. Prokaryotic
and eukaryotic ribosomes are almost the same, only the subunits have different sizes.
The rRNA's are encoded in the genome by the rDNA genes. These rDNA genes form a gene family,
with hundreds of members that are organized as tandem repeats in the genome. So there are many
rDNA genes, at a particular place in the genome, that are all transcribed into rRNA. Along each gene
many RNA polymerases are transcribing in one direction. The growing RNA transcripts appear as
threads extending outward from the DNA backbone. The shorter transcripts are close to the start of
transcription, the longer ones are near the end of the gene.
Only mRNA’s contain a poly-A tail!!
Isolating only the mRNA’s
For some experiment you only want mRNA. To
isolate mRNA’s you make use of the fact that all
eukaryotic mRNA’s contain a poly-A tail. By applying
the total RNA isolation to a column containing an
oligo-dT matrix (see below), the complementary
poly-A tails will hybridize/bind to the oligo-dT matrix.
As the other RNA classes do not have a poly-A tail,
they do not bind and can be washed away. Next the
mRNA's can be eluted from the column, resulting in
a pure mRNA preparation.
Quantitative RT-PCR; qPCR
If you want to know exactly how much stronger a gene is expressed in one tissue or treatment
compared to the other, you can use quantitative RT-PCR, or qPCR. This technique is also called realtime PCR, because you follow the amount of double stranded DNA that is produced in the PCR in
real time.
A commonly used method to perform a qPCR makes use of a fluorescent dye called SYBR green,
which binds to double stranded DNA. It is only fluorescent when it is bound to double stranded
DNA. To detect the SYBR green fluorescence a special type of PCR machine is used which can
measure the amount of fluorescence after each PCR cycle. First a certain number of cycli is needed
to get sufficient signal (above background)... next there is an exponential amplification of the specific
PCR product (a specific gene)... and eventually there is a plateau reached because the reaction will
be saturated.
The different curves in the plot, represent
different cDNA samples. To quantify the
difference in expression for this gene in the
different samples, a threshold is marked with a
red line in the figure in the exponential
amplification phase. By comparing the number of
PCR cycles that is needed to reach the threshold
you can compare the expression level of the gene
in the different tissues. As PCR amplification is
exponential (2^n, where n = # cycles)), a
difference of 5 Ct values (five PCR cycles)
between two samples means a difference in
expression of 2^5 = 32. So in such case the gene is
32 higher expressed in one sample compared to
the other.
Part 3. Sequencing
Sanger dideoxy-sequencing
Sequencing is a technique to determine the nucleotide sequence of a piece of DNA. Much used
sequencing method is Sanger dideoxy sequencing.
Nowadays, there are also several so-called next-generation sequencing methodes on the market.
When all cDNA's (an therefore all mRNA's) from a certain tissue are sequenced by next-gen
sequencing we call this RNA-seq. By counting the number of sequence reads belong to a particular
gene, you know how strong this gene is expressed and u can directly compare it the the number of
reads of all other genes. Two much used methods are: 454-Sequencing and Illumina-sequencing.
These methodes are much faster and cheaper than Sanger sequencing, however they generate only
relatively short sequence reads (~100 bp or < 400 bp) and they often make small mistakes. In this
course we will stick to Sanger dideoxy sequencing.
The term dideoxy comes from a
special modified nucleotide, called
a dideoxynucleotide (generally a
ddNTP). A dideocynucleotide lacks
the 3’-OH group. The OH is nodig for
DNA to synthesize, so with this one
the DNA strand can no longer be
elongated  DNA synthesis is
In a sequencing reaction, a low concentration of dideoxynucleotide (ddNTP) is added in addition to
the normal nucleotides (dNTPs). As a result, there is a chance that a normal nucleotide is
incorporated (allowing the strand to be elongated) or a dideoxynucleotide (elongation stops). The
ingredients of a sequencing reaction are added together with a low concentration of dideoxy ATP
(ddATP). So whenever an A needs to be incorporated there is a chance that a dideoxy-A is
incorporated blocking further strand elongation. Like any DNA polymerase reaction, you need to add
a primer to the single stranded DNA template as the polymerase needs a small piece of double
stranded to elongate/synthesize the new
strand. Note, you only add 1 primer that
will be elongated using the
complementary strand as template.
This can be done for all nucleotides, so
ddATP, ddTTP, ddCTP and ddGTP. This
will results in many differently sized
products with a dideoxynucleotide at
every possible position in the DNA
When you analyse these different products
in a very sensitive gel electrophoresis you
can "read" the sequence of the DNA
fragment starting from the smallest DNA
fragment (from 5'- to -3').
By using fluorescently
labeled ddNTP's, where
each different ddNTP has
a different color, the
detection can be
automized using a column
to separate the differently
sized DNA fragments and
a laser to detect the
terminal nucleotide color.
The template that is sequenced should not be very complex as it can interfere with the sequencing
reaction. For example, you cannot just sequence on isolated genomic DNA. Therefore you first need
to purify a selected piece of DNA (such a single gene) either by PCR (sequence a PCR fragment) or by
Module 3
This module focusses on protein detection and analysis. Although not every gene encodes for a
protein, in many cases proteins are the key executors in a biological process.
In PART 1 (Western blot), you will learn a technique to detect a protein of interest by making use of
specific antibodies.
In PART 2 (Mutations), you will learn what effects different types of mutations in the DNA can have
on the protein that is encoded by a gene.
Part 1. Western blot
The amount of mRNA of a protein-coding gene that is present in a certain cell-type, does not
necessarily correlate with the amount of protein in that cell-type. The amount of protein depends
among others on the stability of the mRNA, the efficiency of translation, the stability of the protein,
and whether the protein (or mRNA) is transported to other cells.
To see how much protein is present in a certain tissue you need to detect the protein itself. One of
the methods is by using labelled antibodies that specifically bind to that protein. Such antibodies can
either detect the protein in the tissue itself or with a method called Western blotting: the proteins
are fist isolated from a tissue, separated by size on a gel and transferred to a membrane, after which
the antibody detects the specific protein. Another method is with Green Fluorescent Protein.
To perform a western blot, you first need to extract the proteins from a certain tissue, for example
the leafs of tomato. Next, you separate the protein based on their size by using polyacrylyamide gel
electrophoresis, or PAGE.
Polyacrylamide gel
electrophoresis is somewhat
comparable to the separation of
DNA fragments based on size in
an agarose gel electrophoresis.
Acrylamide, when polymerized,
forms a gel with pores. The size
of these pores depends on the
percentage of acrylamide in the
gel. Smaller proteins can move
more easily (run faster) through
the pores of the gel than large
proteins. A higher percentage of
polyacrylamide means smaller
pores in the gel, which is more
suited to separate small
proteins. You want to adjust the
percentage on the size of protein.
Most often proteins are first denatured before they are loaded onto a protein gel. An easy way to do
this is to boil the proteins in a protein sample buffer. Furthermore, the protein samplebuffer
contains beta-mercaptoethanol (or DTT) and SDS (see also picture above). beta-Mercaptoethanol (or
DTT) in a high concentration breaks the disulfide bonds that contribute to the secundaray and
tertiary structure of proteins. SDS (a negatively charged soap-like molecule) binds to the (denatured)
proteins and makes sure that the proteins all get the same negative charge and an elongated shape.
Therefore, all proteins will now migrate to the positive pole (anode) After separation of the proteins
by size, the gel can be stained using a protein dye such as coomassie blue to visualize the proteins in
the gel.
Antibody detection
In many cases the detection involves a two-step procedure. First a primary antibody is used, which
binds to the protein of interest. (the antibody is made with the help of an animal, by injecting the
protein) The secondary antibody is used to detect the animal antibody (and also made in a different
animal). The secondary antibody can be chemically labelled with a fluorescent group or with an
enzyme to allow detection (often is HorseRadisch Peroxidase used (HRP)).
To prevent the a-specific binding of antibodies to the positively charged membrane, the areas of the
membrane where no protein is bound have to be blocked. Often proteins such as BSA (bovine serum
albumin) or nonfat dry milk are used to block the membrane so that the antibodies will not aspecifically stick to the membrane.
Part 2. Mutations
A mutation is a permanent, structural change in the DNA (sequence). Often such a mutation does not
negatively affect the functioning of an organism, but in certain cases is does. For example in the coding
region of a gene. Mutations come in all kinds of shapes and sizes; from deletion of a single basepair to the
relocation (translocation) of entire pieces of chromosomes.
Changes in DNA sequence can occur at different scales in the DNA, for example at the level of a whole
chromosoom or at the level of a single gene.
The first type of mutation is callad a SNP (pronounce SNIP). SNP stands for Single Nucleotide
Polymorphism. A SNP is a mutation where one nucleotide (for example an A) is replaced by
another nucleoted (in this case C,G or T). Such a mutation is also called a base(pair)substitution.
A different, often occurring type of mutation is a deletion. As the name implies, a deletion is a
mutation where part of a DNA sequence is lost. As a results the mutated sequence is shorter than
the original one. Deletions can vary in length from one (or a few) lost nucleotides to large pieces of
a chromosome missing.
The third type of mutation that we will cover is an insertion. An insertion is an extra DNA sequence
that is inserted somewhere in the DNA. So in case of an insertion the mutated sequence is always
longer than the original (wild-type) sequence. Insertions can vary in length from one to several
nucleotides or even complete pieces of a chromosome.
A mutation can lead to a so-called "frame-shift" in
the reading frame. For example, if one base is
deleted a different group of 3 bases forms the new
codon. As a result all following codons will also be
changed. Both deletions and insertions can lead to a
frame-shift. As a result the aminoacid sequence
behind the mutations can be completely different
from the wild-type sequence. So, in most cases a
frameshift will have a dramatic effect on the
encoded protein.
Splicing frame-shifts occur, resulting in one large
reading frame. A mutation can impact the splicing of
the primary mRNA if the mutation affects the
conserved splice-sites. A mutation in a conserved splicesite will prevent the correct splicing of the
intron, by which it will become part of the mature mRNA. This will severely impact the open reading
frame of the mRNA as it will change the codon sequences.
Module 4. Recombinant DNA technology
The development of recombinant DNA technology has triggered a revolution in biology. This
breaktrough allowed any DNA fragment, from any genome, to be isolated and to be fused to any
other DNA sequence of your choice; the so-called cloning of DNA. One of the many reserach
applications for cloning is for example to fuse a protein with a green fluorescent protein (GFP), to
study where a protein is located inside a cell.
Part 1. Cloning
Cloning involves the amplification (copying) of a specific DNA fragment in bacteria. Therefore, the DNA
fragment to be cloned is inserted into a plasmid that can replicate itself inside the bacteria. Plasmids are
circular, double stranded DNA molecules that occur naturally in bacteria. A plasmid can replicate
independently from the chromosomal DNA in bacteria and during each cell division at least one copy of the
plasmid is passed on to the daughter cell. Bacteria can exchange plasmids during conjugation. Therefore,
resistance to antibiotics can spread rapidly in a natural bacterial population.
When a plasmid is used for cloning, it is also called a
(cloning) vector. Below, you see an example of a typical
(artificial) plasmid (pUC18) used for cloning.
ori = origin of replication, allows the plasmid to be
replicated independent from the bacteria.
ampR = resistance gene against the antibiotic ampicillin
Polylinker = a region on the plasmid containing a
collection of unique restriction sites, where new DNA
fragment can be inserted. Also multiple cloning site.
lacZ = open reading frame (ORF) for a gene encoding a
beta-galactosidase. Note: the Polylinker is located in side the open reading frame of the lacZ gene.
Cloning a gene using a plasmid consists of three steps
Step 1. Digestion and Ligation
DNA fragments are inserted into a plasmid at specific restriction sites in the Polylinker. As a first step
the plasmid (vector) is digested with one (or two) restriction enzymes at the place where the DNA
fragment needs to be inserted into the vector. The same enzyme(s) is used to digest the DNA
fragment that needs to be cloned; this DNA fragment is called the insert. This creates fragments with
the same sticky ends as the linearized (digested) plasmid.
Only the fragments that have the same sticky ends will efficiently hybridize/anneal. Next, the
enzyme DNA ligase is used to repair the phosphodiester bonds between the DNA insert and the
plasmid. As a result the DNA fragment (the insert) is ligated into the plasmid, creating a socalled recombinant plasmid.
Step 2. Transformation of competent E. coli cells
After your DNA fragment is ligated into the plasmid you need to introduce the recombinant plasmid into a
bacterium to multiply it. To introduce DNA molecules into bacteria we make use of so-called competent
cells, mostly competent E. coli cells.
There are several different ways to make bacteria "competent", so that they can take up a plasmid. In this
practical course we will treat E. coli bacteria with calcium chloride (CaCl2) to make them competent.
Treatment of actively growing E.coli cells with a solution of CaCl2 changes the cell wall in such a way that
cells become capable of absorbing DNA
molecules that are sticking to the outside
of the cell. The bacteria are given a
heatshock (90 seconds at 42 0C) to
trigger the uptake of the adhering DNA
molecules. Therefore, we call these
chemically competent cells also
"heatshock cells". A different, much used,
method to introduce DNA molecules into
bacteria is by electroporation, where a
short high voltage (5 ms, 2500 V)
electroshock creates pores in the cell
walls of the bacteria and triggers the
uptake of plasmid DNA in solution. After
the heatshock or electroshock the
bacteria are incubated in a rich growth
medium to let them recover from their
Introduction of a recombinant plasmid (or
ligation mixture) into bacteria is
called transformation. Note, that one
bacterium will in general only take up one
plasmid molecule. If you apply a mixture
of plasmids, different bacteria will take up
different plasmids.
After a bacterium has taken up a plasmid
it will replicate itself independently from
the chromosomal DNA, and as a result
many identical copies of that plasmid will
be present in a bacterium. As the
bacterium also divides, you get more and
more bacteria, called clones, containing
multiple copies of a specific recombinant
Step 3. Selection of transformants
To make sure that the plasmid is maintained inside the bacteria, you need to apply a selection
pressure. You use a antibiotic resistance gene for this that is located on the plasmid. For different plasmids
this antibiotic is different. By growing the bacteria on a growth medium to which the antibiotics is added,
only those bacteria will grow that contain the (recombinant) plasmid. These bacteria are
called transformants, and are now genetically modified organisms (GMO). After 12-16 hours growth you get
colonies of transformations on the plate. Each colony consists of a collection of identical bacteria, clones,
that all contain the same plasmid.
To make sure the plasmid in bacterial colony’s has a insert, an additional reporter system is present in the
plasmid. This is lacZ.
LacZ is a bacterial gene that encodes for a Β-galactosidase. This enzyme can process the chemical
substance X-gal into a blue precipitate. When bacteria that contain the pUC18 plasmid are grown on a
plate containing ampicilline and the chemical X-gal, it will result in blue colonies. The polylinker, where a
foreign DNA fragment is inserted, is located inside the open reading frame of the LacZ gene. So when a
DNA fragment is ligated into the polylinker region, it will disturb the reading frame of LacZ and as a result
no functional protein can be made. So when bacteria containing a recombinant plasmid are grown on a
medium with ampicilline, IPTG and X-gal, white colonies will appear. Therefore, this method is called bluewhite screening...
Calculating the efficiency of competent cells.
For a cloning experiment to work you need good competent cells. If your competent cells are of bad
quality the efficiency of taking up recombinant plasmids from a ligation mixture will be too low,
resulting in no transformants (ie. no colonies).
The quality of competent cells is expressed as colony forming units (cfu), so the number of colonies
that you get when you transform the cells with 1 microgram of (undigested) plasmid DNA. For
example, when the competent cells have an efficiency of 10^8 cfu, it means that if you use 1 µg of
(pUC18) plasmid you should get 10^8 (100 million) colonies. Because you cannot count 100 million
colonies on a plate, usually several plasmid dilutions are used to calculate the competence of the
cells. Competent cells with an efficiency of 10^7 cfu (preferably higher) are suited for cloning
Analyzing plasmids on gel
Plasmids are circular DNA molecules! Therefore, the speed at which they migrate in an agarose gel
depends on the three dimensional structure of the circular DNA and as a result a plasmid will typically not
run at the same speed as a linear (digested) DNA fragment. The following different conformations can be
seen when analyzing an undigested plasmid on an agarose gel.
Most of the isolated plasmids will be twisted, forming a so-called supercoil structure. Such a
supercoiled plasmid is much more compact, and as a result it will run must faster in an agarose
gel, than expected based on the size of the plasmid.
The plasmid can also be in its open-circular form. In this case the plasmid is fully circular (relaxed),
which is often the result of a nick in one of the strands. When this is the case we speak of the
nicked open-circular form. Such an open circular form runs at a slower speed than expected based
on the size of the plasmid
These two forms will be the most apparent bands when analyzing an undigested plasmid on a gel. In case
the plasmid is nicked at both strands, for example due to damage, you can also sometimes see the linear
form of the plasmid. The linear form will run faster than the open-circular form, but slower than the
supercoiled form.
Part 2. Applications
Genome library
A genomic DNA library is a collection of recombinant plasmids, each containing a different piece of
genomic DNA. Therefore, the DNA is digested and sub-cloned into plasmids. So the aim of a genomic
library is to have each part of the chromosome represented in a plasmid.
To reduce the amount of plasmids to a more workable number, a much used vector is a socalled BAC, Bacterial Artificial Chromosome, vector. In such a BAC vector you can easily clone
fragments up to 300 kb. So for the human genome (10x coverage) this would mean 300.000
To select a clone you can either use PCR or Southern blotting to select your clone of interest,
containing for example a gene that you want to study further. In case of a Southern blot, we speak of
a colony blot, where all the colonies in the library are spotted on to membranes. A single stranded
labelled probe of the gene that you want to detect can then be used to identify the colony/plasmid
of interest.
cDNA library
cDNA library is made to sub-clone every mRNA present in an organism or in a specific tissue-/celltype of an organism. As you will know you cannot clone RNA directly into a plasmid, you first need to
convert the RNA into DNA. This is done by an enzyme called Reverse Transcriptase.
To be able to clone the cDNA's they need to contain unique restriction sites at their ends. These are
generated by ligating so-called adapters (or linkers), small pieces of DNA containing the restriction
site, to the cDNA fragments (see image above). Now you can digest and ligate the cDNA's in to a
vector and create a collection of clones representing every cDNA/mRNA of an organism.
One of the most used fluorescent proteins is the Green Fluorescent Protein (GFP). The make a GFPfusion protein the open reading frame (orf) of GFP can be fused to the orf of your gene of interest by
means of cloning. Depending on the protein GFP can be attached at the N-terminus of the protein
(N-terminal GFP fusion) or at the C-terminus of the protein (C-terminal GFP) fusion.
Reporter construct
A reporter gene (often simply reporter) is a gene that is expressed under the control of the
regulatory sequences (the promoter) of a gene to study if, where and when that gene is expressed. A
reporter has the characteristics that is easy to visualize (with high sensitivity) or can be used as a
selectable marker. Examples of commonly used reporters in plants are for example GFP, the betaglucoronidase enzyme GUS (converting X-gluc into a blue precipitate inside cells) or the enzyme
luciferase (a light emitting enzyme, for example from fireflies).
When a promoter is cloned in front of the open reading frame of a reporter gene, it often includes
the 5'UTR region of the gene to which that promoter belongs. This is because in many cases the
exact transcription start site is not (yet) known.
Expression vector
Using GMO's as bioreactors for protein synthesis
Two well known examples are the production of insulin and the production of chymosin (to make
Insulin is a peptide hormone, produced by beta cells of the pancreas, and is central to regulating
carbohydrate and fat metabolism in the body. It causes cells in the liver, skeletal muscles, and fat
tissue to absorb glucose from the blood. Before recombinant DNA technology insulin was isolated
from the pancreas of cows, horses or pigs.
Therefore, the part of the human gene that codes for insulin was cloned into an expression
vector and expressed in an E. coli host. As a result the bacteria produced synthetic insulin, which
closely resembled the human insulin.
Chymosin is an important enzyme to make cheese. It cleaves the casein proteins in milk by which
these start to coagulate and precipitate. The enzyme is collected from the stomachs of young cows
(calfs). However, in many countries there is a shortage of calf-stomach extracts and because many
vegetarians do not approve the use of animal extracts, alternative sources for this enzyme were
To express a protein in bacteria, the open reading frame of a gene needs to be cloned into an
expression vector where it comes under the control of a bacterial promoter. Below you see an
example of an expression vector for E. coli. In this case the lacZ regulatory region is used to control
the expression of the protein. Therefore, the protein coding sequence for lacZ is replaced by the
open reading frame of the protein of interest. Bacteria containing this recombinant plasmids, can be
specifally induced to produce the protein by adding the lactose analog IPTG (same as used in bluewhite screening).