Chap 5 Manipulation of Gene Expression in Procaryotes
I. Introduction
A major objective of gene cloning is the expression of the cloned gene to study the
biologic functions or to produce recombinant proteins (i.e. insulin). But gene cloning
doesn’t guarantee successful expression.
Factors that influence gene expression
The nature of the transcriptional promoter and terminator sequences
The strength of ribosome binding site
Efficiency of translation (mRNA stability, mRNA secondary structure..)
# of the cloned gene copies (or # of plasmids) and whether the gene is plasmid borne or
integrated into the chromosome.
Nature and cellular location of the expressed protein (intra- or extracellular? secreted?
Post-translational processing: glycosylation, proteolytic processing…
The intrinsic stability of the protein (misfolding of the proteins? susceptible to
A large fraction of proteins (varying from 30% to 70% of all proteins made) is
immediately degraded after synthesis before forming functional proteins8. These socalled DRiPs (defective ribosomal products) are the result of
defective transcription or translation, alternative reading frame usage,
failed assembly into larger protein complexes, the incorporation of
wrong amino acids owing to mistakes by aminoacyl-tRNA
synthetases or altered ubiquitin modifications. DRiPs are
immediately degraded to prevent the formation of protein aggregates,
which would affect cell viability.
Ubiquitination is a post-translational modification in which ubiquitin,
a 76–amino acid protein, is covalently added to lysine residues. In
humans, the ubiquitination reaction is catalyzed by >500 E3 ligases, each of which transfers ubiquitin
to specific protein targets. There are several types of ubiquitin modification, and these may have different effects on target
proteins. The best known is the polyubiquitin chain, which targets proteins for proteasomal degradation. The polyubiquitin
chain begins with a ubiquitin conjugated at its C terminus to a lysine residue in a target protein.
26S proteasome: A giant multicatalytic protease that resides in the cytosol and the
nucleus. The 20S core, which contains three distinct catalytic subunits, can be appended at either end by a 19S cap or an
11S cap. The binding of two 19S caps to the 20S core forms the 26S proteasome,
which degrades
polyubiquitylated proteins into peptides1.
Some of the above factors can be improved by proper design (i.e. select strong promoter,
use multiple gene copies…)
Choice of expression system is very important!!! (May affect location of proteins, poor translational processing,
poor transcription, protease-mediated degradation)
Major expression systems are classified into procaryotic and eucaryotic.
Procaryotic (e.g. E. coli):
Very well-studied and common in protein production.
Grow fast (doubling time20 min), grow easily easy fermenter operation.
Normally high yield (high cell density).
Minimum media (simple composition, e.g. Na+, K+, Mg2+, Ca2+, NH4+, Cl-, SO42-,
glucose and carbon sources) cheap.
Often fail to perform suitable post-translational modifications.
Inclusion body (insoluble proteins) when overexpressing makes purification and
regaining of protein conformation (protein renaturation) more difficult.
Eucaryotic: next chapter
II. Strong and regulatable promoters
Why strong promoters?
Some endogenously expressed proteins (e.g. viral proteins or tumor proteins) are degraded to short peptides and routed to
MHC class I in ER, where the formed complex leaves ER to the surface for presentation.
Has higher affinity for RNA pol so the downstream gene is highly (frequently)
Why regulatable promoters?
Continuous overexpression of a cloned gene is often detrimental to the host cell
because it drains the energy and other resources and impair cellular functions.
genes are constructed under strong and regulatable promoters.
genes are expressed only when “induced”.
E. coli lac promoter:
(a) Regulated by IPTG
Cells are grown in the absence of lactose and repressor binds to the operatorgenes
can’t be transcribed. Only when IPTG is added then starts the gene expression.
Lac repressor
Gene 1, 2,3,..
Induction (turn on) by IPTG
IPTG prevents lac repressor from binding to the operator
Transcription occurs
Very common
(b) regulated by CAP (catabolite activator protein)
Gene 1, 2,3,..
Binding of cAMP to CAP further enhances
the affinity for RNA pol
*level of cyclic AMP is highest when glucose level is low
Gene 1, 2,3,..
RNA pol
Combining the above, induce protein expression at high IPTG (or lactose) and
low [glucose]. (high [cAMP])highest transcription.
Trp promoter: (regulates the transcription of genes responsible for Trp synthesis)
off (negatively regulated): tryptophan-trp repressor protein complex binding to trp
operator transcription shutdown
on (positively regulated): removal of tryptophan
Bacteriophage T7 promoter:
T7 RNA pol
T7 promoter
Target gene
lac promoter
T7 RNA pol gene
T7 promoter is very strong, but requires T7 RNA pol to activate.
Two recombinant genes can be co-introduced into the cells for expression.
Alternatively, the genes encoding T7 RNA pol can be integrated into the
chromosomal DNA to form a stable cell line.
pL promoter (from bacteriophage ):
Controlled by cI repressor protein
Cells carrying temp-sensitive cI repressor are grown at 28C (cI repressor is
expressed under its own promoter pCI at 28C)  cI repressor prevents
transcription when CD is high enoughincrease to 42C thermosensitive cI
repressor is inactivated transcription is on.
Effectiveness of deactivating a repressor depends on
# of repressor
# of copies of promoter sequences
ratio too large difficult to induce
ratio too small transcription is “leaky” (transcription occurs in the absence of inducer)
Put repressor genes in a plasmid: low copy # (e.g. 1-8 copies/cell)
Put promoter-target gene in another plasmid: high copy number (e.g. 30-300 copies/cell)
 maintain the ratio to effectively deactivate and activate.
III.Expression vectors
Regulatable, strong promoters may not guarantee high yield of gene products. Efficiency
of translation, stability of protein, etc. also are factors2. Expression vectors are similar to
cloning vectors but contain more elements to confer efficient expression.
e.g. The expression plasmid pKK233-2 contains:
tac promoter (a hybrid that includes the -10 region of lac
promoter and -35 region of trp promoter, can be induced
by IPTG, 3X and 10X stronger than trp and lac promoters, respectively)
RBS, ori. (RBS: a sequence of 6-8 nt (e.g. UAAGGAGG) in mRNA that can base
pair with rRNA on the ribosome, generally, binding of mRNA to rRNA increases,
the translation initiation increases)
Not all mRNA are translated in the same efficiency, differential translation and transcriptional regulation enable the cells to
adapt to different stresses (environmental, heat shock, oxygen…)
An ATG start codon about 8 nt downstream from the RBS (optional)
Multiple cloning site
Ampr gene as a selectable marker
the RNA sequence from RBS to the first few codons of the cloned gene must not
form intrastrand loops, which hampers the binding to ribosome
DNA sequence is written as the coding strand, so ATG is often seen as the starting
Fusion Proteins
Problems: yield of foreign proteins normally low for various reasons (e.g. degradation by
Solution: covalently attach the cloned gene product to a stable (host) protein to form a
fusion protein to protect the desired recombinant protein.
Construct at DNA level
transcribed RNA must have correct base sequence (stop codon in the middle must
be eliminated)
Reading frame must be correct, base sequence in the linker must be precise,
otherwise ORF will be wrong (need to know the precise sequence of these two proteins)
Cleavage of fusion proteins
The fusion may not be suitable as the final product because:
The biological function might be lost
Stringent regulation by government agencies (e.g. FDA)
The EK cleavage site enables the cleavage of the fusion by enterokinase at the specified
Another linker often used is the Xa linker (Ile-Glu-Gly-Arg) which can be recognized by
a blood coagulation factor (Xa) and specifically recognized at the C-terminus the
desired protein should therefore be in the second segment.
Applications of fusion proteins (many applications, give one example only)
e.g. simplify purification
dual function of the fusion:
reduce the degradation, enable the cleavage
Flag (a peptide recognized by EK)
IL2: a cytokine that stimulates
both T-cell growth and B-cell Ab
enable the product to be purified by immunoaffinity chromatography in which MAb
directed against Flag is immobilized on a polypropylene support and used as a
ligand to bind the fusion.
Unidirectional tandem gene arrays
Generally, gene expression increases as plasmid copy # increases, but other plasmidencoding proteins (e.g. antibiotics resistance) are over-expressed too
wasting metabolic energy, constraining cellular activities
increasing copy # is not always effective
tandem gene arrays (cloning multiple copies of genes into a low copy # plasmid)
problem: simple end-to-end ligation of DNA results in a random orientation of the arrays of genes
The gene, together with the translational start and stop signals, is cloned into the EcoRI site.
The gene is cut by Ava I the DNA fragments have non-identical sticky endsligation
results in unidirectional array.
V. Golden Gate Shuffling: A One-Pot DNA shuffling Method
Limitations of the traditional cloning methods
1. Time consuming
2. Inefficient
3. Require the unique restriction sites
 become limited for large recombinant DNA molecules.
Golden Gate Shuffling is a protocol to assemble separate DNA fragments together into
an acceptor vector in one step and one tube.
The principle of the cloning strategy is based on the ability of type IIs restriction
enzymes to cut outside of their recognition site.
1. Two DNA ends terminated by the same 4 nucleotides. (sequence f, composed of
nucleotides 1234, complementary nucleotides noted in italics)
2. Sequence f are flanked by a BsaI (type II restriction enzymes) recognition sequence, B.
3. The type IIs restriction enzymes removes the enzyme recognition sites and generates
ends with complementary 4 nt overhangs.
4. These ends can be ligated seamlessly, creating a junction that lacks the original site.
One-pot one-step assembly of a GFP construct from 10 constructs.
1. A DNA shuffling protocol would consist of first selecting a number of 4 nucleotides
‘recombination sites’ on a nucleotide sequence alignment of several homologous
2. The selection of these recombination sites defines modules that consist of a core
sequence (C1-C9) flanked by two 4 nt sequences.
3. These modules can be amplified by PCR with primers designed to add flanking BsaI
sites on each side of the modules (the BsaI cleavage sites perfectly overlapping with
the recombination sites)
4. The recipient expression vector, pX-LacZ contains two BsaI sites compatible with the
first (C1) and last (C9) modules.
5. Incubation for 2 minutes at 37℃ and 5 minutes at 16℃, both steps repeated 50 time.
6. Incubation for 5 minutes at 50℃ (final digestion).
7. Incubation for 5 minutes at 80℃ (heat inactivation).
Increasing protein stability
Normally, the half lives of proteins range from a few minutes to hours (some exceptions exist, e.g.
collagen has an half life of years),
because cells contain proteases that degrade unnecessary and
abnormal proteins. This enables the cells to recycle the resources, a housekeeping
function to keep the cell viability.
Normally, proteins with more disulfide bonds (S-S between Cys) and certain amino acids
at the N-terminus are more stable more proteins accumulate and the yield increases.
Ex: stability of -galactosidase with certain a.a added to
the N-terminus
Change the a.a. at the N-terminus
Increase the number of S-S bonds
Co-express chaperone proteins (e.g. groEL, dna J, dna
a.a. added
Half life
Met, Ser, Ala
> 20 h
Thr, Val, Gly
> 20 h
Ile, Glu
> 30 min
 2 min
K….) to aid the protein folding
Overcoming O2 Limitation
Oxygen is generally required for cell growth, to support respiration and maintain cellular
functions and protein expression, but oxygen’s solubility is low. If the CD is high, even
larger amount of air or oxygen or increasing the stirring speed may not be enough. When
O2 depletion occurs, cells would enter stationary phase and die eventually.
If engineering approaches fail, what can we do? bacterial hemoglobin
Solve: bacterium “Vitreoscilla” lives in stagnant ponds (oxygen deficient). To obtain
oxygen for growth and metabolism, the bacteria express a hemoglobin-like protein that
fetch oxygen from the environment and transport into the cells.
When this gene is cloned and expressed in E. coli, the recombinant E. coli shows higher
metabolic activity and higher protein production at low levels of O2.
VII. DNA Integration into the Host Chromosome
Why integrate DNA into the chromosome?
Plasmid-borne expression drains the cellular energy due to the antibiotics-resistant and other genes are
expressed, plasmid replication requires the resources and energies too.
Plasmid instability: plasmid-free cells outgrow plasmid-bearing cells, so after several passages, the percentage of cells
bearing plasmids drops, protein production drops as well. If the foreign DNA is on the chromosome, it is not as easy to occur.
Choose a suitable integration site.
Clone part of the chromosomal DNA
sequence at the integration site into
the vector (e.g. plasmid). The
chromosomal DNA sequence on the
vector and at the integration site
must be similar in sequence, at least
50 bp, so that homologous
recombination can occur.
Clone the target gene (and promoter) into the plasmid (vector) flanked by the
chromosomal DNA sequence.
Transfer the plasmid into a host cell (The vector does not replicate within the host
Select the host cells that have the target gene integrated into the chromosome. More
details discussed in the eucaryotic cell
Increasing Secretion
Secretion is important for many human proteins (e.g. adrenaline, growth factors and many other blood
proteins are secreted).
In industry, it’s often desired that the proteins be secreted because:
The stability of a cloned gene in E. coli depends on its cellular location,
secreted proteins tends to be more
For example, a recombinant proinsulin is 10X more stable if exported into the periplasm (the space between the
inner and the outer membrane).
Secreted proteins may give higher purification recovery yield because they are free from
thousands of cellular proteins.
Drawback: recombinant protein concentration in the medium is low.
Secreted proteins
have a signal
peptide at the Nterminus,
facilitating the
protein transport
though the
secretory pathway.
When crossing the
membrane, the
signal peptide is cleaved by peptidase to become the mature protein.
E. coli and other Gram negative bacteria have outer membranes, which prevent proteins
from secreting to the medium.
Use gram-positive or eucaryotes which do not have outer membranes (but Gram-negative bacteria
such as E. coli is usually excellent first-choice).
Fuse a signal peptide or engineer a fusion protein with signal peptide at the N-terminus
Lower the expression level because sometimes over-expression could overload the
secretion machinery, thus mitigating the secretion.
Co-express a limiting factor in the secretion pathway. For instance, clone prl A4 and
secE genes (which encode the major components of the molecular apparatus that moves
proteins across the membrane) into E. coli % of secreted (and mature) protein
increases from 50% to 90%.
Clone bacteriocin release protein: bacteriocin (in Gram negative bacteria) is secreted
with the help of bacteriocin release protein, which permeabilize the inner and outer
membranes Co-express this protein with the target protein under the control of the
same promoter.
Reducing the Metabolic Load
Expression of foreign gene often changes the metabolism and impairs the normal cellular
function, due to the increased metabolic load (burden) for the following reasons:
Competing for amino acids, tRNA and energy (ATP).
DO is often insufficient for cell metabolism and plasmid maintenance.
Increasing plasmid copy number often requires increasing amounts of cellular
energy for plasmid replication and maintenance.
Foreign proteins may jam the export sites and impair proper localization of host
Foreign proteins may be toxic to the cells.
Plasmid instability: cells w/o plasmid outgrow cells with plasmid loss of
recombinant plasmid.
Energy intensive processes such as nitrogen fixation and protein synthesis slow
Translational error: because tRNA could be limiting so incorrect a.a. may be
incorporated (chance could be 10 times more when overexpression occurs).
Use low copy # plasmid instead of high copy number plasmid.
Integrate the foreign DNA into the chromosome.
Use strong, regulatable promoter, so cell culture is divided into two phases:
Growth phase: cell growth without target protein expression
Production phase: when CD is high, production is induced (e.g. by IPTG, heat…)
Express at a modest level (e.g. 5% of the total protein), but at high CD in fermentation.
High expression and high CD are not easy to achieve simultaneously. Expression at high level (15%) but at low CD may not be
more efficient.
X. Appendix
Inclusion body (IB)
Protein aggregates that usually lack biologic functions  separate the IB by
centrifugation or filtration (may facilitate purification)IB denaturation then
Performed by strong acid, strong base, high temperature and pressure, proteases etc.
Usually chemical agents are used:
Urea: 8-10 M, destroy H-bond and hydrophobic interaction.
Guanidine hydrochloride (GuHCl): 6-8 M, disrupts hydrophobic and ionic
Dithiothreitol (DTT), -mercaptomethanol: disrupt the S-S bond.
EDTA or EGTA: chelate metal ions to avoid unwanted chemical reactions.
Renaturation (refolding)
Dialysis: change the buffer and dilute the denaturant concentration gradually.
Renaturation buffer: usually contains Tris-HCl (pH buffer), low concentration of
denaturant (e.g. urea) to prevent aggregation and oxidizing agent to oxidize the –SH
group for S-S bond formation.