High level expression of recombinant proteins in E. coli.

advertisement
Overproduction of proteins in E. coli.
General ref: Current Protocols in Molecular Biology, Ausubel et al Eds.
Objectives
To understand the following issues with respect to production of foreign proteins in E. coli.
1. The need to provide an E. coli promoter and ribosomal binding site.
2. The need to keep expression turned off during growth and propagation of the clone.
3. Problems related to stability and purification.
4. Use of affinity purification systems.
5. Recombinant Phage display.
Reasons for over-expressing proteins.
1. To purify large amounts for study or for sale.
2. To purify from a more convenient heterologous organism.
3. To purify away from other components of the originating organism.
4. As a prelude to in vitro mutagenesis.
Overview
The amount of care necessary to successfully express a foreign protein in E. coli depends on how
much yield you need. If you're just trying to get enough to detect activity, then most any fusion to a valid
E. coli promoter will probably do. For many research purposes, expressing the protein as about a percent
of the bacterial protein is probably more than enough. If the gene comes with its own promoter, this may
be achievable by simply putting the gene on a multicopy vector. If the gene is without a promoter (a
cDNA for example), one can get this level of expression from fusing to any number of strong E. coli
promoters. At this level of expression, one is mainly concerned with avoiding problems caused by some
noxious property of the gene product (ie. instability, refusal to fold, toxic to the host, mRNA degradation
signals in the untranslated regions).
For some purposes, one needs high yields. Yields in excess of 40% of total E. coli protein can be
obtained. To reach these yields, one should expect to optimize every step in the expression pathway.
Getting high level transcription is usually not too hard. One may have to supply optimal translational start
signals, to supply a transcriptional terminator, to remove some non-optimal codons, to remove or replace
untranslated regions, and to be prepared to recover large amounts of insoluble protein.
When high level expression is coupled to in vitro mutagenesis, one should expect additional
problems with the mutants. Mutant proteins are generally less stable, and therefore more susceptible to
degradation and insolubility. Multiple mutations cause progressively more trouble.
General methods of boosting expression.
1. Increase copy number of the gene.
2. Fuse to more powerful transcription and/or translation signals.
(eg. lac, lambda PL, Trp, TAC, beta-lactamase.)
Other problems and potential solutions:
1. Codon preferences - resynthesize gene or segments thereof with favored codons, particularly codon #2,
or replace runs of adjacent unfavorable codons.
2. Degradation of protein - lon- host; fusion to an E. coli protein.
3. Degradation of mRNA - alteration of gene sequences; use of RNAse- host.
4. Insolubility of the expressed protein a. Solubilize by denaturation and renaturation.
b. Solubilize under nondenaturing conditions.
c. Try growth at reduced temperature
d. Look out for missing cofactors (like metal ions) in the growth medium.
e. Express at a reduced rate to give the protein a chance to fold.
f. Be happy with the soluble portion. (But it it's a small portion, beware that it might represent
mistranslated or unfolded material).
5. Expression of the protein is toxic to E. coli - Use a tightly controlled promoter to keep expression
turned off until the clone has been grown up.
General procedure for high level expression:
These days, most protein high expression experiments in E. coli are done after insertion of the
foreign insert into an expression vector designed to add a well behaved polypeptide as an N terminal
fusion. Typically there is a well regulated promoter and powerful translation imitation signals provided by
the vector. Special host strains for high expression are used that are engineered to aid the process, and
may have specific genetic modifications to complement specific requirements of the vector system in use.
During growth, including plating after transformation, one needs to keep the promoter in the off state.
Pilot expression experiments are done followed by separation of the E. coli lysate to soluble and insoluble
fractions by centrifugation. Both are then dissolved with SDS and run on an SDS polyacrylamide gel.
The recombinant protein should be an obvious massive band by comparison to an untransformed sample
of the host strain. The fusion polypeptide is usually the basis of a purification scheme by affinity
chromatography or batch elution. Typically, a protease site is present at the fusion junction to enable
cleaving the foreign protein away from its fusion partner. Then there is often a final chromatographic
purification to remove the protease and residual contaminating proteins. An activity assay for the foreign
protein is not strictly required, but if available will form the verification that the prepared protein is folded
in a biologically meaningful way. Some variations on these systems are described later in this document.
Historical Examples.
A number of the early protein expression experiments are cited below, because these experiments
exposed the mechanisms of a variety of problems that one would now try to avoid in the design stage of
an expression experiment.
Somatostatin - Itakura et al., Sci 198,1056. (1977)
Somatostatin is a peptide hormone. From the known amino acid sequence, a somatostatin gene
was synthesized with E. coli codon preferences. It was expressed from the lactose promoter with and
without fusion to beta-galactosidase, with the latter found to stabilize the peptide. The fusion was made
after a Met residue so that somatostatin was recovered from the fusion protein after cyanogen bromide
cleavage. The unfused construct produced no detectable somatostatin, and the fusion construct produced
a disappointingly low yield of insoluble protein.
This was the first published attempt to mass produce a eucaryotic protein in E. coli. It mainly
served to anticipate some of the problems that must be overcome for successful mass expression. The
solubility problem remains something that requires a customized solution for each protein, although stable
globular proteins do better than short peptides. This experiment did establish the strategy of fusing
foreign peptides to a carrier protein to stabilize them. More specific means of cleaving the fusion junction
are now available.
The low yield was related to a failure to adequately down-regulate the expression of the insert
while the clone was being grown and propagated. The lac regulation was overpowered by the copy
number of the vector (pBR322). Even though pBR322 exists in only about 20 molecules per cell, this
enough to titrate out the available lac repressor. This causes partially constitutive expression of the insert,
which causes selection for deletions that take out the promoter or the insert.
It is a common error for people to get a poor yield and blame it on degradation, when what really
happened is that the gene or promoter was already genetically damaged in the construct by the time they
looked for expression. To make a case for degradation, one should pulse label with [35]S-methionine and
observe that the protein really is produced and then degraded. (Note: Look to be sure the protein has an
internal met codon first; the initiator met is often removed by posttranslational processing).
Another symptom of genetic instability caused by expression leakage is that the yield drops off
precipitously as the clone is propagated. So the clone might produce a great yield in a small pilot
experiment, and then make almost nothing when scaled up to several hundred liters. One should consider
keeping back a small sample of the culture to allow examination of the plasmid DNA itself after the fact.
Genetic instability will often show up as a heterogeneous set of deletions. However, you need to keep in
mind that point mutations in the promoter, or even mutations in the host background can also destroy the
expression of the insert.
Instability and ampicillin resistance.
The instability problem when growing expression clones is worse when trying to maintain the
clone with ampicillin resistance than with other antibiotics. This is because ampicillinase leaks out of the
cells while they are growing in liquid culture and destroy the ampicillin in the culture fluid. After that,
bacteria that lose the plasmid tend to overgrow the culture. A typical experience goes as follows:
1. The clones behave as expected on an ampicillin plate.
2. Small scale cultures produce the protein as expected.
3. An overnight preculture is prepared to start a large scale growth.
4. When the large culture is inoculated the next day, the optical density increases only slightly, and
then decreases. To the practiced eye, there is an accumulation of stringy debris indicative
of lysis.
5. The effect is non reproducible. Sometimes the large scale culture grows and sometimes it lyses.
When it does grow, there can be a long lag phase, and the protein yield is typically less
than anticipated from the small scale culture.
The explanation is that the ampicillin is cleared from the preculture and then ampicillin sensitive
bacteria that have lost the plasmid overgrow to various degrees by morning. When the preculture is used
to inoculate media with fresh ampicillin, the bacteria begin to grow. But they cannot synthesize cell wall
due to the ampicillin, so they lyse.
This problem shows similar symptoms to a T1 phage infestation. T1 is a bacteriophage of E. coli
that survives dehydration, and spreads as an airborne contaminant. It causes aggressive lysis, producing
plaques on plates the size of a quarter. T1 infestation is rare, but when a culture gets accidentally infected,
lyses, and then opened, it can spread enough airborne contamination throughout the lab or even an entire
building that no one can grow E. coli cultures for years afterwards. This forces everyone to derive T1
resistant versions of all of their strains. This is a tremendous setback when it happens, hence everyone is
advised upon observing a culture of E. coli to lyse unexpectedly to autoclave it without opening it.
Clearly, it is inadvisable to have a background of cultures lysing unexpectedly due to this ampicillin
selection problem because it reduces vigilance against the T1 infestation problem.
If you work with an expression plasmid based on ampicillin selection, you need to take special
steps to maintain the selection. The growth is generally done more continuously to avoid precultures
going to saturation. However the growth may still be done in stages with inoculation into fresh
medium.
Some biotech companies are promoting expression vectors based on different antibiotics to
counter this effect.
Human growth hormone - Goedel, et al. (1979) Nature 281, 544.
This is probably the first published successful mass expression of a eucaryotic protein in E. coli.
Human growth hormone is a 191 residue peptide hormone. The first 24 codons were resynthesized with
an Eco RI site upstream of the AUG convenient for joining to the lac promoter and ribosome binding site.
The other end was made as a Hae III site. The synthetic segment was first cloned and sequenced in an
independent vector to verify the correct sequence.
The cDNA was cloned as a Hae III fragment which omits the first 24 codons. The two parts of the
gene were then ligated together an joined to an Eco RI site downstream of two lac promoters.
They used lac iQ (overproducer of lactose repressor) to get tighter control over expression and
downstream transcriptional fusion to the tet resistance gene of pBR322 to guard against deletion. Upon
induction, they got 20% of cellular protein as HGH.
This strategy anticipates some of the common tricks still used today. The resynthesis of the Nterminal region as part of a linker (or more recently as a PCR primer) is a standard method of achieving a
fusion. Oligo synthesis is sufficiently advanced today to easily reach lengths of up to 100 bases. Lac iQ is
still used to improve control over the lac promoter. Sometimes the lac i gene is placed on the cloning
vector so that its gene copy number is increased together with the number of lac promoters. However, the
lac promoter still leaks expression of the insert. Other promoter systems (lambda PL and T7) can give a
negligible basal expression level, and are preferred for inserts with toxic properties. These improved
promoter systems have supplanted the use of transcriptional fusions to an antibiotic gene as the preferred
way to stabilize troublesome inserts. Additionally, one tries to avoid serial propagation.
This experiment also typifies the multistep constructions that were common in the last decade. In
a multistep construction (where you're putting a lot of different restriction fragments together), there are
lots of things that can go wrong. As much as possible, you need to make one joint at a time, clone the
intermediate, verify it, and then cut it back out to use in the next step. Today, one would try to use
modern techniques and materials to reduce the number of steps. For example, one would use an
established expression vector that already had the promoter, lots of convenient restriction sites, a host
strain, and a history of successful expression experiments. This would avoid the steps involving creation
of the vector. It would probably be preferable to use the synthetic segment as a PCR primer, and therefore
lift out the intact HGH gene in one step. However, it is not a useful simplification to throw four or more
fragments into a ligation reaction and expect them to all join together in the proper order.
The object of simplifying a construction is to increase the reliability, not to reduce your work load.
Steps that are for verification improve reliability and should be included as much as possible. It's the steps
that are mainly opportunities for something else to go wrong that you're trying to eliminate.
Beta-globin - L. Guarente et al. (1980) Cell 20, 543.
When joining the beta-globin AUG to the ribosomal binding site of the lac promoter, they made a
set of deletions with exo III and S1 to get a variety of spacings. The clones were translationally fused
downstream to lac z so that the efficiency of the various arrangements on the 5' end could be assayed by
looking at beta galactosidase activity. When an efficient construct was found, the 3' end was replaced to
make an unfused beta-globin gene.
The figure above shows the relative activity recovered based on the exact sequence that was
deleted. This experiment served to show that the spacing between the AUG and the ribosome binding site
is critical. In modern constructs, one uses an exact copy of an efficiently translated E. coli gene for this
region.
Proinsulin - K. Talmadge, et al. (1980) PNAS 77, 3988.
Preproinsulin has a eucaryotic signal sequence at its N-terminus that normally directs it to be
secreted. Beta-lactamase has a bacterial signal sequence at its N-terminal which directs its secretion into
the periplasmic space. Several fusions with part of the bacterial and part of the eucaryotic signal
sequences were made. They all directed secretion of the protein, and in each case the signal sequence was
properly cleaved off to create correct mature proinsulin. In fact, even the plain proinsulin signal without
any bacterial component worked.
Beta lactamase signal
| Cleavage
MSIQHFRVALIPFFAAFCLPVFA
HPETLVK...
MSIQHFRVALIPFFAAFCLPVFA
MSIQHFRVALIPFFAAFCLPVFA
MSIQHFRVALIP
MSIQ
HPET
HP
AAGGGGGG
LQGGGGG
LQGGGGG
AAAG
WRMFLPLLALLVLWEPKPAQA
WRMFLPLLALLVLWEPKPAQA
WRMFLPLLALLVLWEPKPAQA
QHLCGPHLVEALYLVCGE...
FVKQHLCGPHLVEALYLVCGE...
FVKQHLCGPHLVEALYLVCGE...
FVKQHLCGPHLVEALYLVCGE...
MALWRMFLPLLALLVLWEPKPAQA
FVKQHLCGPHLVEALYLVCGE...
Preproinsulin signal
^ Cleavage site
From Talmadge et al. (1980) PNAS 77, 3988-3992.
This experiment established the feasibility of causing the foreign protein to be secreted into the
periplasmic space, along with removal of the signal sequence. The idea was that by secreting the foreign
protein, it would be easy to purify, and protected from stability and solubility problems. However, it turns
out that the periplasm has even more proteases in it than the cytoplasm, so one generally gets a lower yield
this way than by just leaving it in the cytoplasm.
Strength of the ribosomal binding site.
Ref: Mott et al. (1985) PNAS 82, 88-92.
In order to over express the E. coli rho protein, it was fused to the lambda PL promoter either with
its own ribosomal binding site or with the ribosomal binding site of the lambda cII gene. The former
construct gave rho as 3%-5% of the cellular protein after induction, whereas the latter gave approximately
40%. So even with bacterial genes, it can help to improve the translation signals.
The lambda PL promoter is still one of the best around due to its very low basal expression level,
its high activity, and its ease of induction (heat). However, you have to use a host with the CI857 ts
lambda repressor gene in it, and you have to be sure to grow the clones at 32C so as to avoid leakage of
expression. Modern expression vectors usually come already carrying the translational signals from a
heavily expressed gene like CII. Optimally, you try to make your fusion right at the initiator AUG.
A completely synthetic approach.
Ref: Jay et al. (1984) PNAS 81, 2290-2294.
The gene for human gamma interferon was completely synthesized including a strong
bacteriophage T5 promoter and a strong ribosome binding site. The gene was ligated together from a
series of 66 overlapping oligonucleotides as illustrated in the stylized diagram below. One can put many
oligos together in a single ligation, although it may be wise to assemble the gene as a series of smaller
restriction fragments that can be independently cloned, sequenced and then ligated together to form the
whole gene.
The synthetic gene was ligated into a plasmid vector such that the tet resistance gene was fused
downstream to hold on selection against loss of the interferon gene. Human interferon accumulated at >
15% of cellular protein.
Today, individual oligos of 100 bases can be made with little risk of incorporating errors. This is
because the chemistry has been changed so that error products (failure to add a base at any step) are
capped and left in a condition so that they can all be removed from the correct product in a one step
purification at the end of the synthesis. So it is possible to construct entirely synthetic genes of substantial
length.
Use of high copy number vectors.
Ref: Winter et al. (1982) Nature 299, 756-758.
M13 RF maintains a copy number of about 200 molecules per infected cell. Gene cloned into
M13 with their own promoters can have high level expression, even if their own promoters are not
particularly strong. M13 is a phage that packages a single stranded circle of DNA into the capsid. Within
the cell M13 grows as a double stranded plasmid called RF (replicative form). Methods of mutagenesis
prior to PCR-based methods were based on priming a single stranded template with a mutageneic primer.
Hence M13 vectors fit easily into that strategy.
In order to avoid deletions, one should make the phage propagate by infection rather than by
division of infected cells. Also, one should avoid serial culturing.
Inclusion bodies.
Heavily expressed proteins often aggregate and form inclusion bodies. Inclusion bodies pellet
with the bacterial debris after cell lysis. Since people often discard this fraction, it is easy to mistakenly
believe that the expressed protein has been degraded.
Inclusion bodies can actually protect a protein from degradation. Also, after isolation by
differential centrifugation and washing, the over-expressed protein may be almost pure within the
inclusion body. Inclusion bodies are typically washed with triton X100 or other mild denaturants to
remove contaminating membrane proteins prior to subsequent purification. It is, however, possible that a
recombinant protein would be in the inclusion body because it was hydrophobic, in which case it may
wash out with the Triton X 100. Hence it is advisable to follow the distribution of the recombinant
protein into all fractions by SDS PAGE until the purification procedure is worked out.
Proteins within inclusion bodies are insoluble, often in a denatured state, and may have
inappropriate disulfide bonding. One generally solubilizes the inclusion bodies in a denaturing agent,
such as guanidine hydrochloride, reduces, and then tries to refold the protein out of a low concentration of
guanidine hydrochloride and in the presence of reduced and oxidized glutathione. It is possible to conduct
chromatography under denaturing conditions for removal of contaminants that might interfere with
refolding. Further purification of the refolded form will be necessary along with physical characterization
to assure that it is the correct native form. Conditions for refolding vary from protein to protein and may
be hard to find.
Ref: Tsuji et al, (1987) Bioch 26, 3129-3134.
Proteins can be in inclusion bodies for reasons other than being unfolded. For example, RNA
binding proteins are often found in inclusion bodies by virtue of being networked with cellular RNA. It
may be possible to release such proteins without denaturation by treating with RNAse.
Sometimes proteins can be engineered to avoid certain folding problems. For example, if a cys is
involved in inappropriate disulfide bonding, and homologous proteins suggest an alternative acceptable
amino acid, making that replacement by in vitro mutagenesis might improve the ease of isolation of the
protein.
Protease- mutant host bacteria.
E. coli has numerous proteases that can attack and degrade recombinant proteins. In particular,
synthesis of protease La, which is the product of the lon locus, is induced by the presence of abnormal
proteins. lon is a heat shock gene, and is probably there to degrade damaged proteins after heat shock.
Recombinant proteins are degraded 2-4 times more slowly in lon- cells. Alternatively, the HPTR locus
which encodes an alternative sigma factor for directing the induction of heat shock genes can be mutated.
Ref: Goff and Goldberg (1985) Cell 41, 587.
mRNA half life.
Most E. coli messages have half lives of about 1-2 minutes. T4 gene 32 mRNA has a half life of
about 30 minutes, this being part of the phage's strategy to achieve high level expression. Sequences in
the 5' untranslated region of the message confer this excessive stability. Expression cassettes have been
constructed wherein the 5' end of T4 gene 32 is fused to the beginning of the recombinant gene.
As far as I can tell, this system has not appeared in a commercial vector yet. However, be warned
that the opposite effect can happen by accident. You may inadvertently introduce sequences in an
untranslated region that destabilize the mRNA.
Ref: Frey et al., (1988) Gene 62, 237-247.
mRNA secondary structure.
When fusing an E. coli translation start to the segment of a gene encoding the N terminal, it is
possible to inadvertently create mRNA secondary structure that ties up the initiator codon or the ribosome
binding sequence. Such proposed constructs should be checked out for secondary structure problems
using prediction programs. Mfold in GCG is adequate for this purpose. Mfold can also be found on the
internet.
Many modern expression vectors include an N terminal fusion peptide that can be cleaved away
after purification of the fusion product. This keeps the novel sequences away from the translation controls
and avoids inadvertent creation of mRNA secondary structure problems.
Interaction of the transgenic protein with chaperonins
Ref: Overproductions of Anabaena 7120 ribulose-bisphosphate carboxylase/oxygenase in Escherichia
coli-Larimer and Soper (1993) Gene 126: 85-92.
In photosynthetic organisms Rubisco (D-ribulose-1,5-bisphosphate carboxylase catalyzes the initial
step in the reductive pentose phosphate pathway. Refolding of Rubisco in vitro requires chaperonins.
High-level production of Rubisco activity from E. coli was aided by the simultaneous overproduction of
the E. coli (GroESL) chaperonins.
Curiously, some proteins may be stabilized in strains carrying a defective chaperonin (Reidharr-Olson
et al., Biochemistry 29: 7563-7571(1990))
Problems with non preferred codons.
E. coli uses preferred codons among the synonymous sets for its own highly expressed proteins.
The implication is that the non preferred codons are translated inefficiently. Eucaryotic genes are full of
these non preferred codons, yet they usually can be highly expressed without trouble. However,
sometimes it does help to put a preferred codon at amino acid #2, or to fix stretches of adjacent non
preferred codons.
An alternative solution is to use expression hosts that contain additional tRNA genes added for the
purpose of increasing the level of tRNA specific for non-preferred codons.
Other genetic code problems.
Some genomes don't use the same genetic code as E. coli. For example, most mitochondrial
genomes use a few altered codons. Once the altered code is known, the gene will have to be altered by in
vitro mutagenesis at the variant codons to match the E. coli code. A few human nuclear genes are edited at
the RNA level. RNA editing is found elsewhere, reaching an absurd level in Trypanosome mitochondria.
One would have to be sure to be working from the final sequence after editing.
T7 RNA Polymerase/Promoter systems
This system (marketed under the name pET by Novagen, but also publicly available) is very
popular today. It expresses the foreign gene from a T7 promoter on the vector. T7 is a bacteriophage that
makes its own RNA polymerase that is specific for its own promoters. The T7 polymerase is provided
either by an inducible T7 polymerase gene in the host, or by infecting the culture with a phage carrying the
polymerase gene after the cells have been grown up.
There is a multiple cloning site downstream of the T7 promoter. If the gene already has a suitable
ribosomal initiation site, it can simply be inserted in the correct orientation. Alternatively, one can add a
strong ribosomal binding site, engineer the codons to match E. coli's preferences, add the restriction sites,
and even add a transcriptional terminator, all by using the linker, or PCR fusion procedures described
earlier in the course. Versions of the vector exist that have a strong ribosomal binding site and a cloning
site right at the AUG, so that one could fuse right at the AUG.
If the T7 polymerase gene is in the host background, it will be under the control of the lambda PL
promoter which is in turn under control of a lambda CI857 ts lambda repressor gene also in the host
background. This promoter has one of the lowest basal expression levels of any around, but there is still a
little leakage of expression of the transgene. If the basal expression of the transgene proves toxic to the
host, then one grows up the clone in a host with no T7 polymerase gene, and then introduces it by
infection with a phage carrying the polymerase gene. This is the major advantage of the T7 systems. One
can alter the method of control of expression without having to make new constructs.
Some people have reported leakage of expression in this system even without the T7 polymerase.
When going for 0 basal expression, one has to worry about leakage of transcription from other promoters
in the vector that read through into the transgene.
Fusion systems:
There are now numerous commercial systems marketed in which you fuse your protein to some other
protein that provides a purification handle. Sometimes the fusion is designed to direct secretion into the
periplasmic space. Then some means is provided to subsequently cleave the fusion apart. One needs to
pay attention as to whether or not the cleavage will leave extraneous amino acids attached to your protein.
Hopefully the cleavage will be accomplished on the folded fusion protein, directly releasing your folded
polypeptide. Unfortunately, sometimes you have to denature the fusion protein to get the protease to
cleave the fusion site.
Maltose-Binding Proteins Fusions
This system, based on vectors pMAL-c2 or -p2, can be obtained from New England Biolabs. In
this system, you make a translational fusion downstream of malE, which is a secreted E. coli protein that
binds maltose. When expressed in pMAL-p2 the fusion protein is recovered from the periplasmic space.
Alternatively, the pMAL-c2 version is designed to leave the fusion protein in the cytoplasm. In either
case, the fusion protein can be purified by affinity chromatography on an amylose column, and then
cleaved with factor Xa protease which is specific for the fusion site, leaving your protein with a few extra
amino acids at the N-terminus. The XmnI specificity is GAANN^NNTTC, making it possible to fuse
with no extra amino acids if you can arrange for your insert to start with the first codon at a blunt end.
As we saw before, secretion into the periplasmic space turns out to usually reduce the yield (in this
case about 4 x). However, some proteins that form disulfide bonds fold better if secreted. On the other
hand, large proteins that are normally cytoplasmic have trouble getting through the membrane. The major
attraction of making the fusion is to allow affinity purification based on the maltose binding domain,
before cleaving it off with factor Xa. Factor Xa cleaves after the Ile Glu Gly Arg at the fusion site.
Other fusion systems:
There are a variety of other commercially available fusion systems that are designed to assist
purification of your protein, then let you cleave your protein away from the bacterial domain.
Novagen, Inc., has a variety of T7 pET type vectors designed to effect fusions with various proteins
that can be used as purification handles:
Tag
n/c terminal
T7-tag
S-tag
His-tag
N
N
N or C
HSV-tag
pelB/ompT
C
N
basis for detection
and/or purification
monoclonal antibody
RNAse S-protein
metal chelation
chromatography
monoclonal antibody
potential periplasmic localization
Pharmacia Biotech uses plasmids (pGEX vectors) designed for inducible, high-level intracellular
expression of genes or gene fragments as fusions with glutathione S-transferase (GST). The fusion
proteins can be detected using colorimetric assay or immunoassay and purified using Glutathione
Sepharose 4B affinity chromatography.
Eastman Kodak has the Flag System that is based on the Flag marker octapeptide that is fused to a
protein by molecular cloning of its DNA coding sequence adjacent to the protein coding sequence for
expression in an appropriate vector. Detection is by specifically binding mouse monoclonal antibodies to
the octapeptide, while purification is by affinity chromatography. An amino-terminal Flag peptide can be
removed by the protease, enterokinase. The Flag fusion proteins can be expressed in E. coli, yeast, insect,
or animal cells.
InVitrogen uses a system based on fusion to thioredoxin with purification by binding to a phenylarsine
oxide resin.
In all cases where immunodetection or immunoaffinity purification is used, one has to use a tag that
has no endogenous counterpart.
In the various combinations above, one can cleave with factor Xa, thrombin, or enterokinase. Any of
these might hit sites within your protein, in which case you switch to a vector that uses one of the others.
There are now other proteases involved in various commercial expression systems.
A variety of other proteases have made their way into commercial expression vector systems.
Amersham Pharmacia has a product they call PreScission protease that is based on the rhinovirus
protease. They market it as a noncleavable GST fusion. That way, you can get your GST fusion bound to
glutathione conjugated sepharose and then just mix the protease in. You protein is released, and the GST
fusion partner as well as the protease are retained on the resin.
New England Biolabs markets a system named IMPACT where the cleavage is effected by the activity
of a self splicing protein called an intein. Chong et al., 1998. Nucl. Acids Res. 26:5109.
Nature of the N terminus.
N terminal fusions tend to stabilize proteins in E. coli more so that C terminal fusions. Even other
E. coli proteins may be stabilized by an N terminal fusion to a more stable partner. In some cases, adding
even one residue may stabilize a protein. This may mean that E. coli uses some signals at the N terminus
to govern protein turnover. Hence fusion to a well behaved N terminal fusion partner can avoid numbers
of potential problems: secondary structure at the initiator codon, protein turnover initiated by the N
terminal sequence, poor expression due to suboptimal N terminal codons, and often aggregation .
It can happen that the recombinant protein starts to aggregate as soon as it is cleaved away from its
fusion partner. Usually if the fusion protein avoided inclusion bodies, both parts are probably folded. If
the problem is not caused by folding, one may be able to avoid it by adjusting conditions, or by further
purification of the fusion protein prior to cleavage.
Sometimes people find that the fused protein is so much more stable than the protein with the
natural N terminus that they choose to do their experiments directly on the fusion protein. In this regard,
small fusions, like his tags, are less likely to perturb the properties of the protein than larger fusion
partners. A common fusion partner, GST, dimerizes. Hence doing direct characterization of GST fusions
carries this extra perturbation.
N terminal fusion systems typically leave some extraneous residues at the N terminus after
protease cleavage. In many cases, the exact N terminal sequence is unimportant to the function of the
protein, and people just tolerate this alteration. If a protein function depends critically on its N terminal
residue(s) for function, this will create special problems. In some fusion systems, one can control the
sequence immediately after protease cleavage so that no extraneous residues are left. However, some
thought should be given to exactly what the natural N-terminus should be. Eucaryotes and well as
procaryotes often remove the initiator methionine, and sometimes some more residues. If the protein is
produced with a C terminal His tag from a eucaryotic vector in cell culture (where it is presumably
naturally processed), it can be purified and the mature N terminus can be determined. Classically, this
would be done by Edman degradation. For proteins of < 50 Kd, mass spectrometry after removal of the
His tag is sufficiently accurate at determining molecular weight to directly define the mature N terminus.
Methods to verify the correct structure and fold:
1. Biological assay, if available. This can be problematical for in vitro mutated proteins, since the
mutation may by design destroy the biological activity.
2. Classically, protease sensitivity, Circular Dichroism spectra, and fluorescence measurements can
help verify a fold similar to the protein isolated from its natural source.
3. If the protein is < 50 Kd., mass spectrometry can verify the expected molecular weight.
4. Analytical ultracentrifugation can measure whether or not the protein self associates, or
associates with another component in the expected stoichiometry. AUC may also be able to
detect a gross deviation from the expected shape of the molecule.
5. If the protein is < 45 Kd, then a 2D NMR experiment called HSQC can provide much
information about the physical state of the protein. This is the simplest NMR experiment for
protein structure, and does not require assigning signals to specific residues or solving the 3D
structure.
 How many residues are in a unique fold, and how many are in a highly mobile unfolded
state.
 The temperature or pH at which there is loss of structure.
 In comparison between wild type and mutant, that the effect of the mutation is local
 That association with some substrate or other small molecule occurs.
Recombinant Phage Antibody System
Pharmacia Biotech also has the Recombinant Phage Antibody System (RPAS) designed for the cloning
and expression of recombinant antibody fragments in bacteria. In this system, one makes two insertions
into a fusion protein, one derived from an Ig heavy chain variable region, and one from a Ig light chain
variable region. The fusion protein juxtaposes the chains to allow formation of an antigen binding site.
The fusion protein is displayed on the surface of an fd (M13) phage, allowing one to screen a library of
plaques with a labeled antigen. Even better, one can purify phage that bind the antigen by affinity and
then reinfect the host.
In essence the system produces single polypeptide versions of antibodies (ScVf) quickly in bacterial
cultures. The "cleavage" of the soluble antigen binding domain from the phage protein domain is done by
an interesting genetic manipulation. There is an amber stop codon after the antigen binding domain and
before the phage binding domain. To get bound antibody, one expresses from a amber suppressor strain.
To get soluble antibody, one expresses from a non-suppressor strain.
Phage Display
A popular variation on the above theme is to fuse a library of peptides to the phage coat protein
and to purify the particular sequences that bind to some ligand. Typically the library is composed of
random sequences, and the clones that bind are sequenced and used to discern amino acid patterns
required for binding. Commercial libraries are available with random 7 mer peptides or 12 mer peptides.
One could hope to screen a large enough library to contain all possible 7 mers, but longer peptides will
necessarily have only a fraction of all possible sequences present. Screening is generally by panning, in
which the library of phage, each containing the DNA specifying its displayed sequence, is reacted with a
surface coated with ligand. Phage retained on the surface are eluted, amplified, and panned again several
times. Typically one uses relatively non stringent binding conditions (high concentration of the ligand) at
first because the concentration of phage that will bind is so low. Stringency is then increased in later
rounds of panning by decreasing the concentration of ligand.
Phage Display can also be used with protein domains subjected to saturation mutagenesis.
However, non secreted proteins are often not expressed efficiently in this system because they fail to
successfully pass through the bacterial membrane in the assembly of the virus.
Variations on this theme are:
1. The coat protein gene could be on a phagemid. A phagemid is a plasmid that additionally has
an origin of replication from an M13-like phage. When a helper phage is provided, a
single stranded version of the phagemid ends up packaged as if a viral genome.
2. Rather than coating the surface with the ligand, one could coat it with an antibody to the ligand.
Binding could then be as a sandwich.
3. Stratagene has a lambdaphage cDNA cloning vector out of which a phagmid cassette can be
mobilized such that the phagmid expresses the cDNA (fragment) fused to the M13 coat
protein.
4. Novagen has a T7 phage display system.
Refs:
Clackson, T., et al. Making antibody fragments using phage display libraries. Nature 352:624-628 (1991).
Scott, J.K., et al. 1990. Searching for peptide ligands with an epitope library. Science 249:386-390.
Hogrefe, H.H., et al. 1993. Cloning in a bacteriophage lambda vector for the display of binding proteins
on filamentous phage. Gene 137:85-91.
Problems
1. Given the following cDNA cloned in pBR322, propose a simple and efficient scheme to achieve
controlled high level expression. Assume that the insertion into the Hind III site of pBR322 has destroyed
the tet gene promoter and that the clone is tet sensitive. Use the tet gene to monitor expression and to
insure against deletion of gene X. If you wanted to take the gene completely out of this vector and put it
into another expression vector, how would you proceed?
2. You express a mutant protein in E. coli and find little in the cell lysate with a Western blot. What do
you suspect first, and how would you test your hypothesis?
3. Do you expect any special problems from expressing a human mitochondrial gene in E. coli? If so,
what would you do about it?
4. You have determined the protein sequence from a tyrpanosome mitochondrial gene. What would be
the most direct way to express this protein in E. coli?
5. You make an expression construct that makes a high yield in a small pilot growth, but gives
disappointing yields when large scale (100 liter) cultures are grown up. What do you think is the problem,
and how would you solve it?
Download