1 10. Expression vectors and production systems. Because production of a cloned protein yields large quantities of the product, the yield is readily analysed and visualized by SDS-PAGE. Induced versus uninduced expression may be compared, as well as the compartment and state of production : intracellularly vs periplasmatically vs extracellularly (secretion), and soluble or insoluble. (The latter aspects are not shown in the Figure.) 1. Basic aspects - regulation - high expressions level may - cause toxicity (of the protein product) to the cell - cause "metabolic drain" : disturb the metabolism => any change that reduces the expression level (e.g. by mutations, loss of plasmid, ...) will give the cell a growth advantage - to avoid toxicity : grow the cell population to appropriate level before the synthesis of the “cloned” protein assumes => i.e. induce expression (and until then, keep it repressed) - induce expression in late-logaritmic phase (with cells still metabolically active) - regulation is primarily at the trancriptional level - other factors: translation, stability (vector, transcript, product) - transcription: - E.coli RNA polymerase : 2, , ' general promoter structure (‘consensus’ sequences) 70 promoter : TTGACA (16-19nt) TATAAT (5-8) start (+1) (zones / optimal distances) -35 (17) -10 (7) UP-elements can have an extra influence on efficiency 2 - initiation : efficient promoters and their regulation - lacUV5 : IPTG, LacI, LacIq, titration, 'leakiness' - trp : tryptophan + aporepressor (TrpR), IAA induction - Ptac : upstream –20 Plac + downstream –20 Ptrp (cfr. PtacI, PtacII, Ptrc, and other variants) induction by IPTG - PL, PR : repressor CIts857 : induction by temperature shift or : CIwt + induction by mitomycine C - T7 : delivery of T7 RNA polymerase required - coded onto another (compatible) plasmid - encoded by a prophage (cIts) in a lysogenic E.coli strain - by infection of a recombinant T7-gene1 clone in M13 vector - araBAD : arabinose + AraC : positive and negative regulation => PL and PBAD are blocked very efficiently, Plac (and its derivatives) are ‘leaky’; with phage T7 promoters, all interference with the E.coli polymerase is avoided. - termination - factor-dependent (Rho (, Tau (), NusA (of phage )) - factor-independent (GC-rich stem-loop structure + oligoT-stretch) often used in (most) expression vectors are : - the fd terminator (in Ff phages between gIII and gVIII, see figure in ch.4) - the rrnB operon terminators T1 en T2 - the phage T7 terminator Tf - translation - initiation: RBS (ribosome binding site), AUG (or GUG, 91 vs 8% in E.coli) - sequence motifs : Shine-Dalgarno sequence RBS > < Shine-Dalgarno (SD) motif 3 RBS = about 55 nt between positions –35 and + 22 (with respect to AUG) SD : 16S rRNA 5' ….GUACACCUCCUAOH 3' (E.coli) basic pattern : 5'….AGGA… +/- 7nt … AUG(start)… 3' efficiency is quite unpredictable variations : GAGG, GGAG, GGAGG - secondary structure the AUG (GUG) initiation triplet should be in a readily accessible region, e.g. at the top of a stem-loop structure triplets following AUG influence efficiency (as well as the preceding nt) (and obviously also the secondary structure) - codon usage - the effect of codon usage is very complex - same amino acid => multiple codons - same codon => multiple tRNA's - different codons => same tRNA - codon-anticodon binding strength - comparison of codon usage frequencies of highly expressed genes versus poorly expressed genes, hints at the effect of some particular triplets on the expression yield (see Table) - with synthetic genes : using the 'optimal' codewords usually gives a good expression level (though not necessarily the highest one) - gene dosis effect - increasing copy number may have - no effect - a positive effect - a negative effect (e.g. the expression of trypsin) => can only be determined empirically ; no general rule 4 - stability - transcription termination signal beyond the target gene is required for plasmid stability - par locus (or loci) for plasmid stability (important at the time of induction) - transcript stability : mRNA degradation is a complex process - both 5' and 3' UTR play a role (e.g. a stem-loop structure in 5' UTR gives stabilisation - there is no inverse correlation between the size and half-life of an mRNA hence: degradation not dependent on a-specific endonucleolytic cleavage - protein stability : degradation is strongly regulated in E.coli: there is a large number of proteases in the cytoplasm, periplasm and at the inner and outer membranes. N-rule of Varshavsky: - R, K, L, F, Y, W : half-life of 2 min on a test protein - other amino acids (except P) : half-life of 10 hours Initiating Met: usually the Metformyl (first amino-acid) is cleaved off. this seems to work most readily if the 2nd amino acid has a short side-chain (facilitates the cleavage by the methionine aminopeptidase) Stress conditions cause induction of certain proteases a.o. protease La (lon gene) these are under the control of 32 promoters : - rpoH : gene for the RNA polymerase 32 subunit - mutants in rpoH can give a dramatic increase of expression yield (and product stability) There are many more specific observations that may be taken into account 5 2. Major expression systems (induction and regulatory circuits) some examples : - the pET system - with T7 RNA polymerase, - extra regulation (repression) by LacI (IPTG induction) - delivery of T7 RNA polymerase from an additional expression unit on a (DE3) prophage with PlacUV5 as promoter (also under LacI control : lacO as operator sequence) - additional control by lysozyme T7 LysE (or S) encoded on a compatible plasmid (pACYC-derived) => the lysozyme binds to and inhibits (residual) T7 RNA polymerase - the binary trp-cI system - double system using compatible plasmids - target gene downstream of the PL promoter - synthesis of CI (coded by the cI repressor gene) is regulated by Ptrp - addition of tryptophan activates expression of the target gene by shutting down CI synthesis (as seen above : regulation by CIts857 is also possible, but: this regulation is less strict (wt-CI binds more efficiently) and addition of an inductor in large fermentors is easier that increasing the temperature ; moreover the temperature shift would induce stress mechanisms (heat-shock). - pBAD system - dimer AraC binds to I1 and O2 operator sites (the DNA loop blocks transcription) - dimer AraC + arabinose : binds to the I1 en I2 sites - this is catabolite repression sensitive : CAP + cAMP (cAMP = is low if glucose concentration is high) => interplay of arabinose and glucose concentrations might be used to control the level of expression (nb. CAP-regulation is also present in the Lac operon, but tac promoters are not CAP-dependent) 6 3. Production strategies E. coli : is the major ‘first line’ organism for expression => disadvantages (limitations) - no (few) post-translational modifications - virtually no efficient secretion routes towards the medium (secretion brings the protein into the periplasmic space) - extensive S-S bridges are difficult to form Why recombinant expression ? => increase production yield => facilitate purification => create novel variants (mutants, insertions, fusions, etc.) The expression product may be : - soluble or insoluble - mature protein or fusion product - accumulate in the cytoplasm or be secreted into the periplasm (type II secretion) (with E. coli, extracellular secretion is very limited, e.g. type I secretion based on haemolysin transport) Expression product : - as mature protein - may be deposited in inclusion bodies - allows easy purification - protects against break-down - requires solubilisation and refolding => denature (in e.g. guanidinium hydrochloride) and (try to) renature - retarding the synthesis rate may reduces this process (in part) (e.g. by lowering the temperature) - the expression construct requires critical manipulation of the ribosome binding region to warrant efficient (high-level) translation initiation - the MET-problem : is the initiator (formyl-)methionine removed? 7 - as fusion protein (e.g. random insertions at ScaI site of cat : AGTACT) - N-terminal and C-terminal fusions are possible (or multiple partners) - advantage : translation initiation efficiency may be largely retained (in C-terminal fusions) - fusion partner can be a target for purification (tag) - fusion partner may have activity to assay production level - fusion to a signal peptide sequence can promote secretion => secretion allows formation of (at least some) S-S bridges - fusion partner may allow anchoring in the membrane - an epitope recognized by a specific antibody may be added (for detection or quantification or purification) - choice of intracellular or periplasmic expression (secretion) Trying to avoid (or to reduce) the formation of inclusion bodies (improve chances of protein folding) - grow cells at reduced temperature (retards the expression process) (or use other conditions that retard growth, e.g. composition of the media & pH values) - co-expression of chaperones : DnaK, GroES, GroEL - removing critical amino acid positions (e.g. in interferon) - fusion to thioredoxin, or some other proteins Secretion : is a specific kind of fusion - the fusion partner (signal peptide) is removed during secretion - no methionine remains at the N-terminus - position of cleavage is not always guaranteed => different secretion signals may have to be tried : from OmpA, OmpT, PelB, -lactamase, alkaline phosphatase, etc. Some of these may allow leakage to the growth medium when overexpressed. - !!! but not all proteins are ‘secretable’ 8 In general : 1) small peptides : often (peptide) stability problem : => fusion approach preferred : e.g. LacZ fusions 2) intermediate size: few S-S : intracellular more S-S : periplasmic (secretion) 3) large proteins : more problematic 4. Purification and processing Fusions to allow easy purification : e.g. glutathione, MalE (MBP : maltose binding protein), oligo-His (hexa-His), etc. : partners (carriers) or tags. Purification by affinity chromatography or by immobilisation followed by elution. Release of the tag/partner by cleavage with factor Xa, enterokinase, etc., or the use of intein processing. Some examples : 1. C-terminal fusion in pBAD/His vectors - six Histidine triplets - insertion in correct reading frame : BglII site - binding to a column with immobilized Ni2+ - cleavage site for enterokinase (D-D-D-D-K* ) (* = cleavage site) (n.b. some residual amino acids left at N-terminal after processing) 2. C-terminal fusion to biotin carboxylase - the biotin is covalently attached to a lysine by biotin ligase (BirA) (E. coli has one biotinylated protein but it does not bind to streptavidin in its native configuration) - immobilisation to streptavidin (batchwise or by affinity chromatography) - factor Xa cleavage site : I-E-G-R* or I-D-G-R* (followed by not-P and not-R) (* = cleavage site) (secondary sites usually G-R*) 3. processing by intein - N-terminal fusion to intein + chitin binding domain => self-cleavage at the N-terminus of the intein by thiol compounds 9 - expression from the T7 promoter, fused to lacO => induction with IPTG - bind fusion product to the chitin column - wash the column, then equilibrate the column with a DTT solution => in situ cleavage at 4°C overnight - elute the target protein - remove the fusion partner (intein-CBD) by SDS 4. MalE-fusions : vectors pMAL-c2 and pMAL-p2 (with (c) or without (p) the signal peptide sequence of the malE gene : cytoplasmic or periplasmic expression) - expression unit between Ptac promoter and rrnB terminators lacIq on the vector to provide sufficient quantity of repressor - fusion between malE carrier and lacZa, separated by 10xD and MCS for insertion of the target gene. This in-frame construct produces blue colonies on BCIG-media. The polar 10xD peptide separates the two protein moieties. Cloning in MCS produces white colonies. - affinity purification onto a amylose column : following elution by maltose and factor Xa cleavage, the fusion partner is removed by a second chromatography on amylose. - MCS contains an XmnI site at the left site, which overlaps the coding sequence of factor Xa and allows an exact fusion to the target gene sequence. Other insertion sites leave some extra amino acids at the N-terminal after factor Xa cleavage. - The EcoRI cloning site lies in the same reading frame as the EcoRI site in lacZ : exchanging a cloned gene from gt11 is easy. 5. Optimalisation strategies - C-terminal fusion to a reporter gene to assay expression yield - monitoring expression level variation upon (minor) modifications