Practical tips for cloning, expressing and purifying proteins for structural biology Aled Edwards Banting and Best Department of Medical Research University of Toronto, Canada aled.edwards@utoronto.ca Affinium Pharmaceuticals Toronto, Canada aedwards@afnm.com Molecular biological approaches to structural biology An excellent structural sample usually has the following properties • Lack of conformational heterogeneity • Soluble at high concentrations • Pure Molecular biology is probably fastest way to transform “poor” sample into an “excellent” one. Outline • Historical perspective on engineering proteins for structural biology • Practical advice for cloning/purification of structural samples • Ancillary benefits of high-throughput studies RNA polymerase II From 15Å to 3Å by eliminating heterogeneity Another source of sample heterogeneity Eukaryotic proteins comprise multiple domains • Conformational heterogeneity lowers probability of crystallization • Protein domains • Are resistant to proteolysis • Fold autonomously • Can usually be expressed in bacteria • Are between 15 and 30kDa (NMR or X-ray size) • Are fundamental unit of protein function • Domains are often only tractable targets for HTP crystallography EBNA1 DNA-binding domain (No sequence homologue in database) RPA Domain Structure A collection of OB-folds RPA70 A RPA32 RPA14 B RPA crystallization • Start with full-length protein purified using baculovirus (Wold) • Identify domain (aa 1-442) soluble in E coli (Wold) • Crystallize domain (7Å) • Use limited proteolysis to define smaller domain (aa161-442) (3.5Å….and same cell as 7Å crystal) • Create many constructs varying N- and C-termini to identify final construct (aa 181-422). (2.2Å…solve structure) Final tally: 15 different constructs RPA70 Domains A and B Two OB-folds bound to DNA B L12 loops L45 loops A How does one map domains? Domain mapping using limited proteolysis TFIIS Protease Integrative Proteomics TFIIS Domain Structure 240 309 264 1 131 124 Binds holoenzyme. Similar to elongin, CRSP70 RNA polymerase binding I II Transcript cleavage and read-through (Nucleic acid binding?) III DomainHunterTM Industrialized Domain Mapping •Partial proteolysis in 96 well plates •Optimized set of proteases •Low protein requirement •No SDS-PAGE •No N-terminal sequencing •Direct identification of domains by mass spectrometry 21952 33318 0 0.1 0.25 -0.4 1.0 -0.6 2.5 -0.8 5 25 -1.0 23000 28000 33000 m/z Protease Titration -0.2 21612 -0.0 20507 0.2 25360 23332 r.i. 35057 31650 DomainHunterTM DomainHunter Applied to NMR Sample Residue Number N 20 40 60 80 100 120 140 V8 cleavage site B C Chymotrypsin site A D Fragment Mass B C A D A 10324.0 12352.0 9131.0 11159.0 Matching sequence Expression G[44-133]R G[44-150]D I[55-133]R I[55-150]D +++ no ++ no B Solubility ++ ++ Structural Proteomics MTH40 MTH1615 MTH152 MTH1184 MTH1175 MTH538 MTH150 MTH1790 MTH129 MTH1699 Nat. Str. Biol. Oct/Nov 2000 MTH1048 5 more done 3 more soon Molecular biology for crystallization and for large-scale studies 1. Basic steps in creating expression vectors for E. coli 2. Practical tips for making fewer mistakes 3. Application of methods to higher-throughput 4. Alternate expression systems 5. Some results E coli is the first choice……why? • Cost effective • Easy to grow • Abundance of expertise and reagents • Easy to incorporate selenomethionine • High yield • Rapid doubling time and rapid scale-up Factors involved in successful expression of recombinant proteins in Escherichia coli cytoplasm Expression vector Copy number (gene dosage – sometimes better less than more) Promoter choice (T7, Ptac, Plac, Para ) Little or no expression before induction Reliable and adjustable expression mRNA stability (RNAaseE- mutant) Translation Consensus SD sequence Proper spacing and sequence before the initiation codon Possible mRNA secondary structures that block ribosome binding or internal ribosome binding site Codon Bias But which E coli? BL21(DE3) F- ompT hsdSB (rB-,mB-), gal, dcm, (DE3) BL21-Star(DE3) F- ompT hsdSB (rB-,mB-), gal, dcm, rne131, (DE3) BL21-Gold(DE3) F- ompT hsdS (rB- mB-) dcm+ Tetr gal endA (DE3) Tuner(DE3) F- ompT hsdSB (rB- mB-) gal dcm lacY1 (DE3) Conventional cloning approach 1. Select vector of choice 2. Restriction digest the vector 3. PCR the insert 4. Restriction digest the insert 5. Ligate the vector and insert 6. Transform and plate 7. Pick colonies and screen for insert 8. Screen positive clones for protein expression 9. Sequence positive clones Which vector/tag? 1. T7 RNA polymerase-based systems is overwhelming choice - Highly specific - High yields - Exquisitely controlled 2. Choice of vector - Restriction sites (are there internal sites in gene?) - Are there many possible sites? - Are the enzymes commonly available? - Do the enzymes cut near ends of DNA fragments? 3. Which tag? - Relatively little data on which generates best proteins for crystallization - His-tag, GST, MBP all are effective at purification - His tag offers advantage of being able to screen +/- tag for crystals (double bang for the buck) - Make sure there is a protease site to remove tag Practical issues with cloning 1. Choice of protease??? - Thrombin (more difficult to get but highly effective) - TEV, recombinant with his-tag, stable mutant with less autoproteolysis activity (Waugh), needs calcium, finicky - Factor X, enterokinase…..avoid “I can’t use thrombin, it digests my protein” Purification of Thrombin from Thrombostat 1. We start with 10,000 units of Thrombostat from Parke-Davis and dissolved in 10 ml of 50Mm NaPO4 Ph6.5 and 5% glycerol. 2. The solution was then spun at 10,000rpm for 10 min in an SS34 rotor to clarity This was then loaded onto a Poros S Column (7.5mmX 100mm, Perseptive Biosystems) preequilibrated in the above buffer at 3ml/min The column was then washed in the above buffer until the OD 280 reached zero. The column was then washed with 100Mm NaPO4 Ph6.5 and 5% glycerol until the absorbance went to zero. Thrombin was then eluted from the column in 300Mm NaPO4 Ph8.5 and 5% glycerol at a flow rate of 1ml/min. 0.5 ML fractions were collected and run out on a 15% SDS-PAGE and 35kD protein (Thrombin) was pooled and frozen in small aliquots. Total protein yield was about 3mg in 10 ml of buffer. 3. 4. 5. 6. 7. Schleiff, E., Khanna, R., Orlicky, S. and Vrielink, A. Expression, purification, and in vitro characterization of the human outer mitochondrial membrane receptor human translocase of the outer mitochondrial membrane 20. Arch. Biochem. Biophys. 367:95-103 (1999) Practical issues with cloning Restrict the plasmid - Double digestion often leave one end undigested, which in turn results in high background due to re-ligation - Phosphatase treatment and gel purification of large prep makes life much easier in long run - Optimize system to get no background Practical issues with cloning PCR the insert - For HTP studies need to optimize condition for genome or clone - Order primers from reputable supplier (most common problem is in deprotecting oligos) - Have someone else double-check primer sequence - Order primers with requisite overhang (be over-cautious) - Use error-correcting polymerase Practical issues with cloning Digest the PCR insert - Make sure that there are no internal sites - Purify the restricted product Practical issues with cloning Ligation and transformation - If vector control background is low, and PCR product is purified, then should be no problem - Use highly competent cells Practical issues with cloning Screen for positive clones - PCR screen from colony - Screen by protein expression - Make note of expression, as well as solubility Cloning (conventional method) gene T7 6His TEV STOP T7 TEV 6His STOP T7 6His MBP TEV STOP T7 6His TRX TEV STOP Screening for inserts by PCR Clones TOPO cloning GATEWAY™ Cloning System Technology - l Phage l attP E.coli attB IHF, Int, Xis attR IHF, Int attL attP attB attL attL+attR attR attB+attP E.coli lysogen GATEWAY™ Cloning System Technology - l Phage l attP attP attP1attP2 ? E.coli attB1 attB2 attB attB attP1 attP2 attL1 IHF, Int IHF, Int, Xis x ? attB1 attR1 attB2 attL1 attR1 ? attL2 attR2 attB1 x attP1 attR1 x attL1 attB2 x attP2 attR2 x attL2 ? attL2 x attR2 “Gateway type” cloning “Gateway type” cloning Cloning and Test Expression ligate transform clones X 96 PCR x96 300 ul 24 x 3ml LB Kan, Amp 37C, Induce at OD600 Grow O/N 15C or 20C Kan, Amp X 96 300 ul X 96 X 96 supernatant Spin, Dissolve pellet in SDS Spin, Freeze, Lyse with BugBusterTM Spin again SDS PAGE 1750 clones 100 90 80 70 60 50 40 30 20 10 0 cloned expressed soluble Expression systems for eukaryotic proteins • Baculovirus infection of insect cells • Simple, relatively cost effective, selenomethionine-compatible, not fully able to replicate human post-translational modifications • Viral infection of human cells • Viruses not as easy to work with, high yield, proper modification • Stable transformation of human cells • Usually lower expression. After selection, transcription sometimes goes away. Low throughput due to selection process • Transfection of human cells • High expression in few cells, uses up lots of DNA Generation of recombinant baculoviruses and gene expression with the Bac-To-Bac expression system E.coli (Lac7-) Containing Recombinant Bacmid Competent DH10Bac E.coli cells p10 pPolh Foreign Gene 2 lac Z mini attTn7 Foreign Gene 1 pFastBacDual Donor Tn7R Transposition Transformation Antibiotic selection Helper Tn7L Day 1 Recombinant Gene Expression or Viral Amplification Recombinant Bacmid DNA Infection Bacmid Helper Days 2-3 Mini-prep of High Molecular Weight DNA Day 8 Day 4 Recombinant Bacmid DNA Transfection of Insect cells with CELLFECTIN Reagent Protein Purification Purification parallel des proteines 1. 2. 1 2 3 4 5 1’ 2’ 3’ 4’ 5’ ProteoMax – Automated Protein Purification and Concentration System Affinium Pharmaceuticals A few observations from our work Structure determination strategy < 20 kDa 3-5 weeks of NMR data collection 15N-labeled 15N/13C-labeled > 20 kDa Synchrotron Data Se-Methionine labeled Orthologues 68 Escherichia coli 68 Thermotoga maritima Topt 80 °C Topt 37 °C 1,860,725 bp 4,639,221 bp 1, 877 ORFs 4, 288 ORFs Expressed & soluble 62 48 Concentratable to > 2mg/ml 50 44 15 35 9 9 Proteins could not be purified from either species Total Crystals (30) T. maritima E. coli 11 3 13 Total Good/Promising NMR spectra (14) T. maritima E. coli 4 4 2 NMR & Crystallography: complementary! 24 small proteins for which both crystal trials and NMR data collected Good/promising HSQC crystals 10 3 6 Of 32 proteins that gave poor HSQC’s 7 have crystallized Data storage and Mining: Defined Vocabulary Property Vocabulary Expression level 0-5 (no expression – high expression) Solubility (test expression) 0-5 (insoluble – highly soluble) Concentratability 0-5 (or mg/ml) Crystal trials clear precipitate crystal Initial HSQC NMR good promising poor Expression/solubility testing 5 5 4 3 2 1 0 0 Empirical Bioinformatics Solubility Tree based On 58 sequence properties Kluger & Gerstein Mostly soluble Mostly insoluble Crystallization conditions Efficiency through mining crystal screens Different proteins Clear drop Precipitate Crystal Affinium Pharmaceuticals Crystal trial: Diminishing Returns Collaborators on Structural Proteomics Lawrence McIntosh (UBC) C. Mackereth, G. Lee Thomas Szypersky* (SUNY Buffalo) Mike Kennedy (PNNL)* J. Cort, T. Ramelot Mark Gerstein (Yale) * Yval Kluger Ning Lan Kalle Gehring (McGill) I. Ekiel G. Kozlov Dave Wishart (U. Alberta) S. Bhattacharyya Sherry Mowbray (Sweden) Liang Tong (Columbia) * John Hunt (Columbia) * Andrzej Joachimiak (ANL)* Weontae Lee (Yonsei U.) Guy Montelione (Rutgers) * Emil Pai (U. Toronto) V. Saridakis, N. Wu *Northeast Structural Genomics Consortium *Midwest Structural Genomics Consortium