Native-Source Structural Proteomics Nathaniel Echols*, Monica Totir*, Andrew May#, Chloe Zubieta*, Alisa Moskaleva*, Tom Alber* * UC Berkeley # Fluidigm Corporation Protein Structure Initiative Bottlenecks Workshop April 15th, 2008 Native-source structural proteomics • Native sources provide access to samples that may be difficult to obtain by recombinant methods • Project goal: obtain structures of complexes and lowabundance proteins 1. Scale up purification (>100 g protein) 2. Scale down crystallization (picoliter reactions) • No cloning, no overexpression. Experimental approach • Use E. coli as a model system to develop the purification protocol necessary to go from grams of starting material to 100 μg fractions • Screen the final samples at a concentration of >10 mg/ml in Fluidigm Topaz chips and identify the crystallizable fractions • Identify samples by mass spectrometry • Set the selected samples in diffraction capable chips or nanodrop crystallization trays for X-ray data collection Proof-of-concept: the E. coli proteome • Small, well-studied proteome, but still some novelty: • • • • • • • • • 4243 predicted proteins (manageable number of molecular species) 860 membrane proteins 1000 proteins with > 90% sequence identity to known structures 1250 with > 50% sequence identity 2000 with > 30% sequence identity Nearly 1400 uncharacterized non-membrane proteins Existing structures allow us to validate approach Easy to grow in massive quantities Lysis and clarification are relatively simple Proteome component sizes Cellular protein content is dominated by large assemblies Purification scheme A new philosophy--keep everything--required new strategies Lyse at pH 7-8 Cross flow size fractionation – 500 kDa TFF Proteins/complexes bigger than 500 kDa Sucrose gradients Size exclusion chromatography MonoQ Proteins/complexes smaller than 500 kDa Capto Q Steps Phenyl MonoQ/MonoS SP Sepharose Superdex 200 Phenyl MonoQ/MonoS Scalable, gentle purification scheme Purification scheme (continued) Proteins/complexes smaller than 500 kDa Column size Blue Heparin Capto MMC Approx. protein quantity 1-2 L 50 g 300 mL 10 g Superdex 200 Phenyl MonoQ/MonoS Typical Anion Exchange chromatogram of the final samples 20-50 mL 1-8 mL 1g 10-100 mg The first large-scale prep • 200 g of E. coli cells grown in M9 minimal medium and lysed • Purification scheme: Capto Q Phenyl MonoQ/MonoS • 272 fractions analyzed in 96-well Caliper electrophoresis robot and selected for crystallization Caliper “gel” Crystallization pipeline Purity checked by Caliper gel Microfluidic crystallization with the Fluidigm TOPAZ system (8.96 chips) Promising chip crystals MS identification Sub-optimal chip crystals MS identification Diffraction-capable chips 96 well sitting drop for further optimization X-ray data collection Microfluidic crystallization • 272 samples set in Fluidigm TOPAZ 8.96 chips with Index screen • Automated inspection and scoring required to find crystals efficiently • 190/272 (70%) produced crystals or microcrystals in chips (high redundancy in crystal forms) • 50 unique crystal forms by visual inspection • High-quality crystals possible even in very impure samples 120 100 Purity / % 80 60 40 20 0 1.6 1.8 2 2.2 Resolution / Å http://www.fluidigm.com/topaz.htm 2.4 2.6 Crystal optimization • 66 samples picked for optimization in nanodrop vapor diffusion trays (using Mosquito robot) • • Protocol: sample 40%-100% precipitant concentration with different protein:well ratios (1:3, 1:1, 3:1) 50 of hits (76%) were reproducible by this method Diffraction-capable microfluidic chips “Hands-Free” data collection Reagents Samples 10 nL sample chambers ALS Beamline 8.3.1 Structure determination • • MS identification of unique crystals should be the first step 25 unique native datasets collected at ALS 8.3.1/12.3.1 • • • 15 already published structures identified 3 structures novel in E. coli, phased by MR Robotics and automation software used for data collection and processing whenever possible Rapid structure identification by MR • • • • Concept: identify protein from “anonymous” diffraction data (no mass spec info) Search set of every PDB structure homologous to an E. coli protein (~10,000 models) Molecular replacement rotation function run using each model Identical structures are usually high-scoring • • Homologous proteins may still score better than average Potential solutions can be verified by full MR Experimental phasing • • • The largest bottleneck: much more manual labor required Cryoprotectants contain heavy monovalent ions (Br+, Rb-) Metal quick-soaks (0.5 - 5 mM): Ethyl mercury phosphate/thimerosal • HgCl2 or PCMBS (p-Chloro-mercuric-benzenesulphonate acid) • SmCl3 • • PtCl4, PtCl6 Current structures, new and old (Structures labelled in red were identified by brute-force search.) New: (% identity to PDB) Methylglyoxal reductase (37%) pGlucose isomerase ß-glucosidase (?) (bglA) (65%) (33%) Old: ycaC Arginosuccinate lyase pSer aminotransferase Dihydrodipicolinate synthase Molybdopterin biosynthesis prot. B PPIase Catalase HPII (also in truncated form) Citrate synthase Lysyl-tRNA synthetase Cystathionine -synthase Transhydrogenase domain I Pyruvate kinase Hsp31 chaperone 5-keto-4-deoxyuronate isomerase Purity of crystallized samples Summary • Macro-to-micro strategy tested with E.coli • Large-scale fractionation pipeline: • • • • • • • • • • New approaches and equipment (TFF, larger columns, Caliper CE robot) needed to scale up and keep everything Currently 464 fractions isolated for crystallization Small-scale crystallization: >50% of fractions crystallized in Topaz microfluidic format Many impure fractions yielded starting crystals Optimization in sitting drops and new diffraction chips was efficient Structure determination: 25 data sets collected, 18 structures phased, all oligomeric 3 structures novel to E. coli Brute-force molecular replacement was used in most cases Future directions • • • • Continue improvements to purification methods Pathogenic organisms (e.g. Mycobacteria) Plant/mammalian proteomes: diploid, much larger and more complex Smaller sets of related proteins: Protease-resistant domains • Serum proteins • ATP-binding proteins • Metalloproteins • Large complexes • Acknowledgements • • • • • • • • • • • Tom Alber, Monica Totir, Chloe Zubieta, Alisa Moskaleva Andy May (Fluidigm) Scott Gradia, James Berger (UCB) James Holton (ALS) George Meigs, Jane Tanamatchi (ALS) ALS beamlines 8.3.1, 12.3.1 Tony Iavarone (QB3 MS facility) Scripps Center for Mass Spectrometry W.M. Keck Foundation Millipore Corporation Funded in part by UC Discovery/Fluidigm Corporation and NIGMS grant GM71326-02 Second large-scale prep – a better purification scheme • 1000 g of E.coli cells grown in M9 minimal medium and lysed Lysate at pH 7 Cross flow size fractionation – 500 kDa TFF Proteins/complexes bigger than Proteins/complexes smaller than 500 kDa 500 kDa Sucrose gradients SP Sepharose Blue Size exclusion chromatography MonoQ Heparin Capto MMC Superdex 200 Phenyl MonoQ/MonoS • 192 unique final samples to be screened in 8.96 chips and subsequently set up in diffraction-capable chips Apparently rare proteins accessible # genes I will have to look this up. Or do we have smth like this? Abundance ( # transcripts)