20_Echols

advertisement
Native-Source Structural Proteomics
Nathaniel Echols*, Monica Totir*, Andrew May#, Chloe Zubieta*,
Alisa Moskaleva*, Tom Alber*
* UC Berkeley
# Fluidigm Corporation
Protein Structure Initiative Bottlenecks Workshop
April 15th, 2008
Native-source structural proteomics
• Native sources provide access to samples that may be
difficult to obtain by recombinant methods
• Project goal: obtain structures of complexes and lowabundance proteins
1. Scale up purification (>100 g protein)
2. Scale down crystallization (picoliter reactions)
• No cloning, no overexpression.
Experimental approach
•
Use E. coli as a model system to develop the
purification protocol necessary to go from grams of
starting material to
100 μg fractions
•
Screen the final samples at a concentration of >10
mg/ml in Fluidigm Topaz chips and identify the
crystallizable fractions
•
Identify samples by mass spectrometry
•
Set the selected samples in diffraction capable chips or
nanodrop crystallization trays for X-ray data collection
Proof-of-concept: the E. coli proteome
•
Small, well-studied proteome, but still some
novelty:
•
•
•
•
•
•
•
•
•
4243 predicted proteins (manageable number of molecular species)
860 membrane proteins
1000 proteins with > 90% sequence identity to known structures
1250 with > 50% sequence identity
2000 with > 30% sequence identity
Nearly 1400 uncharacterized non-membrane proteins
Existing structures allow us to validate
approach
Easy to grow in massive quantities
Lysis and clarification are relatively simple
Proteome component sizes
Cellular protein content is dominated by
large assemblies
Purification scheme
A new philosophy--keep everything--required new strategies
Lyse at pH 7-8
Cross flow size fractionation – 500 kDa TFF
Proteins/complexes bigger than
500 kDa
Sucrose gradients
Size exclusion chromatography
MonoQ
Proteins/complexes smaller than
500 kDa
Capto Q
Steps
Phenyl
MonoQ/MonoS
SP Sepharose
Superdex 200
Phenyl
MonoQ/MonoS
Scalable, gentle purification scheme
Purification scheme (continued)
Proteins/complexes smaller than
500 kDa
Column size
Blue
Heparin
Capto MMC
Approx. protein
quantity
1-2 L
50 g
300 mL
10 g
Superdex 200
Phenyl
MonoQ/MonoS
Typical Anion Exchange
chromatogram of the final samples
20-50 mL
1-8 mL
1g
10-100 mg
The first large-scale prep
• 200 g of E. coli cells grown in M9 minimal medium and lysed
• Purification scheme:
Capto Q
Phenyl
MonoQ/MonoS
• 272 fractions analyzed in
96-well Caliper
electrophoresis robot and
selected for crystallization
Caliper “gel”
Crystallization pipeline
Purity checked by Caliper gel
Microfluidic crystallization with the Fluidigm TOPAZ system (8.96 chips)
Promising chip crystals
MS identification
Sub-optimal chip crystals
MS identification
Diffraction-capable chips
96 well sitting drop
for further optimization
X-ray data collection
Microfluidic crystallization
• 272 samples set in Fluidigm TOPAZ 8.96 chips with Index screen
• Automated inspection and scoring required to find crystals efficiently
• 190/272 (70%) produced crystals or microcrystals in chips (high
redundancy in crystal forms)
• 50 unique crystal forms by visual inspection
• High-quality crystals possible even in very impure samples
120
100
Purity / %
80
60
40
20
0
1.6
1.8
2
2.2
Resolution / Å
http://www.fluidigm.com/topaz.htm
2.4
2.6
Crystal optimization
•
66 samples picked for optimization in nanodrop vapor
diffusion trays (using Mosquito robot)
•
•
Protocol: sample 40%-100% precipitant concentration with different
protein:well ratios (1:3, 1:1, 3:1)
50 of hits (76%) were reproducible by this method
Diffraction-capable microfluidic chips
“Hands-Free” data collection
Reagents
Samples
10 nL sample chambers
ALS Beamline 8.3.1
Structure determination
•
•
MS identification of unique crystals should be the
first step
25 unique native datasets collected at ALS
8.3.1/12.3.1
•
•
•
15 already published structures identified
3 structures novel in E. coli, phased by MR
Robotics and automation software used for data
collection and processing whenever possible
Rapid structure identification by MR
•
•
•
•
Concept: identify protein from “anonymous”
diffraction data (no mass spec info)
Search set of every PDB structure homologous to
an E. coli protein (~10,000 models)
Molecular replacement rotation function run using
each model
Identical structures are usually high-scoring
•
•
Homologous proteins may still score better than average
Potential solutions can be verified by full MR
Experimental phasing
•
•
•
The largest bottleneck: much more manual
labor required
Cryoprotectants contain heavy monovalent
ions (Br+, Rb-)
Metal quick-soaks (0.5 - 5 mM):
Ethyl mercury phosphate/thimerosal
• HgCl2 or PCMBS (p-Chloro-mercuric-benzenesulphonate acid)
• SmCl3
•
•
PtCl4, PtCl6
Current structures, new and old
(Structures labelled in red were identified by brute-force search.)
New:
(% identity to PDB)
Methylglyoxal reductase
(37%)
pGlucose isomerase ß-glucosidase (?) (bglA)
(65%)
(33%)
Old:
ycaC
Arginosuccinate lyase
pSer aminotransferase Dihydrodipicolinate synthase
Molybdopterin biosynthesis prot. B
PPIase
Catalase HPII (also in truncated form)
Citrate synthase
Lysyl-tRNA synthetase
Cystathionine -synthase
Transhydrogenase domain I
Pyruvate kinase
Hsp31 chaperone
5-keto-4-deoxyuronate isomerase
Purity of crystallized samples
Summary
•
Macro-to-micro strategy tested with E.coli
•
Large-scale fractionation pipeline:
•
•
•
•
•
•
•
•
•
•
New approaches and equipment (TFF, larger columns, Caliper CE
robot) needed to scale up and keep everything
Currently 464 fractions isolated for crystallization
Small-scale crystallization:
>50% of fractions crystallized in Topaz microfluidic format
Many impure fractions yielded starting crystals
Optimization in sitting drops and new diffraction chips was efficient
Structure determination:
25 data sets collected, 18 structures phased, all oligomeric
3 structures novel to E. coli
Brute-force molecular replacement was used in most cases
Future directions
•
•
•
•
Continue improvements to purification methods
Pathogenic organisms (e.g. Mycobacteria)
Plant/mammalian proteomes: diploid, much larger
and more complex
Smaller sets of related proteins:
Protease-resistant domains
• Serum proteins
• ATP-binding proteins
• Metalloproteins
• Large complexes
•
Acknowledgements
•
•
•
•
•
•
•
•
•
•
•
Tom Alber, Monica Totir, Chloe Zubieta, Alisa Moskaleva
Andy May (Fluidigm)
Scott Gradia, James Berger (UCB)
James Holton (ALS)
George Meigs, Jane Tanamatchi (ALS)
ALS beamlines 8.3.1, 12.3.1
Tony Iavarone (QB3 MS facility)
Scripps Center for Mass Spectrometry
W.M. Keck Foundation
Millipore Corporation
Funded in part by UC Discovery/Fluidigm Corporation and
NIGMS grant GM71326-02
Second large-scale prep – a better purification scheme
• 1000 g of E.coli cells grown in M9 minimal medium and lysed
Lysate at pH 7
Cross flow size fractionation – 500 kDa TFF
Proteins/complexes bigger than Proteins/complexes smaller than
500 kDa
500 kDa
Sucrose gradients
SP Sepharose Blue
Size exclusion chromatography
MonoQ
Heparin
Capto MMC
Superdex 200
Phenyl
MonoQ/MonoS
• 192 unique final samples to be screened in 8.96 chips and subsequently set up
in diffraction-capable chips
Apparently rare proteins accessible
# genes
I will have to look this up. Or do we have smth like this?
Abundance ( # transcripts)
Download