12_Sauder - PSI Structural Biology Knowledgebase

advertisement
PSI Data Management and Reporting:
Expectations, Standards and Utility
J. Michael Sauder
Director, Bioinformatics
NYSGXRC Project Leader
NIGMS Expectations
• http://grants.nih.gov/grants/guide/rfa-files/RFA-GM-05-001.html
• “… a database for deposition of information on experimental outcome
data (both successful and unsuccessful).
• “These data include … cDNA cloning, expression vector
construction, protein production and purification, protein
biochemical characterizations, crystallization screening,
synchrotron and NMR data collection, etc.
• “The PSI Research Network centers will be required to provide plans for
the collection, maintenance, and transfer of experimental results
into this central data repository.
• PepcDB… will contain information on these important results and
provide a platform for cross-center data mining to capitalize on the
PSI investment
Protocols vs Results
• General protocols are reported by each PSI Center
in PepcDB
• General protocols have been published in the
literature by several Centers
• However, one of the real values of PepcDB lies in the
detailed experimental trial results for each target
– Which clones were made? (PSI-MR)
– Which constructs yield soluble protein? (which don’t?)
– What are the fermentation conditions? Purification?
– What was the protein yield? The final concentration? The
experimental molecular weight?
– What conditions gave crystals? How many crystal forms?
What was the cryoprotectant? Which conditions led to
diffraction data? To the structure?
TargetDB/PepcDB Data Mining
• TargetDB status is informative, but far more useful would be
data about
– Small scale expression/solubility testing
– Large scale purification yield, concentration, oligomeric state
– Conditions that yielded diffracting crystals
• Publications
– Overton et al (2008) Bioinformatics 24:901-907. “ParCrys: a Parzen window
density estimation approach to protein crystallization propensity prediction”
(PDB, TargetDB, PepcDB)
– Martin-Galiano et al (2008) Proteins 70:1243-1256 “Predicting experimental
properties of integral membrane proteins by a naive Bayes approach”
(TargetDB)
– Bannen et al (2007) J Struct Funct Genomics 8:217-226 “Effect of low-complexity
regions on protein structure determination” (TargetDB/PepcDB)
– Smialowski et al (2007) Bioinformatics 23:2536-2542 “Protein solubility:
sequence based prediction and experimental verification” (TargetDB)
– Slabinski et al (2007) Bioinformatics 23:3403-3405 “XtalPred: a web server for
prediction of protein crystallizability” (TargetDB)
– Nair & Rost (2004) Nucl Acids Res 32:W517-W521 “LOCnet and LOCtarget: subcellular localization for structural genomics targets” (TargetDB)
Process vs Reporting
Selected
0
10
110
Selected Active Mol biol in
progress
390
Fail
PCR
365 Soluble320
Expressed
Soluble
230
270
Clone
completed
to ferm
315
310
Purification Purification Fermentation Fermentation
waiting
on hold
waiting
on hold
440 Soluble 450
Purification Purification Purification
technical
failed
research
error
unsuccessful
685
Cloned
220
210
Cloning Failed
Failed
Failed
failed transform expresn solubility
370
Purification in
progress
430
170
140
665
655
Purified
482
460
470
Purification
research
marginal
Purification
research
successful
Purified
650
Fermentation
voided
Purified;
completed to
collaborator
645
640
620
Optimization Optimization Optimization Screening
Cryst in
Cryst in Crystallization
crystals
microcrystals grainy ppt grainy ppt optimization screening
admitted
710
Crystallized
720
730
Crystal
Crystal
Crystal
waiting examined abandoned
collection
Diffr
810
data
Dataset
collected
In
947
PDB
Structure
deposited
950
Structure
Need to Consider the Future… Now
• How much data are we capturing in our databases
compared to how much we are reporting?
• What will happen to Center data after PSI-2?
• We should ensure that as much as possible of our
Center data is publicly accessible in PepcDB
Trial Data Reporting by Center
Center
Experimental trial details reported to PepcDB
JCSG
Protein sequence, cloning vector, fermentation media,
purification method, crystallization conditions
MCSG
Protein sequence, cloning vector, expression host,
temperature, media
NESGC
Protein sequence
NYSGXRC
DNA and protein sequence, construct boundaries, cloning
vector, small scale expression/solubility scores, media,
MW, large scale media, volume, induction time/temp,
pellet weight, harvest date, SeMet Y/N, purification yield,
concentration, purity, MW, oligomeric state, start/end
dates, mass spec pass/fail, analysis comments, MW,
crystallization conditions, protein concentration,
temperature, cryo, harvest/collection dates, anomalous
scatterer, diffraction resolution
PepcDB Trial Schema
NYSGXRC <protocolDetails>
<protocolId>SGX_MOLBIO_PCR
DNA source?
### Molecular Biology - PCR ####
Primers?
PCR start date: 03/20/2007
PCR last updated: 04/16/2007
Notebook #: 1358 Page: 13
<protocolId>SGX_MOLBIO_TOPO_TRANSFORM
### Molecular Biology - cloning ####
SGX clonename: 10001b2BSt5p1
Vector: pSGX4 (BS)
<protocolId>SGX_MOLBIO_EXPR_SOL
### Small scale expression/solubility ###
Expression score: HIGH
Solubility rating: HIGH
Predicted molecular weight (kDa): 44.95
Growth Media (small scale): ZYP-5052
Observed molecular Weight (kDa): 46
Sonication buffer: PLB1</protocolDetails>
Host cells?
Purification steps?
Antibiotic resistance?
Buffers?
<protocolId>SGX_FERM_ECOLI_ZYP
### Fermentation ###
SGX PID: 11732
Growth Media (large scale): ZYP-5052
Total volume (L): 1
Induction time (hr): 21
Induction temp. (C): 22
Pellet weight (g): 19
Harvest date: 05/17/2006
Selenomet: N
<protocolId>SGX_PURIF_ECOLI_BACT
### Purification ###
SGX PID: 11732
SGX pool: 1
Selenomet: N
Start date: 06/21/2006
Yield (mg): 52.3
Final concentration (mg/ml): 52.3
Observed molecular weight (kDa): 33
Notebook #: 1136 Page: 115
End date: 06/23/2006
Purity (%): 98
Oligomeric state: monomer (1 subunit)
NYSGXRC <protocolDetails>
<protocolId>SGX_MALDI</protocolId> <protocolId>SGX_XTAL
### Mass Spec - MALDI ###
### Crystallization ###
Mass Spec Status: Passed
SGX XID: 27611
Tray barcode: N0081969
<protocolId>SGX_ESI-MS
Temperature: 21
### Mass Spec - ESI-MS ###
Protein concentration (mg/ml): 26
Mass Spec Status: Passed
Well location: G 12
Observed MW: 32528
Well conditions: [100mM] 1M Hepes pH 7.5 +
[25%] 50% PEG 3350 +[200mM] 1M
Magnesium Chloride hexahydrate
Cryoprotectant comment: [20%] 80%
Glycerol
Harvest date: 09/05/2006
Collection date: 09/05/2006
APS resolution: 2.3
Crystal status: D-DATASET COLLECTED
Crystal morphology?
Space group?
Proposed Data Reporting
• Molecular biology
– DNA source, primers, vector, PSI-MR clone ID, Host,
antibiotic resistance
– Expression and solubility rating (small scale), media,
predicted and observed molecular weight
• Fermentation
– Media, volume, induction time, temp, selenoMet?
• Purification
– Purification steps, final buffer, yield, concentration,
molecular weight, purity, oligomeric state
– Accurate MW if mass spec done
• Crystallization
– Temperature, protein concentration, well conditions,
cryoprotectant and resolution, if applicable
<MeasurementName> <…Value>
• Alternative mechanism to report experimental data
– <MeasurementName>molecular weight</MeasurementName>
– <MeasurementValue>32475</measurementValue>
– <MeasurementUnit>Da</MeasurementUnit>
• Examples
– Molecular weight
– Isoelectric point
– Phosphorylation
– Methylation
– Element analysis / stoichiometry
– etc.
Optional tags
• http://mmcif.pdb.org/sg-data/protprod.html
• PDB-proposed mmCIF-like tags to describe cloning,
expression, purification, crystallization, etc.
• Examples
– _entity_src_gen_pure.protein_concentration
– _entity_src_gen_pure.protein_yield
– _entity_src_gen_pure.protein_oligomeric_state
– _pdbx_buffer_components.name
– _pdbx_buffer_components.conc
– _exptl_crystal_grow.temp
Recommendation
• NYSGXRC plans to further improve our reporting of
trial results in 2008
• We encourage all PSI Centers to utilize the
PepcDB <protocolDetails> or <trialMeasurement>
tags to report as much experimental trial results as
possible in their PepcDB XML updates
• See associated poster
Acknowledgements
• SGX LIMS development team
–
–
–
–
Ryan Allis
Chris Hansen
Peter Hillier
Ken Schwinn
• AECOM - Veena Venkatagiriyappa (Fiser lab)
• Andrei Kouranov (PDB)
• LIMS improvements suggested by SGX protein production,
crystallization, and beamline staff
• This work was supported by SGX Pharmaceuticals, Inc.,
and NIH Grant U54 GM074945
Download