Automating Steps in Protein Structure Determination by NMR

advertisement
Automating Steps in
Protein Structure Determination
by NMR
CS 296.4
April 13, 2009
Outline
Background
Steps in NMR protein structure determination
The ACE cycle (Assign-Calculate-Evaluate)
The assignment problem
Algorithms for automated NOE assignment
Semi-automated methods
More-automated methods
Conclusions
The Steps in
Protein Structure Determination by NMR
1.
2.
3.
4.
5.
6.
Sample preparation
Data collection
Data evaluation
Structure calculation
Structure refinement
Structure deposition
The Steps in
Protein Structure Determination by NMR
1. Sample preparation
(a) protein selection
(b) gene engineering
(c) protein expression
(d) protein purification
(e) buffer optimization
(f ) isotope labeling
2. Data collection
3. Data evaluation
4. Structure calculation
5. Structure refinement
6. Structure deposition
(and maybe write a
paper and graduate)
The Steps in
Protein Structure Determination by NMR
1. Sample preparation
(a) protein selection
(b) gene engineering
(c) protein expression
(d) protein purification
(e) buffer optimization
(f ) isotope labeling
2. Data collection
(a) HSQC
(b) amide H/D exchange
(c) triple-resonance
3. Data evaluation
4. Structure calculation
5. Structure refinement
The Steps in
Protein Structure Determination by NMR
1. Sample preparation
(a) protein selection
(b) gene engineering
(c) protein expression
(d) protein purification
(e) buffer optimization
(f ) isotope labeling
2. Data collection
(a) HSQC
(b) amide H/D exchange
(c) triple-resonance
3. Data evaluation
(a) spectrum calculation
(b) peak picking
Automatable Steps in
Protein Structure Determination by NMR
1.
2.
3.
4.
5.
6.
Sample preparation
Data collection
Data evaluation
Structure calculation
Structure refinement
Structure deposition
The
Assign
Calculate
Evaluate
cycle
Fig. 2
(2003) Progress in NMR Spectroscopy, 43, 105, Guntert.
in
automated
NOE
assignment
and
structure
calculation.
Automating NOE Assignments
and
THE Assignment Problem
Automating NOE Assignments
and
THE Assignment Problem
There are MANY assignment tasks
1. Resonance Assignment
2. NOE Assignment
Automating NOE Assignments
and
THE Assignment Problem
There are MANY assignment tasks
1. Resonance Assignment
2. NOE Assignment
(interpreting data)
(interpreting data)
Automating NOE Assignments
and
THE Assignment Problem
There are MANY assignment tasks
1. Resonance Assignment
2. NOE Assignment
and one major assignment problem.
ambiguous assignments
Due to the data collection problems of
1. Completeness
2. Uniqueness
Automating NOE Assignments
and
THE Assignment Problem
There are MANY assignment tasks
1. Resonance Assignment
2. NOE Assignment
and one major assignment problem.
ambiguous assignments
Due to the data collection problems of
1. Completeness (missing data points)
2. Uniqueness (unresolvable data points)
Unambiguously assigning a NOESY cross peak
from Fig. 3 (2003) Progress in NMR Spectroscopy, 43, 105, Guntert.
Automated NMR Protein structure calculation
Peter Guntert (2003) Progress in NMR Spectroscopy, 43, 105-125
Algorithms for automated NOESY assignment
Semi-automated methods
1. ASsign NOEs
2. Structure Assisted NOE Evaluation
(1993)
(2001)
Automated NMR Protein structure calculation
Peter Guntert (2003) Progress in NMR Spectroscopy, 43, 105-125
Algorithms for automated NOESY assignment
Semi-automated methods
1. ASsign NOEs
2. Structure Assisted NOE Evaluation
More-automated methods
1. NOAH
2. Ambiguous Restraints Iterative Assignments
3. AutoStructure
4. KNOWledge-based NOE assignments
5. CANDID
(1993)
(2001)
(1995)
(1997)
(1999)
(2002)
(2002)
ASNO (1993) Guntert, Berndt, & Wuthrich
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
4. Set of estimated structures
User specifies
1.  = max allowed chemical shift error
2. dmax = max interproton distance causing NOE
3. nmin = min # structures with d < dmax
ASNO (1993) Guntert, Berndt, & Wuthrich
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
4. Set of estimated structures
User specifies
1.  = max allowed chemical shift error
2. dmax = max interproton distance causing NOE
3. nmin = min # structures with d < dmax
Algorithm steps
1. each cross peak: find all poss. assignments (1Hj, 1Hk)
2. each (1Hj, 1Hk): n = # of structures with d < dmax
3. Prune all (1Hj, 1Hk) with n < nmin
User intervention
1. Manually check and refine NOE assignments (1Hj, 1Hk)
2. Refine set of structures and rerun algorithm
Fig. 1 (1993) J Biomol NMR, 3, 601, Guntert, Berndt, & Wuthrich.
demo: Dendrotoxin K, 7kDa, 57AA, bbRMSD = 0.32Ang
SANE (2001) Duggan, Legge, Dyson, & Wright
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
User specifies Filters
1. Distance (Set of estimated structures)
2. Chemical Shift ( = max allowed error)
3. Secondary structure (unlikely NOE assignments)
4. Assignment (expected NOE assignments)
5. NOE contribution (same as in ARIA method)
SANE (2001) Duggan, Legge, Dyson, & Wright
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
User specifies Filters
1. Distance (Set of estimated structures)
2. Chemical Shift ( = max allowed error)
3. Secondary structure (unlikely NOE assignments)
4. Assignment (expected NOE assignments)
5. NOE contribution (same as in ARIA method)
Algorithm steps
1. each cross peak: find all poss. assignments (1Hj, 1Hk)
2. Apply five filters to prune list of (1Hj, 1Hk)
3. Write unique or ambiguous dist restraints, or violations
User intervention
1. Violation analysis
Fig. 1 (2001) J Biomol NMR, 19, 321, Duggan, et al.
demo: LFA-1 I-domain, 21.3kDa, 183AA, bbRMSD = 0.29Ang
NOAH (1995) Mumenthaler & Braun
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
4. Scalar coupling constants (3JNH)
Algorithm calculates
1. Distance constraints from NOE assignments
2. Angle constraints from scalar couplings
NOAH (1995) Mumenthaler & Braun
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
4. Scalar coupling constants (3JNH)
Algorithm calculates
1. Distance constraints from NOE assignments
2. Angle constraints from scalar couplings
Algorithm uses
1. Structure-based filter (recognizes correct constraints)
2. Chemical Shift limit ( = max allowed error)
3. Error-tolerant target function in DIAMOD (1994)
(minimizes effect of incorrect distance constraints
from incorrect NOE assignments)
Fig. 1 (1995) J Mol Biol, 254, 465, Mumenthaler & Braun
demo: 3 proteins ranging from 57 to 74 residues
(1995) J Mol Biol, 254, 465, Mumenthaler & Braun
NMRa/b=DEN=57, TEN=74, REP=69 residues
ARIA (1997) Nilges, et al.
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
4. Assignment cutoff, p, decreases for each cycle
5. (opt) preliminary structures, manual assignments
6. (opt) RDCs, scalar couplings, d-angles, S-S or H-bonds
Algorithm calculates in each cycle
1. Unique and partial NOE assignments
2. Unique and ambiguous distance restraints
3. Merges distance restraints with other input data
4. Bundle of refined structures (typically 20)
ARIA (1997) Nilges, et al.
Ambiguous restraints
An NOE cross peak with more than one possible assignment
is considered as a weighted composite of all of them.
Ambiguous distance restraints introduced to incorporate dk
of each ambiguous NOE assignment.
To reduce the number of assignment possibilities each relative
contribution Ck is calculated from dk and the average distance
for all possible assignments from the lowest n of 20 conformers
from the previous cycle. The largest Ck that add up to the cutoff
value, p, for that cycle are kept, the rest are discarded.
Fig. 1 (1997) J Mol Biol, 269, 408, Nilges, et al.
demo: -spectrin PH domain, 106 residues
MAN data derived from manual assignments
80ms and 30ms data differ only in mixing times
-spectrin PH domain, 106 residues
Table 1 (1997) J Mol Biol, 269, 408, Nilges, et al.
AutoStructure (1999) Moseley & Montelione
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs (
4. Scalar couplings
5. Slow amide H/D exchange data
6. Preliminary structure
7. Preliminary H-bonded pairs
Algorithm calculates
1. Distance restraints
2. Dihedral angle restraints
3. H-bonding pairs
4. Refined structures
j
 j) )
basic fibroblast
growth factor (127 residues)
(a) 10 NMR-derived structures
bbRMSD = 0.7 Ang.
between (b) manual and
AutoStructure-derived structures
Fig. 1 (1999) Curr. Opin. Struct. Biol., 9, 635, Moseley & Montelione.
(& Y.J. Huang PhD thesis)
KNOWNOE (2002) Gronwald, et al.
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
4. NOESY cross peak volume probability distribution
5. Preliminary structure
User specifies
1.  = max allowed chemical shift error
2. initial value of dmax = max interproton distance
3. Number, N, of current best structures
KNOWNOE (2002) Gronwald, et al.
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
4. NOESY cross peak volume probability distribution
5. Preliminary structure
User specifies
1.  = max allowed chemical shift error
2. initial value of dmax = max interproton distance
3. Number, N, of current best structures
Algorithm, working together with CNS, iteratively will
1. build A-list of uniquely assigned NOE cross peaks
2. calculate P(Ak, a | Vo) for all other peaks
3. add to A-list all peaks with P(Ak, a | Vo) < cutoff (0.8-0.9)
4. use current A-list to calculate N structures
KNOWNOE (2002) Gronwald, et al.
The problem of ambiguous assignments is addressed
with a Bayesian algorithm based on NOE cross peak
volume probability distributions derived from 326 spectra.
P(Ak, a | Vo) = probability that more than fraction a of
cross peak volume Vo is due to assignment k
If P(Ak, a | Vo) > cutoff value (typically 0.8 to 0.9)
then consider that peak assigned to k for the next cycle.
These authors state that their algorithm is
“Based on the observation that cross peak volume and
correct cross peak assignment are not independent of
each other”.
Figures 3 & 4 (2002) J. Biomol. NMR, 23, 271, Gronwald, et al.
Probability distributions of distance (left) and volume (right)
CANDID (2002) Hermann, Guntert & Wuthrich
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
4. Previously assigned NOE distance constraints
5. (opt) other conformational constraints
User specifies
1.  = max allowed chemical shift error
2. Cycle-dependent parameters (thresholds, cutoffs, etc.)
from (2002) J. Mol. Biol., 319, 209, Hermann, Guntert, & Wuthrich.
CANDID (2002) Hermann, Guntert & Wuthrich
Input “data”
1. Protein’s amino acid sequence
2. Proton resonance assignments
3. NOESY cross peak list (of pairs ( j
 j) )
4. Previously assigned NOE distance constraints
5. (opt) other conformational constraints
User specifies
1.  = max allowed chemical shift error
2. Cycle-dependent parameters (thresholds, cutoffs, etc.)
Algorithm uses
1. Structure-based filters (like NOAH)
2. Ambiguous distance constraints (like ARIA)
3. Network anchoring (new)
4. Constraint combination (new)
Fig. 1 (2002) J. Mol. Biol., 319, 209, Hermann, Guntert, & Wuthrich.
CANDID (2002) Hermann, Guntert & Wuthrich
ways to handle problems caused by no preliminary structure in first cycle
1. Network anchoring
“… evaluates the self-consistency of NOE assignments independent of
knowledge of the 3D protein structure.”
“… a sensitive approach for detecting erroneous ‘lonely’ constraints …”
2. Constraint combination
“… an extension of the concept of ambiguous NOE assignments.”
“… reduces the impact of unidentified artifact constraints in the input for
the first structure calculation.”
Result:
“The correct fold is obtained in cycle 1 of a de novo structure calculation.”
from (2002) J. Mol. Biol., 319, 209, Hermann, Guntert, & Wuthrich.
Questions ?
Conclusions
Download