Automating Steps in Protein Structure Determination by NMR CS 296.4 April 13, 2009 Outline Background Steps in NMR protein structure determination The ACE cycle (Assign-Calculate-Evaluate) The assignment problem Algorithms for automated NOE assignment Semi-automated methods More-automated methods Conclusions The Steps in Protein Structure Determination by NMR 1. 2. 3. 4. 5. 6. Sample preparation Data collection Data evaluation Structure calculation Structure refinement Structure deposition The Steps in Protein Structure Determination by NMR 1. Sample preparation (a) protein selection (b) gene engineering (c) protein expression (d) protein purification (e) buffer optimization (f ) isotope labeling 2. Data collection 3. Data evaluation 4. Structure calculation 5. Structure refinement 6. Structure deposition (and maybe write a paper and graduate) The Steps in Protein Structure Determination by NMR 1. Sample preparation (a) protein selection (b) gene engineering (c) protein expression (d) protein purification (e) buffer optimization (f ) isotope labeling 2. Data collection (a) HSQC (b) amide H/D exchange (c) triple-resonance 3. Data evaluation 4. Structure calculation 5. Structure refinement The Steps in Protein Structure Determination by NMR 1. Sample preparation (a) protein selection (b) gene engineering (c) protein expression (d) protein purification (e) buffer optimization (f ) isotope labeling 2. Data collection (a) HSQC (b) amide H/D exchange (c) triple-resonance 3. Data evaluation (a) spectrum calculation (b) peak picking Automatable Steps in Protein Structure Determination by NMR 1. 2. 3. 4. 5. 6. Sample preparation Data collection Data evaluation Structure calculation Structure refinement Structure deposition The Assign Calculate Evaluate cycle Fig. 2 (2003) Progress in NMR Spectroscopy, 43, 105, Guntert. in automated NOE assignment and structure calculation. Automating NOE Assignments and THE Assignment Problem Automating NOE Assignments and THE Assignment Problem There are MANY assignment tasks 1. Resonance Assignment 2. NOE Assignment Automating NOE Assignments and THE Assignment Problem There are MANY assignment tasks 1. Resonance Assignment 2. NOE Assignment (interpreting data) (interpreting data) Automating NOE Assignments and THE Assignment Problem There are MANY assignment tasks 1. Resonance Assignment 2. NOE Assignment and one major assignment problem. ambiguous assignments Due to the data collection problems of 1. Completeness 2. Uniqueness Automating NOE Assignments and THE Assignment Problem There are MANY assignment tasks 1. Resonance Assignment 2. NOE Assignment and one major assignment problem. ambiguous assignments Due to the data collection problems of 1. Completeness (missing data points) 2. Uniqueness (unresolvable data points) Unambiguously assigning a NOESY cross peak from Fig. 3 (2003) Progress in NMR Spectroscopy, 43, 105, Guntert. Automated NMR Protein structure calculation Peter Guntert (2003) Progress in NMR Spectroscopy, 43, 105-125 Algorithms for automated NOESY assignment Semi-automated methods 1. ASsign NOEs 2. Structure Assisted NOE Evaluation (1993) (2001) Automated NMR Protein structure calculation Peter Guntert (2003) Progress in NMR Spectroscopy, 43, 105-125 Algorithms for automated NOESY assignment Semi-automated methods 1. ASsign NOEs 2. Structure Assisted NOE Evaluation More-automated methods 1. NOAH 2. Ambiguous Restraints Iterative Assignments 3. AutoStructure 4. KNOWledge-based NOE assignments 5. CANDID (1993) (2001) (1995) (1997) (1999) (2002) (2002) ASNO (1993) Guntert, Berndt, & Wuthrich Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) 4. Set of estimated structures User specifies 1. = max allowed chemical shift error 2. dmax = max interproton distance causing NOE 3. nmin = min # structures with d < dmax ASNO (1993) Guntert, Berndt, & Wuthrich Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) 4. Set of estimated structures User specifies 1. = max allowed chemical shift error 2. dmax = max interproton distance causing NOE 3. nmin = min # structures with d < dmax Algorithm steps 1. each cross peak: find all poss. assignments (1Hj, 1Hk) 2. each (1Hj, 1Hk): n = # of structures with d < dmax 3. Prune all (1Hj, 1Hk) with n < nmin User intervention 1. Manually check and refine NOE assignments (1Hj, 1Hk) 2. Refine set of structures and rerun algorithm Fig. 1 (1993) J Biomol NMR, 3, 601, Guntert, Berndt, & Wuthrich. demo: Dendrotoxin K, 7kDa, 57AA, bbRMSD = 0.32Ang SANE (2001) Duggan, Legge, Dyson, & Wright Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) User specifies Filters 1. Distance (Set of estimated structures) 2. Chemical Shift ( = max allowed error) 3. Secondary structure (unlikely NOE assignments) 4. Assignment (expected NOE assignments) 5. NOE contribution (same as in ARIA method) SANE (2001) Duggan, Legge, Dyson, & Wright Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) User specifies Filters 1. Distance (Set of estimated structures) 2. Chemical Shift ( = max allowed error) 3. Secondary structure (unlikely NOE assignments) 4. Assignment (expected NOE assignments) 5. NOE contribution (same as in ARIA method) Algorithm steps 1. each cross peak: find all poss. assignments (1Hj, 1Hk) 2. Apply five filters to prune list of (1Hj, 1Hk) 3. Write unique or ambiguous dist restraints, or violations User intervention 1. Violation analysis Fig. 1 (2001) J Biomol NMR, 19, 321, Duggan, et al. demo: LFA-1 I-domain, 21.3kDa, 183AA, bbRMSD = 0.29Ang NOAH (1995) Mumenthaler & Braun Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) 4. Scalar coupling constants (3JNH) Algorithm calculates 1. Distance constraints from NOE assignments 2. Angle constraints from scalar couplings NOAH (1995) Mumenthaler & Braun Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) 4. Scalar coupling constants (3JNH) Algorithm calculates 1. Distance constraints from NOE assignments 2. Angle constraints from scalar couplings Algorithm uses 1. Structure-based filter (recognizes correct constraints) 2. Chemical Shift limit ( = max allowed error) 3. Error-tolerant target function in DIAMOD (1994) (minimizes effect of incorrect distance constraints from incorrect NOE assignments) Fig. 1 (1995) J Mol Biol, 254, 465, Mumenthaler & Braun demo: 3 proteins ranging from 57 to 74 residues (1995) J Mol Biol, 254, 465, Mumenthaler & Braun NMRa/b=DEN=57, TEN=74, REP=69 residues ARIA (1997) Nilges, et al. Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) 4. Assignment cutoff, p, decreases for each cycle 5. (opt) preliminary structures, manual assignments 6. (opt) RDCs, scalar couplings, d-angles, S-S or H-bonds Algorithm calculates in each cycle 1. Unique and partial NOE assignments 2. Unique and ambiguous distance restraints 3. Merges distance restraints with other input data 4. Bundle of refined structures (typically 20) ARIA (1997) Nilges, et al. Ambiguous restraints An NOE cross peak with more than one possible assignment is considered as a weighted composite of all of them. Ambiguous distance restraints introduced to incorporate dk of each ambiguous NOE assignment. To reduce the number of assignment possibilities each relative contribution Ck is calculated from dk and the average distance for all possible assignments from the lowest n of 20 conformers from the previous cycle. The largest Ck that add up to the cutoff value, p, for that cycle are kept, the rest are discarded. Fig. 1 (1997) J Mol Biol, 269, 408, Nilges, et al. demo: -spectrin PH domain, 106 residues MAN data derived from manual assignments 80ms and 30ms data differ only in mixing times -spectrin PH domain, 106 residues Table 1 (1997) J Mol Biol, 269, 408, Nilges, et al. AutoStructure (1999) Moseley & Montelione Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( 4. Scalar couplings 5. Slow amide H/D exchange data 6. Preliminary structure 7. Preliminary H-bonded pairs Algorithm calculates 1. Distance restraints 2. Dihedral angle restraints 3. H-bonding pairs 4. Refined structures j j) ) basic fibroblast growth factor (127 residues) (a) 10 NMR-derived structures bbRMSD = 0.7 Ang. between (b) manual and AutoStructure-derived structures Fig. 1 (1999) Curr. Opin. Struct. Biol., 9, 635, Moseley & Montelione. (& Y.J. Huang PhD thesis) KNOWNOE (2002) Gronwald, et al. Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) 4. NOESY cross peak volume probability distribution 5. Preliminary structure User specifies 1. = max allowed chemical shift error 2. initial value of dmax = max interproton distance 3. Number, N, of current best structures KNOWNOE (2002) Gronwald, et al. Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) 4. NOESY cross peak volume probability distribution 5. Preliminary structure User specifies 1. = max allowed chemical shift error 2. initial value of dmax = max interproton distance 3. Number, N, of current best structures Algorithm, working together with CNS, iteratively will 1. build A-list of uniquely assigned NOE cross peaks 2. calculate P(Ak, a | Vo) for all other peaks 3. add to A-list all peaks with P(Ak, a | Vo) < cutoff (0.8-0.9) 4. use current A-list to calculate N structures KNOWNOE (2002) Gronwald, et al. The problem of ambiguous assignments is addressed with a Bayesian algorithm based on NOE cross peak volume probability distributions derived from 326 spectra. P(Ak, a | Vo) = probability that more than fraction a of cross peak volume Vo is due to assignment k If P(Ak, a | Vo) > cutoff value (typically 0.8 to 0.9) then consider that peak assigned to k for the next cycle. These authors state that their algorithm is “Based on the observation that cross peak volume and correct cross peak assignment are not independent of each other”. Figures 3 & 4 (2002) J. Biomol. NMR, 23, 271, Gronwald, et al. Probability distributions of distance (left) and volume (right) CANDID (2002) Hermann, Guntert & Wuthrich Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) 4. Previously assigned NOE distance constraints 5. (opt) other conformational constraints User specifies 1. = max allowed chemical shift error 2. Cycle-dependent parameters (thresholds, cutoffs, etc.) from (2002) J. Mol. Biol., 319, 209, Hermann, Guntert, & Wuthrich. CANDID (2002) Hermann, Guntert & Wuthrich Input “data” 1. Protein’s amino acid sequence 2. Proton resonance assignments 3. NOESY cross peak list (of pairs ( j j) ) 4. Previously assigned NOE distance constraints 5. (opt) other conformational constraints User specifies 1. = max allowed chemical shift error 2. Cycle-dependent parameters (thresholds, cutoffs, etc.) Algorithm uses 1. Structure-based filters (like NOAH) 2. Ambiguous distance constraints (like ARIA) 3. Network anchoring (new) 4. Constraint combination (new) Fig. 1 (2002) J. Mol. Biol., 319, 209, Hermann, Guntert, & Wuthrich. CANDID (2002) Hermann, Guntert & Wuthrich ways to handle problems caused by no preliminary structure in first cycle 1. Network anchoring “… evaluates the self-consistency of NOE assignments independent of knowledge of the 3D protein structure.” “… a sensitive approach for detecting erroneous ‘lonely’ constraints …” 2. Constraint combination “… an extension of the concept of ambiguous NOE assignments.” “… reduces the impact of unidentified artifact constraints in the input for the first structure calculation.” Result: “The correct fold is obtained in cycle 1 of a de novo structure calculation.” from (2002) J. Mol. Biol., 319, 209, Hermann, Guntert, & Wuthrich. Questions ? Conclusions