RECOORD REcalculated COORdinates Database Jurgen Doreleijers Center for Eukaryotic Structural Genomics University of Madison-Wisconsin jurgen@bmrb.wisc.edu Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University a.j.nederveen@chem.uu.nl Wim Vranken Macromolecular Structure Database European Bioinformatics Institute wim@ebi.ac.uk Aim • Recalculation of protein structures based on deposited NMR restraints using state of the art methods • Goals: • • • • decrease user- and software-dependent biases allow a better comparison between structures comparison between different structure calculation programs provide a database for the development and assessments of validation tools and calculation protocols Overview recalculation project restraint manipulation PDB: -coordinates -restraints recalculation design of RECOORD analysis 1 BMRB: 2 EBI/UU: 3 Generation of consistent STAR files STAR files Doreleijers et al. 2003 CNS -topology -MD SA -refinement 4 CYANA 5 -sequence -MD SA -… 6 analysis -improvement? -correlations? -… Databases now publicly available • DOCR/FRED (BMRB) databases containing converted and filtered restraints http://www.bmrb.wisc.edu/servlets/MRGridServlet • RECOORD (EBI) database containing recalculated coordinates http://www.ebi.ac.uk/msd/recoord PDB: Selection -coordinates -restraints • Formats • • • (if distance restraints available): CNS/XPLOR DIANA/DYANA/CYANA DISCOVER/MSI • PDB entries selected: • 1 • only proteins • no HET atoms • multimers allowed (not yet re-calculated) • at least 20 residues Finally 545 monomers were selected BMRB: 2 STAR files Doreleijers et al. 2003 EBI/UU: 3 Conversion issues Generation of consistent STAR files • Data is converted to formats readable by calculation software (e.g. XPLOR/CNS and CYANA) by the FormatConverter available within CCPN software (Wim Vranken, EBI). Problems: • Differences between coordinate and restraint data: • • e.g. 1 chain in pdb entry, 2 chains in restraint list • residue numbering can differ in PDB entry and restraint • restraints for residues not present in PDB entry… Nomenclature in restraint list list CNS Building topology 4 -topology -MD SA -refinement CYANA -sequence -MD SA -… • Starting script: generate_easy.inp from CNS • Automated detection in original ensemble of: • • • Disulfide bridges (<3Å S-S distance in original first models) CIS peptides (if |w|<25º in original first models) Protonation state of histidines (use CNS patches HISD, HISE) • CYANA: sequence based on CNS topology • • Add CYSS, HIST, HIST+, cPRO in sequence Automated generation of disulfide restraints 5 CNS CONDOR computer cluster CS University Madison -topology -MD SA -refinement 4 CYANA -sequence -MD SA -… • More than 800 processor used • Total CPU time: 31,169 hours (3.5 years on single workstation) • Example 2EZM, calculation of 1 model (101 a.a. & 2.2 GHz P4 computer) CYANA CNS 31 seconds 340 seconds 5 Evaluation of structure quality 6 analysis -improvement? -correlations? -… • Agreement with experimental restraints • Improvement? • Comparison CNS and CYANA • Relation NMR data quality and structural quality Distance restraints violations 6 analysis -improvement? -correlations? -… ORG: 0.08 Å (0.14 Å) frequency original entries CNW: 0.04 Å (0.05 Å) recalculated in CNS and refined in water RMS distance restraints violations (Å) Dihedral restraints violations 6 analysis -improvement? -correlations? -… ORG: 1.6° (4.6°) frequency original entries CNW: 0.5° (0.5°) recalculated in CNS and refined in water RMS dihedral restraints violations (degrees) Results: quality indicators performance CNS vs. CYANA (no water refinement yet) 6 analysis -improvement? -correlations? -… Average value over 545 entries Original PDB CNS recalculation CYANA recalculation RMS distance restraints violations (Å) 0.08 ± 0.14 0.04 ± 0.06 0.04 ± 0.05 RMS dihedral restraints violations (degrees) 1.6 ± 4.6 0.5 ± 0.7 0.5 ± 0.7 Packing quality (Z-score) WHATCHECK -3.5 ± 1.9 -4.1 ± 1.9 -4.3 ± 1.8 Bumps per 100 residues 73 ± 63 11 ± 9 86 ± 37 % most favoured PROCHECK 69 ± 14 69 ± 13 61 ± 14 Results: quality indicators performance CNS before and after water refinement 6 analysis -improvement? -correlations? -… Average value over 545 entries Original PDB CNS recalculation CNS + water refinement RMS distance restraints violations (Å) 0.08 ± 0.14 0.04 ± 0.06 0.04 ± 0.05 RMS dihedral restraints violations (degrees) 1.6 ± 4.6 0.5 ± 0.7 0.5 ± 0.5 Packing quality (Z-score) WHATCHECK -3.5 ± 1.9 -4.1 ± 1.9 -2.5 ± 2.0 Bumps per 100 residues 73 ± 63 11 ± 9 10 ± 7 % most favoured PROCHECK 69 ± 14 69 ± 13 76 ± 11 Improvement: packing and Ramachandran Z-scores 6 analysis -improvement? -correlations? -… improvement Ramachandran Improvent Z-score: DZ=Zrefined - Zoriginal For ~ 5 % of entries no improvement possible because of missing NMR data compared to authors missing data improvement packing 6 analysis -improvement? -correlations? -… In search of correlations (Pearson coefficient) (correlations higher) data density data density refined RMS violations circular variance packing Ramachandran (Z score) (Z score) -0.23 -0.46 0.35 0.31 -0.03 0.22 -0.25 -0.37 0.58 -0.60 -0.67 0.25 0.69 -0.39 RMS violations -0.11 circular variance -0.32 0.00 packing 0.32 -0.06 -0.49 0.16 -0.11 -0.48 0.48 0.04 0.04 0.07 -0.21 bumps (Z-score) Ramachandran -0.51 (Z-score) bumps original (correlations lower) -0.47 6 analysis -improvement? -correlations? -… In search of correlations (Bumps) refined data density data density RMS violations circular variance packing Ramachandran (Z score) (Z score) -0.23 -0.46 0.35 0.31 -0.03 0.22 -0.25 -0.37 0.58 -0.60 -0.67 0.25 0.69 -0.39 RMS violations -0.11 circular variance -0.32 0.00 packing 0.32 -0.06 -0.49 0.16 -0.11 -0.48 0.48 0.04 0.04 0.07 -0.21 bumps (Z-score) Ramachandran -0.51 (Z-score) bumps original -0.47 6 analysis -improvement? -correlations? -… In search of correlations (NMR data density) refined data density data density RMS violations circular variance packing Ramachandran (Z score) (Z score) -0.23 -0.46 0.35 0.31 -0.03 0.22 -0.25 -0.37 0.58 -0.60 -0.67 0.25 0.69 -0.39 RMS violations -0.11 circular variance -0.32 0.00 packing 0.32 -0.06 -0.49 0.16 -0.11 -0.48 0.48 0.04 0.04 0.07 -0.21 bumps (Z-score) Ramachandran -0.51 (Z-score) bumps original -0.47 6 analysis -improvement? -correlations? -… Correlation NMR data density Ramachandran Z-score Ramachandran Z-score r=0.31 NMR data density Correlation NOE completeness and packing Z-score 6 analysis -improvement? -correlations? -… r=0.20 packing Z-score NMR data-based indicators cannot yield any indication of the normality of the structures NOE completeness 6 analysis -improvement? -correlations? -… In search of correlations (Precision) refined data density data density RMS violations circular variance packing Ramachandran (Z score) (Z score) -0.23 -0.46 0.35 0.31 -0.03 0.22 -0.25 -0.37 0.58 -0.60 -0.67 0.25 0.69 -0.39 RMS violations -0.11 circular variance -0.32 0.00 packing 0.32 -0.06 -0.49 0.16 -0.11 -0.48 0.48 0.04 0.04 0.07 -0.21 bumps (Z-score) Ramachandran -0.51 (Z-score) bumps original -0.47 Correlation between precision and data density circular variance r=-0.46 NMR data density 6 analysis -improvement? -correlations? -… Correlation between precision and Ramachandran 6 analysis -improvement? -correlations? -… circular variance r=-0.67 Protein with high Ramachandran normality will have small circular variance 1SUT Ramachandran plot appearance (Z-score) 6 analysis -improvement? -correlations? -… Correlation between RMSD and structural uncertainty (QUEEN) backbone RMSD (Å) r=-0.69 Structural uncertainty imposes lower limit to the RMSD structural uncertainty Conclusions I • NMR-STAR files made consistent for 545 out of ±1700 • • • entries Protocols and scripts available for recalculation in CYANA and CNS Validation database available for testing of new protocols Improvement compared to original data: 1 standard deviation closer to X-ray db • violations in original data do no limit recalculation effort • refinement in water required • 5 % no improvement: data missing Conclusions II • Correlations higher after recalculation and refinement, though most of them still weak • Highest correlation: precision vs. Ramachandran score & structural uncertainty (QUEEN) Acknowledgements • • • • • • Utrecht University Alexandre Bonvin Rob Kaptein EBI Cambridge Wim Vranken CESG/BMRB Jurgen Doreleijers Zachary Miller Eldon Ulrich John Markley Radboud University Nijmegen Chris Spronk Sander Nabuurs RIKEN Japan Peter Güntert Institut Pasteur Paris Michael Nilges