The Protein Data Bank An Introduction History of the PDB 1970s Community discussions about how to establish an archive of protein structures Cold Spring Harbor meeting in protein crystallography PDB established at Brookhaven (October 1971; 7 structures) 1980s Number of structures increases as technology improves Community discussions about requiring depositions IUCr guidelines established Number of structures deposited increases 1990s Structural genomics begins PDB moves to RCSB 2000s wwPDB formed Myoglobin Hemoglobin Lysozyme Ribonuclease Crystal Structures Myoglobin: Kendrew, Bodo, Dintzis, Parrish, Wyckoff, Phillips, Nature 181 662-666, 1958. Hemoglobin: Perutz, Proc. R. Soc. A265, 161-187,1962. Lysozyme: Blake, Koenig, Mair, North, Phillips, Sarma, Nature 206 757, 1965. Ribonuclease: Kartha, Bello, Harker, Nature 213, 862-865 1967. Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards. J. Biol. Chem. 242, 3753-3757, 1967. 1970’s Grass roots community efforts to archive data Protein crystallographers discuss how to archive data June 1971 Cold Spring Harbor meeting brings groups together (Cold Spring Harbor Symposia on Quantitative Biology, vol. XXXVI, 1972.) October 1971 PDB is announced in Nature New Biology (7 structures; vol 233, 1971, page 223) 1975 PDB receives first funding from NSF (~32 structures) 1980’s Technology takes off Structural biology is able to focus on medical problems Community efforts to promote data sharing IUCr guidelines requiring data deposition in the PDB are published Cooperative Community Action Individual letters to editors of journals Committees IUCR commission on Biological Macromolecules ACA/ISNCCr Richards committee Funding agencies Articles in journals Larger molecules such as proteins and nucleic acids should be deposited in the Brookhaven Data Bank I believe you can get further advice about how to handle these macromolecular structures from Michael Rossmann of Purdue University It is NIH Policy that results of NIH sponsored research should be published and made available to other scientists We are, therefore, pleased to learn of the appointment of a committee under the auspices of the American Crystallographic Association and the U.S. National Committee for Crystallography to establish guidelines for deposition in the Brookhaven Protein Data Bank and/or the Cambridge Data File 1990’s Number of structures increases exponentially Complexity of structures increases New databases begin to emerge User community for the PDB expands dramatically mmCIF dictionary created PDB moves 2000’s Continued growth in structures Structural genomics takes off wwPDB is formed www.wwpdb.org Structural Genomics “The next step beyond the human genome project” From the NIH Request for Proposals for Structure Genomics Centers: “These studies should lead to an understanding of structure/function relationships and the ability to obtain structural models of all proteins identified by genomics. This project will require the determination of a large number of protein structures in a high-throughput mode.” PSI Structures (October 2007) Scientific Challenges Number of data files continues to increase Information content of each data file is increasing Many more very large macromolecular complexes New structure determination methods Structure genomics Technical Challenges How do we represent diverse data? How do make a searchable database? How do we integrate with other data resources? How do we make a scalable system? How do we meet the needs of a diverse community? Structure Determination Pipeline (X-ray) Hypothesis Driven Target Selection Crystallomics Data Collection Structure Determination Data Deposition Publication Data Release Isolation, Expression, Purification,Crystallization The Data Pipeline The Data Processing Pipeline Depositions since 2000 PDB Depositors 1999-2007 What is Checked and Validated? Protein sequence and chemistry of small molecule ligands Correspondence and cross references to other databases Experimental details Correspondence of coordinates with primary data Conformation of protein (Ramachandran plot) Biological assembly Crystal packing “good” “fair” Validation letter Oct. 15, 16:19:01 2002 Thank you for using RCSB. The following geometrical and stereochemical features have been calculated for your structure: CLOSE CONTACTS -------------==> Close contacts in same asymmetric unit. Distances smaller than 2.2 Angstroms are considered as close contacts. none ==> Close contacts based on crystal symmetry. Distances smaller than 2.2 Angstroms are considered as close contacts. none Data Query Support Strategy Data quality Data standardization Extended annotation Improved query functionality Extended query options Remediation: Scope and Statistics All primary citations verified (45K) Sequences & taxonomy updated for 61K sequences Ligand stereochemistry and nomenclature for 13M monomers and 170K non-polymer molecules Symmetry and coordinate transformations for 280 virus entries 10814 diffraction source & beamline updates ~1000 miscellaneous uniformity issues Virus Structure Remediation Before After PDB Usage Areas warranting additional outreach: Africa, Southeast Asia, South America, Australasia Searching and Reporting Capabilities Sequence References Sequence properties displayed Multiple domain definitions displayed Uniprot and structure Sequences mapped Citation Information Access to Remediated and Unremediated Data Planning for the future Funding E-MSD is supported by grants from the Wellcome Trust, the EU (TEMBLOR, NMRQUAL and IIMS), CCP4, the BBSRC, the MRC and EMBL. PDBj is supported by grant-in-aid from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST), and the Ministry of Education, Culture, Sports, Science and Technology (MEXT). The BMRB is supported by NIH grant LM05799 from the National Library of Medicine. The RCSB PDB is supported by grants from the National Science Foundation, National Institute of General Medical Sciences, the Office of Science-Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes & Digestive & Kidney Diseases. wwPDB annotators wwPDBAC wwPDB retreat September 2007