The Protein Data Bank An Introduction

advertisement
The Protein Data Bank
An Introduction
History of the PDB
1970s
 Community discussions about how to establish an archive of protein structures
 Cold Spring Harbor meeting in protein crystallography
 PDB established at Brookhaven (October 1971; 7 structures)
1980s
 Number of structures increases as technology improves
 Community discussions about requiring depositions
 IUCr guidelines established
 Number of structures deposited increases
1990s
 Structural genomics begins
 PDB moves to RCSB
2000s
 wwPDB formed
Myoglobin
Hemoglobin
Lysozyme
Ribonuclease
Crystal
Structures
Myoglobin: Kendrew, Bodo, Dintzis, Parrish,
Wyckoff, Phillips, Nature 181 662-666, 1958.
Hemoglobin: Perutz, Proc. R. Soc. A265,
161-187,1962. Lysozyme: Blake, Koenig,
Mair, North, Phillips, Sarma, Nature 206 757,
1965. Ribonuclease: Kartha, Bello, Harker,
Nature 213, 862-865 1967. Wyckoff,
Hardman, Allewell, Inagami, Johnson,
Richards. J. Biol. Chem. 242, 3753-3757,
1967.
1970’s
Grass roots community
efforts to archive data
Protein crystallographers
discuss how to archive
data
June 1971 Cold Spring
Harbor meeting brings
groups together (Cold
Spring Harbor Symposia
on Quantitative Biology,
vol. XXXVI, 1972.)
October 1971 PDB is
announced in Nature
New Biology (7
structures; vol 233,
1971, page 223)
1975 PDB receives first
funding from NSF (~32
structures)
1980’s
Technology takes off
Structural biology is
able to focus on
medical problems
Community efforts to
promote data sharing
IUCr guidelines
requiring data
deposition in the PDB
are published
Cooperative Community Action
 Individual letters to editors of journals
 Committees
 IUCR commission on Biological
Macromolecules
 ACA/ISNCCr
 Richards committee
 Funding agencies
 Articles in journals
Larger molecules such as proteins and nucleic acids
should be deposited in the Brookhaven Data Bank
I believe you can get further advice about how to
handle these macromolecular structures from Michael
Rossmann of Purdue University
It is NIH Policy that results of NIH sponsored research
should be published and made available to other
scientists
We are, therefore, pleased to learn of the appointment
of a committee under the auspices of the American
Crystallographic Association and the U.S. National
Committee for Crystallography to establish guidelines
for deposition in the Brookhaven Protein Data Bank
and/or the Cambridge Data File
1990’s
 Number of structures increases exponentially
 Complexity of structures increases
 New databases begin to emerge
 User community for the PDB expands dramatically
 mmCIF dictionary created
 PDB moves
2000’s
 Continued growth in structures
 Structural genomics takes off
 wwPDB is formed
www.wwpdb.org
Structural Genomics
“The next step beyond the human genome project”
From the NIH Request for Proposals for Structure Genomics Centers:
“These studies should lead to an understanding of
structure/function relationships and the ability to obtain
structural models of all proteins identified by genomics.
This project will require the determination of a large
number of protein structures in a high-throughput mode.”
PSI Structures (October 2007)
Scientific Challenges
 Number of data files continues to increase
 Information content of each data file is increasing
 Many more very large macromolecular complexes
 New structure determination methods
 Structure genomics
Technical Challenges
 How do we represent diverse data?
 How do make a searchable database?
 How do we integrate with other data
resources?
 How do we make a scalable system?
 How do we meet the needs of a diverse
community?
Structure Determination Pipeline
(X-ray)
Hypothesis
Driven Target
Selection
Crystallomics
Data
Collection
Structure
Determination
Data
Deposition
Publication
Data
Release
Isolation, Expression,
Purification,Crystallization
The Data Pipeline
The Data Processing Pipeline
Depositions since 2000
PDB Depositors 1999-2007
What is Checked and Validated?
 Protein sequence and chemistry of small molecule
ligands
 Correspondence and cross references to other
databases
 Experimental details
 Correspondence of coordinates with primary data
 Conformation of protein (Ramachandran plot)
 Biological assembly
 Crystal packing
“good”
“fair”
Validation letter
Oct. 15, 16:19:01 2002
Thank you for using RCSB. The following geometrical and stereochemical
features have been calculated for your structure:
CLOSE CONTACTS
-------------==> Close contacts in same asymmetric unit. Distances smaller than 2.2
Angstroms are considered as close contacts.
none
==> Close contacts based on crystal symmetry. Distances smaller than 2.2
Angstroms are considered as close contacts.
none
Data Query Support Strategy
Data quality
Data standardization
Extended annotation
Improved query
functionality
Extended query options
Remediation: Scope and
Statistics
 All primary citations verified (45K)
 Sequences & taxonomy updated for 61K sequences
 Ligand stereochemistry and nomenclature for 13M
monomers and 170K non-polymer molecules
 Symmetry and coordinate transformations for 280 virus
entries
 10814 diffraction source & beamline updates
 ~1000 miscellaneous uniformity issues
Virus Structure Remediation
Before
After
PDB Usage
Areas warranting additional outreach:
Africa, Southeast Asia, South America, Australasia
Searching and Reporting
Capabilities
Sequence References
Sequence properties
displayed
Multiple domain definitions
displayed
Uniprot and structure
Sequences mapped
Citation Information
Access to Remediated and
Unremediated Data
Planning for the future
Funding
E-MSD is supported by grants from the Wellcome Trust, the EU (TEMBLOR,
NMRQUAL and IIMS), CCP4, the BBSRC, the MRC and EMBL.
PDBj is supported by grant-in-aid from the Institute for Bioinformatics Research
and Development, Japan Science and Technology Agency (BIRD-JST), and the
Ministry of Education, Culture, Sports, Science and Technology (MEXT).
The BMRB is supported by NIH grant LM05799 from the National
Library of Medicine.
The RCSB PDB is supported by grants from the National Science Foundation, National
Institute of General Medical Sciences, the Office of Science-Department of Energy, the
National Library of Medicine, the National Cancer Institute, the National Center for
Research Resources, the National Institute of Biomedical Imaging and Bioengineering,
the National Institute of Neurological Disorders and Stroke, and the National Institute of
Diabetes & Digestive & Kidney Diseases.
wwPDB annotators
wwPDBAC
wwPDB retreat September 2007
Download