Shuchismita Dutta
Spring 2010
1
Overview
Exploring the PDB format file
File formats and dictionaries
Finding PDB format files and other files
Validation
Exploring PDB file
Meta data
Coordinates
Title section
OBSLTE 18-JUL-84 1HHB 2HHB 3HHB 4HHB
SPLIT 1JGP 1JGQ 1JGO
CAVEAT 1B86 THERE ARE CHIRALITY ERRORS IN C-ALPHA CENTERS
REVDAT 4 24-FEB-09 4HHB 1 VERSN
REVDAT 3 01-APR-03 4HHB 1 JRNL
REVDAT 2 15-OCT-89 4HHB 3 MTRIX
REVDAT 1 17-JUL-84 4HHB 0
SPRSDE 17-JUL-84 4HHB 1HHB
Remarks: the numbers mean something
REMARK 0
REMARK 0 THIS ENTRY (2Q41) REFLECTS AN ALTERNATIVE MODELING OF THE
REMARK 0 ORIGINAL STRUCTURAL DATA (R1XJ5SF) DETERMINED BY AUTHORS
REMARK 0 OF THE PDB ENTRY 1XJ5: G.E.WESENBERG,D.W.SMITH,
REMARK 0 G.N.PHILLIPS JR.,E.BITTO,C.A.BINGMAN,S.T.M.ALLARD,
REMARK 0 CENTER FOR EUKARYOTIC STRUCTURAL GENOMICS (CESG).
Data collection details:
X-ray source, detector, data collection details (200)
Fiber diffraction (205)
NMR (210, 215, 217)
Neutron diffractions (230)
Electron crystallography (240)
Electron Microscopy (245)
Crystallographic details:
Vm, Matthew’s coefficient
Crystallographic symmetry
Remark 3
Data from each refinement software has its own template and details
Remarks: the numbers mean something
Biological assembly information
Example of a virus (1AYN)
Remarks
Compound details
Missing residues, atoms
Geometry: close contacts, bond length, angle and torsion deviations, sterochemistry
Ligand details
Related entries
Sequence details
Chemistry sections :
Primary Structure & Ligand
DBREF 1BH0 A 1 29 UNP P01275 GLUC_HUMAN 53 81
SEQADV 1BH0 LYS A 17 UNP P01275 ARG 69 ENGINEERED
SEQADV 1BH0 LYS A 18 UNP P01275 ARG 70 ENGINEERED
SEQADV 1BH0 GLU A 21 UNP P01275 ASP 73 ENGINEERED
SEQRES 1 A 29 HIS SER GLN GLY THR PHE THR SER ASP TYR SER LYS TYR
SEQRES 2 A 29 LEU ASP SER LYS LYS ALA GLN GLU PHE VAL GLN TRP LEU
SEQRES 3 A 29 MET ASN THR
MODRES 2F4K NLE A 65 LEU NORLEUCINE
MODRES 2F4K NLE A 70 LEU NORLEUCINE
HET PO4 D 147 1
HET PO4 B 147 1
HET HEM A 142 43
HET HEM B 148 43
HET HEM C 142 43
HET HEM D 148 43
HETNAM PO4 PHOSPHATE ION
HETNAM HEM PROTOPORPHYRIN IX CONTAINING FE
HETSYN HEM HEME
FORMUL 5 PO4 2(O4 P 3-)
FORMUL 7 HEM 4(C34 H32 FE N4 O4)
FORMUL 11 HOH *221(H2 O)
Secondary Structure & Connectivity
HELIX 1 AA SER A 3 GLY A 18 1 16
HELIX 2 AB HIS A 20 SER A 35 1 16
HELIX 3 AC PHE A 36 TYR A 42 1 7
SHEET 1 A 4 ILE A 18 LEU A 23 0
SHEET 2 A 4 LEU A 111 VAL A 118 -1 O GLY A 115 N TRP A 19
SSBOND 1 CYS A 6 CYS A 127 1555 1555 2.02
SSBOND 2 CYS A 30 CYS A 115 1555 1555 2.02
SSBOND 3 CYS A 64 CYS A 80 1555 1555 2.03
SSBOND 4 CYS A 76 CYS A 94 1555 1555 2.01
LINK NE2 HIS A 87 FE HEM A 143 1555 1555 1.94
LINK NE2 HIS B 92 FE HEM B 147 1555 1555 2.07
LINK FE HEM B 147 O1 OXY B 150 1555 1555 1.87
LINK FE HEM A 143 O1 OXY A 150 1555 1555 1.66
CISPEP 1 PRO A 98 PRO A 99 0 0.53
CISPEP 2 GLY A 109 PRO A 110 0 -0.01
Miscellaneous
SITE 1 ACT 3 HIS H 57 ASP H 102 SER H 195
SITE 1 AC1 12 HIS H 57 ASN H 98 LEU H 99 ILE H 174
SITE 2 AC1 12 ASP H 189 ALA H 190 SER H 195 TRP H 215
SITE 3 AC1 12 GLY H 216 GLY H 219 HOH H 264 HOH H 270
REMARK 800 SITE
REMARK 800 SITE_IDENTIFIER: ACT
REMARK 800 EVIDENCE_CODE: AUTHOR
REMARK 800 SITE_DESCRIPTION: CATALYTIC SITE
REMARK 800 SITE_IDENTIFIER: AC1
REMARK 800 EVIDENCE_CODE: SOFTWARE
REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MID H 1
Crystallographic info, Coordinate
Transformations & coordinates
CRYST1 88.814 95.207 89.164 90.00 104.96 90.00 P 1 21 1 8
ORIGX1 1.000000 0.000000 0.000000 0.00000
ORIGX2 0.000000 1.000000 0.000000 0.00000
ORIGX3 0.000000 0.000000 1.000000 0.00000
SCALE1 0.011259 0.000000 0.003009 0.00000
SCALE2 0.000000 0.010503 0.000000 0.00000
SCALE3 0.000000 0.000000 0.011609 0.00000
MODEL 1
ATOM 1 N SER A 41 -9.122 -10.304 89.511 0.12 51.94 N
ATOM 2 CA SER A 41 -8.282 -11.187 88.650 0.12 52.75 C
ATOM 3 C SER A 41 -7.051 -11.693 89.414 0.12 52.51 C
ATOM 4 O SER A 41 -6.646 -11.108 90.421 0.12 53.15 O
ATOM 5 CB SER A 41 -7.845 -10.416 87.393 0.12 51.93 C
ATOM 6 OG SER A 41 -7.250 -11.264 86.423 0.12 52.59 O
ATOM 7 N THR A 42 -6.473 -12.792 88.935 0.12 51.75 N
ATOM 8 CA THR A 42 -5.290 -13.380 89.552 0.12 50.38 C
...
ENDMDL
Coordinate section: A Closer look
Alternate conformer ID
ATOM 49 N GLY A 8 2.326 4.110 1.416 1.00 42.03 N
ATOM 50 CA GLY A 8 3.121 3.079 2.065 1.00 42.27 C
ATOM 51 C GLY A 8 3.533 3.408 3.476 1.00 42.32 C
ATOM 52 O GLY A 8 4.302 2.642 4.092 1.00 44.09 O
ATOM 53 N GLY A 9 3.080 4.526 4.038 1.00 40.18 N
ATOM 54 CA GLY A 9 3.330 4.880 5.396 1.00 40.11 C
ATOM 55 C GLY A 9 4.552 5.685 5.709 1.00 39.75 C
ATOM 56 O GLY A 9 4.720 6.098 6.885 1.00 40.96 O
ATOM 57 N ASER A 10 5.404 6.014 4.753 0.33 39.21 N
ATOM 58 CA ASER A 10 6.598 6.814 5.042 0.33 38.11 C
ATOM 59 C ASER A 10 6.236 8.234 5.479 0.33 36.87 C
ATOM 60 O ASER A 10 5.150 8.733 5.233 0.33 32.77 O
ATOM 61 CB ASER A 10 7.516 6.864 3.822 0.33 39.46 C
ATOM 62 OG ASER A 10 8.894 6.884 4.237 0.33 40.79 O
ATOM 63 N BGLY A 10 5.404 6.014 4.753 0.67 39.21 N
ATOM 64 CA BGLY A 10 6.598 6.814 5.042 0.67 38.11 C
ATOM 65 C BGLY A 10 6.236 8.234 5.479 0.67 36.87 C
ATOM 66 O BGLY A 10 5.150 8.733 5.233 0.67 32.77 O
Microheterogeneity (1ENM)
Insertion codes
ATOM 1 N GLU L 1C 63.677 26.331 17.947 1.00 31.77 N
ATOM 2 CA GLU L 1C 64.338 26.818 16.736 1.00 35.78 C
ATOM 3 C GLU L 1C 63.351 27.360 15.717 1.00 41.73 C
ATOM 4 O GLU L 1C 63.320 28.565 15.489 1.00 49.37 O
ATOM 5 CB GLU L 1C 65.320 25.825 16.101 1.00 38.64 C
ATOM 6 N ALA L 1B 62.537 26.499 15.096 1.00 36.03 N
ATOM 7 CA ALA L 1B 61.571 26.988 14.116 1.00 33.01 C
ATOM 8 C ALA L 1B 60.631 28.018 14.729 1.00 32.42 C
ATOM 9 O ALA L 1B 60.238 27.865 15.872 1.00 31.68 O
ATOM 10 CB ALA L 1B 60.810 25.845 13.511 1.00 33.36 C
ATOM 11 N ASP L 1A 60.262 29.089 14.012 1.00 33.13 N
ATOM 12 CA ASP L 1A 59.378 30.016 14.691 1.00 35.05 C
ATOM 13 C ASP L 1A 57.965 29.526 14.760 1.00 31.74 C
ATOM 14 O ASP L 1A 57.476 28.873 13.851 1.00 36.72 O
ATOM 15 CB ASP L 1A 59.593 31.557 14.587 1.00 41.32 C
ATOM 16 CG ASP L 1A 58.724 32.268 13.564 1.00 46.17 C
ATOM 17 OD1 ASP L 1A 57.452 32.455 13.924 1.00 47.60 O
ATOM 18 OD2 ASP L 1A 59.188 32.658 12.472 1.00 48.99 O
ATOM 19 N CYS L 1 57.321 29.802 15.860 1.00 22.52 N
ATOM 20 CA CYS L 1 56.005 29.353 16.036 1.00 15.35 C
ATOM 21 C CYS L 1 55.351 30.160 17.077 1.00 15.83 C
ATOM 22 O CYS L 1 56.002 30.636 17.968 1.00 18.73 O
Residue numbering (1DWD)
Connectivity & Book keeping
CONECT 73 80
CONECT 80 73 81
CONECT 81 80 82 84
CONECT 82 81 83 88
MASTER 2487 0 28 47 52 0 0 673322 31 280 104
END
The PDB format guide
Located at
– http://www.wwpdb.org/documentation/format32/v3.2.html
Defines all the records that appear in the PDB file
Includes templates for all records and remarks
www.wwpdb.org
Keeping track of all the information
PDB format file is a report from a database
The database is built on the PDB exchange and chemical component dictionaries
The PDB exchange dictionary captures every piece of data from the PDB format file to build the mmCIF format file
Validation uses dictionaries to
– Check inter-relationships between different data components
– Match information to chemical component dictionary
-snip-
PDB format file mmCIF format file
PDB Format vs mmCIF Format
80 characters wide
Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms.
Includes name, source and sequence of all polymers
Can include a maximum of 62 chains and 99999 atoms.
Free format
Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms.
Includes name, source and sequence of all polymers
No restriction to number of chains or atoms in file.
Keeping track of all the information
PDB format file is a report from a database
The database is built on the PDB exchange and chemical component dictionaries
The PDB exchange dictionary captures every piece of data from the PDB format file to build the mmCIF format file
Validation uses dictionaries to
– Check inter-relationships between different data components
– Match information to chemical component dictionary
Dictionaries
PDB Exchange (pdbx) dictionary
– (http://mmcif.pdb.org/)
– Includes the syntax, definitions, relations, boundaries
– Includes examples for the contents of the mmCIF format file.
Chemical Component Dictionary
– Describes all residues in the PDB files (standard, non-standard amino acids, nucleotides and other ligands - ions, drugs, cofactors, inhibitors)
– 1-3 alphanumeric character identifier
– Includes model & idealized coordinates for components, connectivities, name, formula, smiles strings
– Maintained by the wwPDB.
– Used for data processing and validation of structures
Ligand cif file
Ligand Expo - Search Options
Ligand Expo – Substructure Search
Ligand Expo - Browse Options
-snip-
PDB format file
PDB Exchange
Dictionary includes syntax & definitions for mmCIF format files mmCIF format file
Instance of valine matched to VAL in
Chemical Component
Dictionary
Downloading files
Coordinate
– PDB
80 character wide
Created for X-ray structures
Updated for NMR, EM and other methods
– mmCIF
More flexible format
Based on mmCIF (PDBX) dictionary
– PDBML
XML translation of mmCIF format files
– Biological Unit
Experimental data
– SF files
Distributed in mmCIF format
– Constraints file
Validated by BMRB
Archive download
The ftp archive
RCSB PDB website
The Structure Summary page
Asymmetric and Biological Unit
Structure Analysis (RCSB tables)
Validation
Quality assessment
Is the structure well determined overall?
Is the structure suitable for your analysis and/or modeling requirements?
Are local regions that you are interested in well determined?
When to Validate?
Refinement
Step 2: Validation Report
Step 1: PDB ID
Depositor Data Deposition
Primary
Annotation
Validation PDB Entry
Step 3: Corrections
Step 4: Depositor Approval
Step 5: Functional Annotation
Archival Data
Core
Database
Distribution
Site
Download Data
Use of PDB data
Validation
What is validated?
Chemistry
– Of polymer (match to DB and internal consistency)
– Of ligands, ions, inhibitors (match to dictionary)
Geometry
– Close contacts
– Bond length, angle, torsion etc. deviations
– Ramachandran plot
Experimental data
– SF check
– R factors
How to validate?
Molprobity
EDS server
Procheck
Whatcheck/Whatif
Validation server at RCSB PDB
Electron Density Server report
Real-space R-value
Electron Density Server report
Real-space R-value
Validation at RCSB PDB
Summary
Exploring the PDB format file
– Documentation available from the wwPDB website
File formats and dictionaries
– Documentation and links available from the wwPDB website
Finding PDB format files and other files
– Links available from wwPDB and RCSB PDB websites
Validation
– Links available from RCSB PDB website
NSF, NIGMS, DOE, NLM, NCI,
NINDS, NIDDK
Wellcome Trust, EU,
CCP4, BBSRC, MRC, EMBL
BIRD-JST, MEXT NLM
53